revert smarter entrypoint script by 2.10 #78

maxheld83 · 2020-02-28T17:16:35Z

Edit from future: see #78 (comment) for reversion deadlines / strategy.

The smarter entrypoint script introduced in 51cb2b6 closing #54 may add some unwelcome complexity.
For example (maybe the only case), @zspitz noticed that the entrypoint script won't let pandoc containers accept pandoc arguments and input files in arbitrary orders (see pandoc/pandoc-action-example#3 and jgm/pandoc-website#38).

@tarleb suggested to improve the auto detection, but I'm wondering whether it would not just make more sense to revert to something simple:

ENTRYPOINT ["pandoc"]
CMD ["--help"]

I know a lot of thought went into #54 so feel free to close this.

My expectation was actually that there'd be no such cleverness, and it surprised me that this could be a source of problems; much like @zspitz I would've never guessed it.

In #54, @tarleb mentioned that:

New users are repeatedly irritated by our use of ENTRYPOINT.

But isn't ENTRYPOINT ["pandoc"] the default for containers?
And users can always override with docker run --entrypoint=/bin/bash pandoc ls, right?

I'm also not sure the helper script in the dockerfile docs apply here:

The ENTRYPOINT instruction can also be used in combination with a helper script, allowing it to function in a similar way to the command above, even when starting the tool may require more than one step.

(emphasis added).

But this (more than one step) isn't the case for pandoc, right?

The text was updated successfully, but these errors were encountered:

svenevs · 2020-02-28T20:30:59Z

This is somewhat confusing ... I'm both able to and not able to reproduce this.

$ docker run --rm --volume "$(pwd):/data" pandoc/latex:latest test.txt -o test.html
$ docker run --rm --volume "$(pwd):/data" pandoc/latex:latest test.txt -o test.pdf

Works exactly as expected. But if I do

$ docker run --rm --volume "$(pwd):/data" pandoc/core:latest docs/patient-guide.md -o output.pdf -t html
/usr/local/bin/docker-entrypoint.sh: exec: line 11: docs/patient-guide.md: Permission denied

It seems like the ash shell's command -v does something dumb with paths that have a slash in them?

# (in docker container)
$ command -v docs/patient-guide.md
docs/patient-guide.md
$ command -v test.txt

And apparently it's not a builtin, but a script floating around:

#!/bin/sh
# $FreeBSD: src/usr.bin/alias/generic.sh,v 1.2 2005/10/24 22:32:19 cperciva Exp $
# This file is in the public domain.
builtin `echo ${0##*/} | tr \[:upper:] \[:lower:]` ${1+"$@"}
# file: /usr/bin/command

I'm not sure what to make of this.

tarleb · 2020-03-02T10:04:20Z

~~Wow, that's pretty awful. How about we switch to which? Not exactly the same, but arguably closer to what users might expect.~~

Some more googling shows that it's a bad idea

tarleb · 2020-03-02T11:14:07Z

I created a PR which double-checks that the value returned by command -v is actually executable. That change should make unexpected behavior much less likely.

svenevs · 2020-03-04T18:49:29Z

I see that @maxheld83 approves, but to be honest the more I think about it the more I think this isn't robust enough. @zspitz reports in pandoc/pandoc-action-example#3 (comment)

fails with an error:

Syntax error: unexpected newline

This is subtle, but fundamentally different than the error I reported:

/usr/local/bin/docker-entrypoint.sh: exec: line 11: docs/patient-guide.md: Permission denied

Why I'm worried:

$ ls -al tidepool-hebrew-docs/docs/*.md
-rwxrwxr-x. 1 sven sven 6.3K Mar  4 13:46 tidepool-hebrew-docs/docs/patient-guide.md*
-rwxrwxr-x. 1 sven sven   74 Mar  4 13:46 tidepool-hebrew-docs/docs/test.md*

The executable bit on those files is set, meaning that the proposed patch will produce the exact same error message. We either need to be smarter, or dumber. In this particular scenario, we can coach @zspitz to remove the executable bit from their .md files, but I anticipate this same issue will come up again and unfortunately the errors associated with this are not going to be the same and may cause a lot of confusion -- the error message depends entirely on what the contents of the file are.

tarleb · 2020-03-04T20:44:54Z

I see four options right now:

Revert to pandoc as entrypoint. Pro: simple and straight-forward. Con: You have to know about the entrypoint setting in order to change it; caused confusion before; might break more recent setups relying on the new script.
Unset entrypoint. Pros: very simple, makes using pandoc-citeproc and pandoc equally easy, makes container behaves like many other standard containers. Cons: breaks almost all existing setups.
Update script as proposed in PR. Pros: least amount of code changed; existing workflows should remain unaffected. Cons: not robust, potentially bad for windows users.
3.5 Use bash instead of ash for docker-entrypoint.sh: pros and cons as for 3.
Try to make the entrypoint even smarter, e.g. by checking the first argument for known document file extensions. Pros: things should Just Work™ in 99% of all cases. Cons: additional complexity and potential rabbit hole.

All of these options are bad in their own way. I prefer 3, but 4 should be manageable, too.

zspitz · 2020-03-04T22:11:12Z

@svenevs

we can coach @zspitz to remove the executable bit from their .md files

I would note that I am pretty much entirely developing on Windows, where (IIUC) I have to do this through Git. Admittedly, it could be argued that anyone who is familiar with Docker is also familiar with Linux permissions; but that isn't true in my case -- I've gotten to the Docker container via GitHub Actions, and have virtually no prior experience with Docker.

svenevs · 2020-03-05T09:38:09Z

it could be argued that anyone who is familiar with Docker is also familiar with Linux permissions; but that isn't true in my case -- I've gotten to the Docker container via GitHub Actions, and have virtually no prior experience with Docker.

Yeah, I definitely don't think you're alone in that! Permissions are important and all, but I think we can do better. Here's what I was thinking of doing in pseudocode:

if -x command -v ${1}:
    type=get_file_type ${1}
    if type is not text (e.g., binary):
        # this would be the (probably exceedingly rare) case
        # of somebody passing through a precompiled binary
        use ${1} rather than push pandoc in front
    else
        # file type is text, but it could still be a script
        if first two chars are exactly `#!` (the shebang) \
            *and* -x command -v ${shebang of ${1}}:
            use ${1} directly
        else
            do pandoc $@
        fi
    fi
fi

Basically, the idea is see if it's executable. If it is, try and extract the shebang line and see if it is a usable command. The likelihood that anybody has a markdown document that starts with exactly #! and contains a meaningful interpreter is very small in my opinion. We also want to avoid any magic file extension hacks, because of things like people using README rather than README.md or something.

Still fleshing out the implementation, but wanted to solicit input. I wonder, for example, if we need to go the extra step of checking if the shebang line has a valid command (/bin/sh extracted from #!/bin/sh or #! /bin/sh) or if we can simply assume that the presence of exactly #! is enough to say "ok then it's a script".

P.S. In my previous post I mentioned /usr/bin/command. I think I'm going crazy, but I can't find that file anymore and haven't the slightest idea how it works. But the / behavior is documented in the manpages (apk add man man-pages and then man command).

tarleb · 2020-03-05T09:57:09Z

Sounds good! ~~Would it be sufficient to check "first characters are #!" and skip "type is not text"? That would save us a package, and I don't think binary/text gives that much information~~.
Edit: Ignore me. I need more coffee.

if [ "${1#-}" != "${1}" ] || \
   [ ! -x "$(command -v "${1}")" ] || \
   [ "$(head -c2 "${1}")" != '#!' ]
then
  set -- pandoc "$@"
fi

exec "$@"

maxheld83 · 2020-03-05T16:49:19Z

Not my call, but I'm in favor of 1)

Revert to pandoc as entrypoint. Pro: simple and straight-forward. Con: You have to know about the entrypoint setting in order to change it; caused confusion before; might break more recent setups relying on the new script.

My (limited) experience with images for popular tools (such as pandoc) has been a simple

ENTRYPOINT pandoc
CMD --help

So I've come to expect that.

I'm personally not much worried about users being confused by that, because that just means they need to understand how docker ENTRYPOINT and CMD interact, which is easily googleable and which is explained everywhere up and down the internet, so we can link to it.

My (limited) experience has been that the multiplicative complexity of "smart" solutions bites you in the behind in the medium and long term.

tarleb · 2020-03-05T21:42:56Z

@svenevs and I had a quick chat about this and I think we now both agree with you @maxheld83.

alerque · 2020-04-20T12:47:11Z

Isn't there a simpler solution to keep the smarts and not loose the argument issue? Check if the arg is an existing file first, then if it begins with a dash, then if it is a valid shell command. The only possible argument that can be out of order is the input filename(s) which will exist as files at run-time.

maxheld83 · 2020-04-20T14:36:52Z

I am not super experienced with a whole bunch of other docker images, but the ones I have used such as the R images did not seem to feature this kind of cleverness, so I am used to overwriting the entrypoint if I need to.
Are there examples of mainstream images that include this kind of cleverness?
I think it would be good from a UX point to just do what most similar (ie. some CLI tool) images do.
Very possible that I am completely wrong here and have just used weird containers to come to this kind of expectation.

tarleb · 2020-04-20T20:07:06Z

We took inspiration (and most of the code for docker-entrypoint) from nodejs/docker-node. Other examples which do entrypoint magic of sorts:

and so on.

But there doesn't seem to be consensus about how the magic is supposed to work. E.g. some projects require users to specify an executable as argument, but run additional magic for some executable names, e.g. elasticsearch.

Other projects keep it simple: Python doesn't set an entrypoint at all, but just sets CMD ["python3"]. GCC sets neither ENTRYPOINT nor CMD.

It seems hard to identify a common theme here. In the end, there will always be users who will be surprised by our choices.

I tend to agree with @maxheld83 that simplicity trumps magic, especially now that we have even better documentation in the form of pandoc-action-examples. I'd either go for ENTRYPOINT ["/usr/bin/pandoc"] or CMD ["pandoc"].

svenevs · 2020-07-08T08:40:48Z

We are gearing up for an exciting new quasi-release that adds both ubuntu based images and crossref. This will be getting pushed out as 2.9.2.1 (2.10 will come along soon enough!). The plan:

pandoc/core:2.9.2.1 and pandoc/latex:2.9.2.1 will retain their original entrypoint (and latex 2019 version).
pandoc/crossref:2.9.2.1 (to be released soon) will have /usr/local/bin/pandoc.
All ubunutu 2.9.2.1 images will have /usr/local/bin/pandoc.

In other words, all new images being released will have the entrypoint reverted to /usr/local/bin/pandoc. Please update your usage to explicitly supply a non-pandoc --entrypoint if needed, by 2.10's release all images will have a /usr/local/bin/pandoc entrypoint again.

Isn't there a simpler solution to keep the smarts and not loose the argument issue?

Unfortunately, there is no robust enough solution to warrant its use given the kinds of errors and confusion it causes. It basically boils down to (a) pandoc is super flexible wrt argument ordering, and (b) the lack of a truly reliable way to check if $1 is intended to be a script or a file input to pandoc.

CC: @RicardoLinck (#105 interested party)

K4zuki · 2020-07-10T13:34:12Z

May I ask which feature(s) to be installed in which image(s); how ? cells will be filled?
I know one could combine latex and crossref by COPY feature in his Dockerfile so it is ok for me even if both are 'N'.

↓ variant \ feature →	pandoc	pandoc-crossref	LaTeX
*-core	Y	N	N
*-crossref	Y	Y	?
*-latex	Y	?	Y

tarleb · 2020-07-10T13:40:57Z

This is our current design:

↓ variant \ feature →	pandoc	pandoc-crossref	LaTeX
*-core	Y	N	N
*-crossref	Y	Y	N
*-latex	Y	Y	Y

It has the drawback that we can publish -latex images only if pandoc-crossref is available for the underlying pandoc version -- which, for example, is not yet the case for 2.10.

K4zuki · 2020-07-11T02:27:10Z

@tarleb thanks for answering. I would wait for crossref release for every pandoc release (I have been doing so) so It is still fine for me.

need to preserve docker entry point and tex version on these images see #78

alerque · 2021-02-21T06:52:40Z

Just for the record since this discussion seems to have wandered into other areas—

Regarding the issue of entry points, since my original comment I have come around to the OP and @svenevs' way of thinking such as here.

So called "smart" entry points are more trouble than they are worth. I started using them on several other projects and included a lot more smarts. It was pretty snazzy, but the edge cases just kept cropping up. I've since reverted to plain hard coded entry points that are clearly the main purpose of the container and left all the other uses to specify their own entry point. This has cleared up a lot of confusion, caused less unexpected CI errors, etc.

I would support the same move for these Pandoc containers: pandoc should be the entry point with nothing fancier. Specifying --entrypoint=sh makes it quite clear what the syntax of arguments should be.

Fixes #78.

svenevs · 2021-02-21T07:36:40Z

Thanks @alerque it seems we fell down on some book-keeping, see #136 I think we can close this issue once that's gone. The entrypoint is currently /usr/local/bin/pandoc.

Fixes #78.

maxheld83 mentioned this issue Feb 28, 2020

Add note about argument order pandoc/pandoc-action-example#3

Closed

maxheld83 mentioned this issue Apr 20, 2020

Install additional packages to the docker file pandoc/pandoc-action-example#6

Closed

This was referenced May 30, 2020

Run first command only if it is executable #79

Closed

docker-entrypoint.sh misbehaves when input files are under subdirectories #86

Closed

tarleb added this to To do in Unify distro images May 30, 2020

tarleb mentioned this issue Jul 6, 2020

pandoc/core:2.9.2.1 docker-entrypoint.sh not found #105

Closed

svenevs added this to To do in 2.10 via automation Jul 8, 2020

svenevs changed the title ~~revert smarter entrypoint script~~ revert smarter entrypoint script by 2.10 Jul 8, 2020

svenevs pinned this issue Jul 8, 2020

svenevs added a commit that referenced this issue Jul 22, 2020

stashing planned hacks for pandoc/{core,latex}:2.9.2.1 post xref

8c45017

need to preserve docker entry point and tex version on these images see #78

svenevs mentioned this issue Jul 28, 2020

openBinaryFile: does not exist (No such file or directory) #116

Closed

svenevs added a commit that referenced this issue Feb 21, 2021

remove no longer used docker-entrypoint.sh

902e2a2

Fixes #78.

svenevs mentioned this issue Feb 21, 2021

remove no longer used docker-entrypoint.sh #136

Merged

alerque closed this as completed in #136 Feb 21, 2021

Unify distro images automation moved this from To do to Done Feb 21, 2021

alerque pushed a commit that referenced this issue Feb 21, 2021

Remove no longer used docker-entrypoint.sh (#136)

0abc049

Fixes #78.

2.10 automation moved this from To do to Done Feb 21, 2021

svenevs unpinned this issue Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revert smarter entrypoint script by 2.10 #78

revert smarter entrypoint script by 2.10 #78

maxheld83 commented Feb 28, 2020 •

edited by svenevs

svenevs commented Feb 28, 2020

tarleb commented Mar 2, 2020 •

edited

tarleb commented Mar 2, 2020

svenevs commented Mar 4, 2020

tarleb commented Mar 4, 2020

zspitz commented Mar 4, 2020 •

edited

svenevs commented Mar 5, 2020

tarleb commented Mar 5, 2020 •

edited

maxheld83 commented Mar 5, 2020

tarleb commented Mar 5, 2020

alerque commented Apr 20, 2020

maxheld83 commented Apr 20, 2020

tarleb commented Apr 20, 2020

svenevs commented Jul 8, 2020

K4zuki commented Jul 10, 2020

tarleb commented Jul 10, 2020

K4zuki commented Jul 11, 2020

alerque commented Feb 21, 2021 •

edited

svenevs commented Feb 21, 2021

revert smarter entrypoint script by 2.10 #78

revert smarter entrypoint script by 2.10 #78

Comments

maxheld83 commented Feb 28, 2020 • edited by svenevs

svenevs commented Feb 28, 2020

tarleb commented Mar 2, 2020 • edited

tarleb commented Mar 2, 2020

svenevs commented Mar 4, 2020

tarleb commented Mar 4, 2020

zspitz commented Mar 4, 2020 • edited

svenevs commented Mar 5, 2020

tarleb commented Mar 5, 2020 • edited

maxheld83 commented Mar 5, 2020

tarleb commented Mar 5, 2020

alerque commented Apr 20, 2020

maxheld83 commented Apr 20, 2020

tarleb commented Apr 20, 2020

svenevs commented Jul 8, 2020

K4zuki commented Jul 10, 2020

tarleb commented Jul 10, 2020

K4zuki commented Jul 11, 2020

alerque commented Feb 21, 2021 • edited

svenevs commented Feb 21, 2021

maxheld83 commented Feb 28, 2020 •

edited by svenevs

tarleb commented Mar 2, 2020 •

edited

zspitz commented Mar 4, 2020 •

edited

tarleb commented Mar 5, 2020 •

edited

alerque commented Feb 21, 2021 •

edited