Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revert smarter entrypoint script by 2.10 #78

Closed
maxheld83 opened this issue Feb 28, 2020 · 19 comments · Fixed by #136
Closed

revert smarter entrypoint script by 2.10 #78

maxheld83 opened this issue Feb 28, 2020 · 19 comments · Fixed by #136

Comments

@maxheld83
Copy link
Contributor

maxheld83 commented Feb 28, 2020

Edit from future: see #78 (comment) for reversion deadlines / strategy.

The smarter entrypoint script introduced in 51cb2b6 closing #54 may add some unwelcome complexity.
For example (maybe the only case), @zspitz noticed that the entrypoint script won't let pandoc containers accept pandoc arguments and input files in arbitrary orders (see pandoc/pandoc-action-example#3 and jgm/pandoc-website#38).

@tarleb suggested to improve the auto detection, but I'm wondering whether it would not just make more sense to revert to something simple:

ENTRYPOINT ["pandoc"]
CMD ["--help"]

I know a lot of thought went into #54 so feel free to close this.

My expectation was actually that there'd be no such cleverness, and it surprised me that this could be a source of problems; much like @zspitz I would've never guessed it.

In #54, @tarleb mentioned that:

New users are repeatedly irritated by our use of ENTRYPOINT.

But isn't ENTRYPOINT ["pandoc"] the default for containers?
And users can always override with docker run --entrypoint=/bin/bash pandoc ls, right?

I'm also not sure the helper script in the dockerfile docs apply here:

The ENTRYPOINT instruction can also be used in combination with a helper script, allowing it to function in a similar way to the command above, even when starting the tool may require more than one step.

(emphasis added).

But this (more than one step) isn't the case for pandoc, right?

@svenevs
Copy link
Member

svenevs commented Feb 28, 2020

This is somewhat confusing ... I'm both able to and not able to reproduce this.

$ docker run --rm --volume "$(pwd):/data" pandoc/latex:latest test.txt -o test.html
$ docker run --rm --volume "$(pwd):/data" pandoc/latex:latest test.txt -o test.pdf

Works exactly as expected. But if I do

$ docker run --rm --volume "$(pwd):/data" pandoc/core:latest docs/patient-guide.md -o output.pdf -t html
/usr/local/bin/docker-entrypoint.sh: exec: line 11: docs/patient-guide.md: Permission denied

It seems like the ash shell's command -v does something dumb with paths that have a slash in them?

# (in docker container)
$ command -v docs/patient-guide.md
docs/patient-guide.md
$ command -v test.txt

And apparently it's not a builtin, but a script floating around:

#!/bin/sh
# $FreeBSD: src/usr.bin/alias/generic.sh,v 1.2 2005/10/24 22:32:19 cperciva Exp $
# This file is in the public domain.
builtin `echo ${0##*/} | tr \[:upper:] \[:lower:]` ${1+"$@"}
# file: /usr/bin/command

I'm not sure what to make of this.

@tarleb
Copy link
Member

tarleb commented Mar 2, 2020

Wow, that's pretty awful. How about we switch to which? Not exactly the same, but arguably closer to what users might expect.

Some more googling shows that it's a bad idea

@tarleb
Copy link
Member

tarleb commented Mar 2, 2020

I created a PR which double-checks that the value returned by command -v is actually executable. That change should make unexpected behavior much less likely.

@svenevs
Copy link
Member

svenevs commented Mar 4, 2020

I see that @maxheld83 approves, but to be honest the more I think about it the more I think this isn't robust enough. @zspitz reports in pandoc/pandoc-action-example#3 (comment)

fails with an error:

Syntax error: unexpected newline

This is subtle, but fundamentally different than the error I reported:

/usr/local/bin/docker-entrypoint.sh: exec: line 11: docs/patient-guide.md: Permission denied

Why I'm worried:

$ ls -al tidepool-hebrew-docs/docs/*.md
-rwxrwxr-x. 1 sven sven 6.3K Mar  4 13:46 tidepool-hebrew-docs/docs/patient-guide.md*
-rwxrwxr-x. 1 sven sven   74 Mar  4 13:46 tidepool-hebrew-docs/docs/test.md*

The executable bit on those files is set, meaning that the proposed patch will produce the exact same error message. We either need to be smarter, or dumber. In this particular scenario, we can coach @zspitz to remove the executable bit from their .md files, but I anticipate this same issue will come up again and unfortunately the errors associated with this are not going to be the same and may cause a lot of confusion -- the error message depends entirely on what the contents of the file are.

@tarleb
Copy link
Member

tarleb commented Mar 4, 2020

I see four options right now:

  1. Revert to pandoc as entrypoint. Pro: simple and straight-forward. Con: You have to know about the entrypoint setting in order to change it; caused confusion before; might break more recent setups relying on the new script.

  2. Unset entrypoint. Pros: very simple, makes using pandoc-citeproc and pandoc equally easy, makes container behaves like many other standard containers. Cons: breaks almost all existing setups.

  3. Update script as proposed in PR. Pros: least amount of code changed; existing workflows should remain unaffected. Cons: not robust, potentially bad for windows users.
    3.5 Use bash instead of ash for docker-entrypoint.sh: pros and cons as for 3.

  4. Try to make the entrypoint even smarter, e.g. by checking the first argument for known document file extensions. Pros: things should Just Work™ in 99% of all cases. Cons: additional complexity and potential rabbit hole.

All of these options are bad in their own way. I prefer 3, but 4 should be manageable, too.

@zspitz
Copy link

zspitz commented Mar 4, 2020

@svenevs

we can coach @zspitz to remove the executable bit from their .md files

I would note that I am pretty much entirely developing on Windows, where (IIUC) I have to do this through Git. Admittedly, it could be argued that anyone who is familiar with Docker is also familiar with Linux permissions; but that isn't true in my case -- I've gotten to the Docker container via GitHub Actions, and have virtually no prior experience with Docker.

@svenevs
Copy link
Member

svenevs commented Mar 5, 2020

it could be argued that anyone who is familiar with Docker is also familiar with Linux permissions; but that isn't true in my case -- I've gotten to the Docker container via GitHub Actions, and have virtually no prior experience with Docker.

Yeah, I definitely don't think you're alone in that! Permissions are important and all, but I think we can do better. Here's what I was thinking of doing in pseudocode:

if -x command -v ${1}:
    type=get_file_type ${1}
    if type is not text (e.g., binary):
        # this would be the (probably exceedingly rare) case
        # of somebody passing through a precompiled binary
        use ${1} rather than push pandoc in front
    else
        # file type is text, but it could still be a script
        if first two chars are exactly `#!` (the shebang) \
            *and* -x command -v ${shebang of ${1}}:
            use ${1} directly
        else
            do pandoc $@
        fi
    fi
fi

Basically, the idea is see if it's executable. If it is, try and extract the shebang line and see if it is a usable command. The likelihood that anybody has a markdown document that starts with exactly #! and contains a meaningful interpreter is very small in my opinion. We also want to avoid any magic file extension hacks, because of things like people using README rather than README.md or something.

Still fleshing out the implementation, but wanted to solicit input. I wonder, for example, if we need to go the extra step of checking if the shebang line has a valid command (/bin/sh extracted from #!/bin/sh or #! /bin/sh) or if we can simply assume that the presence of exactly #! is enough to say "ok then it's a script".

P.S. In my previous post I mentioned /usr/bin/command. I think I'm going crazy, but I can't find that file anymore and haven't the slightest idea how it works. But the / behavior is documented in the manpages (apk add man man-pages and then man command).

@tarleb
Copy link
Member

tarleb commented Mar 5, 2020

Sounds good! Would it be sufficient to check "first characters are #!" and skip "type is not text"? That would save us a package, and I don't think binary/text gives that much information.
Edit: Ignore me. I need more coffee.

if [ "${1#-}" != "${1}" ] || \
   [ ! -x "$(command -v "${1}")" ] || \
   [ "$(head -c2 "${1}")" != '#!' ]
then
  set -- pandoc "$@"
fi

exec "$@"

@maxheld83
Copy link
Contributor Author

Not my call, but I'm in favor of 1)

  1. Revert to pandoc as entrypoint. Pro: simple and straight-forward. Con: You have to know about the entrypoint setting in order to change it; caused confusion before; might break more recent setups relying on the new script.

My (limited) experience with images for popular tools (such as pandoc) has been a simple

ENTRYPOINT pandoc
CMD --help

So I've come to expect that.

I'm personally not much worried about users being confused by that, because that just means they need to understand how docker ENTRYPOINT and CMD interact, which is easily googleable and which is explained everywhere up and down the internet, so we can link to it.

My (limited) experience has been that the multiplicative complexity of "smart" solutions bites you in the behind in the medium and long term.

@tarleb
Copy link
Member

tarleb commented Mar 5, 2020

@svenevs and I had a quick chat about this and I think we now both agree with you @maxheld83.

@alerque
Copy link
Collaborator

alerque commented Apr 20, 2020

Isn't there a simpler solution to keep the smarts and not loose the argument issue? Check if the arg is an existing file first, then if it begins with a dash, then if it is a valid shell command. The only possible argument that can be out of order is the input filename(s) which will exist as files at run-time.

@maxheld83
Copy link
Contributor Author

I am not super experienced with a whole bunch of other docker images, but the ones I have used such as the R images did not seem to feature this kind of cleverness, so I am used to overwriting the entrypoint if I need to.
Are there examples of mainstream images that include this kind of cleverness?
I think it would be good from a UX point to just do what most similar (ie. some CLI tool) images do.
Very possible that I am completely wrong here and have just used weird containers to come to this kind of expectation.

@tarleb
Copy link
Member

tarleb commented Apr 20, 2020

We took inspiration (and most of the code for docker-entrypoint) from nodejs/docker-node. Other examples which do entrypoint magic of sorts:

and so on.

But there doesn't seem to be consensus about how the magic is supposed to work. E.g. some projects require users to specify an executable as argument, but run additional magic for some executable names, e.g. elasticsearch.

Other projects keep it simple: Python doesn't set an entrypoint at all, but just sets CMD ["python3"]. GCC sets neither ENTRYPOINT nor CMD.

It seems hard to identify a common theme here. In the end, there will always be users who will be surprised by our choices.

I tend to agree with @maxheld83 that simplicity trumps magic, especially now that we have even better documentation in the form of pandoc-action-examples. I'd either go for ENTRYPOINT ["/usr/bin/pandoc"] or CMD ["pandoc"].

@svenevs
Copy link
Member

svenevs commented Jul 8, 2020

We are gearing up for an exciting new quasi-release that adds both ubuntu based images and crossref. This will be getting pushed out as 2.9.2.1 (2.10 will come along soon enough!). The plan:

  1. pandoc/core:2.9.2.1 and pandoc/latex:2.9.2.1 will retain their original entrypoint (and latex 2019 version).
  2. pandoc/crossref:2.9.2.1 (to be released soon) will have /usr/local/bin/pandoc.
  3. All ubunutu 2.9.2.1 images will have /usr/local/bin/pandoc.

In other words, all new images being released will have the entrypoint reverted to /usr/local/bin/pandoc. Please update your usage to explicitly supply a non-pandoc --entrypoint if needed, by 2.10's release all images will have a /usr/local/bin/pandoc entrypoint again.

Isn't there a simpler solution to keep the smarts and not loose the argument issue?

Unfortunately, there is no robust enough solution to warrant its use given the kinds of errors and confusion it causes. It basically boils down to (a) pandoc is super flexible wrt argument ordering, and (b) the lack of a truly reliable way to check if $1 is intended to be a script or a file input to pandoc.

CC: @RicardoLinck (#105 interested party)

@svenevs svenevs changed the title revert smarter entrypoint script revert smarter entrypoint script by 2.10 Jul 8, 2020
@svenevs svenevs pinned this issue Jul 8, 2020
@K4zuki
Copy link
Contributor

K4zuki commented Jul 10, 2020

May I ask which feature(s) to be installed in which image(s); how ? cells will be filled?
I know one could combine latex and crossref by COPY feature in his Dockerfile so it is ok for me even if both are 'N'.

↓ variant \ feature → pandoc pandoc-crossref LaTeX
*-core Y N N
*-crossref Y Y ?
*-latex Y ? Y

@tarleb
Copy link
Member

tarleb commented Jul 10, 2020

This is our current design:

↓ variant \ feature → pandoc pandoc-crossref LaTeX
*-core Y N N
*-crossref Y Y N
*-latex Y Y Y

It has the drawback that we can publish -latex images only if pandoc-crossref is available for the underlying pandoc version -- which, for example, is not yet the case for 2.10.

@K4zuki
Copy link
Contributor

K4zuki commented Jul 11, 2020

@tarleb thanks for answering. I would wait for crossref release for every pandoc release (I have been doing so) so It is still fine for me.

svenevs added a commit that referenced this issue Jul 22, 2020
need to preserve docker entry point and tex version on these images
see #78
@alerque
Copy link
Collaborator

alerque commented Feb 21, 2021

Just for the record since this discussion seems to have wandered into other areas—

Regarding the issue of entry points, since my original comment I have come around to the OP and @svenevs' way of thinking such as here.

So called "smart" entry points are more trouble than they are worth. I started using them on several other projects and included a lot more smarts. It was pretty snazzy, but the edge cases just kept cropping up. I've since reverted to plain hard coded entry points that are clearly the main purpose of the container and left all the other uses to specify their own entry point. This has cleared up a lot of confusion, caused less unexpected CI errors, etc.

I would support the same move for these Pandoc containers: pandoc should be the entry point with nothing fancier. Specifying --entrypoint=sh makes it quite clear what the syntax of arguments should be.

svenevs added a commit that referenced this issue Feb 21, 2021
@svenevs
Copy link
Member

svenevs commented Feb 21, 2021

Thanks @alerque it seems we fell down on some book-keeping, see #136 I think we can close this issue once that's gone. The entrypoint is currently /usr/local/bin/pandoc.

Unify distro images automation moved this from To do to Done Feb 21, 2021
alerque pushed a commit that referenced this issue Feb 21, 2021
2.10 automation moved this from To do to Done Feb 21, 2021
@svenevs svenevs unpinned this issue Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
2.10
  
Done
Development

Successfully merging a pull request may close this issue.

6 participants