Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden shell scripts with improved robustness & verbosity #233

Merged
merged 10 commits into from Jun 27, 2019

Conversation

Projects
None yet
3 participants
@dhimmel
Copy link
Member

commented Jun 10, 2019

Writing good shell scripts that provide useful debugging information seems to be challenging. This PR attempts to adopt best practices as discussed here.

@dongbohu @michaelmhoffman @agitter feel free to suggest any other ways our shell scripts could be improved.

@michaelmhoffman

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2019

This is the template I use for shell scripts:

#!/usr/bin/env bash

## script.sh: <short description here>

## Copyright 2019 Michael M. Hoffman <michael.hoffman@utoronto.ca>

set -o nounset -o pipefail -o errexit

if [[ $# != 0 || "${1:-}" == "--help" || "${1:-}" == "-h" ]]; then
    echo usage: "$0"
    exit 2
fi

Rationale/discussion:

  1. I believe there are several bashisms in this script already. I think it's fine to use bashisms because it is hard to imagine running this in an environment where bash isn't already there. But if so you should specify that it is bash at the top. (Also use bash build/script.sh instead of sh build/script.sh because the latter isn't going to work for some versions of sh.)

  2. I recommend set -o pipefail because it will catch some things you want to be caught by set -o errexit but are masked.

  3. Checking arguments and providing usage information is a good idea too.

  4. I do not recommend set -x/set -o xtrace. Producing that level of noise by default violates best practices of Unix command-line interfaces, and leads to alert fatigue and people ignoring important information. For troubleshooting you can always run bash -x script.sh. As I use them, set -o nounset -o pipefail -o errexit are really failsafes for edge cases a programmer didn't consider rather than an expected path. You can also use trap ... ERR or trap ... EXIT to do something additional in case of unexpected failure (like suggesting running the script with bash -x).

Additional comments on shell style that jump out at me:

  1. You used ${...} in this patch in many places where you probably meant to use $(...), I don't think it will work in its current form, and I'm surprised it passed CI.

  2. It is good practice to almost always use double-quotes around parameter expansion to avoid any problems with weird characters. For example --bibliography=$BIBLIOGRAPHY_PATH --> --bibliography="$BIBLIOGRAPHY_PATH"

  3. ...PATH is usually used for a colon-separated list of directories, not a single file or directory

  4. You may want some of these variables to be set only in cases where the user hasn't set and exported them. You can do that with a formulation like TZ="${TZ:-Etc/UTC}". I find overriding user settings of TZ and LC_ALL rather aggressive. If the latter is only to change Python settings you probably want to use PYTHONIOENCODING instead or even better fix it in the Python script itself.

  5. I would advise adding shellcheck to your editor and your CI workflow which would catch some of the errors above.

  6. Debug info should go to stderr, not stdout. You want to use echo >&2 "...". If the message isn't hard-coded and you are doing any shell expansion, printf is recommended instead.

  7. You don't need ["${BUILD_PDF:-true}" != "false"], just ["${BUILD_PDF:-}" != "false"]. Better yet, the usual convention is to test [ "${BUILD_PDF:-}" ] which will let a user set BUILD_PDF to anything to mean true and empty or unset to mean false. Using a command-line argument is a more expected interface in most cases though.

You may also find the Hoffman Lab application command-line user-interface checklist helpful.

@dhimmel

This comment has been minimized.

Copy link
Member Author

commented Jun 11, 2019

Thanks @michaelmhoffman for the excellent feedback and shell wisdom.

I'll work on integrating your points. In the longterm, it is not clear that we will continue managing the build with a shell script, rather than say migrating the pipeline into Python. So given that consideration, I will probably avoid large changes like adding command-line arguments to the shell scripts... until at least we have a better idea of the ideal future architecture. However, many other suggestions are immediately actionable and I'm excited to implement them!

dhimmel added some commits Jun 11, 2019

@@ -56,7 +56,7 @@ pandoc --verbose \
DOCKER_EXISTS="$(command -v docker || true)"

# Create PDF output (unless BUILD_PDF environment variable equals "false")
if [ "${BUILD_PDF:-true}" != "false" ] && [ -z "$DOCKER_EXISTS" ]; then
if [ "$(BUILD_PDF:-true)" != "false" ] && [ -z "$DOCKER_EXISTS" ]; then

This comment has been minimized.

Copy link
@michaelmhoffman

michaelmhoffman Jun 11, 2019

Contributor

This should be parameter expansion (${BUILD_PDF:-true}) not command substitution ($(command -v docker)). Same below for some variables.

This comment has been minimized.

Copy link
@dhimmel

dhimmel Jun 11, 2019

Author Member

Ah that distinction was lost on me, and the root cause of all this back and forth incorrect usage!

dhimmel added some commits Jun 11, 2019

Revert to parameter expansion
Properly use command substitution and parameter expansion in the
appropriate places. They are not the same thing.
#233 (comment)
@agitter
Copy link
Member

left a comment

I'm leaving a few comments on the changes that have been made so far. I'm not much of a shell scripting expert so I defer to @michaelmhoffman's suggestions above. There are still a few others we could implement in this pull request or a subsequent one.


set -o errexit \
-o nounset \
-o xtrace

This comment has been minimized.

Copy link
@agitter

agitter Jun 23, 2019

Member

The suggestion to not set xtrace here and instead allow it optionally with bash -x script.sh makes sense to me.

This comment has been minimized.

Copy link
@dhimmel

dhimmel Jun 24, 2019

Author Member

Changed in da0b397 along with adding -o pipefail. From these GNU docs

The exit status of a pipeline is the exit status of the last command in the pipeline, unless the pipefail option is enabled (see The Set Builtin). If pipefail is enabled, the pipeline's return status is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands exit successfully. If the reserved word '!' precedes the pipeline, the exit status is the logical negation of the exit status as described above. The shell waits for all commands in the pipeline to terminate before returning a value.

So I am not sure why pipefail does not interfere with the error suppression in:

DOCKER_EXISTS="$(command -v docker || true)"

This comment has been minimized.

Copy link
@michaelmhoffman

michaelmhoffman Jun 24, 2019

Contributor

command -v docker || true is a list of commands, not a pipeline.

# Set options for extra caution & debugging
set -o errexit \
-o nounset \
-o xtrace

This comment has been minimized.

Copy link
@agitter

agitter Jun 23, 2019

Member

Same xtrace comment as above


# Set timezone used by Python for setting the manuscript's date
export TZ=Etc/UTC
# Default Python to read/write text files using UTF-8 encoding
export LC_ALL=en_US.UTF-8

# Generate reference information
echo "Retrieving and processing reference metadata"
>&2 echo "Retrieving and processing reference metadata"

This comment has been minimized.

Copy link
@agitter

agitter Jun 23, 2019

Member

I'm not accustomed to seeing >&2 at the beginning of the line, but it looks like there is not single convention for this.

This comment has been minimized.

Copy link
@michaelmhoffman

michaelmhoffman Jun 24, 2019

Contributor

I agree this seems rather odd to me but I did not comment on it earlier. More important is a consistent use of whatever convention you choose. My own preference in descending order would be:

echo >&2 "Retrieving and processing reference metadata"
echo "Retrieving and processing reference metadata" >&2
>&2 echo "Retrieving and processing reference metadata"

Normally I do redirections at the end of a command but with echo or printf it can get lost at the end.

@@ -54,8 +60,8 @@ pandoc --verbose \
DOCKER_EXISTS="$(command -v docker || true)"

# Create PDF output (unless BUILD_PDF environment variable equals "false")
if [ "$BUILD_PDF" != "false" ] && [ -z "$DOCKER_EXISTS" ]; then
echo "Exporting PDF manuscript using WeasyPrint"
if [ "${BUILD_PDF:-true}" != "false" ] && [ -z "$DOCKER_EXISTS" ]; then

This comment has been minimized.

Copy link
@agitter

agitter Jun 23, 2019

Member

Change to use michaelmhoffman's tip 11? #233 (comment)

This comment has been minimized.

Copy link
@dhimmel

dhimmel Jun 24, 2019

Author Member

You don't need ["${BUILD_PDF:-true}" != "false"], just ["${BUILD_PDF:-}" != "false"]. Better yet, the usual convention is to test [ "${BUILD_PDF:-}" ] which will let a user set BUILD_PDF to anything to mean true and empty or unset to mean false. Using a command-line argument is a more expected interface in most cases though.

I don't think [ "${BUILD_PDF:-}" ] is what we wan't here because it defaults to not building the PDF if BUILD_PDF is unset. However, we want PDF building unless explicitly disabled with BUILD_PDF=false. I also think we want a solution where BUILD_PDF=false and BUILD_PDF=true perform as written, such that they can be easily toggled in CI configs. We could do ["${BUILD_PDF:-}" != "false"], which saves a few characters but is a bit less readable for the non-expert.

This comment has been minimized.

Copy link
@michaelmhoffman

michaelmhoffman Jun 24, 2019

Contributor

You could call it SUPPRESS_PDF or better yet MANUBOT_SUPPRESS_PDF (or something semantically equivalent).

Another alternative is simply to set the variable upstream with something like:

BUILD_PDF="${BUILD_PDF:-true}"

Then you can just use "${BUILD_PDF}" from then on forward.

@dhimmel

This comment has been minimized.

Copy link
Member Author

commented Jun 24, 2019

Ready for review. I addressed several of @michaelmhoffman's suggestions, but not all of them. Some I'd like to address at a future time, but don't want to hold up this PR. Would be good for reviewers to take one more pass to make sure I haven't done anything incorrect.

@agitter
Copy link
Member

left a comment

I re-reviewed and did some light testing in my environment that does not have Docker. The build script worked as expected.

Two other recommendations that were not addressed in this pull request are:

You may want some of these variables to be set only in cases where the user hasn't set and exported them. You can do that with a formulation like TZ="${TZ:-Etc/UTC}". I find overriding user settings of TZ and LC_ALL rather aggressive. If the latter is only to change Python settings you probably want to use PYTHONIOENCODING instead or even better fix it in the Python script itself.

I would advise adding shellcheck to your editor and your CI workflow which would catch some of the errors above.

I recommend we merge and create new issues for these suggestions. The shellcheck comment raises a larger issue. The CI in this repository is being used to generate manuscripts. We don't have separate CI to test the build process and may want to consider that. We would not want those tests to propagate to downstream manuscripts though.

@dhimmel

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2019

I would advise adding shellcheck to your editor and your CI workflow which would catch some of the errors above.

For the record, I satisfied all shellcheck grievances besides quoting:

rootstock/ci/deploy.sh

Lines 19 to 20 in 2fd84f1

-K $encrypted_9befd6eddffe_key \
-iv $encrypted_9befd6eddffe_iv \

Did not modify those lines to avoid conflicts since manuscript instances change them.

You may want some of these variables to be set only in cases where the user hasn't set and exported them.

I started modifying this, but stopped because I remembered we had problems in the past and would want to make sure we're not introducing a bug regression. Also it's not entirely clear to me that we shouldn't always use UTC timezone for manuscripts. Anyways, just want to spend some more time learning how these variables and configurations work before blindly changing.

@dhimmel dhimmel merged commit ddb0288 into manubot:master Jun 27, 2019

1 check passed

Travis CI - Pull Request Build Passed
Details

dhimmel added a commit that referenced this pull request Jun 27, 2019

Harden shell scripts with improved robustness & verbosity (#233)
This build is based on
ddb0288.

This commit was created by the following Travis CI build and job:
https://travis-ci.com/manubot/rootstock/builds/117215168
https://travis-ci.com/manubot/rootstock/jobs/211660565

[ci skip]

The full commit message that triggered this build is copied below:

Harden shell scripts with improved robustness & verbosity (#233)

Merges #233

Hat tip to Michael Hoffman (https://hoffmanlab.org/) for feedback.

* Apply shell best practices
References:
https://kvz.io/blog/2013/11/21/bash-best-practices/
https://www.davidpashley.com/articles/writing-robust-shell-scripts/
#233 (comment)

* Specify bash (not just sh). Other shellcheck suggestions
Properly use command substitution and parameter expansion in the
appropriate places.

* echo diagnostic messages to stderr
Previously echo defaulted to stdout
https://stackoverflow.com/a/41766832/4651668
Placement of stderr redirect to: "echo >&2"

* Update bash options: xtrace specified by calling command

* Quote variable names

* deploy.sh: use Travis env vars for CI build URLs
These environment variables were added to Travis in late 2018
travis-ci/travis-ci#8935 (comment)

dhimmel added a commit that referenced this pull request Jun 27, 2019

Harden shell scripts with improved robustness & verbosity (#233)
This build is based on
ddb0288.

This commit was created by the following Travis CI build and job:
https://travis-ci.com/manubot/rootstock/builds/117215168
https://travis-ci.com/manubot/rootstock/jobs/211660565

[ci skip]

The full commit message that triggered this build is copied below:

Harden shell scripts with improved robustness & verbosity (#233)

Merges #233

Hat tip to Michael Hoffman (https://hoffmanlab.org/) for feedback.

* Apply shell best practices
References:
https://kvz.io/blog/2013/11/21/bash-best-practices/
https://www.davidpashley.com/articles/writing-robust-shell-scripts/
#233 (comment)

* Specify bash (not just sh). Other shellcheck suggestions
Properly use command substitution and parameter expansion in the
appropriate places.

* echo diagnostic messages to stderr
Previously echo defaulted to stdout
https://stackoverflow.com/a/41766832/4651668
Placement of stderr redirect to: "echo >&2"

* Update bash options: xtrace specified by calling command

* Quote variable names

* deploy.sh: use Travis env vars for CI build URLs
These environment variables were added to Travis in late 2018
travis-ci/travis-ci#8935 (comment)
@agitter

This comment has been minimized.

Copy link
Member

commented Jun 27, 2019

That reasoning make sense to me.

Your comment about regression testing is what I was hinting at. Because the manubot package and the Rootstock content are still evolving, we can't easily test that changes to the build process still generate identical manuscripts. However, is there a lighter form of testing we may eventually want?

The manubot package is already testing the process command so we don't need to test that here.

@dhimmel

This comment has been minimized.

Copy link
Member Author

commented Jul 1, 2019

However, is there a lighter form of testing we may eventually want?

I can think of two possibilities:

  1. Move more rootstock functionality to the manubot package where we have a testing framework. build.sh would probably be the hardest to migrate, but parts of deploy.sh could be part of the python library. Although testing deployment is also super difficult.

  2. Have a manuscript repository with extra tests that somehow updates whenever rootstock does. Or potentially switch from a clone based setup method to a template based on, as per #6 (comment). Then perhaps we could have extra tests in the template repo.

No immediate easy solutions come to mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.