-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hygienic alternatives to Python virtual environment scripts #236
Conversation
I would prefer that we fix the seemingly small bugs in the current scripts rather than writing everything from scratch. Would these problems be solved by changing venvcreate/venvactivate to use private variables (eg |
@ShaneHarvey I considered replacement as improvement, but due to the nature of the issues being addressed, I am concerned that doing so may have an unbounded scope of potentially breaking changes to existing users of these scripts. As the version of DET scripts being used by Drivers are often not pinned to a given commit (just a By providing these scripts separately, Drivers will be given the opportunity to gradually test and migrate to using these hygienic scripts before further behavior-changing modifications are introduced to DET scripts (I am eyeing the |
If this was a change that we knew would break all driver testing I might agree with this approach but I don't think that's the position we're in. The minor tweaks to avoid shadowing or unintentionally reusing $PYTHON seem like small fixes that have low chance of breaking anyone. The plus side of my suggestion is that in the best case we fix these bugs and drivers don't need to change anything at all. The normal approach we'd use is to fix the bug, test it in 1 or 2 drivers, merge, and then send an announcement in #drivers that teams might see python env setup failures in CSFLE/MONGODB-AWS test suites with a suggestion of how to fix it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand the concerns of Shane, though I don't share them. It seems like a step in the right direction to make our builds more hygienic and robust and this approach is superior to incremental bug fixes to the existing scripts. That said, I'll wait on approving to see what others might have to say.
.evergreen/find-python3.sh
Outdated
|
||
local -r bin="${1:?'is_venv_capable requires a name or path of a python binary to test'}" | ||
|
||
# Use a temporary directory to avoid polluting the caller's enviornment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Use a temporary directory to avoid polluting the caller's enviornment. | |
# Use a temporary directory to avoid polluting the caller's environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
.evergreen/find-python3.sh
Outdated
|
||
local -r bin="${1:?'is_virtualenv_capable requires a name or path of a python binary to test'}" | ||
|
||
# Use a temporary directory to avoid polluting the caller's enviornment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Use a temporary directory to avoid polluting the caller's enviornment. | |
# Use a temporary directory to avoid polluting the caller's environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
.evergreen/find-python3.sh
Outdated
} | ||
|
||
# /opt/mongodbtoolchain/vX/bin/python | ||
append_bins "/opt/mongodbtoolchain" "v[0-9]*" "bin/python3" "bin/python" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be reconsidered. We use the toolchain in various places and those uses have the potential to become problematic. Specifically, the toolchain is intended for the server project, so it can break without warning. It may be that the way this script checks for presence and capability of Python is resilient against unexpected changes to the toolchain. If that's the case then a comment to that effect might be good here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prioritization of toolchain over system binaries was motivated by older tests where the selection of a binary under /opt/python
failed where /opt/mongodbtoolchain
did not. Revisiting the tests, this does not appear to be the case for any of the currently supported variants. Given this is the case, I think I am willing to revert the ordering to the "prefer system binaries first, toolchain binaries last" order I had initially preferred.
Regarding resiliance, the is_venv_capable
and is_virtualenv_capable
tests ensure that regardless of the state of the toolchain, it will correctly deduce whether the associated binaries can create a virtual environment. Unless the toolchain breaks the currently established pattern of placing Python binaries under /opt/mongodbtoolchain/vX/bin
for a given value of X
(currently 2 through 4, I believe), this script should continue to behave as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon further testing, I rediscovered that prioritizing system binaries (e.g. plain python
or python3
) may lead to cryptography
installation failures on rhel81-power8-small. In response, I have demoted the priority of system Python binaries to be tested after all other "explicitly-managed" Python binaries have been tested first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These scripts LGTM; cool bash coding, thanks @eramongodb 🧑🔧 .
I do wonder if we could just replace the old scripts with these new scripts as part of this PR, though. Doing it in separate PRs just seems like prolonging the possible breaking of drivers' reliance on the scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM.
If this was a change that we knew would break all driver testing I might agree with this approach but I don't think that's the position we're in. The minor tweaks to avoid shadowing or unintentionally reusing $PYTHON seem like small fixes that have low chance of breaking anyone.
Testing the activate-kmstlsvenv.sh
script with Go suggests there are required changes to update Evergreen config to explicitly use bash:
[2022/10/10 17:17:53.902] sh: 17: ./activate-kmstlsvenv.sh: [[: not found
[2022/10/10 17:17:53.902] sh: 52: ../find-python3.sh: Syntax error: "(" unexpected
[2022/10/10 17:17:53.903] Command ''shell.exec' in "start-cse-servers"' failed: shell script encountered problem: exit code 2.
I suggest filing a DRIVERS ticket to request drivers use the new scripts. Once all drivers have implemented the required changes, the existing scripts can be safely updated.
This was the point I was trying to make. I don't think we should make all teams update their scripts to use this new approach. Instead we can make the small $PYTHON bug fixes to the existing scripts and teams (most likely) wouldn't need to do any work. I do like the modern bash and nice error messages though. |
I see. I misinterpreted the point. I am not opposed to the small fixes in the existing scripts. But IMO this PR is still an overall improvement. I do not consider it a requirement to update the existing scripts to merge. |
This may be somewhat feasible for These scripts were motivated by external discussions suggesting that Evergreen supports Bash on all distros, even on Windows via Cygwin (see also the Running ShellCheck on
Similarly, running ShellCheck on
The However, the Given the primary goal of this PR is to avoid variable leakage, being unable to use Bash in full due to semi-backwards-compatibility requirements with current use of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we don't need to use bash local variables, for example we can prefix the vars to avoid clashing in practice:
venvcreate () {
+ __PYTHON="$1"
- PYTHON="$1"
Another suitable option would be to remove the var altogether and use "$1" instead of "$PYTHON".
@@ -0,0 +1,29 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't shebangs meaningless (ignored) for scripts that are intended to be sourced? Eg changing this to #!/usr/bin/env sh
would have no effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is twofold:
- To document that the scripts are designed to be used in Bash shells.
- To indicate to ShellCheck that the script should be analyzed as a Bash script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha thanks for explaining that.
.evergreen/find-python3.sh
Outdated
trap 'rm -rf "$tmp"' EXIT | ||
|
||
# Evaluate the result of this function. | ||
"$bin" -m venv "$tmp" 1>&2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall that in some broken environments python -m venv/virtualenv can succeed but the activate script fails. But no need to change anything now. Let's wait until we actually encounter such a failure before we try to workaround it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fairly trivial to add activation to the test, so I do not mind including it. However, I did not uncover during testing any distros where activation failed when venv creation did not.
More importantly, your comment made me realize that venvcreate
needs to account for possible failure during venv creation + activation when falling back to using virtualenv to match the capability detection logic. I have updated it accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can prefix the vars to avoid clashing in practice
This is not a proper solution, as it does not address the issue of variable leakage. This just changes the name of the variable being leaked. The environment will still be polluted by variables. This is not scalable.
Another suitable option would be to remove the var altogether and use "$1" instead of "$PYTHON".
This is also not a scalable solution. Forbidding the use of named variables in non-subshell functions in order to continue supporting the Bourne shell (where local
is not available) does not align with the goals of this PR, which as stated earlier, "is to improve the robustness and hygiene of DET shell scripts by taking full advantage of the features provided by Bash".
@@ -0,0 +1,29 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is twofold:
- To document that the scripts are designed to be used in Bash shells.
- To indicate to ShellCheck that the script should be analyzed as a Bash script.
Latest changes are verified by this patch. The set of distros tested are (best-effort) all non-EOL'd distros as currently listed in the MongoDB Platform Roadmap. The additions to the Evergreen config are omitted from this PR, but may be included if desirable. The root cause of the test failure on macos-1012 is being tracked by BUILD-16106. |
.evergreen/find-python3.sh
Outdated
local -r version_str="$(perl -lne 'print $1 if m/.*([0-9]+\.[0-9]+\.[0-9]*)/' -- <(printf "%s" "$version_output"))" | ||
|
||
# Evaluate 3.0.0 <= x.y.z. | ||
sort -CV -- <(printf "%s\n%s\n" "3.0.0" "$version_str") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this logic can be implemented much simpler in python. For example:
if python -c "import sys; exit(sys.version_info[0] < 3)"; then
echo "python 3"
else
echo "python 2"
fi
this would avoid the dependency on "sort".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified this suggestion works as suggested. Thank you!
However, it does not completely remove the GNU Coreutils sort
dependency as it is used below to prioritize newer MongoDB Toolchain and Python versions under /opt
. It seems like macos-1012 is circumstantially passing due to system python3
satisfying the requirements despite sort
command failures when searching for binaries under /opt
.
Unless there are any remaining concerns, I think I am comfortable merging this PR given all distros currently supported according to the MongoDB Platform Roadmap are currently passing despite BUILD-16106. The plan is to then create a DRIVERS Task instructing Drivers to replace use of |
DRIVERS-2497 has been created. |
Description
This PR is the first of several PRs whose goal is to improve the robustness and hygiene of DET shell scripts by taking full advantage of the features provided by Bash.
This PR is focused on providing improved alternatives to scripts that pertain to Python virtual environment creation and activation. This PR does not change the behavior of any existing scripts. Followup PRs will attempt to incorporate these new scripts into existing scripts, which may change their behavior.
find-python3.sh
This file defines four functions:
The primary purpose of this file is to provide
find_python3
, which is implemented in terms of the other three functions.find_python3
aims to serve as a reliable and comprehensive utility function to replace the numerous instances of "find a python3 binary capable of creating a virtual environment" conditional logic found in many Evergreen scripts, both in the DET repository (e.g. activate_venv.sh) and in many other Drivers repositories. All demonstrate discrepancies from one another and are often not comprehensive in their handling of various variant-specific idiosyncracies.Most notably,
find_python3
and other functions provided byfind-python3.sh
ensure no variables used within the functions are leaked into the parent shell. The name of the resulting python3 binary (and only the name) is printed tostdout
on success; all diagnostic and error messages are printed tostderr
and may be silenced by redirecting2>/dev/null
. This allows for convenient patterns such as the following:This is unlike, for example, the current utils.sh, which "leaks" the
$PYTHON
variable upon invocation ofvenvcreate
and can inadvertently affect the behavior of other scripts if one is not careful, such as when subsequently using set-temp-creds.sh:This error can be worked-around by explicitly setting
PYTHON=python
afteractivate_venv.sh
to ensure the venv python is used instead of any prior value of$PYTHON
. However, it is preferable if such variable leaks did not occur in the first place.venv-utils.sh
This file is a hygienic alternative to utils.sh.
As described above, the
$PYTHON
variable (among others) is leaked by thevenvcreate
andvenvactivate
functions defined byutils.sh
. Furthermore, their behavior is affected by the$OS
environment variable parameter, which is often set and used by other Evergreen scripts in manners inconsistent to whatutils.sh
expects. This can have the effect of indirectly breaking the behavior ofutils.sh
functions. Lastly,venvcreate
is defined toexit 1
on error (terminates the shell), which limits error handling options by the user.The
venvcreate
andvenvactivate
functions provided byvenv-utils.sh
resolves each of these issues. All parameters to each function is explicitly documented and asserted. The function's behavior depends solely on the provided arguments with no dependency on environment variables. On error, each function immediately returns a non-zero exit code to allow callers to handle errors as appropriate to their needs.activate-kmstlsvenv.sh
This file is a hygienic alternative to activate_venv.sh.
As with
activate_venv.sh
, this script is meant to be invoked by the caller to create and activate thekmstlsvenv
virtual environment. It is implemented in terms of bothfind-python3.sh
andvenv-utils.sh
to avoid the variable leakage problems described above. Furthermore, on error, it immediately returns a non-zero exit code instead of continuing execution as is currently done byactivate_venv.sh
.ShellCheck
All three files were analyzed by ShellCheck using the command
shellcheck -x ./path/to/file.sh
from the root directory of the DET repository. Automated ShellCheck integration is outside the scope of this PR, but hopefully in the near future more DET scripts will be analyzed by ShellCheck to improve their quality.