Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRIVERS-2497 Fix paths on Cygwin and Python package dependencies #244

Merged
merged 15 commits into from Nov 16, 2022

Conversation

eramongodb
Copy link
Contributor

@eramongodb eramongodb commented Nov 10, 2022

Description

This PR is a followup to #236 and applies a critical patch to DRIVERS-2497 required by Windows distros.

This PR is verified by this patch.

Paths on Cygwin

The tests used to validate behavior in #236 did not account for the presence of pre commands which modifies the environment such that it does not accurately reflect the environment used by Drivers, such as modifying the $PATH variable to prefer binaries provided by $MONGODB_BINARIES such as mktemp. This hid Cygwin path conversions requirements by Python binaries on Windows when creating the virtual environment, demonstrated below on a windows-64-vs2017-large distro (virtualenv is used to obtain informative output; venv demonstrates similar behavior but without any output):

$ pwd
/home/Administrator

$ # As found by find_python3.
$ PYTHON="C:/python/Python310/python.exe"

$ # Command behaves as expected given a relative path.
$ # Note `dest=C:\cygwin\home\Administrator\venv` in output.
$ "$PYTHON" -m virtualenv -p "$PYTHON" venv
created virtual environment CPython3.10.2.final.0-64 in 5641ms
  creator CPython3Windows(dest=C:\cygwin\home\Administrator\venv, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=C:\Users\Administrator\AppData\Local\pypa\virtualenv)
    added seed packages: pip==22.2.2, setuptools==65.4.1, wheel==0.37.1
  activators BashActivator,BatchActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

$ find venv -mindepth 1 -maxdepth 1
venv/.gitignore
venv/Lib
venv/pyvenv.cfg
venv/Scripts

$ # As done by is_venv_capable and is_virtualenv_capable.
$ # /tmp/tmp.RukOr5qrGb
$ VENV="$(mktemp -d)"

$ # Command succeeds but places virtual environment in an unxpected location.
$ # Note `dest=C:\tmp\tmp.RukOr5qrGb` in output.
$ "$PYTHON" -m virtualenv -p "$PYTHON" "$VENV"
created virtual environment CPython3.10.2.final.0-64 in 1047ms
  creator CPython3Windows(dest=C:\tmp\tmp.RukOr5qrGb, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=C:\Users\Administrator\AppData\Local\pypa\virtualenv)
    added seed packages: pip==22.2.2, setuptools==65.4.1, wheel==0.37.1
  activators BashActivator,BatchActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

$ # Intended directory is unexpectedly empty despite successful command.
$ find "$VENV" -maxdepth 0 -empty
/tmp/tmp.RukOr5qrGb

A sanity check was added to the is_venv_capable and is_virtualenv_capable functions to ensure correct behavior, as well as an explicit check for the presence of an activation script to provide more informative error messages if one still cannot be found.

Seed Packages

In addition to ensuring venvcreate handles paths correctly on Cygwin, the venvcreate function was updated to ensure all three "seed" packages pip, setuptools, and wheel are consistently installed in the virtual environment, as default behavior is inconsistent depending on venv vs. virtualenv and their respective versions.

A drive-by fix to correctly pass -p "$bin" when using the virtualenv module was also applied.

The --no-cache-dir argument was removed due to lack of necessity.

The --system-site-packages argument was added to improve script performance.

Error handling of the venvcreate function was improved to ensure the virtual environment is only activated on success. deactivate is only possible/necessary if venvcreate was successful.

kmstlsvenv Packages

As a result of ensuring up-to-date pip, setuptools, and wheel packages in the virtual environment, some distros began to encounter issues with installing required packages for the kmstlsvenv virtual environment. I took this opportunity to strictly narrow down the scope and conditions when default behavior does not suffice for successful installation.

The actual packages required by kmstlsvenv scripts, boto3 and pykmip, are still pinned to ~=1.19.0 and ~=0.10.0 respectively. All additional, conditionally pinned packages are dependencies required by these two packages.

The greenlet package is conditionally pinned to <2.0 to avoid build failures on macos-1012.

The setuptools package is conditionally pinned to <65.0 to avoid build failures on windows-64-2016 (see BUILD-16233).

The cryptography package is conditionally pinned to <3.4 to avoid dependency on the presence of a Rust compiler when a cryptography wheel is not available. The associated conditions were narrowed down as much as possible to allow/encourage use of up-to-date packages whenever possible.

As with venvcreate, error handling of the activate_kmstlsvenv function was improved to ensure the virtual environment is only activated on success. deactivate is only possible/necessary if activate_kmstlsvenv was successful.

Copy link
Contributor

@rcsanchez97 rcsanchez97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

If my confusion is unfounded, then feel free to ignore it, otherwise consider clarifying the comment a bit.

.evergreen/csfle/activate-kmstlsvenv.sh Show resolved Hide resolved
"$bin" -m "$mod" --system-site-packages "$real_path" || continue
;;
virtualenv)
# -p: ensure correct Python binary is used by virtual environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is -p actually needed here? -p defaults to the current version of python so this seems redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is required, as some old versions of virtualenv do not correctly select the Python binary used to create the virtual environment. This is documented by this comment in the old utils.sh script, but I observed it to be an issue on more than just Debian 10 distros. I wanted to link to a relevant bug report, but could not find one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, can you add the comment from the old script? It's much more informative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. 👍

local -r real_path="$(cygpath -aw "$tmp")" || return
"$bin" -m venv "$real_path" || return
else
"$bin" -m venv "$tmp" || return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be refactored to avoid duplicating "$bin" -m venv "$tmp" || return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted for a dedicated real_path variable only when required, but I can refactor it to reduce duplication instead.


# Sanity check: on some environments (such as Cygwin) creation of the virtual
# environment may succeed but place the environment in an unexpected location.
if [[ -n "$(find "$tmp" -maxdepth 0 -type d -empty 2>/dev/null)" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show an example of this happening? Regardless can we remove this check because it's already handled by the if/elif/else below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example is as described in the PR description under "Paths on Cygwin".

I suppose it could be considered redundant due to the checks below. The intent of this check was to test if there are any files placed in the intended directory at all, which I felt to be different enough from whether or not an activation script could be found. I can remove/simplify if preferable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would prefer removing it because it simplifies the script and we don't do anything special for an empty dir.

if [[ -n "$(find "$tmp" -maxdepth 0 -type d -empty 2>/dev/null)" ]]; then
echo "$tmp is empty despite successful creation of virtual environment!"
return 1
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above.


if [[ "$windows_os_name" =~ 2016 ]]; then
# Avoid `RuntimeError: Could not determine home directory.` on
# windows-64-2016. See BUILD-16233.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only windows-64-2016? What about windows-64-vsMulti-small (Microsoft Windows Server 2019 Datacenter)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I reproduced the same issue there. This probably hits all windows hosts on evergreen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the patch testing windows-64-2019, there appeared to be no issue. I was not aware of windows-64-vsMulti-small. It is unclear to me what the difference between these distros may be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why windows-64-2019 works but windows-64-vsMulti-small doesn't. Either way the issue needs to be fixed on windows-64-vsMulti-small too because that's what we test on in pymongo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just documenting what we discussed via other channels that the windows-64-vsMulti-small was added to the test suite but did not demonstrate failure that was observed when testing on a spawn host, and that this issue could be related to BUILD-12392.

fi
fi

# Avoid `error: can't find Rust compiler`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if instead of trying to pinpoint which platforms need cryptography<3.4 we just try to install the latest version and if that fails fallback to cryptography<3.4? Like this:

python -m pip install cryptography || python -m pip install 'cryptography<3.4' || ...

python -m pip install -U "${packages[@]}" || ...

This is simpler and should work on more platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed simpler, but I deliberately opted for the current approach in order to be very explicit about conditions that require workarounds and as narrow as possible in the application of said workarounds.

This was motivated by the status quo where generally-applied workarounds such as pinning cryptography to ~=3.4.8 or using CRYPTOGRAPHY_DONT_BUILD_RUST=1 continued to demonstrate unexpected failures, and the conditions for said failures appeared to be inconsistent and opaque. It was unclear to me whenever I encountered such a failure whether it was already known, a new problem, or where the blame should be assigned (did I break it, or did the environment change without my knowing?).

My hope was that being explicit in this manner would make it easier to maintain this script moving forward, with simplifications/removals of special-casing being applied in a controlled and targeted manner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the generic one to avoid needing to tweak and maintain these 30 extra lines which may or may not cover all the hosts drivers test on. I think a good compromise would be to use the generic approach but add an informative comment that specifically explains why the workaround exists like:

# Installing newer versions of cryptography requires rust when a wheel is not available.
# Fallback to an older version that does not require rust if the install fails. This is needed
# for at least the RHEL 6.2, powerpc64le, zSeries, and power8 hosts.
python -m pip install cryptography || python -m pip install 'cryptography<3.4' || ...

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is an acceptable compromise. Would appeciate other reviewers' thoughts on this before committing to the refactor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for the compromise. That may require less changes to this script as distros undergo changes or more distros are added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Verified by this patch.

@kevinAlbs kevinAlbs removed the request for review from benjirewis November 11, 2022 19:53
fi
fi

# Avoid `error: can't find Rust compiler`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the generic one to avoid needing to tweak and maintain these 30 extra lines which may or may not cover all the hosts drivers test on. I think a good compromise would be to use the generic approach but add an informative comment that specifically explains why the workaround exists like:

# Installing newer versions of cryptography requires rust when a wheel is not available.
# Fallback to an older version that does not require rust if the install fails. This is needed
# for at least the RHEL 6.2, powerpc64le, zSeries, and power8 hosts.
python -m pip install cryptography || python -m pip install 'cryptography<3.4' || ...

What do you think?


if [[ "$windows_os_name" =~ 2016 ]]; then
# Avoid `RuntimeError: Could not determine home directory.` on
# windows-64-2016. See BUILD-16233.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why windows-64-2019 works but windows-64-vsMulti-small doesn't. Either way the issue needs to be fixed on windows-64-vsMulti-small too because that's what we test on in pymongo.


# Sanity check: on some environments (such as Cygwin) creation of the virtual
# environment may succeed but place the environment in an unexpected location.
if [[ -n "$(find "$tmp" -maxdepth 0 -type d -empty 2>/dev/null)" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would prefer removing it because it simplifies the script and we don't do anything special for an empty dir.

"$bin" -m "$mod" --system-site-packages "$real_path" || continue
;;
virtualenv)
# -p: ensure correct Python binary is used by virtual environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, can you add the comment from the old script? It's much more informative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants