-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev env setup script #5717
Dev env setup script #5717
Conversation
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…v-setup-script Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…v-setup-script Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
dev/dev-env-setup.sh
Outdated
brew update | ||
brew install pyenv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this script only support macOS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point. I'll add the pyenv setup for a few OS's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just provide the pyenv installation instructions and let contributors install pyenv by themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about maintenance costs. If we support a few OS, I think we need to test the script on the supported operating systems.
dev/dev-env-setup.sh
Outdated
MLFLOW_HOME=$(pwd) | ||
|
||
# Get the minimum supported version from MLflow to ensure any feature development adheres to legacy Python versions | ||
min_py_version=$(grep "python_requires=" "$MLFLOW_HOME/setup.py" | grep -E -o "([0-9]{1,}\.)+[0-9]{1,}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add a command to print out the minimum supported python version in setup.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't clear. I meant a command like ListDependencies
:
Line 90 in f7a42be
class ListDependencies(distutils.cmd.Command): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added (but not pushed yet)
dev/dev-env-setup.sh
Outdated
rm -rf "$tmp_dir" | ||
|
||
# Install dev requirements and test plugin | ||
pip install "$( (( $quiet == 1 && $verbose == 0 )) && printf %s '-q' )" -r "$MLFLOW_HOME/requirements/dev-requirements.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core maintainers like us need all the extra dependencies, but most contributors don't, and installing all of them is difficult to succeed and takes a long time. Can we install a minimum set of dev dependencies by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make the [extras] optional? Or:
Options:
--packages
"light" - skinny, small, lint
"full" - light + large, doc, extra-ml
"all" - dev-requirements + [extras]
WDYT?
@@ -0,0 +1,158 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we run this script in one of the github action workflows to ensure it works?
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
dev/dev-env-setup.sh
Outdated
pip install $( (( quiet == 1 && verbose == 0 )) && printf %s '-q' ) -e "$MLFLOW_HOME/tests/resources//mlflow-test-plugin" | ||
echo "Finished installing pip dependencies." | ||
else | ||
pip install $( (( quiet == 1 && verbose == 0 )) && printf %s '-q' ) -r "$MLFLOW_HOME/requirements/skinny-requirements.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this line because skinny-requirements.txt
is a subset of small-requirements.txt
.
dev/dev-env-setup.sh
Outdated
pip install $( (( quiet == 1 && verbose == 0 )) && printf %s '-q' ) -e "$MLFLOW_HOME/tests/resources//mlflow-test-plugin" | ||
echo "Finished installing pip dependencies." | ||
else | ||
pip install $( (( quiet == 1 && verbose == 0 )) && printf %s '-q' ) -r "$MLFLOW_HOME/requirements/skinny-requirements.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this line because skinny-requirements.txt
is a subset of small-requirements.txt
.
@@ -0,0 +1,90 @@ | |||
name: Dev environment setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to run this workflow in any PRs? or only when we change dev/dev-env-setup.sh
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd likely be running this on a weekly schedule. I really don't want to run this on PRs or during nightly builds since it's going to impact our PR feedback heavily for something that really shouldn't be changing very often at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do something similar to the cross-version test. Run this job weekly and when we update dev/dev-env-setup.sh
.
run: | ||
shell: bash --noprofile --norc -exo pipefail {0} | ||
|
||
jobs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use strategy.matrix
to DRY the code:
mlflow/.github/workflows/js.yml
Lines 23 to 24 in 8b0bfc5
strategy: | |
matrix: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea!
.github/workflows/dev-setup.yml
Outdated
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dev/dev-env-setup.sh
Outdated
deactivate | ||
rm -rf "$directory" | ||
echo "Virtual environment removed from '$directory'. Installing new instance." | ||
pyenv exec virtualenv "$directory" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does virtualenv automatically pick up the current python version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it goes through a discovery process that, if the version is found on the local system, will by default create a symlink. Do you think we should disable this and do a copy mode to force a replica?
dev/dev-env-setup.sh
Outdated
rm -rf "$directory" | ||
echo "Virtual environment removed from '$directory'. Installing new instance." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use could use --clear
flag:
$ virtualenv --help
...
--clear remove the destination directory if exist before starting (will overwrite files otherwise) (default: False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea!
dev/dev-env-setup.sh
Outdated
echo "$PYENV_BIN" >> "$GITHUB_PATH" | ||
echo "PYENV_ROOT=$PYENV_ROOT" >> "$GITHUB_ENV" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these two lines should be executed only in GitHub Actions workflows.
.github/workflows/dev-setup.yml
Outdated
os: [ubuntu-latest, windows-latest, macos-latest] | ||
include: | ||
- os: ubuntu-latest | ||
shell: bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this because we set the workflow-level default shell (bash --noprofile --norc -exo pipefail {0}
).
dev/dev-env-setup.sh
Outdated
git clone --depth 1 https://github.com/pyenv/pyenv.git "$HOME/.pyenv" | ||
PYENV_ROOT="$HOME/.pyenv" | ||
PYENV_BIN="$PYENV_ROOT/bin" | ||
if [ "$MLFLOW_DEV_ENV_CI_RUN" == 1 ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if [ "$MLFLOW_DEV_ENV_CI_RUN" == 1 ]; then | |
if [ ! -z "$GITHUB_ACTIONS" ]; then |
Can we use the GITHUB_ACTIONS
environment variable instead?
I'd say no because the number of contributions to MLflow UI is not large. We can add this later if there is a request. |
dev/dev-env-setup.sh
Outdated
wget -O ~/brew_install.sh https://raw.githubusercontent.com/Homebrew/install/master/install.sh | ||
bash ~/brew_install.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wget -O ~/brew_install.sh https://raw.githubusercontent.com/Homebrew/install/master/install.sh | |
bash ~/brew_install.sh | |
bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" |
to avoid creating brew_install.sh
. I took this command from https://brew.sh/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! Added!
dev/dev-env-setup.sh
Outdated
esac | ||
|
||
# Install the Python version if it cannot be found | ||
pyenv install -s $PY_INSTALL_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest use pyenv install -s -v $PY_INSTALL_VERSION
the -v
make pyenv print compilation status to stdout which helps debugging when error happens.
and shall we add a "pyenv_root_dir" argument to allow user to specify where pyenv download the python tarballs to ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the -v make pyenv print compilation status to stdout which helps debugging when error happens.
Have you tried running pyenv install -v
? It's too verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add a "pyenv_root_dir" argument to allow user to specify where pyenv download the python tarballs to ?
I don't see a need for that argument . In what situation would that argument be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried running pyenv install -v? It's too verbose
Yes it might be too verbose but it helps provides more clues when compiling failed.
I don't see a need for that argument . In what situation would that argument be useful?
E.g the case the system disk remaining capacity is not sufficient, we can specify the path to be on other disk. Just a minor suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it might be too verbose but it helps provides more clues when compiling failed.
See https://github.com/pyenv/pyenv/blob/fab0082bd5cdda07f0bfdd69a9c676bc2d2906b3/plugins/python-build/bin/python-build#L135. pyenv stores build logs in a temporary file and prints out its path when build fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g the case the system disk remaining capacity is not sufficient, we can specify the path to be on other disk. Just a minor suggestion.
Can we just remove some files to free disk space?
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…v-setup-script Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
|
||
Example usage: | ||
|
||
From root of MLflow repository on local with a destination virtualenv path of <MLFLOW_HOME>/.venvs/mlflow-dev: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From root of MLflow repository on local with a destination virtualenv path of <MLFLOW_HOME>/.venvs/mlflow-dev: | |
From root of MLflow repository on local with a destination virtualenv path of <MLFLOW_HOME>/.venv: |
Can we use .venv
instead of .venvs/mlflow-dev-env
?
https://docs.python.org/3/library/venv.html#creating-virtual-environments says:
... (a common name for the target directory is
.venv
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw https://github.com/mlflow/mlflow/pull/5717/files#diff-76e7ca2f2843ac5f659b45382768c1b64bf8704e8d3c6189c4e3623f5d465bc2R162-R170. Now the current approach makes sense.
git config --global user.email "test@mlflow.org" | ||
- name: Run Environment tests | ||
run: | | ||
TERM=xterm bash ./dev/test-dev-env-setup.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is TERM=xterm
only required on github actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. In a local terminal client there will be some defined terminal value to allow for syntactic formatting. I took this tip directly from a Github dev on an issue that they responded on that recommended specifying this if using any sort of tput commands within the script.
dev/dev-env-setup.sh
Outdated
# Check if brew is installed and install it if it isn't present | ||
# Note: if xcode isn't installed, this will fail. | ||
if [ -z "$(command -v brew)" ]; then | ||
echo "Brew is required to install pyenv on MacOS. Installing in your home directory." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo "Brew is required to install pyenv on MacOS. Installing in your home directory." | |
echo "Homebrew is required to install pyenv on MacOS. Installing in your home directory." |
nit
PYENV_ROOT="$HOME/.pyenv" | ||
PYENV_BIN="$PYENV_ROOT/bin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PYENV_ROOT="$HOME/.pyenv" | |
PYENV_BIN="$PYENV_ROOT/bin" | |
PYENV_ROOT="$HOME/.pyenv" | |
PYENV_BIN="$PYENV_ROOT/bin" | |
PATH="$PYENV_BIN:$PATH" |
I think we need to add PYENV_BIN
to PATH
, otherwise pyenv ...
doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
dev/dev-env-setup.sh
Outdated
rd="$MLFLOW_HOME/requirements" | ||
|
||
# Get the minimum supported version from MLflow to ensure any feature development adheres to legacy Python versions | ||
min_py_version=$(grep "python_requires=" "$MLFLOW_HOME/setup.py" | grep -E -o "([0-9]{1,}\.)+[0-9]{1,}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min_py_version=$(grep "python_requires=" "$MLFLOW_HOME/setup.py" | grep -E -o "([0-9]{1,}\.)+[0-9]{1,}") | |
min_py_version=$(python setup.py -q min_python_version) |
Let's use the min_python_version
command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
dev/dev-env-setup.sh
Outdated
|
||
-h, -help, --help Display help | ||
|
||
-d, -directory --directory The path to install the virtual environment into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ./dev/dev-env-setup.sh -directory
or ./dev/dev-env-setup.sh --directory
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw do we need -directory
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed those redundant arg keys
dev/test-dev-env-setup.sh
Outdated
err=0 | ||
trap 'err=1' ERR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fail-fast?
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
dev/dev-env-setup.sh
Outdated
if [[ -n "$override_py_ver" ]]; then | ||
echo "$(tput bold; tput setaf 1)You are overriding the recommended version of Python for MLflow development: $min_py_version. $(tput sgr0)" | ||
min_py_version="$(grep -o "^[0-9]*\.[0-9]*" <<< "$override_py_ver")" | ||
fi | ||
|
||
# Resolve a minor version to the latest micro version | ||
case $min_py_version in | ||
"3.7") PY_INSTALL_VERSION="3.7.13" ;; | ||
"3.8") PY_INSTALL_VERSION="3.8.13" ;; | ||
"3.9") PY_INSTALL_VERSION="3.9.11" ;; | ||
"3.10") PY_INSTALL_VERSION="3.10.3" ;; | ||
esac | ||
|
||
microver=$(grep -o '\.' <<< "$override_py_ver" | wc -l) | ||
if [[ $microver -gt 1 ]]; then | ||
PY_INSTALL_VERSION=$override_py_ver | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is a bit difficult to understand for me. Can we do something like this?
minor_to_micro() {
case $1 in
"3.7") echo "3.7.13" ;;
"3.8") echo "3.8.13" ;;
"3.9") echo "3.9.11" ;;
"3.10") echo "3.10.3" ;;
esac
}
if override_py_ver is specified
if override_py_ver looks like 3.x
PY_INSTALL_VERSION=$(minor_to_micro $override_py_ver)
elif override_py_ver looks like 3.x.y
PY_INSTALL_VERSION=$override_py_ver
else
echo("Invalid version ...")
exit 1
fi
else
PY_INSTALL_VERSION=$(minor_to_micro $min_py_version)
fi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea! This logic is much easier to follow.
.github/workflows/dev-setup.yml
Outdated
with: | ||
repository: ${{ github.event.inputs.repository }} | ||
ref: ${{ github.event.inputs.ref }} | ||
- uses: ./.github/actions/setup-pyenv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- uses: ./.github/actions/setup-pyenv |
Can we remove this line and test pyenv installation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely! Removing and validating CI build.
CONTRIBUTING.rst
Outdated
|
||
.. code-block:: bash | ||
|
||
dev/dev-env-setup.sh -d ~/.venvs/mlflow-dev -q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dev/dev-env-setup.sh -d ~/.venvs/mlflow-dev -q | |
dev/dev-env-setup.sh -d .venvs/mlflow-dev -q |
Can we create an environment in the current working directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch. Simplifies local env activation for the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the remaining comments are addressed!
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
What changes are proposed in this pull request?
Add a development environment setup script for Python dev
How is this patch tested?
Manually
Does this PR change the documentation?
ci/circleci: build_doc
check. If it's successful, proceed to thenext step, otherwise fix it.
Details
on the right to open the job page of CircleCI.Artifacts
tab.docs/build/html/index.html
.Release Notes
Is this a user-facing change?
Add a Development Environment setup script for automated construction of a CI-friendly Python virtual environment.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes