Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev env setup script #5717

Merged
merged 23 commits into from
May 10, 2022
Merged

Dev env setup script #5717

merged 23 commits into from
May 10, 2022

Conversation

BenWilson2
Copy link
Member

What changes are proposed in this pull request?

Add a development environment setup script for Python dev

How is this patch tested?

Manually

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly by following the steps below.
  1. Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
    next step, otherwise fix it.
  2. Click Details on the right to open the job page of CircleCI.
  3. Click the Artifacts tab.
  4. Click docs/build/html/index.html.
  5. Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Add a Development Environment setup script for automated construction of a CI-friendly Python virtual environment.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…v-setup-script

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…v-setup-script

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@github-actions github-actions bot added area/build Build and test infrastructure for MLflow area/docs Documentation issues rn/feature Mention under Features in Changelogs. labels Apr 18, 2022
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Comment on lines 64 to 65
brew update
brew install pyenv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this script only support macOS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great point. I'll add the pyenv setup for a few OS's.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just provide the pyenv installation instructions and let contributors install pyenv by themselves?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about maintenance costs. If we support a few OS, I think we need to test the script on the supported operating systems.

MLFLOW_HOME=$(pwd)

# Get the minimum supported version from MLflow to ensure any feature development adheres to legacy Python versions
min_py_version=$(grep "python_requires=" "$MLFLOW_HOME/setup.py" | grep -E -o "([0-9]{1,}\.)+[0-9]{1,}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add a command to print out the minimum supported python version in setup.py.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear. I meant a command like ListDependencies:

class ListDependencies(distutils.cmd.Command):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added (but not pushed yet)

dev/dev-env-setup.sh Outdated Show resolved Hide resolved
rm -rf "$tmp_dir"

# Install dev requirements and test plugin
pip install "$( (( $quiet == 1 && $verbose == 0 )) && printf %s '-q' )" -r "$MLFLOW_HOME/requirements/dev-requirements.txt"
Copy link
Member

@harupy harupy Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core maintainers like us need all the extra dependencies, but most contributors don't, and installing all of them is difficult to succeed and takes a long time. Can we install a minimum set of dev dependencies by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the [extras] optional? Or:

Options:
--packages
"light" - skinny, small, lint
"full" - light + large, doc, extra-ml
"all" - dev-requirements + [extras]

WDYT?

@@ -0,0 +1,158 @@
#!/usr/bin/env bash
Copy link
Member

@harupy harupy Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we run this script in one of the github action workflows to ensure it works?

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
pip install $( (( quiet == 1 && verbose == 0 )) && printf %s '-q' ) -e "$MLFLOW_HOME/tests/resources//mlflow-test-plugin"
echo "Finished installing pip dependencies."
else
pip install $( (( quiet == 1 && verbose == 0 )) && printf %s '-q' ) -r "$MLFLOW_HOME/requirements/skinny-requirements.txt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this line because skinny-requirements.txt is a subset of small-requirements.txt.

pip install $( (( quiet == 1 && verbose == 0 )) && printf %s '-q' ) -e "$MLFLOW_HOME/tests/resources//mlflow-test-plugin"
echo "Finished installing pip dependencies."
else
pip install $( (( quiet == 1 && verbose == 0 )) && printf %s '-q' ) -r "$MLFLOW_HOME/requirements/skinny-requirements.txt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this line because skinny-requirements.txt is a subset of small-requirements.txt.

@@ -0,0 +1,90 @@
name: Dev environment setup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to run this workflow in any PRs? or only when we change dev/dev-env-setup.sh?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd likely be running this on a weekly schedule. I really don't want to run this on PRs or during nightly builds since it's going to impact our PR feedback heavily for something that really shouldn't be changing very often at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do something similar to the cross-version test. Run this job weekly and when we update dev/dev-env-setup.sh.

run:
shell: bash --noprofile --norc -exo pipefail {0}

jobs:
Copy link
Member

@harupy harupy Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use strategy.matrix to DRY the code:

strategy:
matrix:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea!

Comment on lines 88 to 90



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

deactivate
rm -rf "$directory"
echo "Virtual environment removed from '$directory'. Installing new instance."
pyenv exec virtualenv "$directory"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does virtualenv automatically pick up the current python version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it goes through a discovery process that, if the version is found on the local system, will by default create a symlink. Do you think we should disable this and do a copy mode to force a replica?

Comment on lines 165 to 166
rm -rf "$directory"
echo "Virtual environment removed from '$directory'. Installing new instance."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use could use --clear flag:

$ virtualenv --help
...
  --clear                       remove the destination directory if exist before starting (will overwrite files otherwise) (default: False)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea!

Comment on lines 101 to 102
echo "$PYENV_BIN" >> "$GITHUB_PATH"
echo "PYENV_ROOT=$PYENV_ROOT" >> "$GITHUB_ENV"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these two lines should be executed only in GitHub Actions workflows.

os: [ubuntu-latest, windows-latest, macos-latest]
include:
- os: ubuntu-latest
shell: bash
Copy link
Member

@harupy harupy Apr 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this because we set the workflow-level default shell (bash --noprofile --norc -exo pipefail {0}).

git clone --depth 1 https://github.com/pyenv/pyenv.git "$HOME/.pyenv"
PYENV_ROOT="$HOME/.pyenv"
PYENV_BIN="$PYENV_ROOT/bin"
if [ "$MLFLOW_DEV_ENV_CI_RUN" == 1 ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if [ "$MLFLOW_DEV_ENV_CI_RUN" == 1 ]; then
if [ ! -z "$GITHUB_ACTIONS" ]; then

Can we use the GITHUB_ACTIONS environment variable instead?

@harupy
Copy link
Member

harupy commented Apr 23, 2022

and shall we support installing "node.js" in the env setup script ? node.js is used in mflow web server.

I'd say no because the number of contributions to MLflow UI is not large. We can add this later if there is a request.

Comment on lines 89 to 90
wget -O ~/brew_install.sh https://raw.githubusercontent.com/Homebrew/install/master/install.sh
bash ~/brew_install.sh
Copy link
Member

@harupy harupy Apr 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wget -O ~/brew_install.sh https://raw.githubusercontent.com/Homebrew/install/master/install.sh
bash ~/brew_install.sh
bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

to avoid creating brew_install.sh. I took this command from https://brew.sh/.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! Added!

esac

# Install the Python version if it cannot be found
pyenv install -s $PY_INSTALL_VERSION
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest use pyenv install -s -v $PY_INSTALL_VERSION
the -v make pyenv print compilation status to stdout which helps debugging when error happens.

and shall we add a "pyenv_root_dir" argument to allow user to specify where pyenv download the python tarballs to ?

Copy link
Member

@harupy harupy Apr 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the -v make pyenv print compilation status to stdout which helps debugging when error happens.

Have you tried running pyenv install -v? It's too verbose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add a "pyenv_root_dir" argument to allow user to specify where pyenv download the python tarballs to ?

I don't see a need for that argument . In what situation would that argument be useful?

Copy link
Collaborator

@WeichenXu123 WeichenXu123 Apr 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried running pyenv install -v? It's too verbose

Yes it might be too verbose but it helps provides more clues when compiling failed.

I don't see a need for that argument . In what situation would that argument be useful?

E.g the case the system disk remaining capacity is not sufficient, we can specify the path to be on other disk. Just a minor suggestion.

Copy link
Member

@harupy harupy Apr 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it might be too verbose but it helps provides more clues when compiling failed.

See https://github.com/pyenv/pyenv/blob/fab0082bd5cdda07f0bfdd69a9c676bc2d2906b3/plugins/python-build/bin/python-build#L135. pyenv stores build logs in a temporary file and prints out its path when build fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g the case the system disk remaining capacity is not sufficient, we can specify the path to be on other disk. Just a minor suggestion.

Can we just remove some files to free disk space?

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
…v-setup-script

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

Example usage:

From root of MLflow repository on local with a destination virtualenv path of <MLFLOW_HOME>/.venvs/mlflow-dev:
Copy link
Member

@harupy harupy May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
From root of MLflow repository on local with a destination virtualenv path of <MLFLOW_HOME>/.venvs/mlflow-dev:
From root of MLflow repository on local with a destination virtualenv path of <MLFLOW_HOME>/.venv:

Can we use .venv instead of .venvs/mlflow-dev-env?

https://docs.python.org/3/library/venv.html#creating-virtual-environments says:

... (a common name for the target directory is .venv).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git config --global user.email "test@mlflow.org"
- name: Run Environment tests
run: |
TERM=xterm bash ./dev/test-dev-env-setup.sh
Copy link
Member

@harupy harupy May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is TERM=xterm only required on github actions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. In a local terminal client there will be some defined terminal value to allow for syntactic formatting. I took this tip directly from a Github dev on an issue that they responded on that recommended specifying this if using any sort of tput commands within the script.

# Check if brew is installed and install it if it isn't present
# Note: if xcode isn't installed, this will fail.
if [ -z "$(command -v brew)" ]; then
echo "Brew is required to install pyenv on MacOS. Installing in your home directory."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "Brew is required to install pyenv on MacOS. Installing in your home directory."
echo "Homebrew is required to install pyenv on MacOS. Installing in your home directory."

nit

Comment on lines +103 to +104
PYENV_ROOT="$HOME/.pyenv"
PYENV_BIN="$PYENV_ROOT/bin"
Copy link
Member

@harupy harupy May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PYENV_ROOT="$HOME/.pyenv"
PYENV_BIN="$PYENV_ROOT/bin"
PYENV_ROOT="$HOME/.pyenv"
PYENV_BIN="$PYENV_ROOT/bin"
PATH="$PYENV_BIN:$PATH"

I think we need to add PYENV_BIN to PATH, otherwise pyenv ... doesn't work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

rd="$MLFLOW_HOME/requirements"

# Get the minimum supported version from MLflow to ensure any feature development adheres to legacy Python versions
min_py_version=$(grep "python_requires=" "$MLFLOW_HOME/setup.py" | grep -E -o "([0-9]{1,}\.)+[0-9]{1,}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
min_py_version=$(grep "python_requires=" "$MLFLOW_HOME/setup.py" | grep -E -o "([0-9]{1,}\.)+[0-9]{1,}")
min_py_version=$(python setup.py -q min_python_version)

Let's use the min_python_version command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!


-h, -help, --help Display help

-d, -directory --directory The path to install the virtual environment into
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ./dev/dev-env-setup.sh -directory or ./dev/dev-env-setup.sh --directory work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does now!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw do we need -directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed those redundant arg keys

Comment on lines 6 to 7
err=0
trap 'err=1' ERR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fail-fast?

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Comment on lines 153 to 169
if [[ -n "$override_py_ver" ]]; then
echo "$(tput bold; tput setaf 1)You are overriding the recommended version of Python for MLflow development: $min_py_version. $(tput sgr0)"
min_py_version="$(grep -o "^[0-9]*\.[0-9]*" <<< "$override_py_ver")"
fi

# Resolve a minor version to the latest micro version
case $min_py_version in
"3.7") PY_INSTALL_VERSION="3.7.13" ;;
"3.8") PY_INSTALL_VERSION="3.8.13" ;;
"3.9") PY_INSTALL_VERSION="3.9.11" ;;
"3.10") PY_INSTALL_VERSION="3.10.3" ;;
esac

microver=$(grep -o '\.' <<< "$override_py_ver" | wc -l)
if [[ $microver -gt 1 ]]; then
PY_INSTALL_VERSION=$override_py_ver
fi
Copy link
Member

@harupy harupy May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is a bit difficult to understand for me. Can we do something like this?

minor_to_micro() {
  case $1 in
    "3.7") echo "3.7.13" ;;
    "3.8") echo "3.8.13" ;;
    "3.9") echo "3.9.11" ;;
    "3.10") echo "3.10.3" ;;
  esac
}

if override_py_ver is specified
  if override_py_ver looks like 3.x
      PY_INSTALL_VERSION=$(minor_to_micro $override_py_ver)
  elif override_py_ver looks like 3.x.y
      PY_INSTALL_VERSION=$override_py_ver
  else
      echo("Invalid version ...")
      exit 1
  fi
else
  PY_INSTALL_VERSION=$(minor_to_micro $min_py_version)
fi

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea! This logic is much easier to follow.

with:
repository: ${{ github.event.inputs.repository }}
ref: ${{ github.event.inputs.ref }}
- uses: ./.github/actions/setup-pyenv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- uses: ./.github/actions/setup-pyenv

Can we remove this line and test pyenv installation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely! Removing and validating CI build.

CONTRIBUTING.rst Outdated

.. code-block:: bash

dev/dev-env-setup.sh -d ~/.venvs/mlflow-dev -q
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dev/dev-env-setup.sh -d ~/.venvs/mlflow-dev -q
dev/dev-env-setup.sh -d .venvs/mlflow-dev -q

Can we create an environment in the current working directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch. Simplifies local env activation for the repo.

Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the remaining comments are addressed!

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build Build and test infrastructure for MLflow area/docs Documentation issues rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants