pyfunc.spark_udf: conda env dir / cache dir isolation and NFS optimization #5561

WeichenXu123 · 2022-03-30T13:25:24Z

What changes are proposed in this pull request?

For pyfunc.spark_udf:

Make conda env dir / cache dir isolated for different python processes.
NFS optimization: Create conda env once in driver side (write env dir into NFS mounted directory), and executor side directly read the env set up in driver side.

How is this patch tested?

Manually:

Does this PR change the documentation?

No. You can skip the rest of this section.
Yes. Make sure the changed pages / sections render correctly by following the steps below.

Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
next step, otherwise fix it.
Click Details on the right to open the job page of CircleCI.
Click the Artifacts tab.
Click docs/build/html/index.html.
Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

For pyfunc.spark_udf:

Make conda env dir / cache dir isolated for different python processes.
NFS optimization: Create conda env once in driver side (write env dir into NFS mounted directory), and executor side directly read the env set up in driver side.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

dbczumar

@WeichenXu123 Looks great! Tried it out on Databricks & it seemed to work well! Left a few small comments.

dbczumar · 2022-03-31T20:00:54Z

mlflow/pyfunc/__init__.py

+        # Create individual package cache dir "pkgs" under the conda_env_root_dir
+        # for each python process.


If the user forks a new process after running this method, _CONDA_ENV_ROOT_DIR will still be defined and the same cache dir will be reused. Shall we handle this case?

Good question... This might be hard to address.
But what's the case user need to fork a process ? Is it a common case ?

Could be! Can't we store and check the process id? https://docs.python.org/3/library/os.html#os.getpid. If the pid doesn't match the one used to configure _CONDA_ENV_ROOT_DIR the first time, can we overwrite it?

Addressed! See cache_return_value_per_process decorator and it is applied on get_or_create_nfs_tmp_dir, get_or_create_tmp_dir, _get_or_create_env_root_dir

mlflow/pyfunc/__init__.py

mlflow/utils/conda.py

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

mlflow/utils/process.py

tests/utils/test_process_utils.py

mlflow/utils/process.py

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

mlflow/utils/conda.py

mlflow/utils/file_utils.py

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

tests/models/test_cli.py

tests/utils/test_process_utils.py

mlflow/pyfunc/backend.py

mlflow/utils/conda.py

mlflow/pyfunc/backend.py

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

mlflow/utils/conda.py

harupy

LGTM once the remaining comments are addressed.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

mlflow/utils/conda.py

mlflow/utils/process.py

mlflow/utils/file_utils.py

mlflow/pyfunc/__init__.py

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 added 24 commits March 28, 2022 17:38

init

4891f7d

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

fa0577b

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

49530eb

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

init

f9a4cdd

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

933b6f5

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

12ebd39

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

pull out get_or_create_conda_env

4365e2d

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

bb39bea

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

abab678

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update doc

c473412

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

add test

bdc7e4a

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

move test

d2fb9cf

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update doc

b3c59ae

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

merge

da92ba3

update

9814066

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

2a5867f

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

2dbbbf2

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

d9903b9

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

1436860

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

79704d9

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

0ca1f55

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

e2a62a3

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

9c16e67

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

f807bca

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

dbczumar reviewed Mar 31, 2022

View reviewed changes

WeichenXu123 added 5 commits April 1, 2022 08:09

update

07c530f

merge master

914a308

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

ec00847

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

dbeec68

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

be78087

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

9f83a50

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>