Enable auto dependency inference in spark flavor #4759

harupy · 2021-08-31T06:40:04Z

Signed-off-by: harupy hkawamura0130@gmail.com

What changes are proposed in this pull request?

Enable auto dependency inference in spark flavor.

How is this patch tested?

Existing tests

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: harupy <hkawamura0130@gmail.com>

WeichenXu123 · 2021-08-31T09:13:13Z

mlflow/utils/_capture_modules.py

+    if flavor == "spark" and is_in_databricks_runtime():
+        from dbruntime.spark_connection import initialize_spark_connection
+
+        initialize_spark_connection()


if the case not in databricks runtime ?

@WeichenXu123

If not, a new spark session is created in the following code:

mlflow/mlflow/spark.py

Lines 687 to 700 in c4b8e84

if spark is None:

# NB: If there is no existing Spark context, create a new local one.

# NB: We're disabling caching on the new context since we do not need it and we want to

# avoid overwriting cache of underlying Spark cluster when executed on a Spark Worker

# (e.g. as part of spark_udf).

spark = (

pyspark.sql.SparkSession.builder.config("spark.python.worker.reuse", True)

.config("spark.databricks.io.cache.enabled", False)

# In Spark 3.1 and above, we need to set this conf explicitly to enable creating

# a SparkSession on the workers

.config("spark.executor.allowSparkContext", "true")

.master("local[1]")

.getOrCreate()

)

Shall we add try/catch here and add fallback handling ?
e.g. the case User upgrade/downgrade the builtin mlflow of databrick runtime and found the dbruntime.spark_connection.initialize_spark_connection API does not exist.

Agree with @WeichenXu123 , I think a try/catch is a good idea here.

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2021-08-31T10:29:06Z

mlflow/utils/_capture_modules.py

+    if flavor == "spark" and is_in_databricks_runtime():
+        from dbruntime.spark_connection import initialize_spark_connection


This approach breaks when initialize_spark_connection is renamed or moved to a different module.

Should we make the spark initialization process modifiable via monkey-patching or an environment variable?

Add a MLR test to prevent it break.

dbczumar

@harupy LGTM once https://github.com/mlflow/mlflow/pull/4759/files#r699873541 is addressed. Great work!

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2021-09-02T05:51:17Z

mlflow/utils/_capture_modules.py

+        try:
+            # pylint: disable=import-error
+            from dbruntime.spark_connection import initialize_spark_connection
+
+            initialize_spark_connection()
+        except Exception:
+            pass


@WeichenXu123 @dbczumar Added try-catch. We could optimistically run the subsequent process after the failure, but mlflow.spark.load_pyfunc will also propbably fail If initialize_spark_connection fails.

@harupy Makes sense that it will likely fail. Can we raise a clearer error message for debugging purposes?

Suggested change

try:

# pylint: disable=import-error

from dbruntime.spark_connection import initialize_spark_connection

initialize_spark_connection()

except Exception:

pass

try:

# pylint: disable=import-error

from dbruntime.spark_connection import initialize_spark_connection

initialize_spark_connection()

except Exception as e:

raise Exception("Attempted to initialize a spark session to load the model but failed") from e

Sounds good to me!

Signed-off-by: harupy <hkawamura0130@gmail.com>

* use infer_pip_requirements to spark.py Signed-off-by: harupy <hkawamura0130@gmail.com> * fix tests Signed-off-by: harupy <hkawamura0130@gmail.com> * fix tests Signed-off-by: harupy <hkawamura0130@gmail.com> * rename test Signed-off-by: harupy <hkawamura0130@gmail.com> * workaround for databricks Signed-off-by: harupy <hkawamura0130@gmail.com> * workaround for pyspark in databricks Signed-off-by: harupy <hkawamura0130@gmail.com> * rename test Signed-off-by: harupy <hkawamura0130@gmail.com> * address comments Signed-off-by: harupy <hkawamura0130@gmail.com> * better error message Signed-off-by: harupy <hkawamura0130@gmail.com> * fix error message Signed-off-by: harupy <hkawamura0130@gmail.com>

* use infer_pip_requirements to spark.py Signed-off-by: harupy <hkawamura0130@gmail.com> * fix tests Signed-off-by: harupy <hkawamura0130@gmail.com> * fix tests Signed-off-by: harupy <hkawamura0130@gmail.com> * rename test Signed-off-by: harupy <hkawamura0130@gmail.com> * workaround for databricks Signed-off-by: harupy <hkawamura0130@gmail.com> * workaround for pyspark in databricks Signed-off-by: harupy <hkawamura0130@gmail.com> * rename test Signed-off-by: harupy <hkawamura0130@gmail.com> * address comments Signed-off-by: harupy <hkawamura0130@gmail.com> * better error message Signed-off-by: harupy <hkawamura0130@gmail.com> * fix error message Signed-off-by: harupy <hkawamura0130@gmail.com> Signed-off-by: benwilson <benjamin.wilson@databricks.com>

* WIP build Signed-off-by: benwilson <benjamin.wilson@databricks.com> * Finish unit tests and address prophet build dependencies for serving Signed-off-by: benwilson <benjamin.wilson@databricks.com> * refactor example for prophet and add tracking validation test Signed-off-by: benwilson <benjamin.wilson@databricks.com> * adjusting build env Signed-off-by: benwilson <benjamin.wilson@databricks.com> * linting Signed-off-by: benwilson <benjamin.wilson@databricks.com> * Increase HTTP timeout to 90s. Disabled cloud storage HTTP timeout. (#4764) * Increase HTTP timeout to 120s. Disabled cloud storage HTTP timeout. * Enable auto dependency inference in spark flavor (#4759) * use infer_pip_requirements to spark.py Signed-off-by: harupy <hkawamura0130@gmail.com> * fix tests Signed-off-by: harupy <hkawamura0130@gmail.com> * fix tests Signed-off-by: harupy <hkawamura0130@gmail.com> * rename test Signed-off-by: harupy <hkawamura0130@gmail.com> * workaround for databricks Signed-off-by: harupy <hkawamura0130@gmail.com> * workaround for pyspark in databricks Signed-off-by: harupy <hkawamura0130@gmail.com> * rename test Signed-off-by: harupy <hkawamura0130@gmail.com> * address comments Signed-off-by: harupy <hkawamura0130@gmail.com> * better error message Signed-off-by: harupy <hkawamura0130@gmail.com> * fix error message Signed-off-by: harupy <hkawamura0130@gmail.com> * Fix `test_autolog_emits_warning_message_when_model_prediction_fails` (#4768) * Use AttributeError Signed-off-by: harupy <hkawamura0130@gmail.com> * comment Signed-off-by: harupy <hkawamura0130@gmail.com> * fix Signed-off-by: harupy <hkawamura0130@gmail.com> * use sanity_checking (#4767) Signed-off-by: harupy <hkawamura0130@gmail.com> * refactor: Extract the docker image building _build_image_from_context function (#4769) * refactor: Extract the docker image building _build_image_from_context function Signed-off-by: Alexey Volkov <alexey.volkov@ark-kun.com> * blacken Signed-off-by: harupy <hkawamura0130@gmail.com> Co-authored-by: harupy <hkawamura0130@gmail.com> * Fix autologging compatibility with Keras >= 2.6.0 (#4766) * Increase HTTP timeout to 90s. Disabled cloud storage HTTP timeout. (#4764) * Increase HTTP timeout to 120s. Disabled cloud storage HTTP timeout. Signed-off-by: dbczumar <corey.zumar@databricks.com> * Include keras conditionally Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix root cause Signed-off-by: dbczumar <corey.zumar@databricks.com> * docstring Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some test cases Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some test cases Signed-off-by: dbczumar <corey.zumar@databricks.com> * Tests Signed-off-by: dbczumar <corey.zumar@databricks.com> * Format Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test fix 2 Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove keras change Signed-off-by: dbczumar <corey.zumar@databricks.com> * Use is Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove unused modules Signed-off-by: dbczumar <corey.zumar@databricks.com> * Use fixtures Signed-off-by: dbczumar <corey.zumar@databricks.com> * Docstring Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix fixtures Signed-off-by: dbczumar <corey.zumar@databricks.com> * Lint fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Format Signed-off-by: dbczumar <corey.zumar@databricks.com> * Try preserve find module Signed-off-by: dbczumar <corey.zumar@databricks.com> * fix Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * fix2 Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * Simplify fluent test cases Signed-off-by: dbczumar <corey.zumar@databricks.com> * Format Signed-off-by: dbczumar <corey.zumar@databricks.com> * Tweaks, add a warning Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test excludee Signed-off-by: dbczumar <corey.zumar@databricks.com> * Reverts Signed-off-by: dbczumar <corey.zumar@databricks.com> Co-authored-by: jinzhang21 <78067366+jinzhang21@users.noreply.github.com> Co-authored-by: Weichen Xu <weichen.xu@databricks.com> * Update spark support version to be 3.2 (#4770) * init Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * Fix set matrix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Format Signed-off-by: dbczumar <corey.zumar@databricks.com> Co-authored-by: dbczumar <corey.zumar@databricks.com> * merge * merge * fix build errors * Changelog message tweak (#4777) * Tweak Signed-off-by: dbczumar <corey.zumar@databricks.com> * Lang Signed-off-by: dbczumar <corey.zumar@databricks.com> * Update CHANGELOG.rst * [ALL TESTS] Update (#4778) * Update MLflow version to 1.20.3 Signed-off-by: Jenkins <jenkins@databricks.com> * Update Signed-off-by: dbczumar <corey.zumar@databricks.com> Co-authored-by: Jenkins <jenkins@databricks.com> Co-authored-by: dbczumar <corey.zumar@databricks.com> * Fix small typo (#4772) Before: ``` 2021/09/03 18:53:45 WARNING mlflow.sklearn.utils: precision_score failed. The metric test_gold_precision_scorewill not be recorded. Metric error: '<' not supported between instances of 'float' and 'str' 2021/09/03 18:53:45 WARNING mlflow.sklearn.utils: recall_score failed. The metric test_gold_recall_scorewill not be recorded. Metric error: '<' not supported between instances of 'float' and 'str' 2021/09/03 18:53:45 WARNING mlflow.sklearn.utils: f1_score failed. The metric test_gold_f1_scorewill not be recorded. Metric error: '<' not supported between instances of 'float' and 'str' 2021/09/03 18:53:45 WARNING mlflow.sklearn.utils: accuracy_score failed. The metric test_gold_accuracy_scorewill not be recorded. Metric error: '<' not supported between instances of 'float' and 'str' ``` Signed-off-by: Louis Guitton <louisguitton93@gmail.com> * Push MLflow model to Sagemaker model registry (#4669) * Added push_sagemaker_model() api to enable push model from MLflow to Sagemaker model registry Signed-off-by: Jinni Gu <jinnigu@uw.edu> * Fixed the python doc for _find_transform_job() Signed-off-by: Jinni Gu <jinnigu@uw.edu> * Replace https protocol with s3:// protocol for the URL where Sagemaker model artifacts are stored in S3. Signed-off-by: Yiqing Wang <yiqing@wangemail.com> * rename push_sagemaker_model function add CLI function for push_model_to_sagemaker Signed-off-by: qtz123 <qiutingzhi1995@gmail.com> * reformat CLI for push_model_to_sagemaker Signed-off-by: qtz123 <qiutingzhi1995@gmail.com> * Used describe_model instead of list_models to check if the model exists. Signed-off-by: Yiqing Wang <yiqing@wangemail.com> * Renamed the _find_model method to _does_model_exist and limit the scope Signed-off-by: Yiqing Wang <yiqing@wangemail.com> * Fixed lint error and changed command to push model Signed-off-by: Yiqing Wang <yiqing@wangemail.com> Co-authored-by: Yiqing Wang <yiqing@wangemail.com> Co-authored-by: qtz123 <qiutingzhi1995@gmail.com> * Fix tensorflow dev cross test (#4780) * init Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * dummy tf update Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * Adding requirements for prophet examples Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * update conda versions for prophet Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * Fixing the prophet tests Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * PR fixes Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * Add notes on sub-dependencies Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * Updating docs for Prophet flavor Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * Resolve dependencies and add flavor tests Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * update conda.yaml in examples for prophet Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> Co-authored-by: jinzhang21 <78067366+jinzhang21@users.noreply.github.com> Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com> Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com> Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> Co-authored-by: Weichen Xu <weichen.xu@databricks.com> Co-authored-by: dbczumar <corey.zumar@databricks.com> Co-authored-by: mlflow-automation <61449322+mlflow-automation@users.noreply.github.com> Co-authored-by: Jenkins <jenkins@databricks.com> Co-authored-by: Louis Guitton <louisguitton@users.noreply.github.com> Co-authored-by: Jinni Gu <jinnigu@uw.edu> Co-authored-by: Yiqing Wang <yiqing@wangemail.com> Co-authored-by: qtz123 <qiutingzhi1995@gmail.com>

* WIP build Signed-off-by: benwilson <benjamin.wilson@databricks.com> * Finish unit tests and address prophet build dependencies for serving Signed-off-by: benwilson <benjamin.wilson@databricks.com> * refactor example for prophet and add tracking validation test Signed-off-by: benwilson <benjamin.wilson@databricks.com> * adjusting build env Signed-off-by: benwilson <benjamin.wilson@databricks.com> * linting Signed-off-by: benwilson <benjamin.wilson@databricks.com> * Increase HTTP timeout to 90s. Disabled cloud storage HTTP timeout. (mlflow#4764) * Increase HTTP timeout to 120s. Disabled cloud storage HTTP timeout. * Enable auto dependency inference in spark flavor (mlflow#4759) * use infer_pip_requirements to spark.py Signed-off-by: harupy <hkawamura0130@gmail.com> * fix tests Signed-off-by: harupy <hkawamura0130@gmail.com> * fix tests Signed-off-by: harupy <hkawamura0130@gmail.com> * rename test Signed-off-by: harupy <hkawamura0130@gmail.com> * workaround for databricks Signed-off-by: harupy <hkawamura0130@gmail.com> * workaround for pyspark in databricks Signed-off-by: harupy <hkawamura0130@gmail.com> * rename test Signed-off-by: harupy <hkawamura0130@gmail.com> * address comments Signed-off-by: harupy <hkawamura0130@gmail.com> * better error message Signed-off-by: harupy <hkawamura0130@gmail.com> * fix error message Signed-off-by: harupy <hkawamura0130@gmail.com> * Fix `test_autolog_emits_warning_message_when_model_prediction_fails` (mlflow#4768) * Use AttributeError Signed-off-by: harupy <hkawamura0130@gmail.com> * comment Signed-off-by: harupy <hkawamura0130@gmail.com> * fix Signed-off-by: harupy <hkawamura0130@gmail.com> * use sanity_checking (mlflow#4767) Signed-off-by: harupy <hkawamura0130@gmail.com> * refactor: Extract the docker image building _build_image_from_context function (mlflow#4769) * refactor: Extract the docker image building _build_image_from_context function Signed-off-by: Alexey Volkov <alexey.volkov@ark-kun.com> * blacken Signed-off-by: harupy <hkawamura0130@gmail.com> Co-authored-by: harupy <hkawamura0130@gmail.com> * Fix autologging compatibility with Keras >= 2.6.0 (mlflow#4766) * Increase HTTP timeout to 90s. Disabled cloud storage HTTP timeout. (mlflow#4764) * Increase HTTP timeout to 120s. Disabled cloud storage HTTP timeout. Signed-off-by: dbczumar <corey.zumar@databricks.com> * Include keras conditionally Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix root cause Signed-off-by: dbczumar <corey.zumar@databricks.com> * docstring Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some test cases Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some test cases Signed-off-by: dbczumar <corey.zumar@databricks.com> * Tests Signed-off-by: dbczumar <corey.zumar@databricks.com> * Format Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test fix 2 Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove keras change Signed-off-by: dbczumar <corey.zumar@databricks.com> * Use is Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove unused modules Signed-off-by: dbczumar <corey.zumar@databricks.com> * Use fixtures Signed-off-by: dbczumar <corey.zumar@databricks.com> * Docstring Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix fixtures Signed-off-by: dbczumar <corey.zumar@databricks.com> * Lint fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Format Signed-off-by: dbczumar <corey.zumar@databricks.com> * Try preserve find module Signed-off-by: dbczumar <corey.zumar@databricks.com> * fix Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * fix2 Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * Simplify fluent test cases Signed-off-by: dbczumar <corey.zumar@databricks.com> * Format Signed-off-by: dbczumar <corey.zumar@databricks.com> * Tweaks, add a warning Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test excludee Signed-off-by: dbczumar <corey.zumar@databricks.com> * Reverts Signed-off-by: dbczumar <corey.zumar@databricks.com> Co-authored-by: jinzhang21 <78067366+jinzhang21@users.noreply.github.com> Co-authored-by: Weichen Xu <weichen.xu@databricks.com> * Update spark support version to be 3.2 (mlflow#4770) * init Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * Fix set matrix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Format Signed-off-by: dbczumar <corey.zumar@databricks.com> Co-authored-by: dbczumar <corey.zumar@databricks.com> * merge * merge * fix build errors * Changelog message tweak (mlflow#4777) * Tweak Signed-off-by: dbczumar <corey.zumar@databricks.com> * Lang Signed-off-by: dbczumar <corey.zumar@databricks.com> * Update CHANGELOG.rst * [ALL TESTS] Update (mlflow#4778) * Update MLflow version to 1.20.3 Signed-off-by: Jenkins <jenkins@databricks.com> * Update Signed-off-by: dbczumar <corey.zumar@databricks.com> Co-authored-by: Jenkins <jenkins@databricks.com> Co-authored-by: dbczumar <corey.zumar@databricks.com> * Fix small typo (mlflow#4772) Before: ``` 2021/09/03 18:53:45 WARNING mlflow.sklearn.utils: precision_score failed. The metric test_gold_precision_scorewill not be recorded. Metric error: '<' not supported between instances of 'float' and 'str' 2021/09/03 18:53:45 WARNING mlflow.sklearn.utils: recall_score failed. The metric test_gold_recall_scorewill not be recorded. Metric error: '<' not supported between instances of 'float' and 'str' 2021/09/03 18:53:45 WARNING mlflow.sklearn.utils: f1_score failed. The metric test_gold_f1_scorewill not be recorded. Metric error: '<' not supported between instances of 'float' and 'str' 2021/09/03 18:53:45 WARNING mlflow.sklearn.utils: accuracy_score failed. The metric test_gold_accuracy_scorewill not be recorded. Metric error: '<' not supported between instances of 'float' and 'str' ``` Signed-off-by: Louis Guitton <louisguitton93@gmail.com> * Push MLflow model to Sagemaker model registry (mlflow#4669) * Added push_sagemaker_model() api to enable push model from MLflow to Sagemaker model registry Signed-off-by: Jinni Gu <jinnigu@uw.edu> * Fixed the python doc for _find_transform_job() Signed-off-by: Jinni Gu <jinnigu@uw.edu> * Replace https protocol with s3:// protocol for the URL where Sagemaker model artifacts are stored in S3. Signed-off-by: Yiqing Wang <yiqing@wangemail.com> * rename push_sagemaker_model function add CLI function for push_model_to_sagemaker Signed-off-by: qtz123 <qiutingzhi1995@gmail.com> * reformat CLI for push_model_to_sagemaker Signed-off-by: qtz123 <qiutingzhi1995@gmail.com> * Used describe_model instead of list_models to check if the model exists. Signed-off-by: Yiqing Wang <yiqing@wangemail.com> * Renamed the _find_model method to _does_model_exist and limit the scope Signed-off-by: Yiqing Wang <yiqing@wangemail.com> * Fixed lint error and changed command to push model Signed-off-by: Yiqing Wang <yiqing@wangemail.com> Co-authored-by: Yiqing Wang <yiqing@wangemail.com> Co-authored-by: qtz123 <qiutingzhi1995@gmail.com> * Fix tensorflow dev cross test (mlflow#4780) * init Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * dummy tf update Signed-off-by: Weichen Xu <weichen.xu@databricks.com> * Adding requirements for prophet examples Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * update conda versions for prophet Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * Fixing the prophet tests Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * PR fixes Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * Add notes on sub-dependencies Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * Updating docs for Prophet flavor Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * Resolve dependencies and add flavor tests Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> * update conda.yaml in examples for prophet Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com> Co-authored-by: jinzhang21 <78067366+jinzhang21@users.noreply.github.com> Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com> Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com> Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> Co-authored-by: Weichen Xu <weichen.xu@databricks.com> Co-authored-by: dbczumar <corey.zumar@databricks.com> Co-authored-by: mlflow-automation <61449322+mlflow-automation@users.noreply.github.com> Co-authored-by: Jenkins <jenkins@databricks.com> Co-authored-by: Louis Guitton <louisguitton@users.noreply.github.com> Co-authored-by: Jinni Gu <jinnigu@uw.edu> Co-authored-by: Yiqing Wang <yiqing@wangemail.com> Co-authored-by: qtz123 <qiutingzhi1995@gmail.com> Signed-off-by: Anjali Samad <samad.anjali.14@gmail.com>

use infer_pip_requirements to spark.py

f236960

Signed-off-by: harupy <hkawamura0130@gmail.com>

github-actions bot added the rn/none List under Small Changes in Changelogs. label Aug 31, 2021

harupy added 4 commits August 31, 2021 16:36

fix tests

d7b1be6

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix tests

9750551

Signed-off-by: harupy <hkawamura0130@gmail.com>

rename test

a35420f

Signed-off-by: harupy <hkawamura0130@gmail.com>

workaround for databricks

9ecc045

Signed-off-by: harupy <hkawamura0130@gmail.com>

WeichenXu123 reviewed Aug 31, 2021

View reviewed changes

harupy added 2 commits August 31, 2021 18:30

workaround for pyspark in databricks

a665a74

Signed-off-by: harupy <hkawamura0130@gmail.com>

rename test

a4755d5

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy commented Aug 31, 2021

View reviewed changes

dbczumar approved these changes Sep 2, 2021

View reviewed changes

address comments

9f1e636

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy force-pushed the spark-auto-dep-inference branch from 1aa274c to 9f1e636 Compare September 2, 2021 05:47

harupy commented Sep 2, 2021

View reviewed changes

harupy added 2 commits September 3, 2021 09:00

better error message

6dfffa0

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix error message

c4966cf

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy force-pushed the spark-auto-dep-inference branch from b289979 to c4966cf Compare September 3, 2021 05:04

harupy merged commit b707d9e into mlflow:master Sep 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable auto dependency inference in spark flavor #4759

Enable auto dependency inference in spark flavor #4759

harupy commented Aug 31, 2021

WeichenXu123 Aug 31, 2021

harupy Aug 31, 2021

WeichenXu123 Sep 1, 2021

dbczumar Sep 2, 2021

harupy Aug 31, 2021

harupy Aug 31, 2021 •

edited

WeichenXu123 Sep 1, 2021

dbczumar left a comment

harupy Sep 2, 2021 •

edited

dbczumar Sep 2, 2021

harupy Sep 2, 2021 •

edited

dbczumar Sep 2, 2021

	if spark is None:
	# NB: If there is no existing Spark context, create a new local one.
	# NB: We're disabling caching on the new context since we do not need it and we want to
	# avoid overwriting cache of underlying Spark cluster when executed on a Spark Worker
	# (e.g. as part of spark_udf).
	spark = (
	pyspark.sql.SparkSession.builder.config("spark.python.worker.reuse", True)
	.config("spark.databricks.io.cache.enabled", False)
	# In Spark 3.1 and above, we need to set this conf explicitly to enable creating
	# a SparkSession on the workers
	.config("spark.executor.allowSparkContext", "true")
	.master("local[1]")
	.getOrCreate()
	)

		if flavor == "spark" and is_in_databricks_runtime():
		from dbruntime.spark_connection import initialize_spark_connection

Enable auto dependency inference in spark flavor #4759

Enable auto dependency inference in spark flavor #4759

Conversation

harupy commented Aug 31, 2021

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Aug 31, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

harupy Sep 2, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Sep 2, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Aug 31, 2021 •

edited

harupy Sep 2, 2021 •

edited

harupy Sep 2, 2021 •

edited