Separate MLeap tests and unpin pyspark #4243

harupy · 2021-04-13T03:42:54Z

What changes are proposed in this pull request?

Separate MLeap tests and unpin pyspark to use the latest version

How is this patch tested?

Existing tests

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

harupy · 2021-04-13T05:25:28Z

tests/pyfunc/test_spark.py

-    # setting this env variable is needed when using Spark with Arrow >= 0.15.0
-    # because of a change in Arrow IPC format
-    # https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html# \
-    # compatibiliy-setting-for-pyarrow--0150-and-spark-23x-24x
-    os.environ["ARROW_PRE_0_15_IPC_FORMAT"] = "1"


This causes the following error in spark 3.x:

E RuntimeError: Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT

https://github.com/mlflow/mlflow/runs/2330038685#step:6:649

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

harupy · 2021-04-13T05:44:41Z

tests/pyfunc/test_spark.py

-        with pytest.raises(Py4JJavaError):
+        with pytest.raises(pyspark.sql.utils.PythonException):
            res = data.withColumn("res1", udf("a", "b")).select("res1").toPandas()


This line doesn't raise Py4JJavaError in pyspark 3.x.

> raise converted from None E pyspark.sql.utils.PythonException: E An exception was thrown from the Python worker. Please see the stack trace below.

https://github.com/mlflow/mlflow/runs/2330220978#step:6:611

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

WeichenXu123 · 2021-04-13T12:58:02Z

dev/extra-ml-requirements.txt

@@ -40,7 +40,7 @@ onnxruntime
 # mleap format via ``mlflow.spark.log_model``, ``mlflow.spark.save_model``
 mleap
 # Required by mlflow.spark
-pyspark==2.4.0
+pyspark


pyspark>=2.4.0 ?

I don't think we need to set the minimum version.

WeichenXu123 · 2021-04-13T12:59:17Z

dev/run-python-sagemaker-tests.sh

+
+# MLeap doesn't support spark 3.x (https://github.com/combust/mleap#mleapspark-version)
+pip install pyspark==2.4.5
+pytest --verbose tests/spark/test_mleap_model_export.py --large


Do we need a separate CI action for mleap ?

I don't think we need it for now.

* unpin pyspark Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * install pyspark 2.4.5 Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * Separate mleap tests Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * Remove ARROW_PRE_0_15_IPC_FORMAT Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * fix Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * lint Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * fix tests Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * remove blankline Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * Fix error test Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * remove unused import Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> Signed-off-by: Yiqing Wang <yiqing@wangemail.com>

* unpin pyspark Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * install pyspark 2.4.5 Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * Separate mleap tests Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * Remove ARROW_PRE_0_15_IPC_FORMAT Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * fix Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * lint Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * fix tests Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * remove blankline Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * Fix error test Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> * remove unused import Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

harupy added 2 commits April 13, 2021 12:39

unpin pyspark

0150be9

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

install pyspark 2.4.5

058c75a

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

github-actions bot added the rn/none List under Small Changes in Changelogs. label Apr 13, 2021

harupy added 6 commits April 13, 2021 13:10

Separate mleap tests

1c3e87f

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

Remove ARROW_PRE_0_15_IPC_FORMAT

4124def

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

fix

76e678f

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

lint

50c6de4

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

fix tests

8fd0054

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

remove blankline

2d9c2ef

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

harupy commented Apr 13, 2021

View reviewed changes

Fix error test

6605bac

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

harupy commented Apr 13, 2021

View reviewed changes

remove unused import

35d2d6f

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

harupy changed the title ~~[WIP] Separate MLeap tests~~ Separate MLeap tests Apr 13, 2021

harupy changed the title ~~Separate MLeap tests~~ Separate MLeap tests and unpin pyspark Apr 13, 2021

harupy requested review from smurching and WeichenXu123 April 13, 2021 06:30

WeichenXu123 reviewed Apr 13, 2021

View reviewed changes

WeichenXu123 approved these changes Apr 14, 2021

View reviewed changes

harupy merged commit 77e5c44 into mlflow:master Apr 14, 2021

harupy deleted the separate-mleap branch April 14, 2021 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate MLeap tests and unpin pyspark #4243

Separate MLeap tests and unpin pyspark #4243

harupy commented Apr 13, 2021 •

edited

harupy Apr 13, 2021 •

edited

harupy Apr 13, 2021 •

edited

WeichenXu123 Apr 13, 2021

harupy Apr 13, 2021 •

edited

WeichenXu123 Apr 13, 2021

harupy Apr 13, 2021 •

edited

Separate MLeap tests and unpin pyspark #4243

Separate MLeap tests and unpin pyspark #4243

Conversation

harupy commented Apr 13, 2021 • edited

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

harupy Apr 13, 2021 • edited

Choose a reason for hiding this comment

harupy Apr 13, 2021 • edited

Choose a reason for hiding this comment

WeichenXu123 Apr 13, 2021

Choose a reason for hiding this comment

harupy Apr 13, 2021 • edited

Choose a reason for hiding this comment

WeichenXu123 Apr 13, 2021

Choose a reason for hiding this comment

harupy Apr 13, 2021 • edited

Choose a reason for hiding this comment

harupy commented Apr 13, 2021 •

edited

harupy Apr 13, 2021 •

edited

harupy Apr 13, 2021 •

edited

harupy Apr 13, 2021 •

edited

harupy Apr 13, 2021 •

edited