Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate MLeap tests and unpin pyspark #4243

Merged
merged 10 commits into from
Apr 14, 2021
Merged

Conversation

harupy
Copy link
Member

@harupy harupy commented Apr 13, 2021

What changes are proposed in this pull request?

Separate MLeap tests and unpin pyspark to use the latest version

How is this patch tested?

Existing tests

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: Local serving, model deployment tools, spark UDFs
  • area/server-infra: MLflow server, JavaScript dev server
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, JavaScript, plotting
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@github-actions github-actions bot added the rn/none List under Small Changes in Changelogs. label Apr 13, 2021
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Comment on lines -57 to -61
# setting this env variable is needed when using Spark with Arrow >= 0.15.0
# because of a change in Arrow IPC format
# https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html# \
# compatibiliy-setting-for-pyarrow--0150-and-spark-23x-24x
os.environ["ARROW_PRE_0_15_IPC_FORMAT"] = "1"
Copy link
Member Author

@harupy harupy Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes the following error in spark 3.x:

E           RuntimeError: Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT

https://github.com/mlflow/mlflow/runs/2330038685#step:6:649

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Comment on lines -143 to 139
with pytest.raises(Py4JJavaError):
with pytest.raises(pyspark.sql.utils.PythonException):
res = data.withColumn("res1", udf("a", "b")).select("res1").toPandas()
Copy link
Member Author

@harupy harupy Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line doesn't raise Py4JJavaError in pyspark 3.x.

>               raise converted from None
E               pyspark.sql.utils.PythonException: 
E                 An exception was thrown from the Python worker. Please see the stack trace below.

https://github.com/mlflow/mlflow/runs/2330220978#step:6:611

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy changed the title [WIP] Separate MLeap tests Separate MLeap tests Apr 13, 2021
@harupy harupy changed the title Separate MLeap tests Separate MLeap tests and unpin pyspark Apr 13, 2021
@@ -40,7 +40,7 @@ onnxruntime
# mleap format via ``mlflow.spark.log_model``, ``mlflow.spark.save_model``
mleap
# Required by mlflow.spark
pyspark==2.4.0
pyspark
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyspark>=2.4.0 ?

Copy link
Member Author

@harupy harupy Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to set the minimum version.


# MLeap doesn't support spark 3.x (https://github.com/combust/mleap#mleapspark-version)
pip install pyspark==2.4.5
pytest --verbose tests/spark/test_mleap_model_export.py --large
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate CI action for mleap ?

Copy link
Member Author

@harupy harupy Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need it for now.

@harupy harupy merged commit 77e5c44 into mlflow:master Apr 14, 2021
@harupy harupy deleted the separate-mleap branch April 14, 2021 07:36
YQ-Wang pushed a commit to YQ-Wang/mlflow that referenced this pull request May 29, 2021
* unpin pyspark

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* install pyspark 2.4.5

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Separate mleap tests

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Remove ARROW_PRE_0_15_IPC_FORMAT

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* lint

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix tests

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* remove blankline

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Fix error test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* remove unused import

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: Yiqing Wang <yiqing@wangemail.com>
harupy added a commit to wamartin-aml/mlflow that referenced this pull request Jun 7, 2021
* unpin pyspark

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* install pyspark 2.4.5

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Separate mleap tests

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Remove ARROW_PRE_0_15_IPC_FORMAT

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* lint

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix tests

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* remove blankline

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Fix error test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* remove unused import

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rn/none List under Small Changes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants