Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move utility functions used in sklearn autologging to reuse them in pyspark autologging #4252

Merged
merged 17 commits into from
Apr 19, 2021

Conversation

harupy
Copy link
Member

@harupy harupy commented Apr 15, 2021

Signed-off-by: harupy 17039389+harupy@users.noreply.github.com

What changes are proposed in this pull request?

Move the utility functions used in sklearn autologging to reuse them in pyspark autologging.

How is this patch tested?

Existing unit tests

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: Local serving, model deployment tools, spark UDFs
  • area/server-infra: MLflow server, JavaScript dev server
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, JavaScript, plotting
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@github-actions github-actions bot added the rn/none List under Small Changes in Changelogs. label Apr 15, 2021
Comment on lines 434 to 443
def _get_training_session():
"""
Returns a session manager for nested autologging runs.
"""
# NOTE: The current implementation doesn't guarantee thread-safety, but that's okay for now
# because:
# 1. We don't currently have any use cases for allow_children=True.
# 2. The list append & pop operations are thread-safe, so we will always clear the session stack
# once all _SklearnTrainingSessions exit.
class _TrainingSession(object):
Copy link
Member Author

@harupy harupy Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created this function to avoid using the same session manager across different flavors.

_SklearnSession = _get_training_session()
_PysparkSession = _get_training_session()

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@@ -15,3 +21,64 @@ def test_get_unique_resource_id_with_invalid_max_length_throws_exception():

with pytest.raises(ValueError):
get_unique_resource_id(max_length=0)


def test_truncate_dict():
Copy link
Member Author

@harupy harupy Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few tests for _chunk_dict, _truncate_dict, _get_fully_qualified_class_name

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy changed the title Reorganize utility functions used in sklearn to reuse them in pyspark autologging Move utility functions used in sklearn to reuse them in pyspark autologging Apr 15, 2021
@harupy harupy changed the title Move utility functions used in sklearn to reuse them in pyspark autologging Move utility functions used in sklearn autologging to reuse them in pyspark autologging Apr 15, 2021
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@@ -16,7 +16,6 @@ sklearn:
maximum: "0.24.1"
requirements: ["matplotlib"]
run: |
pytest tests/sklearn/test_sklearn_training_session.py --large
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/sklearn/test_sklearn_training_session.py has been renamed to tests/autologging/test_training_session.py which is executed in dev/run-python-flavor-tests.sh.

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy merged commit c603c37 into mlflow:master Apr 19, 2021
@harupy harupy deleted the move-sklearn-utils branch April 19, 2021 03:48
YQ-Wang pushed a commit to YQ-Wang/mlflow that referenced this pull request May 29, 2021
…yspark autologging (mlflow#4252)

* Reorganize utility functions used in sklearn

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* nit

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* add _get_fully_qualified_class_name

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Add tests

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* lint

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* move file

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* refactor

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* remove duplicated file

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* generate _TrainingSession in each test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Fix test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* run test_training_session in large test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* docstrings

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Address comments

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* use util functions

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: Yiqing Wang <yiqing@wangemail.com>
harupy added a commit to wamartin-aml/mlflow that referenced this pull request Jun 7, 2021
…yspark autologging (mlflow#4252)

* Reorganize utility functions used in sklearn

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* nit

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* add _get_fully_qualified_class_name

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Add tests

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* lint

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* move file

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* refactor

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* remove duplicated file

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* generate _TrainingSession in each test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Fix test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* run test_training_session in large test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* docstrings

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Address comments

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* use util functions

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rn/none List under Small Changes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants