Skip to content

Navigation Menu

Explore
For
- Enterprise
- Teams
- Startups
- Education
By Solution
Resources
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

mlflow / mlflow Public

Notifications You must be signed in to change notification settings
Fork 4k
Star 17.7k

Code
Issues 1.2k
Pull requests 285
Discussions
Actions
Projects 1
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Move utility functions used in sklearn autologging to reuse them in pyspark autologging #4252

Merged

harupy merged 17 commits into mlflow:master from harupy:move-sklearn-utils

Apr 19, 2021

Merged

Move utility functions used in sklearn autologging to reuse them in pyspark autologging #4252

harupy merged 17 commits into mlflow:master from harupy:move-sklearn-utils

Apr 19, 2021

Conversation 7 Commits 17 Checks 0 Files changed

Conversation

Copy link

Member

harupy commented Apr 15, 2021 •

edited

Signed-off-by: harupy 17039389+harupy@users.noreply.github.com

What changes are proposed in this pull request?

Move the utility functions used in sklearn autologging to reuse them in pyspark autologging.

How is this patch tested?

Existing unit tests

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: Local serving, model deployment tools, spark UDFs
area/server-infra: MLflow server, JavaScript dev server
area/tracking: Tracking Service, tracking client APIs, autologging

Interface

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Sorry, something went wrong.

All reactions


          Reorganize utility functions used in sklearn

d818bcc

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

github-actions bot added the rn/none List under Small Changes in Changelogs. label

harupy commented

View reviewed changes

mlflow/utils/autologging_utils/__init__.py Outdated

Comment on lines 434 to 443

+              def _get_training_session():
+                  """
+                  Returns a session manager for nested autologging runs.
+                  """
+                  # NOTE: The current implementation doesn't guarantee thread-safety, but that's okay for now
+                  # because:
+                  # 1. We don't currently have any use cases for allow_children=True.
+                  # 2. The list append & pop operations are thread-safe, so we will always clear the session stack
+                  #    once all _SklearnTrainingSessions exit.
+                  class _TrainingSession(object):

Copy link

Member Author

harupy Apr 15, 2021 •

edited

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created this function to avoid using the same session manager across different flavors.

_SklearnSession = _get_training_session()
_PysparkSession = _get_training_session()

Sorry, something went wrong.

All reactions

harupy added 9 commits

April 15, 2021 09:28

nit

3fa2359

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>


          add _get_fully_qualified_class_name

ed76382

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>


          Add tests

748a29e

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>


          lint

89e34ff

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>


          move file

2b7b717

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>


          refactor

2b18499

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>


          remove duplicated file

d0d9006

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>


          generate _TrainingSession in each test

609cb9f

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>


          Fix test

d592c8e

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

harupy commented

View reviewed changes

tests/utils/test_utils.py

                   with pytest.raises(ValueError):
                       get_unique_resource_id(max_length=0)
+              def test_truncate_dict():

Copy link

Member Author

harupy Apr 15, 2021 •

edited

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few tests for _chunk_dict, _truncate_dict, _get_fully_qualified_class_name

Sorry, something went wrong.

All reactions


          run test_training_session in large test

cba2c30

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

harupy changed the title ~~Reorganize utility functions used in sklearn to reuse them in pyspark autologging~~ Move utility functions used in sklearn to reuse them in pyspark autologging

harupy changed the title ~~Move utility functions used in sklearn to reuse them in pyspark autologging~~ Move utility functions used in sklearn autologging to reuse them in pyspark autologging


          fix test

867a1a4

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

harupy commented

View reviewed changes

mlflow/ml-package-versions.yml

@@ @@ -16,7 +16,6 @@ sklearn: @@
                   maximum: "0.24.1"
                   requirements: ["matplotlib"]
                   run: |
-                    pytest tests/sklearn/test_sklearn_training_session.py --large

Copy link

Member Author

harupy Apr 15, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/sklearn/test_sklearn_training_session.py has been renamed to tests/autologging/test_training_session.py which is executed in dev/run-python-flavor-tests.sh.

Sorry, something went wrong.

All reactions


          docstrings

b14250b

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

WeichenXu123 reviewed

View reviewed changes

mlflow/utils/autologging_utils/__init__.py Outdated Show resolved Hide resolved

WeichenXu123 reviewed

View reviewed changes

tests/autologging/test_training_session.py Outdated Show resolved Hide resolved


          Address comments

322d066

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

WeichenXu123 approved these changes

View reviewed changes

harupy added 3 commits

April 19, 2021 12:02


          resolve conflicts

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>


          use util functions

4dc9a0e

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

fix

d9033da

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

harupy merged commit c603c37 into mlflow:master

harupy deleted the move-sklearn-utils branch

April 19, 2021 03:48

YQ-Wang pushed a commit to YQ-Wang/mlflow that referenced this pull request


          Move utility functions used in sklearn autologging to reuse them in p…

e1e5bfa

…yspark autologging (mlflow#4252)

* Reorganize utility functions used in sklearn

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* nit

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* add _get_fully_qualified_class_name

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Add tests

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* lint

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* move file

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* refactor

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* remove duplicated file

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* generate _TrainingSession in each test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Fix test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* run test_training_session in large test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* docstrings

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Address comments

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* use util functions

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: Yiqing Wang <yiqing@wangemail.com>

harupy added a commit to wamartin-aml/mlflow that referenced this pull request


          Move utility functions used in sklearn autologging to reuse them in p…

835757a

…yspark autologging (mlflow#4252)

* Reorganize utility functions used in sklearn

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* nit

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* add _get_fully_qualified_class_name

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Add tests

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* lint

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* move file

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* refactor

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* remove duplicated file

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* generate _TrainingSession in each test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Fix test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* run test_training_session in large test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* docstrings

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Address comments

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* use util functions

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

WeichenXu123 WeichenXu123 approved these changes

Assignees

No one assigned

Labels

List under Small Changes in Changelogs.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

None yet

2 participants

Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.