Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mlflow.set_tracking_uri to support pathlib.Path (#5820) #5824

Merged

Conversation

cacharle
Copy link
Contributor

@cacharle cacharle commented May 6, 2022

What changes are proposed in this pull request?

Fixes #5820

How is this patch tested?

Test if set_tracking_uri convert a path to a URI string.

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly by following the steps below.
  1. Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
    next step, otherwise fix it.
  2. Click Details on the right to open the job page of CircleCI.
  3. Click the Artifacts tab.
  4. Click docs/build/html/index.html.
  5. Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

mlflow.set_tracking_uri now supports pathlib.Path.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

@github-actions
Copy link

github-actions bot commented May 6, 2022

@cacharle Thanks for the contribution! The DCO check failed. Please sign off your commits by following the instructions here: https://github.com/mlflow/mlflow/runs/6320105654. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.rst#sign-your-work for more details.

@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels May 6, 2022
@cacharle cacharle force-pushed the 5820-allow-passing-path-to-set-tracking-uri branch from 2defdd4 to 1cf5e21 Compare May 6, 2022 09:01
Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cacharle LGTM once docs errors (https://github.com/mlflow/mlflow/pull/5824/files#r867309500) are addressed. Thanks so much for your contribution!

@cacharle cacharle force-pushed the 5820-allow-passing-path-to-set-tracking-uri branch from 26135fd to 649bbef Compare May 9, 2022 07:16
@cacharle cacharle marked this pull request as ready for review May 9, 2022 07:26
@harupy
Copy link
Member

harupy commented May 9, 2022

@cacharle I think we need to add pathlib.Path in nitpick_ignore to fix the build_doc failure:

nitpick_ignore = [

@harupy
Copy link
Member

harupy commented May 9, 2022

Can you also fix the DCO check failure?

Signed-off-by: Charles Cabergs <charles.cabergs@colruytgroup.com>
Signed-off-by: Charles Cabergs <charles.cabergs@colruytgroup.com>
@cacharle cacharle force-pushed the 5820-allow-passing-path-to-set-tracking-uri branch from 649bbef to 20f9368 Compare May 9, 2022 08:48
…erence

Signed-off-by: Charles Cabergs <charles.cabergs@colruytgroup.com>
Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@harupy harupy enabled auto-merge (squash) May 9, 2022 09:11


@pytest.mark.parametrize("absolute", [True, False], ids=["absolute", "relative"])
def test_set_tracking_uri_with_path(tmp_path, monkeypatch, absolute):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails on windows when absolute is False:

D:\a\mlflow\mlflow\mlflow\tracking\_tracking_service\utils.py:75: in set_tracking_uri
    uri = uri.resolve().as_uri()
        uri        = WindowsPath('foo/bar')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = WindowsPath('foo/bar')

    def as_uri(self):
        """Return the path as a 'file' URI."""
        if not self.is_absolute():
>           raise ValueError("relative path can't be expressed as a file URI")
E           ValueError: relative path can't be expressed as a file URI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I call resolve just before as_uri so the path should be absolute.

Copy link
Member

@harupy harupy May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call mkdir before resolve?

uri.mkdir(exist_ok=True)
uri = uri.resolve().as_uri()

Copy link
Contributor Author

@cacharle cacharle May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some test on windows with python3.7 and absolute seemed to work.

I didn't knew absolute was deprecated tho

absolute is more solid now apparently python/cpython#26153

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I checked and mlflow create the directory anyway when we call start_run.

So mkdir should be fine.

Copy link
Member

@harupy harupy May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cacharle Does this work on windows?

(Path.cwd() / uri).resolve()

Test on Linux

# absolute path
>>> (Path.cwd() / Path("foo").resolve()).resolve()
PosixPath('/home/haru/Desktop/repositories/mlflow/foo')
# relative path
>>> (Path.cwd() / Path("foo")).resolve()
PosixPath('/home/haru/Desktop/repositories/mlflow/foo')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works on windows aswell

Copy link
Member

@harupy harupy May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, let's use this to avoid creating a direcotry. Looks like you already pushed a commit.

Signed-off-by: Charles Cabergs <charles.cabergs@colruytgroup.com>
auto-merge was automatically disabled May 9, 2022 10:19

Head branch was pushed to by a user without write access

…rectory

Signed-off-by: Charles Cabergs <charles.cabergs@colruytgroup.com>
…h on Windows+Python3.7

Signed-off-by: Charles Cabergs <charles.cabergs@colruytgroup.com>
@harupy
Copy link
Member

harupy commented May 10, 2022

@cacharle Thanks for the update! Triggered the CI checks.

@cacharle
Copy link
Contributor Author

cacharle commented May 10, 2022

Well, the Path.cwd() / uri worked on my coworker's machine (I don't have access to a windows machine to run the tests on), not sure what's happening here. We could just tried to call uri.absolute() instead of uri.resolve().

Or uri.absolute().resolve() so .absolute() for the weird windows stuff and resolve for a clean uri.

…n3.7 .resolve() bug

Signed-off-by: Charles Cabergs <charles.cabergs@colruytgroup.com>
@harupy
Copy link
Member

harupy commented May 10, 2022

The test failure report indicates Path("foo/bar").resolve().as_uri() failed:

__________________ test_set_tracking_uri_with_path[relative] __________________

tmp_path = WindowsPath('C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/test_set_tracking_uri_with_pat1')
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x000001874D42C6C8>
absolute = False

    @pytest.mark.parametrize("absolute", [True, False], ids=["absolute", "relative"])
    def test_set_tracking_uri_with_path(tmp_path, monkeypatch, absolute):
        monkeypatch.chdir(tmp_path)
        path = Path("foo/bar")
        if absolute:
            path = tmp_path / path
        with mock.patch("mlflow.tracking._tracking_service.utils._tracking_uri", None):
            set_tracking_uri(path)
>           assert get_tracking_uri() == path.resolve().as_uri()

absolute   = False
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x000001874D42C6C8>
path       = WindowsPath('foo/bar')
tmp_path   = WindowsPath('C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/test_set_tracking_uri_with_pat1')

D:\a\mlflow\mlflow\tests\tracking\_tracking_service\test_utils.py:374: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = WindowsPath('foo/bar')

    def as_uri(self):
        """Return the path as a 'file' URI."""
        if not self.is_absolute():
>           raise ValueError("relative path can't be expressed as a file URI")
E           ValueError: relative path can't be expressed as a file URI

self       = WindowsPath('foo/bar')

c:\hostedtoolcache\windows\python\3.7.9\x64\lib\pathlib.py:739: ValueError

I think we can just do:

assert get_tracking_uri() == (tmp_path / path).resolve().as_uri()

Signed-off-by: Charles Cabergs <charles.cabergs@colruytgroup.com>
@harupy
Copy link
Member

harupy commented May 10, 2022

LGTM!

@harupy harupy merged commit a1822cb into mlflow:master May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Allow passing pathlib.Path to set_tracking_uri
3 participants