Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce get_parent_run fluent API #8493

Merged
merged 7 commits into from
May 24, 2023

Conversation

annzhang-db
Copy link
Collaborator

@annzhang-db annzhang-db commented May 22, 2023

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

  • Add get_parent_run to mlflow client API
  • Add get_parent_run fluent API

How is this patch tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests (describe details, including test results, below)

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly in the documentation preview.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Introduce mlflow.get_parent_run() fluent API

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
@mlflow-automation
Copy link
Collaborator

mlflow-automation commented May 22, 2023

Documentation preview for 6a6ff89 will be available here when this CircleCI job completes successfully.

More info

@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels May 22, 2023
Copy link
Collaborator

@prithvikannan prithvikannan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @annzhang-db . Can we also extend this to the mlflow client APIs (see mlflow/tracking/client.py)? Similar to the way we call MlflowClient().get_run(run_id), it would be helpful to call MlflowClient().get_parent_run(run_id).

Copy link
Collaborator

@sunishsheth2009 sunishsheth2009 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me :) Awesome work

@@ -518,6 +518,43 @@ def get_run(run_id: str) -> Run:
return MlflowClient().get_run(run_id)


def get_parent_run(run_id: str) -> Optional[Run]:
"""
Gets the parent run for the given run id.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gets the parent run for the given run id if one exists

Comment on lines 551 to 555
child_run = MlflowClient().get_run(run_id)
parent_run_id = child_run.data.tags.get(MLFLOW_PARENT_RUN_ID)
if parent_run_id is None:
return None
return MlflowClient().get_run(parent_run_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
child_run = MlflowClient().get_run(run_id)
parent_run_id = child_run.data.tags.get(MLFLOW_PARENT_RUN_ID)
if parent_run_id is None:
return None
return MlflowClient().get_run(parent_run_id)
client = MlflowClient()
child_run = client.get_run(run_id)
parent_run_id = child_run.data.tags.get(MLFLOW_PARENT_RUN_ID)
if parent_run_id is None:
return None
return client.get_run(parent_run_id)

Can we reuse the client?

Comment on lines 1258 to 1269
def test_get_parent_run():
parent_run_id = mlflow.start_run().info.run_id
mlflow.log_param("a", 1)
mlflow.log_metric("b", 2.0)
child_run_id = mlflow.start_run(nested=True).info.run_id
mlflow.end_run()
mlflow.end_run()

parent_run = mlflow.get_parent_run(child_run_id)
assert parent_run.info.run_id == parent_run_id
assert parent_run.data.metrics == {"b": 2.0}
assert parent_run.data.params == {"a": "1"}
Copy link
Member

@harupy harupy May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a test that ensures get_parent_run returns None if the parent run doesn't exist?

Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comments, otherwise LGTM!

Comment on lines 1259 to 1264
parent_run_id = mlflow.start_run().info.run_id
mlflow.log_param("a", 1)
mlflow.log_metric("b", 2.0)
child_run_id = mlflow.start_run(nested=True).info.run_id
mlflow.end_run()
mlflow.end_run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parent_run_id = mlflow.start_run().info.run_id
mlflow.log_param("a", 1)
mlflow.log_metric("b", 2.0)
child_run_id = mlflow.start_run(nested=True).info.run_id
mlflow.end_run()
mlflow.end_run()
with mlflow.start_run() as parent:
mlflow.log_param("a", 1)
mlflow.log_metric("b", 2.0)
with mlflow.start_run(nested=True) as child:

Can we use with mlflow.start_run()?

Copy link
Collaborator

@WeichenXu123 WeichenXu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once @harupy 's comments are addressed

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
Copy link
Collaborator

@prithvikannan prithvikannan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @annzhang-db ! Just one small nit comment and a test

with mlflow.start_run(nested=True) as child_run:
child_run_id = child_run.info.run_id

parent_run = mlflow.get_parent_run(child_run_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we update this to demonstrate the MlflowClient approach? something like

parent_run = MlflowClient().get_parent_run

child_run_id: 7d175204675e40328e46d9a6a5a7ee6a
parent_run_id: 8979459433a24a52ab3be87a229a9cdf
"""
return MlflowClient().get_parent_run(run_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat!

mlflow/tracking/client.py Show resolved Hide resolved
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
@annzhang-db annzhang-db enabled auto-merge (squash) May 24, 2023 17:35
@annzhang-db annzhang-db merged commit 9ec828d into mlflow:master May 24, 2023
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants