Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create job experiment if the start_run source is a job #5267

Merged
merged 6 commits into from
Jan 19, 2022

Conversation

sunishsheth2009
Copy link
Collaborator

What changes are proposed in this pull request?

Create job experiment if the start_run source is a job

How is this patch tested?

Added some logs:

Input to create experiment

2022/01/13 19:49:33.666 INFO MlflowBackend[tenant=868247350988813 root=ServiceMain-eec664f1855c0002 parent=HttpServer-eec664f1855ec357 op=ServerBackend-18c564a1126911d6 traceId=274a307994797ce8658da572b6af27b6 spanId=66c31d6457b930f5][MlflowBackend.scala:475]: Creating experiment: name: "job:/429899706490992"
tags {
  key: "mlflow.databricks.jobTypeInfo"
  value: "NORMAL"
}
tags {
  key: "mlflow.experiment.sourceType"
  value: "JOB"
}
tags {
  key: "mlflow.experiment.sourceId"
  value: "429899706490992"
}

Output:

2022/01/13 19:49:33.667 INFO MlflowBackend[tenant=868247350988813 root=ServiceMain-eec664f1855c0002 parent=HttpServer-eec664f1855ec357 op=ServerBackend-18c564a1126911d6 traceId=274a307994797ce8658da572b6af27b6 spanId=66c31d6457b930f5][MlflowBackend.scala:1850]: Create experiment: sourceId Some(429899706490992) and sourceType is Some(JOB)
2022/01/13 19:49:33.667 INFO MlflowBackend[tenant=868247350988813 root=ServiceMain-eec664f1855c0002 parent=HttpServer-eec664f1855ec357 op=ServerBackend-18c564a1126911d6 traceId=274a307994797ce8658da572b6af27b6 spanId=66c31d6457b930f5][MlflowBackend.scala:1854]: In job experiment use case
2022/01/13 19:49:33.846 WARN DefaultEventProcessor[root=ServiceMain-18c564a112690002 parent=Thread-1 op=ServiceMain-18c564a112690002][:]: Received HTTP error 429 for posting 2 event(s) - will retry
2022/01/13 19:49:33.846 WARN DefaultEventProcessor[root=ServiceMain-18c564a112690002 parent=Thread-1 op=ServiceMain-18c564a112690002][:]: Will retry posting 2 event(s) after 1000 milliseconds
[901.385s][info   ][gc] GC(59) Pause Young (Normal) (G1 Evacuation Pause) 235M->118M(281M) 17.621ms
2022/01/13 19:49:34.106 INFO MlflowBackend[tenant=868247350988813 root=ServiceMain-eec664f1855c0002 parent=HttpServer-eec664f1855ec374 op=ServerBackend-18c564a1126911de traceId=cc7f73f12aa9063f389db65a1747776c spanId=c2c83dbceffff1d6][MlflowBackend.scala:681]: Create run: experiment_id: "bd5ae78fb6a84d8dab899a65440fe289"
user_id: "root"
start_time: 1642103373919
tags {
  key: "mlflow.user"
  value: "root"
}
tags {
  key: "mlflow.source.name"
  value: "jobs/429899706490992/run/904118399597297"
}
tags {
  key: "mlflow.source.type"
  value: "JOB"
}
tags {
  key: "mlflow.databricks.jobID"
  value: "429899706490992"
}
tags {
  key: "mlflow.databricks.jobRunID"
  value: "904118399597297"
}
tags {
  key: "mlflow.databricks.jobType"
  value: "notebook"
}
tags {
  key: "mlflow.databricks.webappURL"
  value: "https://test-shard-sunish.dev.databricks.com"
}
tags {
  key: "mlflow.databricks.workspaceURL"
  value: "https://test-shard-sunish.dev.databricks.com"
}
tags {
  key: "mlflow.databricks.workspaceID"
  value: "868247350988813"
}
tags {
  key: "mlflow.databricks.cluster.id"
  value: "0110-230652-e9pz4u4p"
}
tags {
  key: "mlflow.databricks.notebook.commandID"
  value: "4868218422452830782_4685242605830543012_job-429899706490992-run-904118399597297-action-3598691533021880"
}

Error:

tags from fluent py {'mlflow.databricks.jobTypeInfo': 'NORMAL', 'mlflow.experiment.sourceType': 'JOB', 'mlflow.experiment.sourceId': '429899706490992'}
2022/01/13 19:49:33 INFO mlflow.tracking.fluent: Job experiment with experiment_id 'bd5ae78fb6a84d8dab899a65440fe289' created
RestException: INVALID_PARAMETER_VALUE: experiment_id parameter must be a long, found 'bd5ae78fb6a84d8dab899a65440fe289'

Error comes from here: https://livegrep.dev.databricks.com/view/databricks/universe/mlflow/src/main/scala/com/databricks/mlflow/MlflowBackend.scala#L683

Which we will fix later in follow up PRs

Job run: https://dbc-98e50cb7-ce96.dev.databricks.com/?o=868247350988813#job/429899706490992/run/989940483449209

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly by following the steps below.
  1. Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
    next step, otherwise fix it.
  2. Click Details on the right to open the job page of CircleCI.
  3. Click the Artifacts tab.
  4. Click docs/build/html/index.html.
  5. Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

@github-actions
Copy link

@sunishsheth2009 Thanks for the contribution! The DCO check failed. Please sign off your commits by following the instructions here: https://github.com/mlflow/mlflow/runs/4811177187. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.rst#sign-your-work for more details.

@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging integrations/databricks Databricks integrations rn/feature Mention under Features in Changelogs. labels Jan 14, 2022
mlflow/utils/databricks_utils.py Outdated Show resolved Hide resolved
mlflow/utils/databricks_utils.py Show resolved Hide resolved
mlflow/tracking/fluent.py Outdated Show resolved Hide resolved
mlflow/tracking/fluent.py Outdated Show resolved Hide resolved
mlflow/utils/databricks_utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@apurva-koti apurva-koti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

mlflow/utils/databricks_utils.py Outdated Show resolved Hide resolved
mlflow/tracking/fluent.py Outdated Show resolved Hide resolved
Signed-off-by: Sunish Sheth <sunishsheth2009@gmail.com>
Signed-off-by: Sunish Sheth <sunishsheth2009@gmail.com>
Signed-off-by: Sunish Sheth <sunishsheth2009@gmail.com>
Signed-off-by: Sunish Sheth <sunishsheth2009@gmail.com>
Signed-off-by: Sunish Sheth <sunishsheth2009@gmail.com>
Signed-off-by: Sunish Sheth <sunishsheth2009@gmail.com>
@sunishsheth2009 sunishsheth2009 merged commit 1ce3b5e into master Jan 19, 2022
@sunishsheth2009 sunishsheth2009 deleted the sunish-default-exp-jos branch March 10, 2022 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging integrations/databricks Databricks integrations rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants