Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add autodetection of job environments to R client #2272

Merged
merged 9 commits into from
Jan 13, 2020

Conversation

dbczumar
Copy link
Collaborator

@dbczumar dbczumar commented Jan 8, 2020

What changes are proposed in this pull request?

Adds autodetection of Databricks job environments to the MLflow R client.

How is this patch tested?

These MLflow client changes were tested manually on Databricks against an updated Spark image that defines subroutines for fetching job information from the SparkR driver.

Unit tests are pending

Release Notes

The MLflow R client now detects and records Databricks Job information when the client is used within a Databricks Job.

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s) does this PR affect?

  • UI
  • CLI
  • API
  • REST-API
  • Examples
  • Docs
  • Tracking
  • Projects
  • Artifacts
  • Models
  • Scoring
  • Serving
  • R
  • Java
  • Python

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

@dbczumar dbczumar added the rn/feature Mention under Features in Changelogs. label Jan 8, 2020
Copy link
Contributor

@tomasatdatabricks tomasatdatabricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I had just a minor comment / suggestion.

experiment_id = experiment_id %||% notebook_info$id,
...
)
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be cleaner to do this as if(notebook) {...} elseif (job) {...} else {nextMethod()} ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would look cleaner, but I think we'd have to fetch the job info prior to the if/else-if/else block to clean this up, resulting in extra function call within Databricks notebooks. Let me know if you see a way around this / think that's a better idea than the current solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. I think that's ok then.

Copy link
Contributor

@tomasatdatabricks tomasatdatabricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

experiment_id = experiment_id %||% notebook_info$id,
...
)
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. I think that's ok then.

@dbczumar dbczumar merged commit 3b8f9d8 into mlflow:master Jan 13, 2020
amrqt added a commit to criteo-forks/mlflow that referenced this pull request Jan 15, 2020
* Add autodetection of job environments to R client (mlflow#2272)

* R client detection

* Efficiency

* Simplify

* Return tags

* Add test cases

* Lint

* Tweak test name

* EarlyStopping Callback support for Keras Autologging (mlflow#2219)

Keras autologging now supports the EarlyStopping callback. If EarlyStopping.restore_best_weights==True, then the metrics of the restored model will be logged as an extra step.

* Add REPL-aware listener for Spark datasource autologging (mlflow#2249)

Add REPL-aware listener for Spark datasource autologging (mlflow#2249)

* Fix XGBoost and LightGBM flavor tests (mlflow#2244)

Add objective and num_class to xgb.train() and lgb.train() because they try to solve a regression task by default, but the iris dataset is a dataset for a classification task.

* Add status in RunView and ExperimentRunsTable (mlflow#1816)

* Changes needed to support the sqlplugin (mlflow#2285)

* Changes needed to support the sqlplugin

* Edited plugin name in setup file

* Updated plugin name in setup file

Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com>

* remove version suffix dev0

* add pillow fix

Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: juntai-zheng <39497939+juntai-zheng@users.noreply.github.com>
Co-authored-by: Siddharth Murching <smurching@gmail.com>
Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com>
Co-authored-by: Nicolas Laille <nlaille@users.noreply.github.com>
Co-authored-by: Avrilia Floratou <avflor@microsoft.com>
amrqt added a commit to criteo-forks/mlflow that referenced this pull request Jan 16, 2020
* Add autodetection of job environments to R client (mlflow#2272)

* R client detection

* Efficiency

* Simplify

* Return tags

* Add test cases

* Lint

* Tweak test name

* EarlyStopping Callback support for Keras Autologging (mlflow#2219)

Keras autologging now supports the EarlyStopping callback. If EarlyStopping.restore_best_weights==True, then the metrics of the restored model will be logged as an extra step.

* Add REPL-aware listener for Spark datasource autologging (mlflow#2249)

Add REPL-aware listener for Spark datasource autologging (mlflow#2249)

* Fix XGBoost and LightGBM flavor tests (mlflow#2244)

Add objective and num_class to xgb.train() and lgb.train() because they try to solve a regression task by default, but the iris dataset is a dataset for a classification task.

* Add status in RunView and ExperimentRunsTable (mlflow#1816)

* Changes needed to support the sqlplugin (mlflow#2285)

* Changes needed to support the sqlplugin

* Edited plugin name in setup file

* Updated plugin name in setup file

Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com>

* Elevate how to load mleap flavor (mlflow#2211)

* Elevate how to load mleap flavor.

* Load using loadPipeline.

* Get rid of extra char.

Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com>

* pillow fix (mlflow#2307)

* Add EarlyStopping integration to TensorFlow.Keras autologging (mlflow#2301)

Merging TF.Keras EarlyStopping integration

* Document MLflow plugin system (mlflow#2270)

Adds an mlflow.org doc page explaining how to write & use MLflow plugins.

Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: juntai-zheng <39497939+juntai-zheng@users.noreply.github.com>
Co-authored-by: Siddharth Murching <smurching@gmail.com>
Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com>
Co-authored-by: Nicolas Laille <nlaille@users.noreply.github.com>
Co-authored-by: Avrilia Floratou <avflor@microsoft.com>
Co-authored-by: Stephanie Bodoff <stephanie.bodoff@databricks.com>
avflor pushed a commit to avflor/mlflow that referenced this pull request Aug 22, 2020
* R client detection

* Efficiency

* Simplify

* Return tags

* Add test cases

* Lint

* Tweak test name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants