Add autodetection of job environments to R client #2272

dbczumar · 2020-01-08T02:06:05Z

What changes are proposed in this pull request?

Adds autodetection of Databricks job environments to the MLflow R client.

How is this patch tested?

These MLflow client changes were tested manually on Databricks against an updated Spark image that defines subroutines for fetching job information from the SparkR driver.

Unit tests are pending

Release Notes

The MLflow R client now detects and records Databricks Job information when the client is used within a Databricks Job.

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s) does this PR affect?

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

tomasatdatabricks

Looks good, I had just a minor comment / suggestion.

tomasatdatabricks · 2020-01-10T23:23:14Z

mlflow/R/mlflow/R/databricks-utils.R

        experiment_id = experiment_id %||% notebook_info$id,
        ...
-      )
+      ))


Would it be cleaner to do this as if(notebook) {...} elseif (job) {...} else {nextMethod()} ?

This would look cleaner, but I think we'd have to fetch the job info prior to the if/else-if/else block to clean this up, resulting in extra function call within Databricks notebooks. Let me know if you see a way around this / think that's a better idea than the current solution.

Oh I see. I think that's ok then.

tomasatdatabricks

lgtm

tomasatdatabricks · 2020-01-13T18:35:31Z

mlflow/R/mlflow/R/databricks-utils.R

        experiment_id = experiment_id %||% notebook_info$id,
        ...
-      )
+      ))


Oh I see. I think that's ok then.

* Add autodetection of job environments to R client (mlflow#2272) * R client detection * Efficiency * Simplify * Return tags * Add test cases * Lint * Tweak test name * EarlyStopping Callback support for Keras Autologging (mlflow#2219) Keras autologging now supports the EarlyStopping callback. If EarlyStopping.restore_best_weights==True, then the metrics of the restored model will be logged as an extra step. * Add REPL-aware listener for Spark datasource autologging (mlflow#2249) Add REPL-aware listener for Spark datasource autologging (mlflow#2249) * Fix XGBoost and LightGBM flavor tests (mlflow#2244) Add objective and num_class to xgb.train() and lgb.train() because they try to solve a regression task by default, but the iris dataset is a dataset for a classification task. * Add status in RunView and ExperimentRunsTable (mlflow#1816) * Changes needed to support the sqlplugin (mlflow#2285) * Changes needed to support the sqlplugin * Edited plugin name in setup file * Updated plugin name in setup file Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> * remove version suffix dev0 * add pillow fix Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> Co-authored-by: juntai-zheng <39497939+juntai-zheng@users.noreply.github.com> Co-authored-by: Siddharth Murching <smurching@gmail.com> Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com> Co-authored-by: Nicolas Laille <nlaille@users.noreply.github.com> Co-authored-by: Avrilia Floratou <avflor@microsoft.com>

* Add autodetection of job environments to R client (mlflow#2272) * R client detection * Efficiency * Simplify * Return tags * Add test cases * Lint * Tweak test name * EarlyStopping Callback support for Keras Autologging (mlflow#2219) Keras autologging now supports the EarlyStopping callback. If EarlyStopping.restore_best_weights==True, then the metrics of the restored model will be logged as an extra step. * Add REPL-aware listener for Spark datasource autologging (mlflow#2249) Add REPL-aware listener for Spark datasource autologging (mlflow#2249) * Fix XGBoost and LightGBM flavor tests (mlflow#2244) Add objective and num_class to xgb.train() and lgb.train() because they try to solve a regression task by default, but the iris dataset is a dataset for a classification task. * Add status in RunView and ExperimentRunsTable (mlflow#1816) * Changes needed to support the sqlplugin (mlflow#2285) * Changes needed to support the sqlplugin * Edited plugin name in setup file * Updated plugin name in setup file Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> * Elevate how to load mleap flavor (mlflow#2211) * Elevate how to load mleap flavor. * Load using loadPipeline. * Get rid of extra char. Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> * pillow fix (mlflow#2307) * Add EarlyStopping integration to TensorFlow.Keras autologging (mlflow#2301) Merging TF.Keras EarlyStopping integration * Document MLflow plugin system (mlflow#2270) Adds an mlflow.org doc page explaining how to write & use MLflow plugins. Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> Co-authored-by: juntai-zheng <39497939+juntai-zheng@users.noreply.github.com> Co-authored-by: Siddharth Murching <smurching@gmail.com> Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com> Co-authored-by: Nicolas Laille <nlaille@users.noreply.github.com> Co-authored-by: Avrilia Floratou <avflor@microsoft.com> Co-authored-by: Stephanie Bodoff <stephanie.bodoff@databricks.com>

* R client detection * Efficiency * Simplify * Return tags * Add test cases * Lint * Tweak test name

dbczumar added 4 commits December 28, 2019 00:27

R client detection

6cf0117

Efficiency

6f359a3

Simplify

3188f6e

Return tags

c8b0b88

dbczumar added the rn/feature Mention under Features in Changelogs. label Jan 8, 2020

dbczumar requested a review from tomasatdatabricks January 8, 2020 02:06

dbczumar added 4 commits January 10, 2020 15:09

Add test cases

97f7fbc

Lint

913b79e

Merge remote-tracking branch 'origin/master' into r_job_detect

91e0b3d

Tweak test name

bf0b6ec

tomasatdatabricks reviewed Jan 10, 2020

View reviewed changes

Merge remote-tracking branch 'origin/master' into r_job_detect

1bb86e5

tomasatdatabricks approved these changes Jan 13, 2020

View reviewed changes

dbczumar merged commit 3b8f9d8 into mlflow:master Jan 13, 2020

avflor pushed a commit to avflor/mlflow that referenced this pull request Aug 22, 2020

Add autodetection of job environments to R client (mlflow#2272)

8b8fea5

* R client detection * Efficiency * Simplify * Return tags * Add test cases * Lint * Tweak test name

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add autodetection of job environments to R client #2272

Add autodetection of job environments to R client #2272

dbczumar commented Jan 8, 2020

tomasatdatabricks left a comment

tomasatdatabricks Jan 10, 2020

dbczumar Jan 13, 2020

tomasatdatabricks Jan 13, 2020

tomasatdatabricks left a comment

tomasatdatabricks Jan 13, 2020

Add autodetection of job environments to R client #2272

Add autodetection of job environments to R client #2272

Conversation

dbczumar commented Jan 8, 2020

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s) does this PR affect?

How should the PR be classified in the release notes? Choose one:

tomasatdatabricks left a comment

Choose a reason for hiding this comment

tomasatdatabricks Jan 10, 2020

Choose a reason for hiding this comment

dbczumar Jan 13, 2020

Choose a reason for hiding this comment

tomasatdatabricks Jan 13, 2020

Choose a reason for hiding this comment

tomasatdatabricks left a comment

Choose a reason for hiding this comment

tomasatdatabricks Jan 13, 2020

Choose a reason for hiding this comment