-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add autodetection of job environments to R client #2272
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I had just a minor comment / suggestion.
experiment_id = experiment_id %||% notebook_info$id, | ||
... | ||
) | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be cleaner to do this as if(notebook) {...} elseif (job) {...} else {nextMethod()} ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would look cleaner, but I think we'd have to fetch the job info prior to the if/else-if/else block to clean this up, resulting in extra function call within Databricks notebooks. Let me know if you see a way around this / think that's a better idea than the current solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. I think that's ok then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
experiment_id = experiment_id %||% notebook_info$id, | ||
... | ||
) | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. I think that's ok then.
* Add autodetection of job environments to R client (mlflow#2272) * R client detection * Efficiency * Simplify * Return tags * Add test cases * Lint * Tweak test name * EarlyStopping Callback support for Keras Autologging (mlflow#2219) Keras autologging now supports the EarlyStopping callback. If EarlyStopping.restore_best_weights==True, then the metrics of the restored model will be logged as an extra step. * Add REPL-aware listener for Spark datasource autologging (mlflow#2249) Add REPL-aware listener for Spark datasource autologging (mlflow#2249) * Fix XGBoost and LightGBM flavor tests (mlflow#2244) Add objective and num_class to xgb.train() and lgb.train() because they try to solve a regression task by default, but the iris dataset is a dataset for a classification task. * Add status in RunView and ExperimentRunsTable (mlflow#1816) * Changes needed to support the sqlplugin (mlflow#2285) * Changes needed to support the sqlplugin * Edited plugin name in setup file * Updated plugin name in setup file Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> * remove version suffix dev0 * add pillow fix Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> Co-authored-by: juntai-zheng <39497939+juntai-zheng@users.noreply.github.com> Co-authored-by: Siddharth Murching <smurching@gmail.com> Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com> Co-authored-by: Nicolas Laille <nlaille@users.noreply.github.com> Co-authored-by: Avrilia Floratou <avflor@microsoft.com>
* Add autodetection of job environments to R client (mlflow#2272) * R client detection * Efficiency * Simplify * Return tags * Add test cases * Lint * Tweak test name * EarlyStopping Callback support for Keras Autologging (mlflow#2219) Keras autologging now supports the EarlyStopping callback. If EarlyStopping.restore_best_weights==True, then the metrics of the restored model will be logged as an extra step. * Add REPL-aware listener for Spark datasource autologging (mlflow#2249) Add REPL-aware listener for Spark datasource autologging (mlflow#2249) * Fix XGBoost and LightGBM flavor tests (mlflow#2244) Add objective and num_class to xgb.train() and lgb.train() because they try to solve a regression task by default, but the iris dataset is a dataset for a classification task. * Add status in RunView and ExperimentRunsTable (mlflow#1816) * Changes needed to support the sqlplugin (mlflow#2285) * Changes needed to support the sqlplugin * Edited plugin name in setup file * Updated plugin name in setup file Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> * Elevate how to load mleap flavor (mlflow#2211) * Elevate how to load mleap flavor. * Load using loadPipeline. * Get rid of extra char. Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> * pillow fix (mlflow#2307) * Add EarlyStopping integration to TensorFlow.Keras autologging (mlflow#2301) Merging TF.Keras EarlyStopping integration * Document MLflow plugin system (mlflow#2270) Adds an mlflow.org doc page explaining how to write & use MLflow plugins. Co-authored-by: dbczumar <39497902+dbczumar@users.noreply.github.com> Co-authored-by: juntai-zheng <39497939+juntai-zheng@users.noreply.github.com> Co-authored-by: Siddharth Murching <smurching@gmail.com> Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com> Co-authored-by: Nicolas Laille <nlaille@users.noreply.github.com> Co-authored-by: Avrilia Floratou <avflor@microsoft.com> Co-authored-by: Stephanie Bodoff <stephanie.bodoff@databricks.com>
* R client detection * Efficiency * Simplify * Return tags * Add test cases * Lint * Tweak test name
What changes are proposed in this pull request?
Adds autodetection of Databricks job environments to the MLflow R client.
How is this patch tested?
These MLflow client changes were tested manually on Databricks against an updated Spark image that defines subroutines for fetching job information from the SparkR driver.
Unit tests are pending
Release Notes
The MLflow R client now detects and records Databricks Job information when the client is used within a Databricks Job.
Is this a user-facing change?
What component(s) does this PR affect?
How should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes