-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MLflow Dataset Tracking #8186
MLflow Dataset Tracking #8186
Conversation
* Source reg Signed-off-by: dbczumar <corey.zumar@databricks.com> * Rename Signed-off-by: dbczumar <corey.zumar@databricks.com> * Registry Signed-off-by: dbczumar <corey.zumar@databricks.com> * Data Signed-off-by: dbczumar <corey.zumar@databricks.com> * Rename data.py Signed-off-by: dbczumar <corey.zumar@databricks.com> * Dataset sources Signed-off-by: dbczumar <corey.zumar@databricks.com> * Partial Signed-off-by: dbczumar <corey.zumar@databricks.com> * Sources Signed-off-by: dbczumar <corey.zumar@databricks.com> * More Signed-off-by: dbczumar <corey.zumar@databricks.com> * Source Signed-off-by: dbczumar <corey.zumar@databricks.com> * done Signed-off-by: dbczumar <corey.zumar@databricks.com> * dbfs data source Signed-off-by: dbczumar <corey.zumar@databricks.com> * More Signed-off-by: dbczumar <corey.zumar@databricks.com> * Dummy Signed-off-by: dbczumar <corey.zumar@databricks.com> * Tweaks Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some docs Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some docstrings Signed-off-by: dbczumar <corey.zumar@databricks.com> * Progress Signed-off-by: dbczumar <corey.zumar@databricks.com> * More Signed-off-by: dbczumar <corey.zumar@databricks.com> * Working pandas Signed-off-by: dbczumar <corey.zumar@databricks.com> * Pandas works :D Signed-off-by: dbczumar <corey.zumar@databricks.com> * Colspec in schema Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test structure Signed-off-by: dbczumar <corey.zumar@databricks.com> * Move Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some tests Signed-off-by: dbczumar <corey.zumar@databricks.com> * Suite Signed-off-by: dbczumar <corey.zumar@databricks.com> * fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Many test Signed-off-by: dbczumar <corey.zumar@databricks.com> * Add files Signed-off-by: dbczumar <corey.zumar@databricks.com> * Blacken Signed-off-by: dbczumar <corey.zumar@databricks.com> * CI Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove todo Signed-off-by: dbczumar <corey.zumar@databricks.com> * Simplify Signed-off-by: dbczumar <corey.zumar@databricks.com> * Resource init Signed-off-by: dbczumar <corey.zumar@databricks.com> * Removals Signed-off-by: dbczumar <corey.zumar@databricks.com> * Tweak Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove unused Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix tests, rename Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove dataset stuff Signed-off-by: dbczumar <corey.zumar@databricks.com> * More docstrings Signed-off-by: dbczumar <corey.zumar@databricks.com> * Better docs Signed-off-by: dbczumar <corey.zumar@databricks.com> * Blank init Signed-off-by: dbczumar <corey.zumar@databricks.com> * Restore file Signed-off-by: dbczumar <corey.zumar@databricks.com> * Datasets files Signed-off-by: dbczumar <corey.zumar@databricks.com> * Lint Signed-off-by: dbczumar <corey.zumar@databricks.com> * Better docstrings Signed-off-by: dbczumar <corey.zumar@databricks.com> * Docstring Signed-off-by: dbczumar <corey.zumar@databricks.com> --------- Signed-off-by: dbczumar <corey.zumar@databricks.com>
…ies (#8051) * Source reg Signed-off-by: dbczumar <corey.zumar@databricks.com> * Rename Signed-off-by: dbczumar <corey.zumar@databricks.com> * Registry Signed-off-by: dbczumar <corey.zumar@databricks.com> * Data Signed-off-by: dbczumar <corey.zumar@databricks.com> * Rename data.py Signed-off-by: dbczumar <corey.zumar@databricks.com> * Dataset sources Signed-off-by: dbczumar <corey.zumar@databricks.com> * Partial Signed-off-by: dbczumar <corey.zumar@databricks.com> * Sources Signed-off-by: dbczumar <corey.zumar@databricks.com> * More Signed-off-by: dbczumar <corey.zumar@databricks.com> * Source Signed-off-by: dbczumar <corey.zumar@databricks.com> * done Signed-off-by: dbczumar <corey.zumar@databricks.com> * dbfs data source Signed-off-by: dbczumar <corey.zumar@databricks.com> * More Signed-off-by: dbczumar <corey.zumar@databricks.com> * Dummy Signed-off-by: dbczumar <corey.zumar@databricks.com> * Tweaks Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some docs Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some docstrings Signed-off-by: dbczumar <corey.zumar@databricks.com> * Progress Signed-off-by: dbczumar <corey.zumar@databricks.com> * More Signed-off-by: dbczumar <corey.zumar@databricks.com> * Working pandas Signed-off-by: dbczumar <corey.zumar@databricks.com> * Pandas works :D Signed-off-by: dbczumar <corey.zumar@databricks.com> * Colspec in schema Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test structure Signed-off-by: dbczumar <corey.zumar@databricks.com> * Move Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some tests Signed-off-by: dbczumar <corey.zumar@databricks.com> * Suite Signed-off-by: dbczumar <corey.zumar@databricks.com> * fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Many test Signed-off-by: dbczumar <corey.zumar@databricks.com> * Add files Signed-off-by: dbczumar <corey.zumar@databricks.com> * Blacken Signed-off-by: dbczumar <corey.zumar@databricks.com> * CI Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove todo Signed-off-by: dbczumar <corey.zumar@databricks.com> * Simplify Signed-off-by: dbczumar <corey.zumar@databricks.com> * Resource init Signed-off-by: dbczumar <corey.zumar@databricks.com> * Removals Signed-off-by: dbczumar <corey.zumar@databricks.com> * Tweak Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove unused Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix tests, rename Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove dataset stuff Signed-off-by: dbczumar <corey.zumar@databricks.com> * More docstrings Signed-off-by: dbczumar <corey.zumar@databricks.com> * Better docs Signed-off-by: dbczumar <corey.zumar@databricks.com> * Blank init Signed-off-by: dbczumar <corey.zumar@databricks.com> * Restore file Signed-off-by: dbczumar <corey.zumar@databricks.com> * Datasets files Signed-off-by: dbczumar <corey.zumar@databricks.com> * Lint Signed-off-by: dbczumar <corey.zumar@databricks.com> * Better docstrings Signed-off-by: dbczumar <corey.zumar@databricks.com> * Docstring Signed-off-by: dbczumar <corey.zumar@databricks.com> * Register artifact sources Signed-off-by: dbczumar <corey.zumar@databricks.com> * Get it working Signed-off-by: dbczumar <corey.zumar@databricks.com> * artifact DS Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some test coverage Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test, docstring Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix windows Signed-off-by: dbczumar <corey.zumar@databricks.com> * Assert on content Signed-off-by: dbczumar <corey.zumar@databricks.com> --------- Signed-off-by: dbczumar <corey.zumar@databricks.com>
…or downloads (#8069) * fix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove separator Signed-off-by: dbczumar <corey.zumar@databricks.com> * Lint Signed-off-by: dbczumar <corey.zumar@databricks.com> --------- Signed-off-by: dbczumar <corey.zumar@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
* Source reg Signed-off-by: dbczumar <corey.zumar@databricks.com> * Rename Signed-off-by: dbczumar <corey.zumar@databricks.com> * Registry Signed-off-by: dbczumar <corey.zumar@databricks.com> * Data Signed-off-by: dbczumar <corey.zumar@databricks.com> * Rename data.py Signed-off-by: dbczumar <corey.zumar@databricks.com> * Dataset sources Signed-off-by: dbczumar <corey.zumar@databricks.com> * Partial Signed-off-by: dbczumar <corey.zumar@databricks.com> * Sources Signed-off-by: dbczumar <corey.zumar@databricks.com> * More Signed-off-by: dbczumar <corey.zumar@databricks.com> * Source Signed-off-by: dbczumar <corey.zumar@databricks.com> * done Signed-off-by: dbczumar <corey.zumar@databricks.com> * dbfs data source Signed-off-by: dbczumar <corey.zumar@databricks.com> * More Signed-off-by: dbczumar <corey.zumar@databricks.com> * Dummy Signed-off-by: dbczumar <corey.zumar@databricks.com> * Tweaks Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some docs Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some docstrings Signed-off-by: dbczumar <corey.zumar@databricks.com> * Progress Signed-off-by: dbczumar <corey.zumar@databricks.com> * More Signed-off-by: dbczumar <corey.zumar@databricks.com> * Working pandas Signed-off-by: dbczumar <corey.zumar@databricks.com> * Pandas works :D Signed-off-by: dbczumar <corey.zumar@databricks.com> * Colspec in schema Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test structure Signed-off-by: dbczumar <corey.zumar@databricks.com> * Move Signed-off-by: dbczumar <corey.zumar@databricks.com> * Some tests Signed-off-by: dbczumar <corey.zumar@databricks.com> * Suite Signed-off-by: dbczumar <corey.zumar@databricks.com> * fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Many test Signed-off-by: dbczumar <corey.zumar@databricks.com> * Add files Signed-off-by: dbczumar <corey.zumar@databricks.com> * Blacken Signed-off-by: dbczumar <corey.zumar@databricks.com> * CI Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove todo Signed-off-by: dbczumar <corey.zumar@databricks.com> * Simplify Signed-off-by: dbczumar <corey.zumar@databricks.com> * Resource init Signed-off-by: dbczumar <corey.zumar@databricks.com> * Removals Signed-off-by: dbczumar <corey.zumar@databricks.com> * Tweak Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove unused Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix tests, rename Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove dataset stuff Signed-off-by: dbczumar <corey.zumar@databricks.com> * More docstrings Signed-off-by: dbczumar <corey.zumar@databricks.com> * Better docs Signed-off-by: dbczumar <corey.zumar@databricks.com> * Blank init Signed-off-by: dbczumar <corey.zumar@databricks.com> * Restore file Signed-off-by: dbczumar <corey.zumar@databricks.com> * Datasets files Signed-off-by: dbczumar <corey.zumar@databricks.com> * Lint Signed-off-by: dbczumar <corey.zumar@databricks.com> * Better docstrings Signed-off-by: dbczumar <corey.zumar@databricks.com> * Docstring Signed-off-by: dbczumar <corey.zumar@databricks.com> * HF source base Signed-off-by: dbczumar <corey.zumar@databricks.com> * More args Signed-off-by: dbczumar <corey.zumar@databricks.com> * More args Signed-off-by: dbczumar <corey.zumar@databricks.com> * Progress Signed-off-by: dbczumar <corey.zumar@databricks.com> * HF - needs tests Signed-off-by: dbczumar <corey.zumar@databricks.com> * Updates Signed-off-by: dbczumar <corey.zumar@databricks.com> * Loosen dict requirements Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix windows Signed-off-by: dbczumar <corey.zumar@databricks.com> * to_pyfunc, targets, test Signed-off-by: dbczumar <corey.zumar@databricks.com> * Docstrings and couple tests Signed-off-by: dbczumar <corey.zumar@databricks.com> * Mixin Signed-off-by: dbczumar <corey.zumar@databricks.com> * hyphen source type Signed-off-by: dbczumar <corey.zumar@databricks.com> * Digest fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> * Add consistent digest big data test Signed-off-by: dbczumar <corey.zumar@databricks.com> * Format Signed-off-by: dbczumar <corey.zumar@databricks.com> --------- Signed-off-by: dbczumar <corey.zumar@databricks.com>
* add numpy dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test for numpy dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * add pandas dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * update Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test for deterministic hash Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * add property Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * targets in numpy Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * PyFuncConvertibleDatasetMixin Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * create delta and spark dataset sources Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix delta and spark source Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fixes Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test from_pandas and from_numpy Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * lint Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * delta information Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * from_pandas with delta and spark sources Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * lint Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * tablse Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * lint Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix delta tests Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * tests Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * lint Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * _get_table_info_if_uc Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * databricks-uc host creds Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * _is_uc_table Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * split out spark and delta dataset source tests Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * Autoformat: https://github.com/mlflow/mlflow/actions/runs/4602300008 Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com> * cleanup Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * addressing comments Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * bump delta core to 2.2.0 Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fixed all references of spark session Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * small fixes Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * remove import Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com> Co-authored-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
Signed-off-by: dbczumar <corey.zumar@databricks.com>
* Check out data model files except sql Signed-off-by: dbczumar <corey.zumar@databricks.com> * Working filestore Signed-off-by: dbczumar <corey.zumar@databricks.com> * Simplify Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test cases Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix test Signed-off-by: dbczumar <corey.zumar@databricks.com> * SQL, REST notimplemennt Signed-off-by: dbczumar <corey.zumar@databricks.com> * Address comment Signed-off-by: dbczumar <corey.zumar@databricks.com> * Address comments Signed-off-by: dbczumar <corey.zumar@databricks.com> * Coverage Signed-off-by: dbczumar <corey.zumar@databricks.com> * Add internal Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove exp Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Pass Signed-off-by: dbczumar <corey.zumar@databricks.com> --------- Signed-off-by: dbczumar <corey.zumar@databricks.com>
* Add log inputs to rest store Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test_rest_store Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * Autoformat: https://github.com/mlflow/mlflow/actions/runs/4604613188 Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com> * log_inputs api Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * bulk writes Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * implement read inputs to run via _get_run_inputs Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * unused import Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test_log_inputs_fails_with_missing_inputs and test_log_inputs_fails_with_too_large_inputs Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * search_runs and test case Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * more tests [wip] Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fixed write side Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fixing some tests Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix overwrite issue Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * cleanup Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * teardown Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com> Co-authored-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
Documentation preview for 565f299 will be available here when this CircleCI job completes successfully. More info
|
@prithvikannan Thank you for the contribution! Could you fix the following issue(s)? ⚠ DCO checkThe DCO check failed. Please sign off your commit(s) by following the instructions here. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.md#sign-your-work for more details. |
* Add log inputs to rest store Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test_rest_store Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * Autoformat: https://github.com/mlflow/mlflow/actions/runs/4604613188 Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com> * log_inputs api Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * bulk writes Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * draft for log_inputs fluent api Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fluent log_input api Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test case Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * implement read inputs to run via _get_run_inputs Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * unused import Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * small fixes Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test_log_inputs_fails_with_missing_inputs and test_log_inputs_fails_with_too_large_inputs Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * search_runs and test case Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * more tests [wip] Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * pylint Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fixing up test_log_input Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fixed write side Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fixing some tests Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix overwrite issue Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * cleanup Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * Autoformat: https://github.com/mlflow/mlflow/actions/runs/4627612637 Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com> * teardown Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * small fixes Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fixes Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com> Co-authored-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
* Add log inputs to rest store Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test_rest_store Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * Autoformat: https://github.com/mlflow/mlflow/actions/runs/4604613188 Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com> * log_inputs api Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * bulk writes Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * draft for log_inputs fluent api Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fluent log_input api Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test case Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * implement read inputs to run via _get_run_inputs Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * unused import Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * small fixes Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test_log_inputs_fails_with_missing_inputs and test_log_inputs_fails_with_too_large_inputs Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * search_runs and test case Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * more tests [wip] Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * pylint Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fixing up test_log_input Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fixed write side Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fixing some tests Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix overwrite issue Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * cleanup Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * Autoformat: https://github.com/mlflow/mlflow/actions/runs/4627612637 Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com> * teardown Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * python server log inputs Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * small fixes Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * check keys Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * add tests Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fixes Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * remove run_uuid Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com> Co-authored-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
* create a code dataset source Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * tests Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * mlflow_source_type and mlflow_source_name Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
* add numpy dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test for numpy dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * add pandas dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * update Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * Partial spark ds, hash is broken Signed-off-by: dbczumar <corey.zumar@databricks.com> * test for deterministic hash Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * add property Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * targets in numpy Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * PyFuncConvertibleDatasetMixin Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * create delta and spark dataset sources Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix delta and spark source Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * spark Signed-off-by: dbczumar <corey.zumar@databricks.com> * spark Signed-off-by: dbczumar <corey.zumar@databricks.com> * fixes Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * Progress Signed-off-by: dbczumar <corey.zumar@databricks.com> * Add Signed-off-by: dbczumar <corey.zumar@databricks.com> * test from_pandas and from_numpy Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * lint Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * delta information Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * progress Signed-off-by: dbczumar <corey.zumar@databricks.com> * Progress Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix kwagrs Signed-off-by: dbczumar <corey.zumar@databricks.com> * Register Signed-off-by: dbczumar <corey.zumar@databricks.com> * Dedupe Signed-off-by: dbczumar <corey.zumar@databricks.com> * remove nl Signed-off-by: dbczumar <corey.zumar@databricks.com> * Address comments Signed-off-by: dbczumar <corey.zumar@databricks.com> * test case for spark dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * approx count Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * skeleton for various from_spark tests Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * tests for properties Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix _is_delta_table Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test cleanup Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * move pyspark import Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * trying again Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * create spark_delta_utils Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * pyspark import into util fn Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * import utils inside loaders Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * check for pyspark in sys modules Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * move pyspark import to load in spark and delta source Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * lint Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> Signed-off-by: dbczumar <corey.zumar@databricks.com> Co-authored-by: Prithvi Kannan <prithvi.kannan@databricks.com>
Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
* tensorflow dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * more progress Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix schema and profile Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test tensor Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * lint Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * move tf imports Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * small fix Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test_tensorflow_dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * remove reference to dataframe Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
Signed-off-by: dbczumar <corey.zumar@databricks.com>
…nd profile (#8305) * Infer schem dict Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix profile and schema for np Signed-off-by: dbczumar <corey.zumar@databricks.com> --------- Signed-off-by: dbczumar <corey.zumar@databricks.com>
Signed-off-by: dbczumar <corey.zumar@databricks.com>
…8315) * Use dataset as default Signed-off-by: dbczumar <corey.zumar@databricks.com> * Dataset Signed-off-by: dbczumar <corey.zumar@databricks.com> --------- Signed-off-by: dbczumar <corey.zumar@databricks.com>
* Patches Signed-off-by: dbczumar <corey.zumar@databricks.com> * Make dataset sources importable Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove protocol Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Cherry pick Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix entrypoint loading error and revert dummy dataset Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Comment Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix typo Signed-off-by: dbczumar <corey.zumar@databricks.com> * DS Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix attempt Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix silliness Signed-off-by: dbczumar <corey.zumar@databricks.com> --------- Signed-off-by: dbczumar <corey.zumar@databricks.com>
* load from source Signed-off-by: dbczumar <corey.zumar@databricks.com> * Load source Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Experimental Signed-off-by: dbczumar <corey.zumar@databricks.com> --------- Signed-off-by: dbczumar <corey.zumar@databricks.com>
* tensorflow dataset targets Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * test coverage Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * cosmetic changes: Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
* starting out Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * draft of branching with mlflow dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * add to_evaluation_dataset to pyfunc mixin Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * to_evaluation_dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * log input Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * add to_evaluation_dataset to all dataset types Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * remove tensorflow impl Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * use metric prefix as name Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * add test without metric_prefix Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * update to_evaluation_dataset and add test cases Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * update docstrings and use client api Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix case with no context Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * make targets optional Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * disable=unused-variable Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix tensorflow targets and expand tests for to_evaluation_dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * only support eval dataset if Tensor Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
…a, improve test coverage (#8304) * SQL Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix approx count performance Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fixes Signed-off-by: dbczumar <corey.zumar@databricks.com> --------- Signed-off-by: dbczumar <corey.zumar@databricks.com> Signed-off-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
* Patches Signed-off-by: dbczumar <corey.zumar@databricks.com> * Make dataset sources importable Signed-off-by: dbczumar <corey.zumar@databricks.com> * Test Signed-off-by: dbczumar <corey.zumar@databricks.com> * Remove protocol Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Cherry pick Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix entrypoint loading error and revert dummy dataset Signed-off-by: dbczumar <corey.zumar@databricks.com> * Make experimental Signed-off-by: dbczumar <corey.zumar@databricks.com> * Mark DS and sources experimental Signed-off-by: dbczumar <corey.zumar@databricks.com> * Experimental entities Signed-off-by: dbczumar <corey.zumar@databricks.com> * Experimental input tags Signed-off-by: dbczumar <corey.zumar@databricks.com> * get run docstrings Signed-off-by: dbczumar <corey.zumar@databricks.com> --------- Signed-off-by: dbczumar <corey.zumar@databricks.com>
* Update schema in sklearn and xgboost tests Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * empty Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
* Update optional targets logic with datasets Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * small fix Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * targets Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
* Update optional targets logic with datasets Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * small fix Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * targets Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * Add CodeDatasetSource to dataset_source_registry Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
Signed-off-by: dbczumar <corey.zumar@databricks.com>
* docs progress Signed-off-by: dbczumar <corey.zumar@databricks.com> * Initial API docs Signed-off-by: dbczumar <corey.zumar@databricks.com> * Renaming for tf Signed-off-by: dbczumar <corey.zumar@databricks.com> * Partial Signed-off-by: dbczumar <corey.zumar@databricks.com> * fix Signed-off-by: dbczumar <corey.zumar@databricks.com> * data rst Signed-off-by: dbczumar <corey.zumar@databricks.com> * Install deps Signed-off-by: dbczumar <corey.zumar@databricks.com> * Fix Signed-off-by: dbczumar <corey.zumar@databricks.com> * Dataset Signed-off-by: dbczumar <corey.zumar@databricks.com> * Progress Signed-off-by: dbczumar <corey.zumar@databricks.com> * fix some doc references Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * alias Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * small fix Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * use numpy with a local array Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * pyspark import Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * remove annotations for spark df Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * remove mlflow.data.DatasetSource and mlflow.data.Dataset Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * add dataset sources Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * double backtick Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * tracking doc Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * small fix Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * fix py class Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> * pysaprk Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> --------- Signed-off-by: dbczumar <corey.zumar@databricks.com> Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> Co-authored-by: Prithvi Kannan <prithvi.kannan@databricks.com>
tensorflow | ||
pyspark | ||
datasets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these for building docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we need these for the docs build because, otherwise, the mlflow.data.from_tensorflow
, etc. methods are not defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Once @harupy 's comments are addressed (or feel free to file a follow-up). Thanks @prithvikannan !
Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
log_inputs
and fluent APIlog_input
evaluate()
How is this patch tested?
Does this PR change the documentation?
Release Notes
Is this a user-facing change?
Introduce Dataset Tracking to MLflow! Now users can log datasets as inputs to MLflow runs.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes