Skip to content

Latest commit

 

History

History
101 lines (70 loc) · 3.48 KB

File metadata and controls

101 lines (70 loc) · 3.48 KB

mlflow-export-import - Databricks Notebooks

Overview

  • Databricks notebooks to perform MLflow export and import operations.
  • Use these notebooks when you want to copy MLflow objects from one Databricks workspace (tracking server) to another.
  • In order to copy MLflow objects between workspaces, you will need to set up a shared cloud bucket mounted on each workspace's DBFS.
  • The notebooks use Git integration with Databricks Repos though they can be run as a simple non-Repo workspace folder.
  • See _README.py for more details.

Databricks notebooks

There are two types of notebooks:

  • Standard widget-based notebooks that call the MLflow Export Import API.
  • Console script notebooks that use the shell to call the standard call Python scripts specified here. Slightly experimental.

Standard widget-based notebooks

Single Notebooks

Export and import one MLflow object.

Export Import
Export_Run Import_Run
Export_Experiment Import_Experiment.py
Export_Model Import_Model.py
Export_Model_Version Import_Model_Version.py

Copy an MLflow object.

MLflow object
Copy_Model_Version
Copy_Run

Bulk notebooks

Exports and imports multiple MLflow objects.

Export Import
Export_Experiments Import_Experiments
Export_Models Import_Models
Export_All Use Import_Models

Console script shell notebooks

Experimental

Using Databricks %sh cell mode, you can execute MLflow Export Import scripts from the Linux shell. See the _README.py and and Console_Scripts notebook.

From a notebook you can then call a script such as:

export-model --help

Import notebooks into Databricks workspace

You can load these notebooks into Databricks either as a workspace folder or a Git Repo.

Load directory as Databricks workspace folder

See the Workspace CLI.

git clone https://github.com/mlflow/mlflow-export-import

databricks workspace import_dir \
  databricks_notebooks \
  /Users/me@mycompany.com/mlflow-export-import

Clone directory as Databricks Git Repo

You can load a Git Repo either through the Databricks UI or via the command line.

1. Load through Databricks UI

See Clone a Git Repo & other common Git operations.

2. Load from command line with curl

Note it's best to use the curl version since the CLI doesn't appear to support the sparse checkout option.

curl \
  https://my.company.com/api/2.0/repos \
  -H "Authorization: Bearer MY_TOKEN" \
  -X POST \
  -d ' {
    "url": "https://github.com/mlflow/mlflow-export-import",
    "provider": "gitHub",
    "path": "/Repos/me@my.company.com/mlflow-export-import",
    "sparse_checkout": {
      "patterns": [ "databricks_notebooks" ]
      }
    }'