# Databricks CLI API Demonstration

This notebook demonstrates how to interact with the Databricks Command Line Interface (CLI) using Python helper functions defined in `databricks_cli_utils.py`. We will cover basic operations for managing clusters, interacting with the Databricks File System (DBFS), and submitting jobs.

**Prerequisites (If you need help to create/configure any of the below Prerequisites refer databricks_cli.api.md):**
* Databricks CLI installed and configured (via `databricks configure --token` or environment variables `DATABRICKS_HOST`/`DATABRICKS_TOKEN`). This configuration needs to be done in the environment where this notebook is run (e.g., inside the Docker container).
* A `config/cluster_config.json` file defining a valid cluster specification.
* A pre-configured Databricks Job (you will need its ID later).

### Check if your databricks config is tagged or not before starting

In [1]:

import os
from configparser import ConfigParser

cfg_path = os.path.expanduser("~/.databrickscfg")

if os.path.isfile(cfg_path):
    # parse to make sure it actually has a token`
    parser = ConfigParser()
    parser.read(cfg_path)
    if parser.has_option("DEFAULT", "token"):
        print("Yes, databricks config found")
    else:
        print("Config file exists, but no token found")
else:
    print("No config file found")


Yes, databricks config found


In [2]:
# %load_ext autoreload
# %autoreload 2

import logging
import os
import json
import time

# Import our utility functions
import databricks_cli_utils as dcu

if not logging.getLogger().hasHandlers():
     logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
_LOG = logging.getLogger(__name__)


# --- Configuration ---
CLUSTER_CONFIG_PATH = "config/cluster_config.json"
CLUSTER_ID_FILE = "config/cluster_id.txt" 


TEST_DIR = "data"
TEST_UPLOAD_FILENAME = "api_notebook_test_upload.txt"
TEST_DOWNLOAD_FILENAME = "api_notebook_test_download.txt"
TEST_UPLOAD_LOCAL_PATH = os.path.join(TEST_DIR, TEST_UPLOAD_FILENAME)
TEST_DOWNLOAD_LOCAL_PATH = os.path.join(TEST_DIR, TEST_DOWNLOAD_FILENAME)

# Define DBFS paths for testing (use a distinct path for safety)
DBFS_TEST_DIR = "dbfs:/api_notebook_tests"
DBFS_TEST_UPLOAD_PATH = f"{DBFS_TEST_DIR}/{TEST_UPLOAD_FILENAME}"

# !! IMPORTANT: Replace with a valid Job ID from your Databricks workspace !!
# Create a simple job in Databricks UI first (e.g., one that runs a basic notebook)
JOB_ID = "90640552909146"
#"515902180597777"

os.makedirs(TEST_DIR, exist_ok=True)

cluster_id_to_delete = None

print("Setup complete. Utility functions imported.")
print(f"Cluster config path: {CLUSTER_CONFIG_PATH}")
print(f"Test upload file path: {TEST_UPLOAD_LOCAL_PATH}")
print(f"DBFS test upload path: {DBFS_TEST_UPLOAD_PATH}")
print(f"Job ID to use (replace if needed): {JOB_ID}")

Setup complete. Utility functions imported.
Cluster config path: config/cluster_config.json
Test upload file path: data/api_notebook_test_upload.txt
DBFS test upload path: dbfs:/api_notebook_tests/api_notebook_test_upload.txt
Job ID to use (replace if needed): 90640552909146


## 2. Authentication Prerequisite

The utility functions (`dcu.*_cli`) rely on the Databricks CLI being properly authenticated in the environment where this notebook is running.

This configuration must be done **before running** the cells below that interact with Databricks. The two common methods are:

1.  **`databricks configure --token`:** Run this command in the terminal *once* and provide your Databricks Host URL and Personal Access Token (PAT). Credentials are saved in `~/.databrickscfg`.
2.  **Environment Variables:** Set `DATABRICKS_HOST` (your workspace URL) and `DATABRICKS_TOKEN` (your PAT) in your environment before starting Jupyter.

We will assume one of these methods has been completed.

## 3. Cluster Management

The Databricks CLI allows programmatic control over compute clusters. We can create, monitor, and delete clusters.

---
**Note:** The following cells interact live with your Databricks workspace and require the CLI to be configured. They may incur costs if resources are left running.
---

### 3.1 Create Cluster (`databricks clusters create`)

This command creates a new cluster based on a JSON configuration file. Our wrapper function `dcu.create_cluster_cli()` handles the call and saves the resulting `cluster_id`.

Ensure `config/cluster_config.json` exists and contains valid settings before running the next cell.

In [3]:
_LOG.info(f"Attempting to create cluster using config: {CLUSTER_CONFIG_PATH}")

# Check if config file exists before calling
if not os.path.exists(CLUSTER_CONFIG_PATH):
    _LOG.error(f"Cluster configuration file not found at {CLUSTER_CONFIG_PATH}. Please create it.")
else:
    cluster_id_created = dcu.create_cluster_cli(
        config_file=CLUSTER_CONFIG_PATH,
        cluster_id_file=CLUSTER_ID_FILE
    )

    if cluster_id_created:
        cluster_id_to_delete = cluster_id_created
        _LOG.info(f"Cluster creation initiated. ID: {cluster_id_to_delete}")
        print(f"Cluster creation request submitted. Cluster ID: {cluster_id_to_delete}")
        try:
            with open(CLUSTER_ID_FILE, 'r') as f:
                id_from_file = f.read().strip()
            print(f"Cluster ID saved to {CLUSTER_ID_FILE}: {id_from_file}")
        except Exception as e:
            print(f"Could not read cluster ID file {CLUSTER_ID_FILE}: {e}")

    else:
        _LOG.error("Cluster creation failed. Check logs for details.")
        print("Cluster creation failed.")

2025-05-17 20:38:33,988 - __main__ - INFO - Attempting to create cluster using config: config/cluster_config.json
2025-05-17 20:38:37,176 - databricks_cli_utils - INFO - Submitted cluster creation. ID: 0517-203836-cjzo8kc
2025-05-17 20:38:37,187 - databricks_cli_utils - INFO - Saved cluster ID to config/cluster_id.txt
2025-05-17 20:38:37,188 - __main__ - INFO - Cluster creation initiated. ID: 0517-203836-cjzo8kc


Cluster creation request submitted. Cluster ID: 0517-203836-cjzo8kc
Cluster ID saved to config/cluster_id.txt: 0517-203836-cjzo8kc


## 4. DBFS Interaction (`databricks fs ...`)

The CLI allows interaction with the Databricks File System (DBFS) for storing and retrieving files.

---
**Note:** The following cells interact live with your Databricks workspace and require the CLI to be configured.
---

### 4.1 If not created - Create Directory (`databricks fs mkdirs`)

While `fs cp` often creates directories, you can also explicitly create directories using `mkdirs`. This is useful for setting up structures beforehand.

In [4]:
DBFS_MKDIRS_PATH = f"{DBFS_TEST_DIR}/newly_created_dir"
print(f"Explicitly creating directory: {DBFS_MKDIRS_PATH}...")
# Note: Requires configured CLI
result = dcu._run_databricks_cli(['databricks', 'fs', 'mkdirs', DBFS_MKDIRS_PATH])
if result["success"]:
    print(f"Successfully created or ensured directory exists: {DBFS_MKDIRS_PATH}")

else:
    print(f"Failed to create directory: {result.get('error', 'Unknown')}")

Explicitly creating directory: dbfs:/api_notebook_tests/newly_created_dir...
Successfully created or ensured directory exists: dbfs:/api_notebook_tests/newly_created_dir


### 4.2 Upload File (`databricks fs cp`)

We can upload local files to DBFS. Let's create a small test file locally first.

In [5]:
import datetime
_LOG.info(f"Creating dummy file for upload at: {TEST_UPLOAD_LOCAL_PATH}")
try:
    with open(TEST_UPLOAD_LOCAL_PATH, "w") as f:
        f.write(f"This is a test file uploaded by API notebook at {datetime.datetime.now()}.\n")
    print(f"Dummy file created at {TEST_UPLOAD_LOCAL_PATH}")
except IOError as e:
    _LOG.error(f"Failed to create dummy file: {e}")
    print(f"Error creating dummy file: {e}")

# Upload it using the utility function
if os.path.exists(TEST_UPLOAD_LOCAL_PATH):
    _LOG.info(f"Uploading {TEST_UPLOAD_LOCAL_PATH} to {DBFS_TEST_UPLOAD_PATH}")
    print(f"Uploading {TEST_UPLOAD_LOCAL_PATH} to {DBFS_TEST_UPLOAD_PATH}...")
    success = dcu.upload_to_dbfs_cli(
        local_path=TEST_UPLOAD_LOCAL_PATH,
        dbfs_path=DBFS_TEST_UPLOAD_PATH,
        overwrite=True
    )
    if success:
        print("Upload successful.")
    else:
        print("Upload failed.")
else:
     print("Skipping upload, dummy file not created.")

2025-05-17 20:38:38,479 - __main__ - INFO - Creating dummy file for upload at: data/api_notebook_test_upload.txt
2025-05-17 20:38:38,489 - __main__ - INFO - Uploading data/api_notebook_test_upload.txt to dbfs:/api_notebook_tests/api_notebook_test_upload.txt


Dummy file created at data/api_notebook_test_upload.txt
Uploading data/api_notebook_test_upload.txt to dbfs:/api_notebook_tests/api_notebook_test_upload.txt...


2025-05-17 20:38:39,853 - databricks_cli_utils - INFO - Uploaded 'data/api_notebook_test_upload.txt' to 'dbfs:/api_notebook_tests/api_notebook_test_upload.txt'


Upload successful.


### 4.3 List Files (`databricks fs ls`)

We can list files in DBFS. Our `_run_databricks_cli` helper returns the raw output for commands like `ls`.

In [6]:
_LOG.info(f"Listing contents of {DBFS_TEST_DIR}")
print(f"Listing contents of {DBFS_TEST_DIR}...")
result = dcu._run_databricks_cli(['databricks', 'fs', 'ls', DBFS_TEST_DIR])

if result["success"]:
    print("Successfully listed files. Raw output:")
    # Print the raw output which contains the file listing
    print(result["raw_stdout"])
else:
    print(f"Failed to list files: {result.get('error', 'Unknown')}")

2025-05-17 20:38:39,862 - __main__ - INFO - Listing contents of dbfs:/api_notebook_tests


Listing contents of dbfs:/api_notebook_tests...
Successfully listed files. Raw output:
api_notebook_test_upload.txt
newly_created_dir


### 4.4 Download File (`databricks fs cp`)

We can also download files from DBFS back to our local system.

In [7]:
_LOG.info(f"Downloading {DBFS_TEST_UPLOAD_PATH} to {TEST_DOWNLOAD_LOCAL_PATH}")
print(f"Downloading {DBFS_TEST_UPLOAD_PATH} to {TEST_DOWNLOAD_LOCAL_PATH}...")
success = dcu.download_from_dbfs_cli(
    dbfs_path=DBFS_TEST_UPLOAD_PATH,
    local_path=TEST_DOWNLOAD_LOCAL_PATH,
    overwrite=True
)

if success:
    print("Download successful.")
    # Verify content locally
    try:
        with open(TEST_DOWNLOAD_LOCAL_PATH, "r") as f:
            content = f.read()
        print(f"Content of downloaded file: '{content.strip()}'")
        os.remove(TEST_DOWNLOAD_LOCAL_PATH)
        print(f"Cleaned up local download file: {TEST_DOWNLOAD_LOCAL_PATH}")
    except Exception as e:
        print(f"Could not read or delete downloaded file: {e}")
else:
    print("Download failed.")


if os.path.exists(TEST_UPLOAD_LOCAL_PATH):
    try:
        os.remove(TEST_UPLOAD_LOCAL_PATH)
        print(f"Cleaned up local upload file: {TEST_UPLOAD_LOCAL_PATH}")
    except Exception as e:
        print(f"Could not remove local upload file: {e}")

2025-05-17 20:38:40,825 - __main__ - INFO - Downloading dbfs:/api_notebook_tests/api_notebook_test_upload.txt to data/api_notebook_test_download.txt


Downloading dbfs:/api_notebook_tests/api_notebook_test_upload.txt to data/api_notebook_test_download.txt...


2025-05-17 20:38:42,006 - databricks_cli_utils - INFO - Downloaded 'dbfs:/api_notebook_tests/api_notebook_test_upload.txt' to 'data/api_notebook_test_download.txt'


Download successful.
Content of downloaded file: 'This is a test file uploaded by API notebook at 2025-05-17 20:38:38.486143.'
Cleaned up local download file: data/api_notebook_test_download.txt
Cleaned up local upload file: data/api_notebook_test_upload.txt


## 5. Job Management

The CLI can trigger pre-defined jobs and check the status of job runs.

---
**Note:** The following cells interact live with your Databricks workspace and require the CLI to be configured. You also need to replace `"YOUR_JOB_ID_HERE"` with a valid Job ID created in your workspace UI.
---

### 5.1 Submit Job (`databricks jobs run-now`)

This command starts a run of an existing job. Our wrapper `dcu.submit_notebook_job_cli()` returns the `run_id`.

In [8]:
run_id = None 
_LOG.info(f"Attempting to submit job with ID: {JOB_ID}")
print(f"Attempting to submit job with ID: {JOB_ID}...")

run_id = dcu.submit_notebook_job_cli(JOB_ID)

if run_id:
    _LOG.info(f"Job submitted successfully. Run ID: {run_id}")
    print(f"Job submitted successfully. Run ID: {run_id}")
else:
    _LOG.error("Job submission failed.")
    print("Job submission failed.")

2025-05-17 20:38:42,027 - __main__ - INFO - Attempting to submit job with ID: 90640552909146


Attempting to submit job with ID: 90640552909146...


2025-05-17 20:38:44,070 - databricks_cli_utils - INFO - Submitted job '90640552909146'. Run ID: 57365404224252
2025-05-17 20:38:44,071 - __main__ - INFO - Job submitted successfully. Run ID: 57365404224252


Job submitted successfully. Run ID: 57365404224252


### 5.2 Get Job Run Status (`databricks runs get`)

Using the `run_id`, we can check the status of that specific job run (e.g., PENDING, RUNNING, SUCCEEDED, FAILED).

In [14]:
if run_id:
    _LOG.info(f"Checking status for job run: {run_id}")
    print(f"Checking status for job run: {run_id}...")
    time.sleep(5)
    run_state = dcu.get_job_run_status_cli(run_id)
    if run_state:
        _LOG.info(f"Current job run state dictionary: {run_state}")
        print(f"Life Cycle State: {run_state.get('life_cycle_state', 'N/A')}")
        print(f"Result State: {run_state.get('result_state', 'N/A')}")
        print(f"State Message: {run_state.get('state_message', 'N/A')}")
    else:
        _LOG.warning("Could not retrieve job run status.")
        print("Could not retrieve job run status.")
else:
    _LOG.warning("Skipping job status check as job submission failed or was skipped.")
    print("Skipping job status check - no run ID available.")

2025-05-17 20:45:04,912 - __main__ - INFO - Checking status for job run: 57365404224252


Checking status for job run: 57365404224252...


2025-05-17 20:45:10,757 - databricks_cli_utils - INFO - Job run '57365404224252' status: TERMINATED
2025-05-17 20:45:10,759 - __main__ - INFO - Current job run state dictionary: {'life_cycle_state': 'TERMINATED', 'result_state': 'SUCCESS', 'state_message': '', 'user_cancelled_or_timedout': False}


Life Cycle State: TERMINATED
Result State: SUCCESS
State Message: 


## 6. Cleanup

It's crucial to terminate compute resources like clusters when finished to avoid ongoing costs. We also clean up test files.

---
**Note:** The following cell interacts live with your Databricks workspace and requires the CLI to be configured.
---

In [15]:
# Delete the cluster created earlier in this notebook
if cluster_id_to_delete:
    _LOG.info(f"Attempting to delete cluster: {cluster_id_to_delete}")
    print(f"Attempting to delete cluster: {cluster_id_to_delete}...")
    deleted = dcu.delete_cluster_cli(cluster_id_to_delete)
    if deleted:
        _LOG.info("Cluster deletion request submitted successfully.")
        print("Cluster deletion request submitted successfully.")
        if os.path.exists(CLUSTER_ID_FILE):
             try:
                 os.remove(CLUSTER_ID_FILE)
                 _LOG.info(f"Removed cluster ID file: {CLUSTER_ID_FILE}")
                 print(f"Removed cluster ID file: {CLUSTER_ID_FILE}")
             except OSError as e:
                 _LOG.warning(f"Could not remove cluster ID file {CLUSTER_ID_FILE}: {e}")
                 print(f"Warning: Could not remove cluster ID file {CLUSTER_ID_FILE}")
        cluster_id_to_delete = None
    else:
        _LOG.error("Cluster deletion failed. Manual cleanup may be required.")
        print("ERROR: Cluster deletion failed. Please check the Databricks UI.")
else:
    _LOG.info("No cluster ID was stored from this session, skipping cluster deletion.")
    print("No cluster created in this session, skipping deletion.")


# Clean up the test directory created on DBFS
_LOG.info(f"Attempting to remove DBFS test directory: {DBFS_TEST_DIR}")
print(f"Attempting to remove DBFS test directory: {DBFS_TEST_DIR}...")
cleanup_result = dcu._run_databricks_cli(['databricks', 'fs', 'rm', '-r', DBFS_TEST_DIR])
if cleanup_result["success"]:
    print(f"Successfully removed {DBFS_TEST_DIR}")
else:
     print(f"Warning: Failed to remove {DBFS_TEST_DIR} - {cleanup_result.get('error', 'Unknown')}")


_LOG.info("API Notebook cleanup attempt finished.")
print("Cleanup finished.")

2025-05-17 20:45:26,429 - __main__ - INFO - No cluster ID was stored from this session, skipping cluster deletion.
2025-05-17 20:45:26,432 - __main__ - INFO - Attempting to remove DBFS test directory: dbfs:/api_notebook_tests


No cluster created in this session, skipping deletion.
Attempting to remove DBFS test directory: dbfs:/api_notebook_tests...


2025-05-17 20:45:27,542 - __main__ - INFO - API Notebook cleanup attempt finished.


Successfully removed dbfs:/api_notebook_tests
Cleanup finished.


## End of API Demonstration