## Introduction to Databricks API

**What is Databricks API?**
- A REST-based interface to interact programmatically with Databricks workspace.
- Automates workflows, manages clusters, runs jobs, and retrieves data insights.

**Use Cases:**
- Automating job execution.
- Managing Databricks resources.
- Integrating with external systems.

**Key Components:**
- Authentication.
- REST API endpoints.
- Access tokens for security.

**Example**

In [1]:
import os
import requests
import json

# Store token in environment variable for security
databricks_url = "https://adb-1420859118153884.4.azuredatabricks.net"
os.environ["DATABRICKS_TOKEN"] = "dapi75a302e8520c3953a3d3002d0527ee2e"

# Retrieve token and use in API calls
databricks_token = os.getenv("DATABRICKS_TOKEN")
headers = {"Authorization": f"Bearer {databricks_token}"}
print("Token configured successfully!")


Token configured successfully!


### List Workspace Directory

In [4]:
# Request URI to List the Notebooks
url = f"{databricks_url}/api/2.0/workspace/list"

print("Calling: ", url)

data = {
    "path": "/Users/pankaj.py2@outlook.com"
}

response = requests.get(url, headers=headers, data=json.dumps(data))
response_json = response.json()
display(response_json)

Calling:  https://adb-1420859118153884.4.azuredatabricks.net/api/2.0/workspace/list


{'objects': [{'object_type': 'NOTEBOOK',
   'path': '/Users/pankaj.py2@outlook.com/Extract Data',
   'language': 'PYTHON',
   'created_at': 1734243890988,
   'modified_at': 1734329594855,
   'object_id': 846899863485469,
   'resource_id': '846899863485469'},
  {'object_type': 'NOTEBOOK',
   'path': '/Users/pankaj.py2@outlook.com/pyspark-join-df',
   'language': 'PYTHON',
   'created_at': 1734002990081,
   'modified_at': 1734066860587,
   'object_id': 1648461507245785,
   'resource_id': '1648461507245785'},
  {'object_type': 'NOTEBOOK',
   'path': '/Users/pankaj.py2@outlook.com/pyspark-analytics',
   'language': 'PYTHON',
   'created_at': 1734060767426,
   'modified_at': 1734244342169,
   'object_id': 2110172399548968,
   'resource_id': '2110172399548968'},
  {'object_type': 'NOTEBOOK',
   'path': '/Users/pankaj.py2@outlook.com/PySpark-Broadcast-Join',
   'language': 'PYTHON',
   'created_at': 1734060777993,
   'modified_at': 1734070675297,
   'object_id': 2110172399548995,
   'resource

### Create a new Job with cluster configuration

In [5]:
# Create a new job
job_payload = {
    "name": "ETL-SQl-to-DL",
    "new_cluster": {
        "num_workers": 0,
        "spark_version": "13.3.x-scala2.12",
        "spark_conf": {
            "spark.master": "local[*, 4]",
            "spark.databricks.cluster.profile": "singleNode"
        },
        "node_type_id": "Standard_DS3_v2"
    },
    "notebook_task": {
        "notebook_path": "/Users/pankaj.py2@outlook.com/Extract-Data-Task1"
    }
}

create_job_response = requests.post(
    f"{databricks_url}/api/2.1/jobs/create",
    headers=headers,
    json=job_payload
)

if create_job_response.status_code == 200:
    job_id = create_job_response.json().get("job_id")
    print(f"Job created with ID: {job_id}")
else:
    print(f"Failed to create job: {create_job_response.text}")


Job created with ID: 561370898228136


### List the Jobs

In [33]:
# API endpoint for listing jobs
list_jobs_url = f"{databricks_url}/api/2.1/jobs/list"

# Make the API call
response = requests.get(
    list_jobs_url,
    headers=headers
)

# Parse and display the response
if response.status_code == 200:
    jobs = response.json().get("jobs", [])
    if not jobs:
        print("No jobs found in the workspace.")
    else:
        print("Jobs in the workspace:")
        for job in jobs:
            print(f"Job ID: {job['job_id']}, Job Name: {job['settings']['name']}")
else:
    print(f"Error fetching jobs: {response.status_code} - {response.text}")

Jobs in the workspace:
Job ID: 561370898228136, Job Name: ETL-SQl-to-DL
Job ID: 920062048610217, Job Name: MY-ETL-JOB
Job ID: 513102915751090, Job Name: MyFirstJob3


In [20]:
job_id

561370898228136

### Trigger the job

In [21]:
# Pass parameter to notebook
parameters = {
    "file" : "new_file_name"
}

# Trigger the job
run_response = requests.post(
    f"{databricks_url}/api/2.1/jobs/run-now",
    headers=headers,
    json={
        "job_id": job_id,
        "notebook_params" : parameters
    }
)

output = run_response.json()
print(f"Job run initiated: {output}")
run_id = output.get("run_id","")


Job run initiated: {'run_id': 578636408087868, 'number_in_job': 578636408087868}


### Get the Job Output

In [30]:
# Specify the run ID for which to get output
run_id = "578636408087868"

# API endpoint to get the job run output
get_output_url = f"{databricks_url}/api/2.1/jobs/runs/get-output"

# Make the API call
response = requests.get(
    get_output_url,
    headers=headers,
    params={"run_id": run_id}
)

# Parse and display the response
if response.status_code == 200:
    output = response.json()
    print("Job Run Output:")
    print(output.get("notebook_output", {}).get("result", "No output available"))
else:
    print(f"Error fetching job output: {response.status_code} - {response.text}")

Job Run Output:
new_file_name


### Get the Job Status

In [31]:
# Specify the run ID for which to get details
run_id = "578636408087868"

# API endpoint to get the job run details
get_run_url = f"{databricks_url}/api/2.1/jobs/runs/get"

# Make the API call
response = requests.get(
    get_run_url,
    headers=headers,
    params={"run_id": run_id}
)

# Parse and display the response
if response.status_code == 200:
    run_details = response.json()
    run_state = run_details.get("state", {})
    life_cycle_state = run_state.get("life_cycle_state", "Unknown")
    result_state = run_state.get("result_state", "Unknown")
    run_page_url = run_details.get("run_page_url", "N/A")

    print(f"Run ID: {run_id}")
    print(f"Lifecycle State: {life_cycle_state}")
    print(f"Result State: {result_state}")
    print(f"Run Page URL: {run_page_url}")

    if result_state == "SUCCESS":
        print("The job run was successful!")
    elif result_state == "FAILED":
        print("The job run failed.")
    else:
        print("The job run is in progress or has an unknown status.")
else:
    print(f"Error fetching job details: {response.status_code} - {response.text}")

Run ID: 578636408087868
Lifecycle State: TERMINATED
Result State: SUCCESS
Run Page URL: https://adb-1420859118153884.4.azuredatabricks.net/?o=1420859118153884#job/561370898228136/run/578636408087868
The job run was successful!


### Delete a Job

In [34]:
# API endpoint to delete a job
delete_job_url = f"{databricks_url}/api/2.1/jobs/delete"
job_id = "513102915751090"

# Make the API call
response = requests.post(
    delete_job_url,
    headers=headers,
    json={"job_id": job_id}
)

# Parse and display the response
if response.status_code == 200:
    print(f"Job ID {job_id} has been successfully deleted.")
else:
    print(f"Error deleting job: {response.status_code} - {response.text}")

Job ID 513102915751090 has been successfully deleted.


### List Active Jobs only

In [32]:
# API endpoint to list active job runs
list_active_jobs_url = f"{databricks_url}/api/2.1/jobs/runs/list"

# Fetch currently executing jobs
params = {
    "active_only": "true"  # Filter to include only active/running jobs
}

response = requests.get(
    list_active_jobs_url,
    headers=headers,
    params=params
)

# Parse and display the response
if response.status_code == 200:
    active_jobs = response.json().get("runs", [])
    if not active_jobs:
        print("No jobs are currently running.")
    else:
        print("Currently Executing Jobs:")
        for job in active_jobs:
            print(f"Run ID: {job['run_id']}, Job ID: {job['job_id']}, Job Name: {job['run_name']}, State: {job['state']['life_cycle_state']}")
else:
    print(f"Error fetching executing jobs: {response.status_code} - {response.text}")


No jobs are currently running.
