### What is Databricks REST API?
The Databricks REST API is a collection of programmatic interfaces that allow you to interact with Databricks resources such as clusters, jobs, notebooks, files, secrets, and more—using HTTP requests. It allows automation, integration, and management of your Databricks workspace outside the web UI.

### Use Cases of Databricks REST API
- 🚀 Automate workflows: Trigger jobs, create clusters, monitor executions.
- 📁 Manage workspace assets: Upload/download notebooks, scripts, and files.
- 🔐 Manage secrets: Create scopes, add/update secrets.
- 🔄 CI/CD integration: Deploy notebooks/jobs through pipelines.
- 👥 Manage users and permissions.

### Commonly Used API Categories
| Resource        | Example Use                          |
| --------------- | ------------------------------------ |
| `Clusters API`  | Create, start, stop clusters         |
| `Jobs API`      | Submit/run/monitor jobs              |
| `Workspace API` | Import/export notebooks              |
| `DBFS API`      | Upload/download files                |
| `Secrets API`   | Manage secrets for authentication    |
| `SQL API`       | Execute SQL queries programmatically |

[Databrick REST API reference (Azure)](https://docs.databricks.com/api/azure/workspace/introduction)

### Authentication for Databricks REST API
To access the Databricks REST API securely, authentication is required. This ensures only authorized users or systems can perform actions like running jobs, uploading notebooks, or managing secrets.

Databricks supports two types of tokens for authentication:

### Personal Access Token (PAT)
- **Best for**: Individual use, scripts, automation tools
- **Not ideal for**: Multi-user or enterprise-wide integrations with strict governance

#### How to generate a PAT:
- Log in to Databricks workspace.
- Go to User Settings (click your avatar).
- Click "Access Tokens".
- Click "Generate New Token".
- Copy the token (you won’t see it again).

#### Using PAT in API calls:
Set the Authorization header like this:
```
headers = {
    "Authorization": "Bearer <your_pat_token>"
}
```


### Azure Active Directory (AAD) Token
- **Best for**: Enterprise apps, Azure-integrated workflows, user impersonation
- **Slightly more complex to set up**

#### How it works:
Uses OAuth2.0 to obtain an AAD token from Microsoft Entra ID. This allows:

- Centralized token lifecycle management
- Role-based access control
- Audit logging

#### Steps to use AAD token:
1. Register your app in Microsoft Entra ID (Azure AD).
1. Get a token from Azure AD via OAuth2.0:

   `az account get-access-token --resource=https://<databricks-instance>#<workspace-id>`
1. Use the returned token in the header.


In [0]:
import requests

# Define workspace URL and token
DATABRICKS_INSTANCE = 'https://adb-3251879884955162.2.azuredatabricks.net/'
TOKEN = ''

### List Notebooks

In [0]:
# Set the base path to root folder or any sub-folder (e.g., "/Users/your.name")
list_path = "/Users/admin@pankajacksgmail.onmicrosoft.com/neurealm/JobsAndPipeline"

# API endpoint
url = f"{DATABRICKS_INSTANCE}/api/2.0/workspace/list/"

# Request headers
headers = {
    "Authorization": f"Bearer {TOKEN}"
}

# Request params
params = {
    "path": list_path
}

# Make the request
response = requests.get(url, headers=headers, params=params)

# Display result
if response.status_code == 200:
    items = response.json().get("objects", [])
    for item in items:
        print(f"{item['object_type']}: {item['path']}")
else:
    print(f"Error: {response.status_code}, {response.text}")


### Create a Job

In [0]:
NOTEBOOK_PATH = "/Users/admin@pankajacksgmail.onmicrosoft.com/neurealm/JobsAndPipeline/Random-String"
CLUSTER_ID = "0625-053440-himm25sq"  # Optional: or use new_cluster block

# Endpoint to create job
url = f"{DATABRICKS_INSTANCE}/api/2.2/jobs/create"

# Headers
headers = {
    "Authorization": f"Bearer {TOKEN}"
}

# Job definition payload
payload = {
    "name": "My Notebook Job via API",
    "existing_cluster_id": CLUSTER_ID,
    "notebook_task": {
        "notebook_path": NOTEBOOK_PATH,
        "base_parameters": {
            "param1": "value1"  # optional parameters for dbutils.widgets.get
        }
    }
}

# Make API call
response = requests.post(url, headers=headers, json=payload)

# Print result
if response.status_code == 200:
    job_id = response.json()["job_id"]
    print(f"Job created successfully! Job ID: {job_id}")
else:
    print(f"Failed to create job. Status: {response.status_code}, Error: {response.text}")


### List All Jobs

In [0]:
# API Endpoint to list jobs
url = f'{DATABRICKS_INSTANCE}/api/2.2/jobs/list'

# Make the GET request
response = requests.get(url, headers=headers)

# Display response
if response.status_code == 200:
    jobs = response.json()
    for job in jobs.get('jobs', []):
        print(f"Job ID: {job['job_id']}, Name: {job['settings']['name']}")
else:
    print(f"Error: {response.status_code}, {response.text}")


### Run a Job

In [0]:
url = f"{DATABRICKS_INSTANCE}/api/2.2/jobs/run-now"
data = {
    "job_id": job_id,
    # "param": "overriding_val"
}

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
    print("Job started successfully!")
    print(response.json())
else:
    print(f"Failed to start job: {response.text}")


### Get Job Status and Output

In [0]:
import requests

RUN_ID = 994265203054286  # Replace with actual run ID

# Common headers
headers = {
    "Authorization": f"Bearer {TOKEN}"
}

# STEP 1: Get job run status
status_url = f"{DATABRICKS_INSTANCE}/api/2.1/jobs/runs/get"
status_response = requests.get(status_url, headers=headers, params={"run_id": RUN_ID})

if status_response.status_code == 200:
    run_data = status_response.json()
    life_cycle = run_data["state"]["life_cycle_state"]
    result_state = run_data["state"].get("result_state", "N/A")
    print(f"\n=== Job Status ===\nLife Cycle State: {life_cycle}\nResult State: {result_state}")
else:
    print(f"Failed to get job status. Status Code: {status_response.status_code}")
    exit()

# STEP 2: Get job output (only if completed)
if life_cycle in ["TERMINATED", "SKIPPED", "INTERNAL_ERROR"]:
    output_url = f"{DATABRICKS_INSTANCE}/api/2.2/jobs/runs/get-output"
    output_response = requests.get(output_url, headers=headers, params={"run_id": RUN_ID})
    
    if output_response.status_code == 200:
        output_data = output_response.json()
        result = output_data.get("notebook_output", {}).get("result", "No output or dbutils.notebook.exit() used.")
        print(f"\n=== Job Output ===\n{result}")
    else:
        print(f"Failed to get output. Status Code: {output_response.status_code}")
else:
    print("\nJob is still running. Wait for it to complete before checking output.")
