# Databricks Compute (Clusters) Architecture

**Objective:**
Understand the computing resources that power Databricks workloads. We will cover types of compute, access modes, runtimes, and how to manage costs.

**Agenda:**
1.  **What is Compute?** (Drivers & Workers)
2.  **All-Purpose vs. Job Compute**
3.  **Access Modes** (Single User vs. Shared)
4.  **Databricks Runtime (DBR) & Photon**
5.  **Cost Management** (Auto-scaling & Termination)
6.  **Cluster Permissions**

## 1. Types of Compute

### A. All-Purpose Compute
*   **Use Case:** Interactive analysis (Notebooks, Databricks SQL queries).
*   **Lifecycle:** You manually create it, start it, and terminate it.
*   **Cost:** Generally more expensive than Job compute because it is designed for interactivity.
*   **Creation:** Created via the "Compute" tab in the UI.

### B. Job Compute
*   **Use Case:** Running automated jobs (Workflows).
*   **Lifecycle:** Created automatically when a job starts and terminates immediately when the job finishes.
*   **Cost:** Cheaper than All-Purpose compute.
*   **Creation:** Defined within the "Workflows" tab when setting up a job task.

> **Best Practice:** Always use **Job Compute** for production scheduled workloads to save costs.

## 2. Access Modes (Crucial for Unity Catalog)

When creating a cluster, you must select an Access Mode. This dictates how users interact with the cluster and data security.

| Access Mode | Unity Catalog Support | Description |
| :--- | :--- | :--- |
| **Single User** | ✅ Yes | Designed for a single developer. Supports Python, SQL, Scala, R. The user acts as the owner. |
| **Shared** | ✅ Yes | Designed for multiple users to share the same cluster securely. Good for concurrent SQL/Python usage. Limits some filesystem operations for security. |
| **No Isolation Shared** | ❌ No | Legacy mode. Does not support Unity Catalog governance features. |

## 3. Databricks Runtime (DBR) & Photon

### Databricks Runtime (DBR)
A pre-configured environment that includes Spark, Python, Scala, and standard libraries.
*   **LTS (Long Term Support):** Recommended for stability (e.g., 13.3 LTS, 14.3 LTS).
*   **ML Variants:** Includes machine learning libraries (TensorFlow, PyTorch) pre-installed.

### Photon Acceleration
*   A native vectorized execution engine written in C++ (instead of JVM).
*   **Benefits:** significantly speeds up SQL and DataFrame operations.
*   **Cost:** Slightly higher DBU rate, but often results in lower *total* cost because jobs finish much faster.

In [None]:
# Practical: Check if the current cluster has Photon enabled
# We can check the spark configuration for photon
is_photon = spark.conf.get("spark.databricks.clusterUsageTags.sparkVersion", "Unknown")

print(f"Cluster Runtime Version Tag: {is_photon}")

# Check specifically for photon engine
try:
    photon_status = spark.conf.get("spark.databricks.service.client.photonEnabled")
    print(f"Photon Enabled: {photon_status}")
except:
    print("Could not determine exact Photon status from config, likely standard runtime.")

## 4. Cost Management

### Auto-scaling
*   Allows the cluster to resize based on workload.
*   **Min Workers:** The cluster will never go below this.
*   **Max Workers:** The cluster will never exceed this (budget control).

### Auto-termination
*   **Critical Feature:** Automatically stops the cluster after `X` minutes of inactivity.
*   **Recommendation:** Set this to 15-30 minutes for development clusters to prevent accidental overnight costs.

In [None]:
# Practical: Inspect Current Cluster Configuration
# We can access cluster tags to see how this specific cluster is configured
# Note: This requires access to the context tags

try:
    tags = spark.conf.get("spark.databricks.clusterUsageTags.clusterName")
    print(f"Current Cluster Name: {tags}")
    
    workers = spark.conf.get("spark.databricks.clusterUsageTags.clusterWorkers")
    print(f"Target Workers: {workers}")
except Exception as e:
    print("Unable to fetch cluster tags.")

## 5. Permissions
Who can do what with a cluster?

1.  **Can Manage:** Can modify configuration, start, stop, and delete the cluster.
2.  **Can Restart:** Can only restart the cluster (useful for clearing cache/state).
3.  **Can Attach To:** Can connect notebooks to the cluster to run code, but cannot start/stop it.

## 6. Monitoring (Spark UI & Logs)
*   **Spark UI:** Used for debugging DAGs, tasks, and stages performance.
*   **Driver Logs:** `Standard Output` (print statements) and `Log4j` logs are found here. Essential for debugging errors.
*   **Metrics:** CPU and Memory utilization charts (Ganglia metrics).