# Databricks Architecture & Roles

**Objective:** Understand the high-level architecture of Databricks, specifically how it integrates with cloud providers (Azure/AWS/GCP), the separation of responsibilities (Control Plane vs. Data Plane), and the administrative roles required to manage the platform.

---

## 1. Cloud Account & Workspace Structure

Databricks operates on top of major cloud providers: **AWS, Azure, and GCP**.

### The Hierarchy
1.  **Databricks Account:** The top-level entity. You can have one Databricks Account managing multiple Workspaces.
2.  **Workspaces:** Isolated environments for different teams or stages (e.g., `Dev`, `UAT`, `Prod`).
    *   All workspaces are managed by the same Account.
    *   Users and Groups are often defined at the Account level and assigned to specific Workspaces.

### Workspace URL Anatomy
When you launch a workspace, you will see a URL structure similar to this:
*   `https://adb-<workspace-id>.<region>.azuredatabricks.net`
*   `https://<workspace-id>.cloud.databricks.com`

 The unique number in the URL is the **Workspace ID**.

In [None]:
# Simulation: Extracting Workspace ID from a URL
# In a real environment, you might parse this to know which environment you are connected to.

def parse_workspace_url(url):
    print(f"Analyzing URL: {url}")
    if "adb-" in url:
        # Example logic for Azure Databricks style
        parts = url.split("-")
        # Extracting the number part roughly
        workspace_id = parts[1].split(".")[0]
        print(f"Workspace ID: {workspace_id}")
    else:
        print("Standard URL format not detected.")

# Example URLs from the video
url_1 = "https://adb-1563232110.azuredatabricks.net"
parse_workspace_url(url_1)

## 2. High-Level Architecture: Control Plane vs. Data Plane

This is the most critical concept for security and architecture. Databricks uses a split architecture.

| Feature | **Control Plane** (Managed by Databricks) | **Data Plane** (Managed by Customer) |
| :--- | :--- | :--- |
| **Location** | Databricks Cloud Account | **Your** Cloud Account (Azure/AWS/GCP) |
| **Responsibility** | Backend services, Web Application, Orchestration | Data Processing, Storage, Compute execution |
| **Data Storage** | Stores **Metadata**, Notebook code, Job configurations | Stores **Actual Data** (in your Data Lake/Blob Storage) |
| **Compute** | Manages cluster configurations | **Clusters run here.** Compute instances spin up in your VPC/VNet. |
| **Security** | **Data never resides here.** | **Data never leaves your account.** |

### Key Takeaway
*   The **Web App** (UI), **Notebooks**, and **Job Definitions** live in the Control Plane.
*   The **Clusters** (Virtual Machines) and **Data** (Customer Data) live in the Data Plane (Your Account).
*   If a cluster needs data from an external source (e.g., MySQL, another Data Lake), the connection initiates from the **Data Plane** (Your cloud network), not the Control Plane.

## 3. Roles and Responsibilities

To manage this infrastructure, Databricks defines four specific roles.

### 1. Account Administrator
*   **Scope:** The entire Cloud Account.
*   **Tasks:**
    *   Create new Workspaces (e.g., Setup Dev, QA, Prod).
    *   Manage the Unity Catalog Metastore.
    *   Manage Account-level Users and Groups.

### 2. Metastore Administrator
*   **Scope:** Unity Catalog (Data Governance).
*   **Tasks:**
    *   Create **Catalogs**.
    *   Manage data objects (Schemas, Tables).
    *   Delegate privileges/permissions to other users/groups.

### 3. Workspace Administrator
*   **Scope:** A specific Workspace (e.g., the Admin of just the 'Dev' environment).
*   **Tasks:**
    *   Manage Workspace-level users and groups.
    *   Manage Workspace assets (Cluster policies, Folders).
    *   Grant permissions within that specific workspace.

### 4. Owner
*   **Scope:** Specific Objects.
*   **Definition:** The user who creates an object is the **Owner**.
*   **Example:** If you create a Table or a Schema, you are the Owner. You have full control over it and can grant access to others.

In [None]:
# Concept Check
roles = {
    "Account Admin": "Creates Workspaces & Metastores",
    "Metastore Admin": "Manages Catalogs & Data Permissions",
    "Workspace Admin": "Manages specific Workspace Users & Compute",
    "Owner": "Creator of a specific object (Table/Job)"
}

print("Hierarchy of Roles in Databricks:\n")
for role, desc in roles.items():
    print(f"ðŸ”¹ {role}: {desc}")

## Next Steps
Now that we understand the architecture and roles, in the next session, we will start the **Hands-on Setup**. We will:
1.  Set up Databricks on Azure.
2.  Create our first Workspace.
3.  Log in as an Administrator.