# Delta Tables: Cloning, CTAS, and Views

**Objective:** Master different ways to copy and reference data in Databricks using Delta Lake features. We will cover:
1.  **CTAS (Create Table As Select):** Creating a new physical table from a query.
2.  **Views:** Creating logical references (Standard vs. Temporary).
3.  **Deep Clone:** Creating a full independent copy of a Delta table (Data + Metadata).
4.  **Shallow Clone:** Creating a copy of metadata only (Data files are referenced, not copied).

---

## 1. Setup Data
We will use the `dev.bronze` schema and the `emp` table we created previously.

In [None]:
-- Ensure we are in the correct context
USE CATALOG dev;
USE SCHEMA bronze;

-- Verify source data
SELECT * FROM emp;

## 2. CTAS (Create Table As Select)
This creates a **new physical table** with data copied from the source query at that point in time.

*   **Pros:** Full independence from source.
*   **Cons:** Duplicates data storage.

In [None]:
-- Create a new table 'emp_ctas' based on 'emp'
CREATE TABLE IF NOT EXISTS emp_ctas
AS SELECT * FROM emp;

-- Verify data
SELECT * FROM emp_ctas;

-- Verify History (Version 0 will show CREATE TABLE AS SELECT)
DESCRIBE HISTORY emp_ctas;

## 3. Views
Views are logical queries saved in the metastore. They do not duplicate data.

### A. Standard View (Permanent)
*   Persisted in Unity Catalog.
*   Available to other users with permission.

In [None]:
CREATE OR REPLACE VIEW v_emp_high_salary AS
SELECT * FROM emp WHERE salary > 50000;

SELECT * FROM v_emp_high_salary;

### B. Temporary View
*   **Session-scoped:** Only exists for the current Spark session/notebook run.
*   **Dropped automatically** when cluster restarts or notebook detaches.

In [None]:
CREATE OR REPLACE TEMPORARY VIEW v_temp_emp AS
SELECT * FROM emp;

-- This works now, but if you detach and reattach, this view will disappear.
SELECT * FROM v_temp_emp;

## 4. Delta Cloning
Cloning is a unique feature of Delta Lake.

### A. Deep Clone
*   **Behavior:** Copies source table data AND metadata to a new location.
*   **Use Case:** archiving, migrating tables, testing with full isolation.
*   **Independence:** Completely independent of the source table after creation.

In [None]:
-- Create a Deep Clone
CREATE TABLE IF NOT EXISTS emp_deep_clone
DEEP CLONE emp;

-- Verify History (Operation will show CLONE)
DESCRIBE HISTORY emp_deep_clone;

### B. Shallow Clone
*   **Behavior:** Copies **ONLY metadata** (transaction log). The data files are **not** copied; they are referenced from the source.
*   **Speed:** Extremely fast (seconds) and cheap storage-wise.
*   **Use Case:** Testing pipelines on production data without duplicating storage costs.
*   **Warning:** If the source table is deleted or vacuumed aggressively, the shallow clone may break (since it relies on source files).

In [None]:
-- Create a Shallow Clone
CREATE TABLE IF NOT EXISTS emp_shallow_clone
SHALLOW CLONE emp;

-- Verify History (Operation parameters will show shallow=true)
DESCRIBE HISTORY emp_shallow_clone;

-- Verify data (It reads from source files)
SELECT * FROM emp_shallow_clone;

## 5. Comparison Summary

| Feature | Data Copied? | Metadata Copied? | Independent? | Speed |
| :--- | :--- | :--- | :--- | :--- |
| **CTAS** | Yes | Yes (Schema only) | Yes | Slow (Write heavy) |
| **Deep Clone** | Yes | Yes (History+Schema) | Yes | Slow (Write heavy) |
| **Shallow Clone** | No (Referenced) | Yes | No (Dependent on source files) | Fast (Metadata only) |

## Next Steps
In the next video, we will explore **Delta Lake Merge (Upsert)** operations to handle data updates and inserts efficiently.