# Managed vs. External Delta Lake Tables

This notebook demonstrates the difference between **Managed** and **External** Delta Lake tables using **PySpark**.

Delta Lake brings ACID transactions to Apache Spark and big data workloads. Understanding the distinction between managed and external tables is crucial for data management and governance.


## 🔍 Definitions

### Managed Table
- Spark manages both the **metadata** and the **data**.
- Dropping the table deletes both the metadata and the data files.

### External Table
- Spark manages only the **metadata**.
- The data resides at an external location.
- Dropping the table deletes only the metadata, not the data files.


In [None]:
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("DeltaTableExample") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .getOrCreate()


## 📘 Example 1: Creating a Managed Delta Table

We create a managed Delta table by writing a DataFrame and saving it as a table.


In [None]:
data = [(1, "Alice", 30), (2, "Bob", 25)]
columns = ["id", "name", "age"]

df = spark.createDataFrame(data, columns)

# Save as a managed table
df.write.format("delta").saveAsTable("managed_people")


In [None]:
# Query the managed table
spark.sql("SELECT * FROM managed_people").show()


In [None]:
# Drop the managed table (this deletes both metadata and data)
spark.sql("DROP TABLE managed_people")


## 📘 Example 2: Creating an External Delta Table

We create an external Delta table by writing data to a specific path and registering it as a table.


In [None]:
external_path = "/tmp/external_people"

df.write.format("delta").mode("overwrite").save(external_path)

# Register the external table
spark.sql(f"CREATE TABLE external_people USING DELTA LOCATION '{external_path}'")


In [None]:
# Query the external table
spark.sql("SELECT * FROM external_people").show()


In [None]:
# Drop the external table (this deletes only the metadata)
spark.sql("DROP TABLE external_people")


In [None]:
# Load the data directly from the path to verify it still exists
spark.read.format("delta").load(external_path).show()


## ✅ Summary

| Feature | Managed Table | External Table |
|--------|----------------|----------------|
| Metadata Managed By | Spark | Spark |
| Data Managed By | Spark | User |
| Drop Table Deletes Data | ✅ Yes | ❌ No |
| Use Case | Temporary or internal datasets | Shared or persistent datasets |

Understanding the distinction helps in choosing the right table type for your data architecture.
