# Legacy Data Management: Hive Metastore

**Objective:** Before we enable Unity Catalog, it is crucial to understand how data was managed in the "Legacy" world using the **Hive Metastore**. We will explore the default behavior of Managed vs. External tables and how file storage works in the DBFS (Databricks File System).

---

## 1. The Default Hive Metastore

Every Databricks workspace comes with a built-in Hive Metastore.
*   **Catalog Name:** `hive_metastore` (This is the only catalog available before Unity Catalog is enabled).
*   **Default Schema:** `default`.
*   **Storage Location:** `dbfs:/user/hive/warehouse/`.

### Managed vs. External Tables
1.  **Managed Table:** Databricks manages both the **Metadata** and the **Data**.
    *   *Drop Table:* Deletes Metadata + Data files.
    *   *Location:* `dbfs:/user/hive/warehouse/<schema>.db/<table>/`
2.  **External Table:** Databricks manages only the **Metadata**. You manage the **Data**.
    *   *Drop Table:* Deletes Metadata only. Data files remain intact.
    *   *Location:* Custom path (e.g., `dbfs:/mnt/data/` or cloud path).

In [None]:
# Step 1: Setup - Create a Schema (Database)
# This schema will be created in the legacy Hive Metastore
%sql
CREATE SCHEMA IF NOT EXISTS legacy_demo;
USE legacy_demo;

## 2. Working with Managed Tables
Let's create a **Managed Table**. We will not specify a `LOCATION`, so it will use the default DBFS path.

In [None]:
# 2.1 Create a Managed Table
%sql
CREATE TABLE IF NOT EXISTS managed_emp (
    id INT,
    name STRING,
    dept STRING
);

-- Insert Data
INSERT INTO managed_emp VALUES 
(1, 'John', 'IT'),
(2, 'Jane', 'HR');

-- Verify Data
SELECT * FROM managed_emp;

In [None]:
# 2.2 Check Table Details (Metadata)
# Look for "Type: MANAGED" and "Location: dbfs:/user/hive/warehouse/..."
%sql
DESCRIBE EXTENDED managed_emp;

### Verify Managed Table Behavior (DROP)
If we drop a managed table, the underlying files should be deleted.

In [None]:
# 2.3 Drop Managed Table
%sql
DROP TABLE managed_emp;

-- If you try to query the underlying location (from the previous DESCRIBE output),
-- it should now be empty or non-existent.
-- Example check: dbutils.fs.ls("dbfs:/user/hive/warehouse/legacy_demo.db/managed_emp") -> Should fail or return empty

## 3. Working with External Tables
Now, let's create an **External Table**. We must explicitly specify a `LOCATION`.

In [None]:
# 3.1 Define a custom location for External Table
external_path = "dbfs:/FileStore/tables/legacy_demo/external_emp"

# Clean up path if exists (for demo purposes)
dbutils.fs.rm(external_path, True)

# 3.2 Create External Table using Python/Spark (Alternative syntax)
# We can also use SQL with the LOCATION keyword
%sql
CREATE TABLE IF NOT EXISTS external_emp (
    id INT,
    name STRING,
    dept STRING
)
LOCATION 'dbfs:/FileStore/tables/legacy_demo/external_emp';

-- Insert Data
INSERT INTO external_emp VALUES 
(10, 'Mike', 'Finance'),
(20, 'Sarah', 'Ops');

In [None]:
# 3.3 Check Table Details
# Look for "Type: EXTERNAL" and the custom Location we provided.
%sql
DESCRIBE EXTENDED external_emp;

### Verify External Table Behavior (DROP)
If we drop an external table, the data files should **persist**.

In [None]:
# 3.4 Drop External Table
%sql
DROP TABLE external_emp;

In [None]:
# 3.5 Verify Data Persistence
# Even though the table is dropped from the UI, the files should still exist in the path.
files = dbutils.fs.ls("dbfs:/FileStore/tables/legacy_demo/external_emp")
display(files)

# You should see Parquet/Delta files listed here.

## 4. Recreating Table from Existing Data
Since the data exists for the external table, we can recreate the table metadata and instantly access the data again.

In [None]:
%sql
-- Recreate table pointing to existing data
CREATE TABLE IF NOT EXISTS recovered_emp
LOCATION 'dbfs:/FileStore/tables/legacy_demo/external_emp';

-- Query Data immediately
SELECT * FROM recovered_emp;

## 5. Views in Hive Metastore
Views are logical queries saved in the metastore. They do not hold data themselves but point to tables.

In [None]:
-- Create a View
CREATE VIEW IF NOT EXISTS v_it_employees AS
SELECT * FROM recovered_emp WHERE dept = 'Finance'; -- Using Finance as per previous insert

SELECT * FROM v_it_employees;

## Next Steps
We have seen the limitations and behavior of the Legacy Hive Metastore. In the next session, we will officially **Enable Unity Catalog** to bring modern governance, 3-level namespaces, and centralized security to our workspace.