###  OPTIMIZE COMMAND IN DATABRICKS (DELTA LAKE)

#### PURPOSE:
- The OPTIMIZE command in Databricks compacts small files into larger ones within a Delta table.
- This improves query performance by reducing the number of files that Spark needs to read.

#### WHY NEEDED:
- Delta tables can accumulate many small files due to frequent updates, merges, and streaming writes.
- OPTIMIZE combines these small files into fewer large files, which helps improve read performance.

#### HOW IT WORKS:
- It rewrites data files within each partition(if any) into optimized files.
- Uses a bin-packing algorithm to combine smaller files into target-sized files (~1GB each).
- Only affects physical layout of data — does NOT change data content.


In [0]:
%sql
-- Step 1: Create a Delta table
CREATE OR REPLACE TABLE inceptez_catalog.inputdb.tblsales
(
  sales_id INT,
  product_id INT,
  region STRING,
  sales_amount DOUBLE,
  sales_date DATE
)
USING DELTA;

In [0]:
%sql
-- Step 2: Insert sample data

-- Let’s add multiple small batches to simulate many small files:

INSERT INTO inceptez_catalog.inputdb.tblsales VALUES
  (1, 101, 'North', 1000.50, '2025-10-16'),
  (2, 102, 'South', 500.75, '2025-10-16'),
  (3, 103, 'East', 700.20, '2025-10-16'),
  (4, 104, 'West', 1200.00, '2025-10-16');

INSERT INTO inceptez_catalog.inputdb.tblsales VALUES
  (5, 101, 'North', 800.00, '2025-10-17'),
  (6, 102, 'South', 450.00, '2025-10-17'),
  (7, 103, 'East', 600.00, '2025-10-17'),
  (8, 104, 'West', 1100.00, '2025-10-17');


In [0]:
%sql
select * from inceptez_catalog.inputdb.tblsales;
--spark.sql("select * from inceptez_catalog.inputdb.tblsales");
--spark.Table("inceptez_catalog.inputdb.tblsales")

In [0]:
%sql
-- Step 3: Check fragmentation

DESCRIBE DETAIL inceptez_catalog.inputdb.tblsales;

In [0]:
%sql
-- Step 4: Optimize the table
-- This performs file compaction:
-- Combines many small Parquet files into fewer large files (around 1 GB default).
-- Improves read performance and reduces metadata overhead.


OPTIMIZE inceptez_catalog.inputdb.tblsales;

In [0]:
%sql
-- Step 5: Verify compaction

-- After optimization, run:

DESCRIBE DETAIL inceptez_catalog.inputdb.tblsales;