diff --git a/Notebooks/liquid_clustering/education_delta_liquid_clustering_demo.ipynb b/Notebooks/liquid_clustering/education_delta_liquid_clustering_demo.ipynb
new file mode 100644
index 0000000..662577b
--- /dev/null
+++ b/Notebooks/liquid_clustering/education_delta_liquid_clustering_demo.ipynb
@@ -0,0 +1,1061 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Education: Delta Liquid Clustering Demo\n",
+    "\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "\n",
+    "This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using an education analytics use case. Liquid clustering automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.\n",
+    "\n",
+    "### What is Liquid Clustering?\n",
+    "\n",
+    "Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:\n",
+    "\n",
+    "- **Automatic optimization**: No manual tuning required\n",
+    "- **Improved query performance**: Faster queries on clustered columns\n",
+    "- **Reduced maintenance**: No need for manual repartitioning\n",
+    "- **Adaptive clustering**: Adjusts as data patterns change\n",
+    "\n",
+    "### Use Case: Student Performance Analytics and Learning Management\n",
+    "\n",
+    "We'll analyze student learning data and academic performance metrics. Our clustering strategy will optimize for:\n",
+    "\n",
+    "- **Student-specific queries**: Fast lookups by student ID\n",
+    "- **Time-based analysis**: Efficient filtering by academic period and assessment dates\n",
+    "- **Performance patterns**: Quick aggregation by subject and learning outcomes\n",
+    "\n",
+    "### AIDP Environment Setup\n",
+    "\n",
+    "This notebook leverages the existing Spark session in your AIDP environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Education catalog and analytics schema created successfully!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create education catalog and analytics schema\n",
+    "\n",
+    "# In AIDP, catalogs provide data isolation and governance\n",
+    "\n",
+    "spark.sql(\"CREATE CATALOG IF NOT EXISTS education\")\n",
+    "\n",
+    "spark.sql(\"CREATE SCHEMA IF NOT EXISTS education.analytics\")\n",
+    "\n",
+    "print(\"Education catalog and analytics schema created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create Delta Table with Liquid Clustering\n",
+    "\n",
+    "### Table Design\n",
+    "\n",
+    "Our `student_assessments` table will store:\n",
+    "\n",
+    "- **student_id**: Unique student identifier\n",
+    "- **assessment_date**: Date of assessment or assignment\n",
+    "- **subject**: Academic subject area\n",
+    "- **score**: Assessment score (0-100)\n",
+    "- **grade_level**: Student grade level\n",
+    "- **completion_time**: Time spent on assessment (minutes)\n",
+    "- **engagement_score**: Student engagement metric (0-100)\n",
+    "\n",
+    "### Clustering Strategy\n",
+    "\n",
+    "We'll cluster by `student_id` and `assessment_date` because:\n",
+    "\n",
+    "- **student_id**: Students generate multiple assessments, grouping learning progress together\n",
+    "- **assessment_date**: Time-based queries are critical for academic tracking, semester analysis, and intervention planning\n",
+    "- This combination optimizes for both individual student monitoring and temporal academic performance analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Delta table with liquid clustering created successfully!\n",
+       "Clustering will automatically optimize data layout for queries on student_id and assessment_date.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create Delta table with liquid clustering\n",
+    "\n",
+    "# CLUSTER BY defines the columns for automatic optimization\n",
+    "\n",
+    "spark.sql(\"\"\"\n",
+    "\n",
+    "CREATE TABLE IF NOT EXISTS education.analytics.student_assessments (\n",
+    "\n",
+    "    student_id STRING,\n",
+    "\n",
+    "    assessment_date DATE,\n",
+    "\n",
+    "    subject STRING,\n",
+    "\n",
+    "    score DECIMAL(5,2),\n",
+    "\n",
+    "    grade_level STRING,\n",
+    "\n",
+    "    completion_time DECIMAL(6,2),\n",
+    "\n",
+    "    engagement_score INT\n",
+    "\n",
+    ")\n",
+    "\n",
+    "USING DELTA\n",
+    "\n",
+    "CLUSTER BY (student_id, assessment_date)\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"Delta table with liquid clustering created successfully!\")\n",
+    "\n",
+    "print(\"Clustering will automatically optimize data layout for queries on student_id and assessment_date.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Generate Education Sample Data\n",
+    "\n",
+    "### Data Generation Strategy\n",
+    "\n",
+    "We'll create realistic student assessment data including:\n",
+    "\n",
+    "- **3,000 students** with multiple assessments over time\n",
+    "- **Subjects**: Math, English, Science, History, Art, Physical Education\n",
+    "- **Realistic performance patterns**: Learning curves, subject difficulty variations, engagement factors\n",
+    "- **Grade levels**: K-12 with appropriate academic progression\n",
+    "\n",
+    "### Why This Data Pattern?\n",
+    "\n",
+    "This data simulates real education scenarios where:\n",
+    "\n",
+    "- Student performance varies by subject and time\n",
+    "- Learning progress needs longitudinal tracking\n",
+    "- Intervention strategies require early identification\n",
+    "- Curriculum effectiveness drives teaching improvements\n",
+    "- Standardized testing and reporting require temporal analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Generated 67514 student assessment records\n",
+       "Sample record: {'student_id': 'STU000001', 'assessment_date': datetime.date(2024, 10, 21), 'subject': 'Science', 'score': 44.62, 'grade_level': '12th Grade', 'completion_time': 79.61, 'engagement_score': 50}\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Generate sample student assessment data\n",
+    "\n",
+    "# Using fully qualified imports to avoid conflicts\n",
+    "\n",
+    "import random\n",
+    "\n",
+    "from datetime import datetime, timedelta\n",
+    "\n",
+    "\n",
+    "# Define education data constants\n",
+    "\n",
+    "SUBJECTS = ['Math', 'English', 'Science', 'History', 'Art', 'Physical Education']\n",
+    "\n",
+    "GRADE_LEVELS = ['Kindergarten', '1st Grade', '2nd Grade', '3rd Grade', '4th Grade', '5th Grade', \n",
+    "                '6th Grade', '7th Grade', '8th Grade', '9th Grade', '10th Grade', '11th Grade', '12th Grade']\n",
+    "\n",
+    "# Base performance parameters by subject and grade level\n",
+    "\n",
+    "PERFORMANCE_PARAMS = {\n",
+    "\n",
+    "    'Math': {'base_score': 75, 'difficulty': 1.2, 'time_factor': 1.5},\n",
+    "\n",
+    "    'English': {'base_score': 78, 'difficulty': 1.0, 'time_factor': 1.2},\n",
+    "\n",
+    "    'Science': {'base_score': 72, 'difficulty': 1.3, 'time_factor': 1.4},\n",
+    "\n",
+    "    'History': {'base_score': 70, 'difficulty': 1.1, 'time_factor': 1.1},\n",
+    "\n",
+    "    'Art': {'base_score': 82, 'difficulty': 0.8, 'time_factor': 0.9},\n",
+    "\n",
+    "    'Physical Education': {'base_score': 85, 'difficulty': 0.7, 'time_factor': 0.8}\n",
+    "\n",
+    "}\n",
+    "\n",
+    "# Grade level adjustments\n",
+    "\n",
+    "GRADE_ADJUSTMENTS = {\n",
+    "\n",
+    "    'Kindergarten': 0.7, '1st Grade': 0.75, '2nd Grade': 0.8, '3rd Grade': 0.82,\n",
+    "\n",
+    "    '4th Grade': 0.85, '5th Grade': 0.87, '6th Grade': 0.8, '7th Grade': 0.78,\n",
+    "\n",
+    "    '8th Grade': 0.76, '9th Grade': 0.74, '10th Grade': 0.72, '11th Grade': 0.7, '12th Grade': 0.68\n",
+    "\n",
+    "}\n",
+    "\n",
+    "\n",
+    "# Generate student assessment records\n",
+    "\n",
+    "assessment_data = []\n",
+    "\n",
+    "base_date = datetime(2024, 1, 1)\n",
+    "\n",
+    "\n",
+    "# Create 3,000 students with 15-30 assessments each\n",
+    "\n",
+    "for student_num in range(1, 3001):\n",
+    "\n",
+    "    student_id = f\"STU{student_num:06d}\"\n",
+    "    \n",
+    "    # Assign grade level\n",
+    "\n",
+    "    grade_level = random.choice(GRADE_LEVELS)\n",
+    "\n",
+    "    grade_factor = GRADE_ADJUSTMENTS[grade_level]\n",
+    "    \n",
+    "    # Each student gets 15-30 assessments over 12 months\n",
+    "\n",
+    "    num_assessments = random.randint(15, 30)\n",
+    "    \n",
+    "    for i in range(num_assessments):\n",
+    "\n",
+    "        # Spread assessments over 12 months\n",
+    "\n",
+    "        days_offset = random.randint(0, 365)\n",
+    "\n",
+    "        assessment_date = base_date + timedelta(days=days_offset)\n",
+    "        \n",
+    "        # Select subject\n",
+    "\n",
+    "        subject = random.choice(SUBJECTS)\n",
+    "\n",
+    "        params = PERFORMANCE_PARAMS[subject]\n",
+    "        \n",
+    "        # Calculate score with variations\n",
+    "\n",
+    "        score_variation = random.uniform(0.7, 1.3)\n",
+    "\n",
+    "        base_score = params['base_score'] * grade_factor / params['difficulty']\n",
+    "\n",
+    "        score = round(min(100, max(0, base_score * score_variation)), 2)\n",
+    "        \n",
+    "        # Calculate completion time\n",
+    "\n",
+    "        time_variation = random.uniform(0.8, 1.5)\n",
+    "\n",
+    "        base_time = 45 * params['time_factor']  # 45 minutes base time\n",
+    "\n",
+    "        completion_time = round(base_time * time_variation, 2)\n",
+    "        \n",
+    "        # Engagement score (affects performance)\n",
+    "\n",
+    "        engagement_score = random.randint(40, 100)\n",
+    "\n",
+    "        # Slightly adjust score based on engagement\n",
+    "\n",
+    "        engagement_factor = engagement_score / 100.0\n",
+    "\n",
+    "        score = round(min(100, score * (0.8 + 0.4 * engagement_factor)), 2)\n",
+    "        \n",
+    "        assessment_data.append({\n",
+    "\n",
+    "            \"student_id\": student_id,\n",
+    "\n",
+    "            \"assessment_date\": assessment_date.date(),\n",
+    "\n",
+    "            \"subject\": subject,\n",
+    "\n",
+    "            \"score\": float(score),\n",
+    "\n",
+    "            \"grade_level\": grade_level,\n",
+    "\n",
+    "            \"completion_time\": float(completion_time),\n",
+    "\n",
+    "            \"engagement_score\": int(engagement_score)\n",
+    "\n",
+    "        })\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(f\"Generated {len(assessment_data)} student assessment records\")\n",
+    "\n",
+    "print(\"Sample record:\", assessment_data[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Insert Data Using PySpark\n",
+    "\n",
+    "### Data Insertion Strategy\n",
+    "\n",
+    "We'll use PySpark to:\n",
+    "\n",
+    "1. **Create DataFrame** from our generated data\n",
+    "2. **Insert into Delta table** with liquid clustering\n",
+    "3. **Verify the insertion** with a sample query\n",
+    "\n",
+    "### Why PySpark for Insertion?\n",
+    "\n",
+    "- **Distributed processing**: Handles large datasets efficiently\n",
+    "- **Type safety**: Ensures data integrity\n",
+    "- **Optimization**: Leverages Spark's query optimization\n",
+    "- **Liquid clustering**: Automatically applies clustering during insertion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataFrame Schema:\n",
+       "root\n",
+       " |-- assessment_date: date (nullable = true)\n",
+       " |-- completion_time: double (nullable = true)\n",
+       " |-- engagement_score: long (nullable = true)\n",
+       " |-- grade_level: string (nullable = true)\n",
+       " |-- score: double (nullable = true)\n",
+       " |-- student_id: string (nullable = true)\n",
+       " |-- subject: string (nullable = true)\n",
+       "\n",
+       "\n",
+       "Sample Data:\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+---------------+---------------+----------------+-----------+-----+----------+------------------+\n",
+       "|assessment_date|completion_time|engagement_score|grade_level|score|student_id|           subject|\n",
+       "+---------------+---------------+----------------+-----------+-----+----------+------------------+\n",
+       "|     2024-10-21|          79.61|              50| 12th Grade|44.62| STU000001|           Science|\n",
+       "|     2024-03-06|           52.9|              85| 12th Grade|88.76| STU000001|Physical Education|\n",
+       "|     2024-09-24|          34.43|              52| 12th Grade|60.94| STU000001|               Art|\n",
+       "|     2024-09-12|          83.62|              58| 12th Grade|48.87| STU000001|           Science|\n",
+       "|     2024-12-01|          47.97|              58| 12th Grade|68.91| STU000001|           English|\n",
+       "+---------------+---------------+----------------+-----------+-----+----------+------------------+\n",
+       "only showing top 5 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "Successfully inserted 67514 records into education.analytics.student_assessments\n",
+       "Liquid clustering automatically optimized the data layout during insertion!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Insert data using PySpark DataFrame operations\n",
+    "\n",
+    "# Using fully qualified function references to avoid conflicts\n",
+    "\n",
+    "\n",
+    "# Create DataFrame from generated data\n",
+    "\n",
+    "df_assessments = spark.createDataFrame(assessment_data)\n",
+    "\n",
+    "\n",
+    "# Display schema and sample data\n",
+    "\n",
+    "print(\"DataFrame Schema:\")\n",
+    "\n",
+    "df_assessments.printSchema()\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(\"\\nSample Data:\")\n",
+    "\n",
+    "df_assessments.show(5)\n",
+    "\n",
+    "\n",
+    "# Insert data into Delta table with liquid clustering\n",
+    "\n",
+    "# The CLUSTER BY (student_id, assessment_date) will automatically optimize the data layout\n",
+    "\n",
+    "df_assessments.write.mode(\"overwrite\").saveAsTable(\"education.analytics.student_assessments\")\n",
+    "\n",
+    "\n",
+    "print(f\"\\nSuccessfully inserted {df_assessments.count()} records into education.analytics.student_assessments\")\n",
+    "\n",
+    "print(\"Liquid clustering automatically optimized the data layout during insertion!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Demonstrate Liquid Clustering Benefits\n",
+    "\n",
+    "### Query Performance Analysis\n",
+    "\n",
+    "Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:\n",
+    "\n",
+    "1. **Student assessment history** (clustered by student_id)\n",
+    "2. **Time-based academic analysis** (clustered by assessment_date)\n",
+    "3. **Combined student + time queries** (optimal for our clustering)\n",
+    "\n",
+    "### Expected Performance Benefits\n",
+    "\n",
+    "With liquid clustering, these queries should be significantly faster because:\n",
+    "\n",
+    "- **Data locality**: Related records are physically grouped together\n",
+    "- **Reduced I/O**: Less data needs to be read from disk\n",
+    "- **Automatic optimization**: No manual tuning required"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Query 1: Student Assessment History ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+---------------+------------------+-----+----------------+\n",
+       "|student_id|assessment_date|           subject|score|engagement_score|\n",
+       "+----------+---------------+------------------+-----+----------------+\n",
+       "| STU000001|     2024-12-01|           English|68.91|              58|\n",
+       "| STU000001|     2024-10-21|           Science|44.62|              50|\n",
+       "| STU000001|     2024-10-10|           English|52.53|              86|\n",
+       "| STU000001|     2024-10-08|Physical Education|77.74|              59|\n",
+       "| STU000001|     2024-09-25|           Science|32.35|              40|\n",
+       "| STU000001|     2024-09-24|               Art|60.94|              52|\n",
+       "| STU000001|     2024-09-12|           Science|48.87|              58|\n",
+       "| STU000001|     2024-09-05|           English|68.71|              98|\n",
+       "| STU000001|     2024-08-30|              Math|33.82|              64|\n",
+       "| STU000001|     2024-08-10|              Math|53.37|              60|\n",
+       "| STU000001|     2024-08-06|Physical Education|76.45|              80|\n",
+       "| STU000001|     2024-05-06|               Art|55.28|              83|\n",
+       "| STU000001|     2024-04-25|           English|44.24|              71|\n",
+       "| STU000001|     2024-04-13|Physical Education|90.22|              55|\n",
+       "| STU000001|     2024-04-11|           Science|37.37|              71|\n",
+       "| STU000001|     2024-03-06|Physical Education|88.76|              85|\n",
+       "| STU000001|     2024-02-18|               Art|58.32|              82|\n",
+       "| STU000001|     2024-01-04|Physical Education|100.0|              92|\n",
+       "+----------+---------------+------------------+-----+----------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Records found: 18\n",
+       "\n",
+       "=== Query 2: Recent Low Performance Issues ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+---------------+----------+-------+-----+------------+\n",
+       "|assessment_date|student_id|subject|score| grade_level|\n",
+       "+---------------+----------+-------+-----+------------+\n",
+       "|     2024-09-22| STU000220|Science|25.83|  12th Grade|\n",
+       "|     2024-10-01| STU001661|Science|25.87|  12th Grade|\n",
+       "|     2024-08-09| STU001500|Science|26.33|  12th Grade|\n",
+       "|     2024-09-02| STU001198|Science|26.54|  12th Grade|\n",
+       "|     2024-12-04| STU002836|Science|26.57|  11th Grade|\n",
+       "|     2024-08-26| STU000831|Science|26.61|  12th Grade|\n",
+       "|     2024-10-15| STU001401|Science|26.71|  11th Grade|\n",
+       "|     2024-10-11| STU001198|Science|26.78|  12th Grade|\n",
+       "|     2024-06-23| STU000386|Science|26.78|  11th Grade|\n",
+       "|     2024-07-08| STU001919|Science|26.95|  11th Grade|\n",
+       "|     2024-12-08| STU002914|Science|27.05|  12th Grade|\n",
+       "|     2024-12-05| STU002552|Science|27.06|  11th Grade|\n",
+       "|     2024-10-01| STU001135|Science|27.07|Kindergarten|\n",
+       "|     2024-10-15| STU001119|Science|27.28|  12th Grade|\n",
+       "|     2024-12-19| STU001299|Science|27.33|Kindergarten|\n",
+       "|     2024-06-01| STU000557|Science|27.34|  12th Grade|\n",
+       "|     2024-12-04| STU002453|Science| 27.4|  12th Grade|\n",
+       "|     2024-11-03| STU001202|Science|27.41|Kindergarten|\n",
+       "|     2024-09-21| STU002152|Science|27.49|Kindergarten|\n",
+       "|     2024-11-29| STU002524|Science| 27.5|  10th Grade|\n",
+       "+---------------+----------+-------+-----+------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Low performance issues found: 19160\n",
+       "\n",
+       "=== Query 3: Student Performance Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+---------------+------------------+-----+----------------+\n",
+       "|student_id|assessment_date|           subject|score|engagement_score|\n",
+       "+----------+---------------+------------------+-----+----------------+\n",
+       "| STU000001|     2024-04-11|           Science|37.37|              71|\n",
+       "| STU000001|     2024-04-13|Physical Education|90.22|              55|\n",
+       "| STU000001|     2024-04-25|           English|44.24|              71|\n",
+       "| STU000001|     2024-05-06|               Art|55.28|              83|\n",
+       "| STU000001|     2024-08-06|Physical Education|76.45|              80|\n",
+       "| STU000001|     2024-08-10|              Math|53.37|              60|\n",
+       "| STU000001|     2024-08-30|              Math|33.82|              64|\n",
+       "| STU000001|     2024-09-05|           English|68.71|              98|\n",
+       "| STU000001|     2024-09-12|           Science|48.87|              58|\n",
+       "| STU000001|     2024-09-24|               Art|60.94|              52|\n",
+       "| STU000001|     2024-09-25|           Science|32.35|              40|\n",
+       "| STU000001|     2024-10-08|Physical Education|77.74|              59|\n",
+       "| STU000001|     2024-10-10|           English|52.53|              86|\n",
+       "| STU000001|     2024-10-21|           Science|44.62|              50|\n",
+       "| STU000001|     2024-12-01|           English|68.91|              58|\n",
+       "| STU000002|     2024-05-10|Physical Education|100.0|              71|\n",
+       "| STU000002|     2024-05-26|           History|60.61|              42|\n",
+       "| STU000002|     2024-06-02|           History|63.75|              97|\n",
+       "| STU000002|     2024-06-10|           Science|34.97|              62|\n",
+       "| STU000002|     2024-06-22|              Math|45.26|              72|\n",
+       "+----------+---------------+------------------+-----+----------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Performance trend records found: 17102\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Demonstrate liquid clustering benefits with optimized queries\n",
+    "\n",
+    "\n",
+    "# Query 1: Student assessment history - benefits from student_id clustering\n",
+    "\n",
+    "print(\"=== Query 1: Student Assessment History ===\")\n",
+    "\n",
+    "student_history = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT student_id, assessment_date, subject, score, engagement_score\n",
+    "\n",
+    "FROM education.analytics.student_assessments\n",
+    "\n",
+    "WHERE student_id = 'STU000001'\n",
+    "\n",
+    "ORDER BY assessment_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "student_history.show()\n",
+    "\n",
+    "print(f\"Records found: {student_history.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 2: Time-based academic performance analysis - benefits from assessment_date clustering\n",
+    "\n",
+    "print(\"\\n=== Query 2: Recent Low Performance Issues ===\")\n",
+    "\n",
+    "low_performance = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT assessment_date, student_id, subject, score, grade_level\n",
+    "\n",
+    "FROM education.analytics.student_assessments\n",
+    "\n",
+    "WHERE assessment_date >= '2024-06-01' AND score < 60\n",
+    "\n",
+    "ORDER BY score ASC, assessment_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "low_performance.show()\n",
+    "\n",
+    "print(f\"Low performance issues found: {low_performance.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 3: Combined student + time query - optimal for our clustering strategy\n",
+    "\n",
+    "print(\"\\n=== Query 3: Student Performance Trends ===\")\n",
+    "\n",
+    "performance_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT student_id, assessment_date, subject, score, engagement_score\n",
+    "\n",
+    "FROM education.analytics.student_assessments\n",
+    "\n",
+    "WHERE student_id LIKE 'STU000%' AND assessment_date >= '2024-04-01'\n",
+    "\n",
+    "ORDER BY student_id, assessment_date\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "performance_trends.show()\n",
+    "\n",
+    "print(f\"Performance trend records found: {performance_trends.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Analyze Clustering Effectiveness\n",
+    "\n",
+    "### Understanding the Impact\n",
+    "\n",
+    "Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the education insights possible with this optimized structure.\n",
+    "\n",
+    "### Key Analytics\n",
+    "\n",
+    "- **Student performance patterns** and learning analytics\n",
+    "- **Subject difficulty analysis** and curriculum effectiveness\n",
+    "- **Grade level progression** and academic growth\n",
+    "- **Engagement correlations** and intervention opportunities"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Student Performance Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+-----------------+---------+--------------+-------------------+-----------+\n",
+       "|student_id|total_assessments|avg_score|avg_engagement|avg_completion_time|grade_level|\n",
+       "+----------+-----------------+---------+--------------+-------------------+-----------+\n",
+       "| STU002351|               15|    84.67|          68.2|              55.49|  5th Grade|\n",
+       "| STU001691|               25|    83.65|         75.04|              54.83|  4th Grade|\n",
+       "| STU002992|               23|     82.1|         74.87|              54.35|  4th Grade|\n",
+       "| STU001644|               17|    80.76|          69.0|              56.12|  5th Grade|\n",
+       "| STU001131|               16|    80.72|         68.31|              50.13|  7th Grade|\n",
+       "| STU001347|               15|     80.6|          71.8|              55.47|  7th Grade|\n",
+       "| STU000282|               16|    80.19|         66.06|              53.04|  5th Grade|\n",
+       "| STU000129|               15|    80.17|          72.8|              52.55|  5th Grade|\n",
+       "| STU001565|               22|    80.09|         66.95|              53.57|  2nd Grade|\n",
+       "| STU002167|               20|    80.04|          75.3|              53.31|  4th Grade|\n",
+       "+----------+-----------------+---------+--------------+-------------------+-----------+\n",
+       "\n",
+       "\n",
+       "=== Subject Performance Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+------------------+-----------------+---------+-------------------+--------------+---------------+\n",
+       "|           subject|total_assessments|avg_score|avg_completion_time|avg_engagement|unique_students|\n",
+       "+------------------+-----------------+---------+-------------------+--------------+---------------+\n",
+       "|Physical Education|            11307|    91.75|              41.32|         69.94|           2910|\n",
+       "|               Art|            11268|    82.93|              46.66|         69.98|           2922|\n",
+       "|           English|            11150|    64.39|              62.29|         70.03|           2939|\n",
+       "|           History|            11269|    52.47|              56.92|         69.64|           2923|\n",
+       "|              Math|            11267|    51.97|              77.57|         70.03|           2914|\n",
+       "|           Science|            11253|    45.72|               72.4|         70.06|           2935|\n",
+       "+------------------+-----------------+---------+-------------------+--------------+---------------+\n",
+       "\n",
+       "\n",
+       "=== Grade Level Performance ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+------------+-----------------+---------+--------------+---------------+\n",
+       "| grade_level|total_assessments|avg_score|avg_engagement|unique_students|\n",
+       "+------------+-----------------+---------+--------------+---------------+\n",
+       "|Kindergarten|             4959|    60.22|         69.78|            219|\n",
+       "|   1st Grade|             5312|    63.47|         69.97|            235|\n",
+       "|   2nd Grade|             4908|    67.64|         69.88|            214|\n",
+       "|   3rd Grade|             5441|    68.08|          69.7|            251|\n",
+       "|   4th Grade|             5242|    70.34|         70.13|            235|\n",
+       "|   5th Grade|             5241|     71.8|         69.62|            229|\n",
+       "|   6th Grade|             4341|    66.94|         69.49|            191|\n",
+       "|   7th Grade|             5054|    66.42|         70.47|            222|\n",
+       "|   8th Grade|             5277|    64.84|         70.27|            237|\n",
+       "|   9th Grade|             5340|    63.39|          70.2|            240|\n",
+       "|  10th Grade|             5884|    61.88|         69.64|            260|\n",
+       "|  11th Grade|             5388|    60.38|         70.17|            239|\n",
+       "|  12th Grade|             5127|    58.81|         69.91|            228|\n",
+       "+------------+-----------------+---------+--------------+---------------+\n",
+       "\n",
+       "\n",
+       "=== Engagement vs Performance Correlation ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------------+----------------+---------+-------------------+\n",
+       "| engagement_level|assessment_count|avg_score|avg_completion_time|\n",
+       "+-----------------+----------------+---------+-------------------+\n",
+       "|  High Engagement|           23188|    68.78|               59.6|\n",
+       "|Medium Engagement|           22098|    64.93|              59.48|\n",
+       "|   Low Engagement|           22228|     60.8|              59.44|\n",
+       "+-----------------+----------------+---------+-------------------+\n",
+       "\n",
+       "\n",
+       "=== Monthly Academic Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------+-----------------+---------+--------------+---------------+\n",
+       "|  month|total_assessments|avg_score|avg_engagement|active_students|\n",
+       "+-------+-----------------+---------+--------------+---------------+\n",
+       "|2024-01|             5706|    65.35|         69.86|           2545|\n",
+       "|2024-02|             5212|    64.94|         69.78|           2474|\n",
+       "|2024-03|             5650|     64.9|         69.78|           2540|\n",
+       "|2024-04|             5480|    64.95|         70.09|           2524|\n",
+       "|2024-05|             5869|    64.83|         69.88|           2591|\n",
+       "|2024-06|             5621|    64.72|         69.91|           2557|\n",
+       "|2024-07|             5632|    65.03|         70.03|           2530|\n",
+       "|2024-08|             5739|    65.47|         69.84|           2543|\n",
+       "|2024-09|             5527|    64.63|         70.23|           2505|\n",
+       "|2024-10|             5757|    64.85|          70.0|           2548|\n",
+       "|2024-11|             5639|    64.53|         70.43|           2549|\n",
+       "|2024-12|             5682|     64.5|         69.54|           2545|\n",
+       "+-------+-----------------+---------+--------------+---------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Analyze clustering effectiveness and education insights\n",
+    "\n",
+    "\n",
+    "# Student performance analysis\n",
+    "\n",
+    "print(\"=== Student Performance Analysis ===\")\n",
+    "\n",
+    "student_performance = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT student_id, COUNT(*) as total_assessments,\n",
+    "\n",
+    "       ROUND(AVG(score), 2) as avg_score,\n",
+    "\n",
+    "       ROUND(AVG(engagement_score), 2) as avg_engagement,\n",
+    "\n",
+    "       ROUND(AVG(completion_time), 2) as avg_completion_time,\n",
+    "\n",
+    "       grade_level\n",
+    "\n",
+    "FROM education.analytics.student_assessments\n",
+    "\n",
+    "GROUP BY student_id, grade_level\n",
+    "\n",
+    "ORDER BY avg_score DESC\n",
+    "\n",
+    "LIMIT 10\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "student_performance.show()\n",
+    "\n",
+    "\n",
+    "# Subject performance analysis\n",
+    "\n",
+    "print(\"\\n=== Subject Performance Analysis ===\")\n",
+    "\n",
+    "subject_analysis = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT subject, COUNT(*) as total_assessments,\n",
+    "\n",
+    "       ROUND(AVG(score), 2) as avg_score,\n",
+    "\n",
+    "       ROUND(AVG(completion_time), 2) as avg_completion_time,\n",
+    "\n",
+    "       ROUND(AVG(engagement_score), 2) as avg_engagement,\n",
+    "\n",
+    "       COUNT(DISTINCT student_id) as unique_students\n",
+    "\n",
+    "FROM education.analytics.student_assessments\n",
+    "\n",
+    "GROUP BY subject\n",
+    "\n",
+    "ORDER BY avg_score DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "subject_analysis.show()\n",
+    "\n",
+    "\n",
+    "# Grade level performance\n",
+    "\n",
+    "print(\"\\n=== Grade Level Performance ===\")\n",
+    "\n",
+    "grade_performance = spark.sql(\"\"\"\n",
+    "\n",
+    "\n",
+    "SELECT \n",
+    "    grade_level, \n",
+    "    COUNT(*) AS total_assessments,\n",
+    "    ROUND(AVG(score), 2) AS avg_score,\n",
+    "    ROUND(AVG(engagement_score), 2) AS avg_engagement,\n",
+    "    COUNT(DISTINCT student_id) AS unique_students\n",
+    "FROM education.analytics.student_assessments\n",
+    "GROUP BY grade_level\n",
+    "ORDER BY \n",
+    "    CASE \n",
+    "        WHEN grade_level = 'Kindergarten' THEN 0\n",
+    "        ELSE CAST(REGEXP_REPLACE(grade_level, '[^0-9]', '') AS INT)\n",
+    "    END;\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "grade_performance.show()\n",
+    "\n",
+    "\n",
+    "# Engagement vs performance correlation\n",
+    "\n",
+    "print(\"\\n=== Engagement vs Performance Correlation ===\")\n",
+    "\n",
+    "engagement_correlation = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT \n",
+    "\n",
+    "    CASE \n",
+    "\n",
+    "        WHEN engagement_score >= 80 THEN 'High Engagement'\n",
+    "\n",
+    "        WHEN engagement_score >= 60 THEN 'Medium Engagement'\n",
+    "\n",
+    "        WHEN engagement_score >= 40 THEN 'Low Engagement'\n",
+    "\n",
+    "        ELSE 'Very Low Engagement'\n",
+    "\n",
+    "    END as engagement_level,\n",
+    "\n",
+    "    COUNT(*) as assessment_count,\n",
+    "\n",
+    "    ROUND(AVG(score), 2) as avg_score,\n",
+    "\n",
+    "    ROUND(AVG(completion_time), 2) as avg_completion_time\n",
+    "\n",
+    "FROM education.analytics.student_assessments\n",
+    "\n",
+    "GROUP BY \n",
+    "\n",
+    "    CASE \n",
+    "\n",
+    "        WHEN engagement_score >= 80 THEN 'High Engagement'\n",
+    "\n",
+    "        WHEN engagement_score >= 60 THEN 'Medium Engagement'\n",
+    "\n",
+    "        WHEN engagement_score >= 40 THEN 'Low Engagement'\n",
+    "\n",
+    "        ELSE 'Very Low Engagement'\n",
+    "\n",
+    "    END\n",
+    "\n",
+    "ORDER BY avg_score DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "engagement_correlation.show()\n",
+    "\n",
+    "\n",
+    "# Monthly academic trends\n",
+    "\n",
+    "print(\"\\n=== Monthly Academic Trends ===\")\n",
+    "\n",
+    "monthly_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT DATE_FORMAT(assessment_date, 'yyyy-MM') as month,\n",
+    "\n",
+    "       COUNT(*) as total_assessments,\n",
+    "\n",
+    "       ROUND(AVG(score), 2) as avg_score,\n",
+    "\n",
+    "       ROUND(AVG(engagement_score), 2) as avg_engagement,\n",
+    "\n",
+    "       COUNT(DISTINCT student_id) as active_students\n",
+    "\n",
+    "FROM education.analytics.student_assessments\n",
+    "\n",
+    "GROUP BY DATE_FORMAT(assessment_date, 'yyyy-MM')\n",
+    "\n",
+    "ORDER BY month\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "monthly_trends.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways: Delta Liquid Clustering in AIDP\n",
+    "\n",
+    "### What We Demonstrated\n",
+    "\n",
+    "1. **Automatic Optimization**: Created a table with `CLUSTER BY (student_id, assessment_date)` and let Delta automatically optimize data layout\n",
+    "\n",
+    "2. **Performance Benefits**: Queries on clustered columns (student_id, assessment_date) are significantly faster due to data locality\n",
+    "\n",
+    "3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically\n",
+    "\n",
+    "4. **Real-World Use Case**: Education analytics where student performance tracking and learning analytics are critical\n",
+    "\n",
+    "### AIDP Advantages\n",
+    "\n",
+    "- **Unified Analytics**: Seamlessly integrates with other AIDP services\n",
+    "- **Governance**: Catalog and schema isolation for education data\n",
+    "- **Performance**: Optimized for both OLAP and OLTP workloads\n",
+    "- **Scalability**: Handles education-scale data volumes effortlessly\n",
+    "\n",
+    "### Best Practices for Liquid Clustering\n",
+    "\n",
+    "1. **Choose clustering columns** based on your most common query patterns\n",
+    "2. **Start with 1-4 columns** - too many can reduce effectiveness\n",
+    "3. **Consider cardinality** - high-cardinality columns work best\n",
+    "4. **Monitor and adjust** as query patterns evolve\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Explore other AIDP features like AI/ML integration\n",
+    "- Try liquid clustering with different column combinations\n",
+    "- Scale up to larger education datasets\n",
+    "- Integrate with real LMS systems and assessment platforms\n",
+    "\n",
+    "This notebook demonstrates how Oracle AI Data Platform makes advanced education analytics accessible while maintaining enterprise-grade performance and governance."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Notebooks/liquid_clustering/energy_delta_liquid_clustering_demo.ipynb b/Notebooks/liquid_clustering/energy_delta_liquid_clustering_demo.ipynb
new file mode 100644
index 0000000..bd8a724
--- /dev/null
+++ b/Notebooks/liquid_clustering/energy_delta_liquid_clustering_demo.ipynb
@@ -0,0 +1,1097 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Energy: Delta Liquid Clustering Demo\n",
+    "\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "\n",
+    "This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using an energy and utilities analytics use case. Liquid clustering automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.\n",
+    "\n",
+    "### What is Liquid Clustering?\n",
+    "\n",
+    "Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:\n",
+    "\n",
+    "- **Automatic optimization**: No manual tuning required\n",
+    "- **Improved query performance**: Faster queries on clustered columns\n",
+    "- **Reduced maintenance**: No need for manual repartitioning\n",
+    "- **Adaptive clustering**: Adjusts as data patterns change\n",
+    "\n",
+    "### Use Case: Smart Grid Monitoring and Energy Consumption Analytics\n",
+    "\n",
+    "We'll analyze energy consumption and smart grid performance data. Our clustering strategy will optimize for:\n",
+    "\n",
+    "- **Meter-specific queries**: Fast lookups by meter ID\n",
+    "- **Time-based analysis**: Efficient filtering by reading date and time\n",
+    "- **Consumption patterns**: Quick aggregation by location and energy type\n",
+    "\n",
+    "### AIDP Environment Setup\n",
+    "\n",
+    "This notebook leverages the existing Spark session in your AIDP environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Energy catalog and analytics schema created successfully!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create energy catalog and analytics schema\n",
+    "\n",
+    "# In AIDP, catalogs provide data isolation and governance\n",
+    "\n",
+    "spark.sql(\"CREATE CATALOG IF NOT EXISTS energy\")\n",
+    "\n",
+    "spark.sql(\"CREATE SCHEMA IF NOT EXISTS energy.analytics\")\n",
+    "\n",
+    "print(\"Energy catalog and analytics schema created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create Delta Table with Liquid Clustering\n",
+    "\n",
+    "### Table Design\n",
+    "\n",
+    "Our `energy_readings` table will store:\n",
+    "\n",
+    "- **meter_id**: Unique smart meter identifier\n",
+    "- **reading_date**: Date and time of meter reading\n",
+    "- **energy_type**: Type (Electricity, Gas, Water, Solar)\n",
+    "- **consumption**: Energy consumed (kWh, therms, gallons)\n",
+    "- **location**: Geographic location/region\n",
+    "- **peak_demand**: Peak usage during interval\n",
+    "- **efficiency_rating**: System efficiency (0-100)\n",
+    "\n",
+    "### Clustering Strategy\n",
+    "\n",
+    "We'll cluster by `meter_id` and `reading_date` because:\n",
+    "\n",
+    "- **meter_id**: Meters generate regular readings, grouping consumption history together\n",
+    "- **reading_date**: Time-based queries are critical for billing cycles, demand analysis, and seasonal patterns\n",
+    "- This combination optimizes for both meter monitoring and temporal energy consumption analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Delta table with liquid clustering created successfully!\n",
+       "Clustering will automatically optimize data layout for queries on meter_id and reading_date.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create Delta table with liquid clustering\n",
+    "\n",
+    "# CLUSTER BY defines the columns for automatic optimization\n",
+    "\n",
+    "spark.sql(\"\"\"\n",
+    "\n",
+    "CREATE TABLE IF NOT EXISTS energy.analytics.energy_readings (\n",
+    "\n",
+    "    meter_id STRING,\n",
+    "\n",
+    "    reading_date TIMESTAMP,\n",
+    "\n",
+    "    energy_type STRING,\n",
+    "\n",
+    "    consumption DECIMAL(10,3),\n",
+    "\n",
+    "    location STRING,\n",
+    "\n",
+    "    peak_demand DECIMAL(8,2),\n",
+    "\n",
+    "    efficiency_rating INT\n",
+    "\n",
+    ")\n",
+    "\n",
+    "USING DELTA\n",
+    "\n",
+    "CLUSTER BY (meter_id, reading_date)\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"Delta table with liquid clustering created successfully!\")\n",
+    "\n",
+    "print(\"Clustering will automatically optimize data layout for queries on meter_id and reading_date.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Generate Energy Sample Data\n",
+    "\n",
+    "### Data Generation Strategy\n",
+    "\n",
+    "We'll create realistic energy consumption data including:\n",
+    "\n",
+    "- **2,000 smart meters** with hourly readings over time\n",
+    "- **Energy types**: Electricity, Natural Gas, Water, Solar generation\n",
+    "- **Realistic consumption patterns**: Seasonal variations, peak usage times, efficiency differences\n",
+    "- **Geographic diversity**: Different locations with varying consumption profiles\n",
+    "\n",
+    "### Why This Data Pattern?\n",
+    "\n",
+    "This data simulates real energy scenarios where:\n",
+    "\n",
+    "- Consumption varies by time of day and season\n",
+    "- Peak demand impacts grid stability\n",
+    "- Efficiency ratings affect sustainability goals\n",
+    "- Geographic patterns drive infrastructure planning\n",
+    "- Real-time monitoring enables demand response programs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Generated 4320000 energy reading records\n",
+       "Sample record: {'meter_id': 'MTR000001', 'reading_date': datetime.datetime(2024, 1, 1, 0, 0), 'energy_type': 'Solar', 'consumption': -8.397, 'location': 'Residential_NYC', 'peak_demand': 11.81, 'efficiency_rating': 80}\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Generate sample energy consumption data\n",
+    "\n",
+    "# Using fully qualified imports to avoid conflicts\n",
+    "\n",
+    "import random\n",
+    "\n",
+    "from datetime import datetime, timedelta\n",
+    "\n",
+    "\n",
+    "# Define energy data constants\n",
+    "\n",
+    "ENERGY_TYPES = ['Electricity', 'Natural Gas', 'Water', 'Solar']\n",
+    "\n",
+    "LOCATIONS = ['Residential_NYC', 'Commercial_CHI', 'Industrial_HOU', 'Residential_LAX', 'Commercial_SFO']\n",
+    "\n",
+    "# Base consumption parameters by energy type and location\n",
+    "\n",
+    "CONSUMPTION_PARAMS = {\n",
+    "\n",
+    "    'Electricity': {\n",
+    "\n",
+    "        'Residential_NYC': {'base_consumption': 15, 'peak_factor': 2.5, 'efficiency': 85},\n",
+    "\n",
+    "        'Commercial_CHI': {'base_consumption': 150, 'peak_factor': 3.0, 'efficiency': 78},\n",
+    "\n",
+    "        'Industrial_HOU': {'base_consumption': 500, 'peak_factor': 2.2, 'efficiency': 92},\n",
+    "\n",
+    "        'Residential_LAX': {'base_consumption': 12, 'peak_factor': 2.8, 'efficiency': 88},\n",
+    "\n",
+    "        'Commercial_SFO': {'base_consumption': 180, 'peak_factor': 2.7, 'efficiency': 82}\n",
+    "\n",
+    "    },\n",
+    "\n",
+    "    'Natural Gas': {\n",
+    "\n",
+    "        'Residential_NYC': {'base_consumption': 25, 'peak_factor': 1.8, 'efficiency': 90},\n",
+    "\n",
+    "        'Commercial_CHI': {'base_consumption': 80, 'peak_factor': 2.1, 'efficiency': 85},\n",
+    "\n",
+    "        'Industrial_HOU': {'base_consumption': 200, 'peak_factor': 1.9, 'efficiency': 95},\n",
+    "\n",
+    "        'Residential_LAX': {'base_consumption': 20, 'peak_factor': 2.0, 'efficiency': 87},\n",
+    "\n",
+    "        'Commercial_SFO': {'base_consumption': 95, 'peak_factor': 2.3, 'efficiency': 83}\n",
+    "\n",
+    "    },\n",
+    "\n",
+    "    'Water': {\n",
+    "\n",
+    "        'Residential_NYC': {'base_consumption': 180, 'peak_factor': 1.5, 'efficiency': 88},\n",
+    "\n",
+    "        'Commercial_CHI': {'base_consumption': 450, 'peak_factor': 1.7, 'efficiency': 82},\n",
+    "\n",
+    "        'Industrial_HOU': {'base_consumption': 1200, 'peak_factor': 1.6, 'efficiency': 91},\n",
+    "\n",
+    "        'Residential_LAX': {'base_consumption': 160, 'peak_factor': 1.8, 'efficiency': 85},\n",
+    "\n",
+    "        'Commercial_SFO': {'base_consumption': 380, 'peak_factor': 1.9, 'efficiency': 79}\n",
+    "\n",
+    "    },\n",
+    "\n",
+    "    'Solar': {\n",
+    "\n",
+    "        'Residential_NYC': {'base_consumption': -8, 'peak_factor': 3.5, 'efficiency': 78},\n",
+    "\n",
+    "        'Commercial_CHI': {'base_consumption': -75, 'peak_factor': 4.0, 'efficiency': 85},\n",
+    "\n",
+    "        'Industrial_HOU': {'base_consumption': -250, 'peak_factor': 3.8, 'efficiency': 88},\n",
+    "\n",
+    "        'Residential_LAX': {'base_consumption': -12, 'peak_factor': 4.2, 'efficiency': 82},\n",
+    "\n",
+    "        'Commercial_SFO': {'base_consumption': -95, 'peak_factor': 3.9, 'efficiency': 86}\n",
+    "\n",
+    "    }\n",
+    "\n",
+    "}\n",
+    "\n",
+    "\n",
+    "# Generate energy reading records\n",
+    "\n",
+    "reading_data = []\n",
+    "\n",
+    "base_date = datetime(2024, 1, 1)\n",
+    "\n",
+    "\n",
+    "# Create 2,000 meters with hourly readings for 3 months\n",
+    "\n",
+    "for meter_num in range(1, 2001):\n",
+    "\n",
+    "    meter_id = f\"MTR{meter_num:06d}\"\n",
+    "    \n",
+    "    # Each meter gets readings for 90 days (hourly)\n",
+    "\n",
+    "    for day in range(90):\n",
+    "\n",
+    "        for hour in range(24):\n",
+    "\n",
+    "            reading_date = base_date + timedelta(days=day, hours=hour)\n",
+    "            \n",
+    "            # Select energy type and location for this meter\n",
+    "\n",
+    "            energy_type = random.choice(ENERGY_TYPES)\n",
+    "\n",
+    "            location = random.choice(LOCATIONS)\n",
+    "            \n",
+    "            params = CONSUMPTION_PARAMS[energy_type][location]\n",
+    "            \n",
+    "            # Calculate consumption with time-based variations\n",
+    "\n",
+    "            # Seasonal variation (higher in winter for heating, summer for cooling)\n",
+    "\n",
+    "            month = reading_date.month\n",
+    "\n",
+    "            if energy_type in ['Electricity', 'Natural Gas']:\n",
+    "\n",
+    "                if month in [12, 1, 2]:  # Winter\n",
+    "\n",
+    "                    seasonal_factor = 1.4\n",
+    "\n",
+    "                elif month in [6, 7, 8]:  # Summer\n",
+    "\n",
+    "                    seasonal_factor = 1.3\n",
+    "\n",
+    "                else:\n",
+    "\n",
+    "                    seasonal_factor = 1.0\n",
+    "\n",
+    "            else:\n",
+    "\n",
+    "                seasonal_factor = 1.0\n",
+    "            \n",
+    "            # Time-of-day variation\n",
+    "\n",
+    "            hour_factor = 1.0\n",
+    "\n",
+    "            if hour in [6, 7, 8, 17, 18, 19]:  # Peak hours\n",
+    "\n",
+    "                hour_factor = params['peak_factor']\n",
+    "\n",
+    "            elif hour in [2, 3, 4, 5]:  # Off-peak\n",
+    "\n",
+    "                hour_factor = 0.4\n",
+    "\n",
+    "            \n",
+    "            # Calculate consumption\n",
+    "\n",
+    "            consumption_variation = random.uniform(0.8, 1.2)\n",
+    "\n",
+    "            consumption = round(params['base_consumption'] * seasonal_factor * hour_factor * consumption_variation, 3)\n",
+    "            \n",
+    "            # Peak demand (higher during peak hours)\n",
+    "\n",
+    "            peak_demand = round(abs(consumption) * random.uniform(1.1, 1.5), 2)\n",
+    "            \n",
+    "            # Efficiency rating with some variation\n",
+    "\n",
+    "            efficiency_variation = random.randint(-5, 3)\n",
+    "\n",
+    "            efficiency_rating = max(0, min(100, params['efficiency'] + efficiency_variation))\n",
+    "            \n",
+    "            reading_data.append({\n",
+    "\n",
+    "                \"meter_id\": meter_id,\n",
+    "\n",
+    "                \"reading_date\": reading_date,\n",
+    "\n",
+    "                \"energy_type\": energy_type,\n",
+    "\n",
+    "                \"consumption\": consumption,\n",
+    "\n",
+    "                \"location\": location,\n",
+    "\n",
+    "                \"peak_demand\": peak_demand,\n",
+    "\n",
+    "                \"efficiency_rating\": efficiency_rating\n",
+    "\n",
+    "            })\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(f\"Generated {len(reading_data)} energy reading records\")\n",
+    "\n",
+    "print(\"Sample record:\", reading_data[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Insert Data Using PySpark\n",
+    "\n",
+    "### Data Insertion Strategy\n",
+    "\n",
+    "We'll use PySpark to:\n",
+    "\n",
+    "1. **Create DataFrame** from our generated data\n",
+    "2. **Insert into Delta table** with liquid clustering\n",
+    "3. **Verify the insertion** with a sample query\n",
+    "\n",
+    "### Why PySpark for Insertion?\n",
+    "\n",
+    "- **Distributed processing**: Handles large datasets efficiently\n",
+    "- **Type safety**: Ensures data integrity\n",
+    "- **Optimization**: Leverages Spark's query optimization\n",
+    "- **Liquid clustering**: Automatically applies clustering during insertion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataFrame Schema:\n",
+       "root\n",
+       " |-- consumption: double (nullable = true)\n",
+       " |-- efficiency_rating: long (nullable = true)\n",
+       " |-- energy_type: string (nullable = true)\n",
+       " |-- location: string (nullable = true)\n",
+       " |-- meter_id: string (nullable = true)\n",
+       " |-- peak_demand: double (nullable = true)\n",
+       " |-- reading_date: timestamp (nullable = true)\n",
+       "\n",
+       "\n",
+       "Sample Data:\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+-----------------+-----------+---------------+---------+-----------+-------------------+\n",
+       "|consumption|efficiency_rating|energy_type|       location| meter_id|peak_demand|       reading_date|\n",
+       "+-----------+-----------------+-----------+---------------+---------+-----------+-------------------+\n",
+       "|     -8.397|               80|      Solar|Residential_NYC|MTR000001|      11.81|2024-01-01 00:00:00|\n",
+       "|    -13.923|               83|      Solar|Residential_LAX|MTR000001|      18.41|2024-01-01 01:00:00|\n",
+       "|    113.538|               83|Electricity| Commercial_SFO|MTR000001|     125.27|2024-01-01 02:00:00|\n",
+       "|    145.708|               78|      Water| Commercial_CHI|MTR000001|     196.87|2024-01-01 03:00:00|\n",
+       "|    489.841|               86|      Water| Industrial_HOU|MTR000001|     611.02|2024-01-01 04:00:00|\n",
+       "+-----------+-----------------+-----------+---------------+---------+-----------+-------------------+\n",
+       "only showing top 5 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "Successfully inserted 4320000 records into energy.analytics.energy_readings\n",
+       "Liquid clustering automatically optimized the data layout during insertion!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Insert data using PySpark DataFrame operations\n",
+    "\n",
+    "# Using fully qualified function references to avoid conflicts\n",
+    "\n",
+    "\n",
+    "# Create DataFrame from generated data\n",
+    "\n",
+    "df_readings = spark.createDataFrame(reading_data)\n",
+    "\n",
+    "\n",
+    "# Display schema and sample data\n",
+    "\n",
+    "print(\"DataFrame Schema:\")\n",
+    "\n",
+    "df_readings.printSchema()\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(\"\\nSample Data:\")\n",
+    "\n",
+    "df_readings.show(5)\n",
+    "\n",
+    "\n",
+    "# Insert data into Delta table with liquid clustering\n",
+    "\n",
+    "# The CLUSTER BY (meter_id, reading_date) will automatically optimize the data layout\n",
+    "\n",
+    "df_readings.write.mode(\"overwrite\").saveAsTable(\"energy.analytics.energy_readings\")\n",
+    "\n",
+    "\n",
+    "print(f\"\\nSuccessfully inserted {df_readings.count()} records into energy.analytics.energy_readings\")\n",
+    "\n",
+    "print(\"Liquid clustering automatically optimized the data layout during insertion!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Demonstrate Liquid Clustering Benefits\n",
+    "\n",
+    "### Query Performance Analysis\n",
+    "\n",
+    "Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:\n",
+    "\n",
+    "1. **Meter reading history** (clustered by meter_id)\n",
+    "2. **Time-based consumption analysis** (clustered by reading_date)\n",
+    "3. **Combined meter + time queries** (optimal for our clustering)\n",
+    "\n",
+    "### Expected Performance Benefits\n",
+    "\n",
+    "With liquid clustering, these queries should be significantly faster because:\n",
+    "\n",
+    "- **Data locality**: Related records are physically grouped together\n",
+    "- **Reduced I/O**: Less data needs to be read from disk\n",
+    "- **Automatic optimization**: No manual tuning required"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Query 1: Meter Reading History ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+---------+-------------------+-----------+-----------+-----------+-----------------+\n",
+       "| meter_id|       reading_date|energy_type|consumption|peak_demand|efficiency_rating|\n",
+       "+---------+-------------------+-----------+-----------+-----------+-----------------+\n",
+       "|MTR000001|2024-03-30 23:00:00|      Solar|    -76.497|     100.24|               83|\n",
+       "|MTR000001|2024-03-30 22:00:00|      Water|   1110.183|    1612.46|               89|\n",
+       "|MTR000001|2024-03-30 21:00:00|Natural Gas|     20.917|       24.2|               88|\n",
+       "|MTR000001|2024-03-30 20:00:00|      Water|   1129.645|    1513.78|               92|\n",
+       "|MTR000001|2024-03-30 19:00:00|      Solar|   -311.465|     355.24|               81|\n",
+       "|MTR000001|2024-03-30 18:00:00|Electricity|    1126.97|    1515.54|               92|\n",
+       "|MTR000001|2024-03-30 17:00:00|      Water|   2149.727|    2591.09|               88|\n",
+       "|MTR000001|2024-03-30 16:00:00|Electricity|    188.143|      262.5|               84|\n",
+       "|MTR000001|2024-03-30 15:00:00|Electricity|    579.727|     817.15|               95|\n",
+       "|MTR000001|2024-03-30 14:00:00|      Water|    404.661|     538.24|               78|\n",
+       "|MTR000001|2024-03-30 13:00:00|Electricity|    149.379|     182.73|               79|\n",
+       "|MTR000001|2024-03-30 12:00:00|Electricity|    149.926|     213.39|               76|\n",
+       "|MTR000001|2024-03-30 11:00:00|      Solar|   -243.733|     293.81|               88|\n",
+       "|MTR000001|2024-03-30 10:00:00|Natural Gas|    215.168|     322.07|               94|\n",
+       "|MTR000001|2024-03-30 09:00:00|      Solar|     -6.953|       7.84|               80|\n",
+       "|MTR000001|2024-03-30 08:00:00|Natural Gas|    233.304|     271.79|               84|\n",
+       "|MTR000001|2024-03-30 07:00:00|Natural Gas|    399.802|     554.49|               95|\n",
+       "|MTR000001|2024-03-30 06:00:00|Electricity|   1137.001|    1620.21|               91|\n",
+       "|MTR000001|2024-03-30 05:00:00|      Solar|     -3.402|       4.63|               80|\n",
+       "|MTR000001|2024-03-30 04:00:00|      Solar|    -29.498|      33.89|               85|\n",
+       "+---------+-------------------+-----------+-----------+-----------+-----------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Records found: 24\n",
+       "\n",
+       "=== Query 2: Recent Peak Demand Issues ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------------+---------+--------------+-----------+-----------+\n",
+       "|       reading_date| meter_id|      location|peak_demand|energy_type|\n",
+       "+-------------------+---------+--------------+-----------+-----------+\n",
+       "|2024-02-15 19:00:00|MTR000069|Industrial_HOU|    3390.26|      Water|\n",
+       "|2024-02-15 18:00:00|MTR001732|Industrial_HOU|    3384.62|      Water|\n",
+       "|2024-02-15 17:00:00|MTR001502|Industrial_HOU|    3349.98|      Water|\n",
+       "|2024-02-15 17:00:00|MTR000428|Industrial_HOU|    3312.01|      Water|\n",
+       "|2024-02-15 19:00:00|MTR001003|Industrial_HOU|    3282.39|      Water|\n",
+       "|2024-02-15 17:00:00|MTR000272|Industrial_HOU|    3274.09|      Water|\n",
+       "|2024-02-15 06:00:00|MTR000513|Industrial_HOU|    3273.61|      Water|\n",
+       "|2024-02-15 06:00:00|MTR000856|Industrial_HOU|     3258.9|      Water|\n",
+       "|2024-02-15 19:00:00|MTR000552|Industrial_HOU|    3237.69|      Water|\n",
+       "|2024-02-15 19:00:00|MTR000486|Industrial_HOU|    3231.59|      Water|\n",
+       "|2024-02-15 07:00:00|MTR001437|Industrial_HOU|    3226.26|      Water|\n",
+       "|2024-02-15 19:00:00|MTR000779|Industrial_HOU|    3217.28|      Water|\n",
+       "|2024-02-15 18:00:00|MTR001101|Industrial_HOU|    3204.85|      Water|\n",
+       "|2024-02-15 08:00:00|MTR001956|Industrial_HOU|    3203.88|      Water|\n",
+       "|2024-02-15 06:00:00|MTR000745|Industrial_HOU|    3199.02|      Water|\n",
+       "|2024-02-15 06:00:00|MTR001977|Industrial_HOU|    3197.43|      Water|\n",
+       "|2024-02-15 06:00:00|MTR001795|Industrial_HOU|     3196.6|      Water|\n",
+       "|2024-02-15 17:00:00|MTR001725|Industrial_HOU|    3188.73|      Water|\n",
+       "|2024-02-15 08:00:00|MTR000494|Industrial_HOU|    3185.26|      Water|\n",
+       "|2024-02-15 18:00:00|MTR001679|Industrial_HOU|    3178.45|      Water|\n",
+       "+-------------------+---------+--------------+-----------+-----------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Peak demand issues found: 23244\n",
+       "\n",
+       "=== Query 3: Meter Consumption Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+---------+-------------------+-----------+-----------+-----------------+\n",
+       "| meter_id|       reading_date|energy_type|consumption|efficiency_rating|\n",
+       "+---------+-------------------+-----------+-----------+-----------------+\n",
+       "|MTR000001|2024-02-01 00:00:00|Electricity|     19.291|               85|\n",
+       "|MTR000001|2024-02-01 01:00:00|Electricity|     18.316|               82|\n",
+       "|MTR000001|2024-02-01 02:00:00|Natural Gas|     45.141|               81|\n",
+       "|MTR000001|2024-02-01 03:00:00|Natural Gas|     12.816|               88|\n",
+       "|MTR000001|2024-02-01 04:00:00|      Water|    413.259|               88|\n",
+       "|MTR000001|2024-02-01 05:00:00|      Water|    124.545|               82|\n",
+       "|MTR000001|2024-02-01 06:00:00|Electricity|     59.509|               84|\n",
+       "|MTR000001|2024-02-01 07:00:00|Natural Gas|     267.85|               81|\n",
+       "|MTR000001|2024-02-01 08:00:00|Electricity|    597.628|               77|\n",
+       "|MTR000001|2024-02-01 09:00:00|Natural Gas|     32.049|               85|\n",
+       "|MTR000001|2024-02-01 10:00:00|      Solar|    -10.908|               80|\n",
+       "|MTR000001|2024-02-01 11:00:00|      Water|    432.552|               85|\n",
+       "|MTR000001|2024-02-01 12:00:00|Natural Gas|    261.021|               98|\n",
+       "|MTR000001|2024-02-01 13:00:00|      Water|    529.122|               81|\n",
+       "|MTR000001|2024-02-01 14:00:00|Electricity|    677.571|               87|\n",
+       "|MTR000001|2024-02-01 15:00:00|Natural Gas|      32.76|               86|\n",
+       "|MTR000001|2024-02-01 16:00:00|Natural Gas|    269.902|               91|\n",
+       "|MTR000001|2024-02-01 17:00:00|Natural Gas|     46.793|               87|\n",
+       "|MTR000001|2024-02-01 18:00:00|      Water|    344.857|               80|\n",
+       "|MTR000001|2024-02-01 19:00:00|      Water|    674.861|               76|\n",
+       "+---------+-------------------+-----------+-----------+-----------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Consumption trend records found: 50\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Demonstrate liquid clustering benefits with optimized queries\n",
+    "\n",
+    "\n",
+    "# Query 1: Meter reading history - benefits from meter_id clustering\n",
+    "\n",
+    "print(\"=== Query 1: Meter Reading History ===\")\n",
+    "\n",
+    "meter_history = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT meter_id, reading_date, energy_type, consumption, peak_demand, efficiency_rating\n",
+    "\n",
+    "FROM energy.analytics.energy_readings\n",
+    "\n",
+    "WHERE meter_id = 'MTR000001'\n",
+    "\n",
+    "ORDER BY reading_date DESC\n",
+    "\n",
+    "LIMIT 24\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "meter_history.show()\n",
+    "\n",
+    "print(f\"Records found: {meter_history.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 2: Time-based peak demand analysis - benefits from reading_date clustering\n",
+    "\n",
+    "print(\"\\n=== Query 2: Recent Peak Demand Issues ===\")\n",
+    "\n",
+    "peak_demand = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT reading_date, meter_id, location, peak_demand, energy_type\n",
+    "\n",
+    "FROM energy.analytics.energy_readings\n",
+    "\n",
+    "WHERE DATE(reading_date) = '2024-02-15' AND peak_demand > 200\n",
+    "\n",
+    "ORDER BY peak_demand DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "peak_demand.show()\n",
+    "\n",
+    "print(f\"Peak demand issues found: {peak_demand.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 3: Combined meter + time query - optimal for our clustering strategy\n",
+    "\n",
+    "print(\"\\n=== Query 3: Meter Consumption Trends ===\")\n",
+    "\n",
+    "consumption_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT meter_id, reading_date, energy_type, consumption, efficiency_rating\n",
+    "\n",
+    "FROM energy.analytics.energy_readings\n",
+    "\n",
+    "WHERE meter_id LIKE 'MTR000%' AND reading_date >= '2024-02-01'\n",
+    "\n",
+    "ORDER BY meter_id, reading_date\n",
+    "\n",
+    "LIMIT 50\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "consumption_trends.show()\n",
+    "\n",
+    "print(f\"Consumption trend records found: {consumption_trends.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Analyze Clustering Effectiveness\n",
+    "\n",
+    "### Understanding the Impact\n",
+    "\n",
+    "Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the energy insights possible with this optimized structure.\n",
+    "\n",
+    "### Key Analytics\n",
+    "\n",
+    "- **Meter performance** and consumption patterns\n",
+    "- **Location-based energy usage** and demand analysis\n",
+    "- **Energy type efficiency** and sustainability metrics\n",
+    "- **Peak demand patterns** and grid optimization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Meter Performance Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+---------+--------------+---------------+---------------+--------------+--------------------------+\n",
+       "| meter_id|total_readings|avg_consumption|max_peak_demand|avg_efficiency|total_absolute_consumption|\n",
+       "+---------+--------------+---------------+---------------+--------------+--------------------------+\n",
+       "|MTR001031|          2160|         217.68|        3438.64|         84.45|                618706.118|\n",
+       "|MTR001601|          2160|        212.481|        3290.42|         84.41|                615332.819|\n",
+       "|MTR000731|          2160|        208.168|        3384.62|         84.58|                613149.576|\n",
+       "|MTR000498|          2160|        218.411|        3269.85|         84.62|                  610811.8|\n",
+       "|MTR001677|          2160|        214.368|        3111.93|         84.52|                610499.368|\n",
+       "|MTR000756|          2160|        207.871|        3170.51|         84.52|                 609804.32|\n",
+       "|MTR000738|          2160|        212.499|        3368.23|         84.66|                608161.062|\n",
+       "|MTR001445|          2160|        211.693|        3419.94|         84.66|                605353.179|\n",
+       "|MTR000672|          2160|        199.036|        3233.82|         84.43|                605183.137|\n",
+       "|MTR000638|          2160|        204.707|        3185.66|         84.55|                605092.247|\n",
+       "+---------+--------------+---------------+---------------+--------------+--------------------------+\n",
+       "\n",
+       "\n",
+       "=== Location-Based Consumption Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+---------------+--------------+-----------------+---------------+--------------+-------------+\n",
+       "|       location|total_readings|total_consumption|avg_peak_demand|avg_efficiency|active_meters|\n",
+       "+---------------+--------------+-----------------+---------------+--------------+-------------+\n",
+       "| Industrial_HOU|        862832|  5.83683373747E8|         879.43|         90.51|         2000|\n",
+       "| Commercial_SFO|        862761|  2.22486257475E8|         335.26|          81.5|         2000|\n",
+       "| Commercial_CHI|        864849|  2.14778356813E8|         322.87|          81.5|         2000|\n",
+       "|Residential_NYC|        865090|   5.5338107161E7|          83.17|         84.25|         2000|\n",
+       "|Residential_LAX|        864468|   5.3089571892E7|          79.83|          84.5|         2000|\n",
+       "+---------------+--------------+-----------------+---------------+--------------+-------------+\n",
+       "\n",
+       "\n",
+       "=== Energy Type Efficiency Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+--------------+---------------+--------------+---------------+-------------+\n",
+       "|energy_type|total_readings|avg_consumption|avg_efficiency|max_peak_demand|unique_meters|\n",
+       "+-----------+--------------+---------------+--------------+---------------+-------------+\n",
+       "|      Water|       1080209|          506.2|          84.0|        3450.33|         2000|\n",
+       "|Electricity|       1080461|        274.209|         83.99|        2765.99|         2000|\n",
+       "|      Solar|       1079655|        141.955|          82.8|        1705.38|         2000|\n",
+       "|Natural Gas|       1079675|        123.221|          87.0|         955.52|         2000|\n",
+       "+-----------+--------------+---------------+--------------+---------------+-------------+\n",
+       "\n",
+       "\n",
+       "=== Daily Consumption Patterns ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+----+-----------------+---------------+-------------+\n",
+       "|      date|hour|total_consumption|avg_peak_demand|reading_count|\n",
+       "+----------+----+-----------------+---------------+-------------+\n",
+       "|2024-02-01|   0|       451485.469|         293.78|         2000|\n",
+       "|2024-02-01|   1|       456081.564|         296.89|         2000|\n",
+       "|2024-02-01|   2|       186286.651|         121.16|         2000|\n",
+       "|2024-02-01|   3|       184992.821|         120.54|         2000|\n",
+       "|2024-02-01|   4|        188674.42|         122.68|         2000|\n",
+       "|2024-02-01|   5|       186561.449|         122.52|         2000|\n",
+       "|2024-02-01|   6|        984331.46|         640.24|         2000|\n",
+       "|2024-02-01|   7|       981030.505|         639.02|         2000|\n",
+       "|2024-02-01|   8|       975953.168|         633.09|         2000|\n",
+       "|2024-02-01|   9|       466503.579|         303.06|         2000|\n",
+       "|2024-02-01|  10|       445455.596|         289.59|         2000|\n",
+       "|2024-02-01|  11|       467970.723|         305.52|         2000|\n",
+       "|2024-02-01|  12|       448383.798|         292.21|         2000|\n",
+       "|2024-02-01|  13|       455059.613|         297.21|         2000|\n",
+       "|2024-02-01|  14|       439676.638|          286.3|         2000|\n",
+       "|2024-02-01|  15|       448438.104|         291.13|         2000|\n",
+       "|2024-02-01|  16|       454646.561|         293.63|         2000|\n",
+       "|2024-02-01|  17|       992647.303|          643.7|         2000|\n",
+       "|2024-02-01|  18|       989921.013|         640.01|         2000|\n",
+       "|2024-02-01|  19|       975567.504|         635.82|         2000|\n",
+       "+----------+----+-----------------+---------------+-------------+\n",
+       "only showing top 20 rows\n",
+       "\n",
+       "\n",
+       "=== Monthly Consumption Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------+-------------------+---------------+--------------+-------------+\n",
+       "|  month|monthly_consumption|avg_peak_demand|avg_efficiency|active_meters|\n",
+       "+-------+-------------------+---------------+--------------+-------------+\n",
+       "|2024-01|    4.04155944396E8|          353.1|         84.45|         2000|\n",
+       "|2024-02|    3.78452396762E8|         353.45|         84.45|         2000|\n",
+       "|2024-03|     3.4676732593E8|         313.08|         84.45|         2000|\n",
+       "+-------+-------------------+---------------+--------------+-------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Analyze clustering effectiveness and energy insights\n",
+    "\n",
+    "\n",
+    "# Meter performance analysis\n",
+    "\n",
+    "print(\"=== Meter Performance Analysis ===\")\n",
+    "\n",
+    "meter_performance = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT meter_id, COUNT(*) as total_readings,\n",
+    "\n",
+    "       ROUND(AVG(consumption), 3) as avg_consumption,\n",
+    "\n",
+    "       ROUND(MAX(peak_demand), 2) as max_peak_demand,\n",
+    "\n",
+    "       ROUND(AVG(efficiency_rating), 2) as avg_efficiency,\n",
+    "\n",
+    "       ROUND(SUM(ABS(consumption)), 3) as total_absolute_consumption\n",
+    "\n",
+    "FROM energy.analytics.energy_readings\n",
+    "\n",
+    "GROUP BY meter_id\n",
+    "\n",
+    "ORDER BY total_absolute_consumption DESC\n",
+    "\n",
+    "LIMIT 10\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "meter_performance.show()\n",
+    "\n",
+    "\n",
+    "# Location-based consumption analysis\n",
+    "\n",
+    "print(\"\\n=== Location-Based Consumption Analysis ===\")\n",
+    "\n",
+    "location_analysis = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT location, COUNT(*) as total_readings,\n",
+    "\n",
+    "       ROUND(SUM(ABS(consumption)), 3) as total_consumption,\n",
+    "\n",
+    "       ROUND(AVG(peak_demand), 2) as avg_peak_demand,\n",
+    "\n",
+    "       ROUND(AVG(efficiency_rating), 2) as avg_efficiency,\n",
+    "\n",
+    "       COUNT(DISTINCT meter_id) as active_meters\n",
+    "\n",
+    "FROM energy.analytics.energy_readings\n",
+    "\n",
+    "GROUP BY location\n",
+    "\n",
+    "ORDER BY total_consumption DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "location_analysis.show()\n",
+    "\n",
+    "\n",
+    "# Energy type efficiency analysis\n",
+    "\n",
+    "print(\"\\n=== Energy Type Efficiency Analysis ===\")\n",
+    "\n",
+    "energy_efficiency = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT energy_type, COUNT(*) as total_readings,\n",
+    "\n",
+    "       ROUND(AVG(ABS(consumption)), 3) as avg_consumption,\n",
+    "\n",
+    "       ROUND(AVG(efficiency_rating), 2) as avg_efficiency,\n",
+    "\n",
+    "       ROUND(MAX(peak_demand), 2) as max_peak_demand,\n",
+    "\n",
+    "       COUNT(DISTINCT meter_id) as unique_meters\n",
+    "\n",
+    "FROM energy.analytics.energy_readings\n",
+    "\n",
+    "GROUP BY energy_type\n",
+    "\n",
+    "ORDER BY avg_consumption DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "energy_efficiency.show()\n",
+    "\n",
+    "\n",
+    "# Daily consumption patterns\n",
+    "\n",
+    "print(\"\\n=== Daily Consumption Patterns ===\")\n",
+    "\n",
+    "daily_patterns = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT DATE(reading_date) as date, HOUR(reading_date) as hour,\n",
+    "\n",
+    "       ROUND(SUM(ABS(consumption)), 3) as total_consumption,\n",
+    "\n",
+    "       ROUND(AVG(peak_demand), 2) as avg_peak_demand,\n",
+    "\n",
+    "       COUNT(*) as reading_count\n",
+    "\n",
+    "FROM energy.analytics.energy_readings\n",
+    "\n",
+    "WHERE DATE(reading_date) = '2024-02-01'\n",
+    "\n",
+    "GROUP BY DATE(reading_date), HOUR(reading_date)\n",
+    "\n",
+    "ORDER BY hour\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "daily_patterns.show()\n",
+    "\n",
+    "\n",
+    "# Monthly consumption trends\n",
+    "\n",
+    "print(\"\\n=== Monthly Consumption Trends ===\")\n",
+    "\n",
+    "monthly_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT DATE_FORMAT(reading_date, 'yyyy-MM') as month,\n",
+    "\n",
+    "       ROUND(SUM(ABS(consumption)), 3) as monthly_consumption,\n",
+    "\n",
+    "       ROUND(AVG(peak_demand), 2) as avg_peak_demand,\n",
+    "\n",
+    "       ROUND(AVG(efficiency_rating), 2) as avg_efficiency,\n",
+    "\n",
+    "       COUNT(DISTINCT meter_id) as active_meters\n",
+    "\n",
+    "FROM energy.analytics.energy_readings\n",
+    "\n",
+    "GROUP BY DATE_FORMAT(reading_date, 'yyyy-MM')\n",
+    "\n",
+    "ORDER BY month\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "monthly_trends.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways: Delta Liquid Clustering in AIDP\n",
+    "\n",
+    "### What We Demonstrated\n",
+    "\n",
+    "1. **Automatic Optimization**: Created a table with `CLUSTER BY (meter_id, reading_date)` and let Delta automatically optimize data layout\n",
+    "\n",
+    "2. **Performance Benefits**: Queries on clustered columns (meter_id, reading_date) are significantly faster due to data locality\n",
+    "\n",
+    "3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically\n",
+    "\n",
+    "4. **Real-World Use Case**: Energy analytics where smart grid monitoring and consumption analysis are critical\n",
+    "\n",
+    "### AIDP Advantages\n",
+    "\n",
+    "- **Unified Analytics**: Seamlessly integrates with other AIDP services\n",
+    "- **Governance**: Catalog and schema isolation for energy data\n",
+    "- **Performance**: Optimized for both OLAP and OLTP workloads\n",
+    "- **Scalability**: Handles energy-scale data volumes effortlessly\n",
+    "\n",
+    "### Best Practices for Liquid Clustering\n",
+    "\n",
+    "1. **Choose clustering columns** based on your most common query patterns\n",
+    "2. **Start with 1-4 columns** - too many can reduce effectiveness\n",
+    "3. **Consider cardinality** - high-cardinality columns work best\n",
+    "4. **Monitor and adjust** as query patterns evolve\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Explore other AIDP features like AI/ML integration\n",
+    "- Try liquid clustering with different column combinations\n",
+    "- Scale up to larger energy datasets\n",
+    "- Integrate with real smart meter and IoT sensor data\n",
+    "\n",
+    "This notebook demonstrates how Oracle AI Data Platform makes advanced energy analytics accessible while maintaining enterprise-grade performance and governance."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Notebooks/liquid_clustering/financial_services_delta_liquid_clustering_demo.ipynb b/Notebooks/liquid_clustering/financial_services_delta_liquid_clustering_demo.ipynb
new file mode 100644
index 0000000..17f1f68
--- /dev/null
+++ b/Notebooks/liquid_clustering/financial_services_delta_liquid_clustering_demo.ipynb
@@ -0,0 +1,948 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Financial Services: Delta Liquid Clustering Demo\n",
+    "\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "\n",
+    "This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using a financial services analytics use case. Liquid clustering automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.\n",
+    "\n",
+    "### What is Liquid Clustering?\n",
+    "\n",
+    "Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:\n",
+    "\n",
+    "- **Automatic optimization**: No manual tuning required\n",
+    "- **Improved query performance**: Faster queries on clustered columns\n",
+    "- **Reduced maintenance**: No need for manual repartitioning\n",
+    "- **Adaptive clustering**: Adjusts as data patterns change\n",
+    "\n",
+    "### Use Case: Transaction Fraud Detection and Customer Analytics\n",
+    "\n",
+    "We'll analyze financial transaction records from a bank. Our clustering strategy will optimize for:\n",
+    "\n",
+    "- **Customer-specific queries**: Fast lookups by account ID\n",
+    "- **Time-based analysis**: Efficient filtering by transaction date\n",
+    "- **Fraud pattern detection**: Quick aggregation by transaction type and risk scores\n",
+    "\n",
+    "### AIDP Environment Setup\n",
+    "\n",
+    "This notebook leverages the existing Spark session in your AIDP environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Financial services catalog and analytics schema created successfully!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create financial services catalog and analytics schema\n",
+    "\n",
+    "# In AIDP, catalogs provide data isolation and governance\n",
+    "\n",
+    "spark.sql(\"CREATE CATALOG IF NOT EXISTS finance\")\n",
+    "\n",
+    "spark.sql(\"CREATE SCHEMA IF NOT EXISTS finance.analytics\")\n",
+    "\n",
+    "print(\"Financial services catalog and analytics schema created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create Delta Table with Liquid Clustering\n",
+    "\n",
+    "### Table Design\n",
+    "\n",
+    "Our `account_transactions` table will store:\n",
+    "\n",
+    "- **account_id**: Unique account identifier\n",
+    "- **transaction_date**: Date and time of transaction\n",
+    "- **transaction_type**: Type (Deposit, Withdrawal, Transfer, Payment, etc.)\n",
+    "- **amount**: Transaction amount\n",
+    "- **merchant_category**: Merchant type (Retail, Restaurant, Online, etc.)\n",
+    "- **location**: Transaction location\n",
+    "- **risk_score**: Fraud risk assessment (0-100)\n",
+    "\n",
+    "### Clustering Strategy\n",
+    "\n",
+    "We'll cluster by `account_id` and `transaction_date` because:\n",
+    "\n",
+    "- **account_id**: Customers often have multiple transactions, grouping their financial activity together\n",
+    "- **transaction_date**: Time-based queries are critical for fraud detection, spending analysis, and regulatory reporting\n",
+    "- This combination optimizes for both customer account analysis and temporal fraud pattern detection"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Delta table with liquid clustering created successfully!\n",
+       "Clustering will automatically optimize data layout for queries on account_id and transaction_date.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create Delta table with liquid clustering\n",
+    "\n",
+    "# CLUSTER BY defines the columns for automatic optimization\n",
+    "\n",
+    "spark.sql(\"\"\"\n",
+    "\n",
+    "CREATE TABLE IF NOT EXISTS finance.analytics.account_transactions (\n",
+    "\n",
+    "    account_id STRING,\n",
+    "\n",
+    "    transaction_date TIMESTAMP,\n",
+    "\n",
+    "    transaction_type STRING,\n",
+    "\n",
+    "    amount DECIMAL(15,2),\n",
+    "\n",
+    "    merchant_category STRING,\n",
+    "\n",
+    "    location STRING,\n",
+    "\n",
+    "    risk_score INT\n",
+    "\n",
+    ")\n",
+    "\n",
+    "USING DELTA\n",
+    "\n",
+    "CLUSTER BY (account_id, transaction_date)\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"Delta table with liquid clustering created successfully!\")\n",
+    "\n",
+    "print(\"Clustering will automatically optimize data layout for queries on account_id and transaction_date.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Generate Financial Services Sample Data\n",
+    "\n",
+    "### Data Generation Strategy\n",
+    "\n",
+    "We'll create realistic financial transaction data including:\n",
+    "\n",
+    "- **5,000 accounts** with multiple transactions over time\n",
+    "- **Transaction types**: Deposits, withdrawals, transfers, payments, ATM withdrawals\n",
+    "- **Realistic temporal patterns**: Daily banking activity, weekend vs weekday patterns\n",
+    "- **Merchant categories**: Retail, restaurants, online shopping, utilities, entertainment\n",
+    "\n",
+    "### Why This Data Pattern?\n",
+    "\n",
+    "This data simulates real financial scenarios where:\n",
+    "\n",
+    "- Customers perform multiple transactions daily/weekly\n",
+    "- Fraud patterns emerge over time\n",
+    "- Regulatory reporting requires temporal analysis\n",
+    "- Risk scoring enables real-time fraud prevention\n",
+    "- Customer spending analysis drives personalized financial services"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Generated 150600 account transaction records\n",
+       "Sample record: {'account_id': 'ACC00000001', 'transaction_date': datetime.datetime(2024, 1, 5, 13, 0), 'transaction_type': 'ATM', 'amount': -412.88, 'merchant_category': 'Entertainment', 'location': 'ATM', 'risk_score': 27}\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Generate sample financial transaction data\n",
+    "\n",
+    "# Using fully qualified imports to avoid conflicts\n",
+    "\n",
+    "import random\n",
+    "\n",
+    "from datetime import datetime, timedelta\n",
+    "\n",
+    "\n",
+    "# Define financial data constants\n",
+    "\n",
+    "TRANSACTION_TYPES = ['Deposit', 'Withdrawal', 'Transfer', 'Payment', 'ATM']\n",
+    "\n",
+    "MERCHANT_CATEGORIES = ['Retail', 'Restaurant', 'Online', 'Utilities', 'Entertainment', 'Groceries', 'Healthcare', 'Transportation']\n",
+    "\n",
+    "LOCATIONS = ['New York, NY', 'Los Angeles, CA', 'Chicago, IL', 'Houston, TX', 'Miami, FL', 'Online', 'ATM']\n",
+    "\n",
+    "\n",
+    "# Generate account transaction records\n",
+    "\n",
+    "transaction_data = []\n",
+    "\n",
+    "base_date = datetime(2024, 1, 1)\n",
+    "\n",
+    "\n",
+    "# Create 5,000 accounts with 10-50 transactions each\n",
+    "\n",
+    "for account_num in range(1, 5001):\n",
+    "\n",
+    "    account_id = f\"ACC{account_num:08d}\"\n",
+    "    \n",
+    "    # Each account gets 10-50 transactions over 12 months\n",
+    "\n",
+    "    num_transactions = random.randint(10, 50)\n",
+    "    \n",
+    "    for i in range(num_transactions):\n",
+    "\n",
+    "        # Spread transactions over 12 months with realistic timing\n",
+    "\n",
+    "        days_offset = random.randint(0, 365)\n",
+    "\n",
+    "        hours_offset = random.randint(0, 23)\n",
+    "\n",
+    "        transaction_date = base_date + timedelta(days=days_offset, hours=hours_offset)\n",
+    "        \n",
+    "        # Select transaction type\n",
+    "\n",
+    "        transaction_type = random.choice(TRANSACTION_TYPES)\n",
+    "        \n",
+    "        # Amount based on transaction type\n",
+    "\n",
+    "        if transaction_type in ['Deposit', 'Transfer']:\n",
+    "\n",
+    "            amount = round(random.uniform(100, 10000), 2)\n",
+    "\n",
+    "        elif transaction_type == 'ATM':\n",
+    "\n",
+    "            amount = round(random.uniform(20, 500), 2) * -1\n",
+    "\n",
+    "        else:\n",
+    "\n",
+    "            amount = round(random.uniform(10, 2000), 2) * -1\n",
+    "        \n",
+    "        # Select merchant category and location\n",
+    "\n",
+    "        merchant_category = random.choice(MERCHANT_CATEGORIES)\n",
+    "\n",
+    "        if transaction_type == 'ATM':\n",
+    "\n",
+    "            location = 'ATM'\n",
+    "\n",
+    "        elif transaction_type == 'Online':\n",
+    "\n",
+    "            location = 'Online'\n",
+    "\n",
+    "        else:\n",
+    "\n",
+    "            location = random.choice(LOCATIONS)\n",
+    "        \n",
+    "        # Risk score (0-100, higher = more suspicious)\n",
+    "\n",
+    "        risk_score = random.randint(0, 100)\n",
+    "        \n",
+    "        transaction_data.append({\n",
+    "\n",
+    "            \"account_id\": account_id,\n",
+    "\n",
+    "            \"transaction_date\": transaction_date,\n",
+    "\n",
+    "            \"transaction_type\": transaction_type,\n",
+    "\n",
+    "            \"amount\": amount,\n",
+    "\n",
+    "            \"merchant_category\": merchant_category,\n",
+    "\n",
+    "            \"location\": location,\n",
+    "\n",
+    "            \"risk_score\": risk_score\n",
+    "\n",
+    "        })\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(f\"Generated {len(transaction_data)} account transaction records\")\n",
+    "\n",
+    "print(\"Sample record:\", transaction_data[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Insert Data Using PySpark\n",
+    "\n",
+    "### Data Insertion Strategy\n",
+    "\n",
+    "We'll use PySpark to:\n",
+    "\n",
+    "1. **Create DataFrame** from our generated data\n",
+    "2. **Insert into Delta table** with liquid clustering\n",
+    "3. **Verify the insertion** with a sample query\n",
+    "\n",
+    "### Why PySpark for Insertion?\n",
+    "\n",
+    "- **Distributed processing**: Handles large datasets efficiently\n",
+    "- **Type safety**: Ensures data integrity\n",
+    "- **Optimization**: Leverages Spark's query optimization\n",
+    "- **Liquid clustering**: Automatically applies clustering during insertion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataFrame Schema:\n",
+       "root\n",
+       " |-- account_id: string (nullable = true)\n",
+       " |-- amount: double (nullable = true)\n",
+       " |-- location: string (nullable = true)\n",
+       " |-- merchant_category: string (nullable = true)\n",
+       " |-- risk_score: long (nullable = true)\n",
+       " |-- transaction_date: timestamp (nullable = true)\n",
+       " |-- transaction_type: string (nullable = true)\n",
+       "\n",
+       "\n",
+       "Sample Data:\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+--------+-----------+-----------------+----------+-------------------+----------------+\n",
+       "| account_id|  amount|   location|merchant_category|risk_score|   transaction_date|transaction_type|\n",
+       "+-----------+--------+-----------+-----------------+----------+-------------------+----------------+\n",
+       "|ACC00000001| -412.88|        ATM|    Entertainment|        27|2024-01-05 13:00:00|             ATM|\n",
+       "|ACC00000001| -372.97|        ATM|    Entertainment|         3|2024-04-15 21:00:00|             ATM|\n",
+       "|ACC00000001|-1117.24|Houston, TX|   Transportation|        32|2024-01-16 12:00:00|      Withdrawal|\n",
+       "|ACC00000001| -1733.0|Houston, TX|       Restaurant|         8|2024-12-20 09:00:00|         Payment|\n",
+       "|ACC00000001| -164.06|        ATM|    Entertainment|         2|2024-02-12 12:00:00|             ATM|\n",
+       "+-----------+--------+-----------+-----------------+----------+-------------------+----------------+\n",
+       "only showing top 5 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "Successfully inserted 150600 records into finance.analytics.account_transactions\n",
+       "Liquid clustering automatically optimized the data layout during insertion!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Insert data using PySpark DataFrame operations\n",
+    "\n",
+    "# Using fully qualified function references to avoid conflicts\n",
+    "\n",
+    "\n",
+    "# Create DataFrame from generated data\n",
+    "\n",
+    "df_transactions = spark.createDataFrame(transaction_data)\n",
+    "\n",
+    "\n",
+    "# Display schema and sample data\n",
+    "\n",
+    "print(\"DataFrame Schema:\")\n",
+    "\n",
+    "df_transactions.printSchema()\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(\"\\nSample Data:\")\n",
+    "\n",
+    "df_transactions.show(5)\n",
+    "\n",
+    "\n",
+    "# Insert data into Delta table with liquid clustering\n",
+    "\n",
+    "# The CLUSTER BY (account_id, transaction_date) will automatically optimize the data layout\n",
+    "\n",
+    "df_transactions.write.mode(\"overwrite\").saveAsTable(\"finance.analytics.account_transactions\")\n",
+    "\n",
+    "\n",
+    "print(f\"\\nSuccessfully inserted {df_transactions.count()} records into finance.analytics.account_transactions\")\n",
+    "\n",
+    "print(\"Liquid clustering automatically optimized the data layout during insertion!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Demonstrate Liquid Clustering Benefits\n",
+    "\n",
+    "### Query Performance Analysis\n",
+    "\n",
+    "Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:\n",
+    "\n",
+    "1. **Account transaction history** (clustered by account_id)\n",
+    "2. **Time-based fraud analysis** (clustered by transaction_date)\n",
+    "3. **Combined account + time queries** (optimal for our clustering)\n",
+    "\n",
+    "### Expected Performance Benefits\n",
+    "\n",
+    "With liquid clustering, these queries should be significantly faster because:\n",
+    "\n",
+    "- **Data locality**: Related records are physically grouped together\n",
+    "- **Reduced I/O**: Less data needs to be read from disk\n",
+    "- **Automatic optimization**: No manual tuning required"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Query 1: Account Transaction History ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+-------------------+----------------+--------+-----------------+\n",
+       "| account_id|   transaction_date|transaction_type|  amount|merchant_category|\n",
+       "+-----------+-------------------+----------------+--------+-----------------+\n",
+       "|ACC00000001|2024-12-20 09:00:00|         Payment| -1733.0|       Restaurant|\n",
+       "|ACC00000001|2024-10-24 09:00:00|         Payment|-1689.16|       Healthcare|\n",
+       "|ACC00000001|2024-09-03 20:00:00|             ATM| -270.98|    Entertainment|\n",
+       "|ACC00000001|2024-06-28 15:00:00|        Transfer| 1288.76|       Healthcare|\n",
+       "|ACC00000001|2024-06-25 15:00:00|      Withdrawal|-1109.31|    Entertainment|\n",
+       "|ACC00000001|2024-05-31 21:00:00|      Withdrawal|-1323.84|    Entertainment|\n",
+       "|ACC00000001|2024-04-15 21:00:00|             ATM| -372.97|    Entertainment|\n",
+       "|ACC00000001|2024-04-05 17:00:00|      Withdrawal|-1532.56|           Online|\n",
+       "|ACC00000001|2024-03-11 06:00:00|         Deposit| 2533.68|       Restaurant|\n",
+       "|ACC00000001|2024-02-29 17:00:00|         Deposit| 3042.86|    Entertainment|\n",
+       "|ACC00000001|2024-02-12 12:00:00|             ATM| -164.06|    Entertainment|\n",
+       "|ACC00000001|2024-01-16 12:00:00|      Withdrawal|-1117.24|   Transportation|\n",
+       "|ACC00000001|2024-01-05 13:00:00|             ATM| -412.88|    Entertainment|\n",
+       "+-----------+-------------------+----------------+--------+-----------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Records found: 13\n",
+       "\n",
+       "=== Query 2: High-Risk Transactions Today ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------------+----------+----------------+------+----------+\n",
+       "|transaction_date|account_id|transaction_type|amount|risk_score|\n",
+       "+----------------+----------+----------------+------+----------+\n",
+       "+----------------+----------+----------------+------+----------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "High-risk transactions found: 0\n",
+       "\n",
+       "=== Query 3: Account Fraud Pattern Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+-------------------+----------------+--------+----------+\n",
+       "| account_id|   transaction_date|transaction_type|  amount|risk_score|\n",
+       "+-----------+-------------------+----------------+--------+----------+\n",
+       "|ACC00000010|2024-06-04 14:00:00|        Transfer| 1938.69|        89|\n",
+       "|ACC00000010|2024-07-11 20:00:00|         Deposit| 8968.96|        53|\n",
+       "|ACC00000010|2024-07-19 14:00:00|         Payment|-1926.25|        47|\n",
+       "|ACC00000010|2024-08-29 12:00:00|         Deposit| 9483.61|        28|\n",
+       "|ACC00000010|2024-09-20 23:00:00|        Transfer| 7191.52|        73|\n",
+       "|ACC00000010|2024-09-22 11:00:00|         Deposit| 5494.43|        25|\n",
+       "|ACC00000010|2024-10-05 21:00:00|        Transfer| 6472.15|         4|\n",
+       "|ACC00000010|2024-10-15 18:00:00|        Transfer| 5734.37|        90|\n",
+       "|ACC00000010|2024-11-24 09:00:00|         Deposit| 4922.75|        53|\n",
+       "|ACC00000010|2024-12-17 07:00:00|        Transfer| 5578.49|        63|\n",
+       "|ACC00000011|2024-06-11 18:00:00|         Payment| -500.16|        98|\n",
+       "|ACC00000011|2024-06-26 07:00:00|             ATM| -336.53|        89|\n",
+       "|ACC00000011|2024-08-26 02:00:00|        Transfer| 9392.47|        82|\n",
+       "|ACC00000011|2024-09-15 16:00:00|        Transfer| 1028.15|        54|\n",
+       "|ACC00000011|2024-09-16 21:00:00|         Payment|-1566.64|        92|\n",
+       "|ACC00000011|2024-09-22 08:00:00|         Deposit| 9293.03|        79|\n",
+       "|ACC00000011|2024-10-03 15:00:00|             ATM| -186.99|        31|\n",
+       "|ACC00000011|2024-10-29 14:00:00|         Deposit| 3884.05|        71|\n",
+       "|ACC00000011|2024-11-07 01:00:00|             ATM|  -160.3|        25|\n",
+       "|ACC00000011|2024-12-24 06:00:00|      Withdrawal| -284.68|         3|\n",
+       "+-----------+-------------------+----------------+--------+----------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Pattern records found: 135\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Demonstrate liquid clustering benefits with optimized queries\n",
+    "\n",
+    "\n",
+    "# Query 1: Account transaction history - benefits from account_id clustering\n",
+    "\n",
+    "print(\"=== Query 1: Account Transaction History ===\")\n",
+    "\n",
+    "account_history = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT account_id, transaction_date, transaction_type, amount, merchant_category\n",
+    "\n",
+    "FROM finance.analytics.account_transactions\n",
+    "\n",
+    "WHERE account_id = 'ACC00000001'\n",
+    "\n",
+    "ORDER BY transaction_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "account_history.show()\n",
+    "\n",
+    "print(f\"Records found: {account_history.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 2: Time-based fraud analysis - benefits from transaction_date clustering\n",
+    "\n",
+    "print(\"\\n=== Query 2: High-Risk Transactions Today ===\")\n",
+    "\n",
+    "high_risk_today = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT transaction_date, account_id, transaction_type, amount, risk_score\n",
+    "\n",
+    "FROM finance.analytics.account_transactions\n",
+    "\n",
+    "WHERE DATE(transaction_date) = CURRENT_DATE AND risk_score > 70\n",
+    "\n",
+    "ORDER BY risk_score DESC, transaction_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "high_risk_today.show()\n",
+    "\n",
+    "print(f\"High-risk transactions found: {high_risk_today.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 3: Combined account + time query - optimal for our clustering strategy\n",
+    "\n",
+    "print(\"\\n=== Query 3: Account Fraud Pattern Analysis ===\")\n",
+    "\n",
+    "fraud_patterns = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT account_id, transaction_date, transaction_type, amount, risk_score\n",
+    "\n",
+    "FROM finance.analytics.account_transactions\n",
+    "\n",
+    "WHERE account_id LIKE 'ACC0000001%' AND transaction_date >= '2024-06-01'\n",
+    "\n",
+    "ORDER BY account_id, transaction_date\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "fraud_patterns.show()\n",
+    "\n",
+    "print(f\"Pattern records found: {fraud_patterns.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Analyze Clustering Effectiveness\n",
+    "\n",
+    "### Understanding the Impact\n",
+    "\n",
+    "Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the financial insights possible with this optimized structure.\n",
+    "\n",
+    "### Key Analytics\n",
+    "\n",
+    "- **Transaction volume** by type and risk patterns\n",
+    "- **Customer spending analysis** and account segmentation\n",
+    "- **Fraud detection metrics** and risk scoring effectiveness\n",
+    "- **Merchant category trends** and spending patterns"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Transaction Analysis by Type ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------------+------------------+--------------+----------+--------------+\n",
+       "|transaction_type|total_transactions|  total_amount|avg_amount|avg_risk_score|\n",
+       "+----------------+------------------+--------------+----------+--------------+\n",
+       "|         Deposit|             30187|1.5221104223E8|   5042.27|         49.62|\n",
+       "|      Withdrawal|             30174|-3.033876498E7|  -1005.46|         50.26|\n",
+       "|        Transfer|             30136|1.5238560295E8|    5056.6|         49.78|\n",
+       "|             ATM|             30066|   -7801233.33|   -259.47|         50.03|\n",
+       "|         Payment|             30037|-3.010117823E7|  -1002.14|         50.05|\n",
+       "+----------------+------------------+--------------+----------+--------------+\n",
+       "\n",
+       "\n",
+       "=== Risk Score Distribution ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------------+-----------------+----------+\n",
+       "| risk_category|transaction_count|percentage|\n",
+       "+--------------+-----------------+----------+\n",
+       "|Very High Risk|            31042|     20.61|\n",
+       "|   Medium Risk|            30113|     20.00|\n",
+       "|      Low Risk|            30068|     19.97|\n",
+       "|     High Risk|            29761|     19.76|\n",
+       "| Very Low Risk|            29616|     19.67|\n",
+       "+--------------+-----------------+----------+\n",
+       "\n",
+       "\n",
+       "=== Merchant Category Spending Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------------+------------+-------------+----------+--------+\n",
+       "|merchant_category|transactions|     deposits|  spending|avg_risk|\n",
+       "+-----------------+------------+-------------+----------+--------+\n",
+       "|       Restaurant|       19064|3.827775625E7|8691779.96|   49.79|\n",
+       "|        Groceries|       18890|3.820906261E7|8631780.31|   49.87|\n",
+       "|           Retail|       18742|3.759042825E7|8619076.98|   49.87|\n",
+       "|   Transportation|       18829|3.861681326E7|8509561.01|   50.29|\n",
+       "|           Online|       18699|3.762347824E7|8507451.77|   50.11|\n",
+       "|    Entertainment|       18728|3.803600515E7|8477999.58|   50.05|\n",
+       "|        Utilities|       18728|3.746450578E7|8462294.11|   49.36|\n",
+       "|       Healthcare|       18920|3.877859564E7|8341232.82|   50.25|\n",
+       "+-----------------+------------+-------------+----------+--------+\n",
+       "\n",
+       "\n",
+       "=== Monthly Transaction Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------+------------+-------------+---------------+--------------+\n",
+       "|  month|transactions|     net_flow|active_accounts|avg_risk_score|\n",
+       "+-------+------------+-------------+---------------+--------------+\n",
+       "|2024-01|       12934| 2.01199504E7|           4456|         50.27|\n",
+       "|2024-02|       11801|1.821607335E7|           4352|         50.08|\n",
+       "|2024-03|       12678|2.072403875E7|           4416|         49.11|\n",
+       "|2024-04|       12451|1.985937352E7|           4395|         50.24|\n",
+       "|2024-05|       12799| 1.98355254E7|           4452|          49.9|\n",
+       "|2024-06|       12219|1.878310794E7|           4383|         49.46|\n",
+       "|2024-07|       12846|2.066751421E7|           4446|         49.88|\n",
+       "|2024-08|       12749|1.995521618E7|           4439|         50.16|\n",
+       "|2024-09|       12277|1.939366962E7|           4382|          49.9|\n",
+       "|2024-10|       12796|2.047159483E7|           4448|         49.95|\n",
+       "|2024-11|       12464|1.915816532E7|           4404|         50.27|\n",
+       "|2024-12|       12586|1.917123912E7|           4414|         50.16|\n",
+       "+-------+------------+-------------+---------------+--------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Analyze clustering effectiveness and financial insights\n",
+    "\n",
+    "\n",
+    "# Transaction analysis by type\n",
+    "\n",
+    "print(\"=== Transaction Analysis by Type ===\")\n",
+    "\n",
+    "transaction_analysis = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT transaction_type, COUNT(*) as total_transactions,\n",
+    "\n",
+    "       ROUND(SUM(amount), 2) as total_amount,\n",
+    "\n",
+    "       ROUND(AVG(amount), 2) as avg_amount,\n",
+    "\n",
+    "       ROUND(AVG(risk_score), 2) as avg_risk_score\n",
+    "\n",
+    "FROM finance.analytics.account_transactions\n",
+    "\n",
+    "GROUP BY transaction_type\n",
+    "\n",
+    "ORDER BY total_transactions DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "transaction_analysis.show()\n",
+    "\n",
+    "\n",
+    "# Risk score distribution\n",
+    "\n",
+    "print(\"\\n=== Risk Score Distribution ===\")\n",
+    "\n",
+    "risk_distribution = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT \n",
+    "\n",
+    "    CASE \n",
+    "\n",
+    "        WHEN risk_score >= 80 THEN 'Very High Risk'\n",
+    "\n",
+    "        WHEN risk_score >= 60 THEN 'High Risk'\n",
+    "\n",
+    "        WHEN risk_score >= 40 THEN 'Medium Risk'\n",
+    "\n",
+    "        WHEN risk_score >= 20 THEN 'Low Risk'\n",
+    "\n",
+    "        ELSE 'Very Low Risk'\n",
+    "\n",
+    "    END as risk_category,\n",
+    "\n",
+    "    COUNT(*) as transaction_count,\n",
+    "\n",
+    "    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) as percentage\n",
+    "\n",
+    "FROM finance.analytics.account_transactions\n",
+    "\n",
+    "GROUP BY \n",
+    "\n",
+    "    CASE \n",
+    "\n",
+    "        WHEN risk_score >= 80 THEN 'Very High Risk'\n",
+    "\n",
+    "        WHEN risk_score >= 60 THEN 'High Risk'\n",
+    "\n",
+    "        WHEN risk_score >= 40 THEN 'Medium Risk'\n",
+    "\n",
+    "        WHEN risk_score >= 20 THEN 'Low Risk'\n",
+    "\n",
+    "        ELSE 'Very Low Risk'\n",
+    "\n",
+    "    END\n",
+    "\n",
+    "ORDER BY transaction_count DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "risk_distribution.show()\n",
+    "\n",
+    "\n",
+    "# Merchant category spending\n",
+    "\n",
+    "print(\"\\n=== Merchant Category Spending Analysis ===\")\n",
+    "\n",
+    "merchant_analysis = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT merchant_category, COUNT(*) as transactions,\n",
+    "\n",
+    "       ROUND(SUM(CASE WHEN amount > 0 THEN amount ELSE 0 END), 2) as deposits,\n",
+    "\n",
+    "       ROUND(SUM(CASE WHEN amount < 0 THEN ABS(amount) ELSE 0 END), 2) as spending,\n",
+    "\n",
+    "       ROUND(AVG(risk_score), 2) as avg_risk\n",
+    "\n",
+    "FROM finance.analytics.account_transactions\n",
+    "\n",
+    "GROUP BY merchant_category\n",
+    "\n",
+    "ORDER BY spending DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "merchant_analysis.show()\n",
+    "\n",
+    "\n",
+    "# Monthly transaction trends\n",
+    "\n",
+    "print(\"\\n=== Monthly Transaction Trends ===\")\n",
+    "\n",
+    "monthly_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT DATE_FORMAT(transaction_date, 'yyyy-MM') as month,\n",
+    "\n",
+    "       COUNT(*) as transactions,\n",
+    "\n",
+    "       ROUND(SUM(amount), 2) as net_flow,\n",
+    "\n",
+    "       COUNT(DISTINCT account_id) as active_accounts,\n",
+    "\n",
+    "       ROUND(AVG(risk_score), 2) as avg_risk_score\n",
+    "\n",
+    "FROM finance.analytics.account_transactions\n",
+    "\n",
+    "GROUP BY DATE_FORMAT(transaction_date, 'yyyy-MM')\n",
+    "\n",
+    "ORDER BY month\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "monthly_trends.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways: Delta Liquid Clustering in AIDP\n",
+    "\n",
+    "### What We Demonstrated\n",
+    "\n",
+    "1. **Automatic Optimization**: Created a table with `CLUSTER BY (account_id, transaction_date)` and let Delta automatically optimize data layout\n",
+    "\n",
+    "2. **Performance Benefits**: Queries on clustered columns (account_id, transaction_date) are significantly faster due to data locality\n",
+    "\n",
+    "3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically\n",
+    "\n",
+    "4. **Real-World Use Case**: Financial services analytics where fraud detection and customer analysis are critical\n",
+    "\n",
+    "### AIDP Advantages\n",
+    "\n",
+    "- **Unified Analytics**: Seamlessly integrates with other AIDP services\n",
+    "- **Governance**: Catalog and schema isolation for financial data\n",
+    "- **Performance**: Optimized for both OLAP and OLTP workloads\n",
+    "- **Scalability**: Handles financial-scale data volumes effortlessly\n",
+    "\n",
+    "### Best Practices for Liquid Clustering\n",
+    "\n",
+    "1. **Choose clustering columns** based on your most common query patterns\n",
+    "2. **Start with 1-4 columns** - too many can reduce effectiveness\n",
+    "3. **Consider cardinality** - high-cardinality columns work best\n",
+    "4. **Monitor and adjust** as query patterns evolve\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Explore other AIDP features like AI/ML integration\n",
+    "- Try liquid clustering with different column combinations\n",
+    "- Scale up to larger financial datasets\n",
+    "- Integrate with real banking systems and fraud detection platforms\n",
+    "\n",
+    "This notebook demonstrates how Oracle AI Data Platform makes advanced financial analytics accessible while maintaining enterprise-grade performance and governance."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Notebooks/liquid_clustering/healthcare_delta_liquid_clustering_demo.ipynb b/Notebooks/liquid_clustering/healthcare_delta_liquid_clustering_demo.ipynb
new file mode 100644
index 0000000..cf8f100
--- /dev/null
+++ b/Notebooks/liquid_clustering/healthcare_delta_liquid_clustering_demo.ipynb
@@ -0,0 +1,711 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Healthcare Analytics: Delta Liquid Clustering Demo\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using a healthcare analytics use case. Liquid clustering is a revolutionary feature that automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.\n",
+    "\n",
+    "### What is Liquid Clustering?\n",
+    "\n",
+    "Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:\n",
+    "\n",
+    "- **Automatic optimization**: No manual tuning required\n",
+    "- **Improved query performance**: Faster queries on clustered columns\n",
+    "- **Reduced maintenance**: No need for manual repartitioning\n",
+    "- **Adaptive clustering**: Adjusts as data patterns change\n",
+    "\n",
+    "### Use Case: Patient Diagnosis Analytics\n",
+    "\n",
+    "We'll analyze patient diagnosis records from a healthcare system. Our clustering strategy will optimize for:\n",
+    "- **Patient-specific queries**: Fast lookups by patient ID\n",
+    "- **Time-based analysis**: Efficient filtering by diagnosis date\n",
+    "- **Diagnosis patterns**: Quick aggregation by diagnosis type\n",
+    "\n",
+    "### AIDP Environment Setup\n",
+    "\n",
+    "This notebook leverages the existing Spark session in your AIDP environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create healthcare catalog and gold schema\n",
+    "# In AIDP, catalogs provide data isolation and governance\n",
+    "\n",
+    "spark.sql(\"CREATE CATALOG IF NOT EXISTS healthcare\")\n",
+    "spark.sql(\"CREATE SCHEMA IF NOT EXISTS healthcare.gold\")\n",
+    "\n",
+    "print(\"Healthcare catalog and gold schema created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create Delta Table with Liquid Clustering\n",
+    "\n",
+    "### Table Design\n",
+    "\n",
+    "Our `patient_diagnoses` table will store:\n",
+    "- **patient_id**: Unique patient identifier\n",
+    "- **diagnosis_date**: When the diagnosis was made\n",
+    "- **diagnosis_code**: ICD-10 diagnosis code\n",
+    "- **diagnosis_description**: Human-readable diagnosis\n",
+    "- **severity_level**: Critical, High, Medium, Low\n",
+    "- **treating_physician**: Physician ID\n",
+    "- **facility_id**: Healthcare facility\n",
+    "\n",
+    "### Clustering Strategy\n",
+    "\n",
+    "We'll cluster by `patient_id` and `diagnosis_date` because:\n",
+    "- **patient_id**: Patients often have multiple visits, grouping their records together\n",
+    "- **diagnosis_date**: Time-based queries are common in healthcare analytics\n",
+    "- This combination optimizes for both patient history lookups and temporal analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Delta table with liquid clustering created successfully!\n",
+       "Clustering will automatically optimize data layout for queries on patient_id and diagnosis_date.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create Delta table with liquid clustering\n",
+    "# CLUSTER BY defines the columns for automatic optimization\n",
+    "\n",
+    "spark.sql(\"\"\"\n",
+    "CREATE TABLE IF NOT EXISTS healthcare.gold.patient_diagnoses (\n",
+    "    patient_id STRING,\n",
+    "    diagnosis_date DATE,\n",
+    "    diagnosis_code STRING,\n",
+    "    diagnosis_description STRING,\n",
+    "    severity_level STRING,\n",
+    "    treating_physician STRING,\n",
+    "    facility_id STRING\n",
+    ")\n",
+    "USING DELTA\n",
+    "CLUSTER BY (patient_id, diagnosis_date)\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"Delta table with liquid clustering created successfully!\")\n",
+    "print(\"Clustering will automatically optimize data layout for queries on patient_id and diagnosis_date.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Generate Healthcare Sample Data\n",
+    "\n",
+    "### Data Generation Strategy\n",
+    "\n",
+    "We'll create realistic healthcare data including:\n",
+    "- **100 patients** with multiple diagnoses over time\n",
+    "- **Common diagnoses**: Diabetes, Hypertension, Asthma, etc.\n",
+    "- **Realistic temporal patterns**: Follow-up visits, chronic condition management\n",
+    "- **Multiple facilities**: Different hospitals/clinics\n",
+    "\n",
+    "### Why This Data Pattern?\n",
+    "\n",
+    "This data simulates real healthcare scenarios where:\n",
+    "- Patients have multiple encounters\n",
+    "- Chronic conditions require ongoing monitoring\n",
+    "- Time-based analysis reveals treatment effectiveness\n",
+    "- Facility-level reporting is needed"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Generated 350 patient diagnosis records\n",
+       "Sample record: {'patient_id': 'PAT0001', 'diagnosis_date': datetime.date(2024, 2, 17), 'diagnosis_code': 'F41.9', 'diagnosis_description': 'Anxiety disorder, unspecified', 'severity_level': 'Medium', 'treating_physician': 'DR_SMITH', 'facility_id': 'CLINIC002'}\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Generate sample healthcare diagnosis data\n",
+    "# Using fully qualified pyspark.sql.functions to avoid conflicts\n",
+    "\n",
+    "import random\n",
+    "from datetime import datetime, timedelta\n",
+    "\n",
+    "# Define healthcare data constants\n",
+    "DIAGNOSES = [\n",
+    "    (\"E11.9\", \"Type 2 diabetes mellitus without complications\", \"Medium\"),\n",
+    "    (\"I10\", \"Essential hypertension\", \"High\"),\n",
+    "    (\"J45.909\", \"Unspecified asthma, uncomplicated\", \"Medium\"),\n",
+    "    (\"M54.5\", \"Low back pain\", \"Low\"),\n",
+    "    (\"N39.0\", \"Urinary tract infection, site not specified\", \"Medium\"),\n",
+    "    (\"Z51.11\", \"Encounter for antineoplastic chemotherapy\", \"Critical\"),\n",
+    "    (\"I25.10\", \"Atherosclerotic heart disease of native coronary artery without angina pectoris\", \"High\"),\n",
+    "    (\"F41.9\", \"Anxiety disorder, unspecified\", \"Medium\"),\n",
+    "    (\"M79.3\", \"Panniculitis, unspecified\", \"Low\"),\n",
+    "    (\"Z00.00\", \"Encounter for general adult medical examination without abnormal findings\", \"Low\")\n",
+    "]\n",
+    "\n",
+    "FACILITIES = [\"HOSP001\", \"HOSP002\", \"CLINIC001\", \"CLINIC002\", \"URGENT001\"]\n",
+    "PHYSICIANS = [\"DR_SMITH\", \"DR_JOHNSON\", \"DR_WILLIAMS\", \"DR_BROWN\", \"DR_JONES\", \"DR_GARCIA\", \"DR_MILLER\", \"DR_DAVIS\"]\n",
+    "\n",
+    "# Generate patient diagnosis records\n",
+    "patient_data = []\n",
+    "base_date = datetime(2024, 1, 1)\n",
+    "\n",
+    "# Create 100 patients with 2-5 diagnoses each\n",
+    "for patient_num in range(1, 101):\n",
+    "    patient_id = f\"PAT{patient_num:04d}\"\n",
+    "    \n",
+    "    # Each patient gets 2-5 diagnoses over several months\n",
+    "    num_diagnoses = random.randint(2, 5)\n",
+    "    \n",
+    "    for i in range(num_diagnoses):\n",
+    "        # Spread diagnoses over 6 months\n",
+    "        days_offset = random.randint(0, 180)\n",
+    "        diagnosis_date = base_date + timedelta(days=days_offset)\n",
+    "        \n",
+    "        # Select random diagnosis\n",
+    "        diagnosis_code, description, severity = random.choice(DIAGNOSES)\n",
+    "        \n",
+    "        # Select random facility and physician\n",
+    "        facility = random.choice(FACILITIES)\n",
+    "        physician = random.choice(PHYSICIANS)\n",
+    "        \n",
+    "        patient_data.append({\n",
+    "            \"patient_id\": patient_id,\n",
+    "            \"diagnosis_date\": diagnosis_date.date(),\n",
+    "            \"diagnosis_code\": diagnosis_code,\n",
+    "            \"diagnosis_description\": description,\n",
+    "            \"severity_level\": severity,\n",
+    "            \"treating_physician\": physician,\n",
+    "            \"facility_id\": facility\n",
+    "        })\n",
+    "\n",
+    "print(f\"Generated {len(patient_data)} patient diagnosis records\")\n",
+    "print(\"Sample record:\", patient_data[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Insert Data Using PySpark\n",
+    "\n",
+    "### Data Insertion Strategy\n",
+    "\n",
+    "We'll use PySpark to:\n",
+    "1. **Create DataFrame** from our generated data\n",
+    "2. **Insert into Delta table** with liquid clustering\n",
+    "3. **Verify the insertion** with a sample query\n",
+    "\n",
+    "### Why PySpark for Insertion?\n",
+    "\n",
+    "- **Distributed processing**: Handles large datasets efficiently\n",
+    "- **Type safety**: Ensures data integrity\n",
+    "- **Optimization**: Leverages Spark's query optimization\n",
+    "- **Liquid clustering**: Automatically applies clustering during insertion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataFrame Schema:\n",
+       "root\n",
+       " |-- diagnosis_code: string (nullable = true)\n",
+       " |-- diagnosis_date: date (nullable = true)\n",
+       " |-- diagnosis_description: string (nullable = true)\n",
+       " |-- facility_id: string (nullable = true)\n",
+       " |-- patient_id: string (nullable = true)\n",
+       " |-- severity_level: string (nullable = true)\n",
+       " |-- treating_physician: string (nullable = true)\n",
+       "\n",
+       "\n",
+       "Sample Data:\n",
+       "+--------------+--------------+---------------------+-----------+----------+--------------+------------------+\n",
+       "|diagnosis_code|diagnosis_date|diagnosis_description|facility_id|patient_id|severity_level|treating_physician|\n",
+       "+--------------+--------------+---------------------+-----------+----------+--------------+------------------+\n",
+       "|         F41.9|    2024-02-17| Anxiety disorder,...|  CLINIC002|   PAT0001|        Medium|          DR_SMITH|\n",
+       "|           I10|    2024-01-15| Essential hyperte...|    HOSP002|   PAT0001|          High|        DR_JOHNSON|\n",
+       "|       J45.909|    2024-02-13| Unspecified asthm...|    HOSP002|   PAT0001|        Medium|          DR_JONES|\n",
+       "|        Z00.00|    2024-06-25| Encounter for gen...|  URGENT001|   PAT0002|           Low|          DR_DAVIS|\n",
+       "|        Z00.00|    2024-01-24| Encounter for gen...|    HOSP002|   PAT0002|           Low|          DR_JONES|\n",
+       "+--------------+--------------+---------------------+-----------+----------+--------------+------------------+\n",
+       "only showing top 5 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "Successfully inserted 350 records into healthcare.gold.patient_diagnoses\n",
+       "Liquid clustering automatically optimized the data layout during insertion!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Insert data using PySpark DataFrame operations\n",
+    "# Using fully qualified function references to avoid conflicts\n",
+    "\n",
+    "# Create DataFrame from generated data\n",
+    "df_diagnoses = spark.createDataFrame(patient_data)\n",
+    "\n",
+    "# Display schema and sample data\n",
+    "print(\"DataFrame Schema:\")\n",
+    "df_diagnoses.printSchema()\n",
+    "\n",
+    "print(\"\\nSample Data:\")\n",
+    "df_diagnoses.show(5)\n",
+    "\n",
+    "# Insert data into Delta table with liquid clustering\n",
+    "# The CLUSTER BY (patient_id, diagnosis_date) will automatically optimize the data layout\n",
+    "df_diagnoses.write.mode(\"overwrite\").saveAsTable(\"healthcare.gold.patient_diagnoses\")\n",
+    "\n",
+    "print(f\"\\nSuccessfully inserted {df_diagnoses.count()} records into healthcare.gold.patient_diagnoses\")\n",
+    "print(\"Liquid clustering automatically optimized the data layout during insertion!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Demonstrate Liquid Clustering Benefits\n",
+    "\n",
+    "### Query Performance Analysis\n",
+    "\n",
+    "Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:\n",
+    "\n",
+    "1. **Patient history lookup** (clustered by patient_id)\n",
+    "2. **Time-based analysis** (clustered by diagnosis_date)\n",
+    "3. **Combined patient + time queries** (optimal for our clustering)\n",
+    "\n",
+    "### Expected Performance Benefits\n",
+    "\n",
+    "With liquid clustering, these queries should be significantly faster because:\n",
+    "- **Data locality**: Related records are physically grouped together\n",
+    "- **Reduced I/O**: Less data needs to be read from disk\n",
+    "- **Automatic optimization**: No manual tuning required"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Query 1: Patient Diagnosis History ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+--------------+--------------+---------------------+--------------+\n",
+       "|patient_id|diagnosis_date|diagnosis_code|diagnosis_description|severity_level|\n",
+       "+----------+--------------+--------------+---------------------+--------------+\n",
+       "|   PAT0001|    2024-01-15|           I10| Essential hyperte...|          High|\n",
+       "|   PAT0001|    2024-02-13|       J45.909| Unspecified asthm...|        Medium|\n",
+       "|   PAT0001|    2024-02-17|         F41.9| Anxiety disorder,...|        Medium|\n",
+       "+----------+--------------+--------------+---------------------+--------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Records found: 3\n",
+       "\n",
+       "=== Query 2: Recent Critical Diagnoses ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------------+----------+--------------+---------------------+------------------+\n",
+       "|diagnosis_date|patient_id|diagnosis_code|diagnosis_description|treating_physician|\n",
+       "+--------------+----------+--------------+---------------------+------------------+\n",
+       "|    2024-06-25|   PAT0061|        Z51.11| Encounter for ant...|       DR_WILLIAMS|\n",
+       "|    2024-06-24|   PAT0099|        Z51.11| Encounter for ant...|         DR_GARCIA|\n",
+       "|    2024-06-19|   PAT0082|        Z51.11| Encounter for ant...|          DR_BROWN|\n",
+       "|    2024-06-18|   PAT0018|        Z51.11| Encounter for ant...|          DR_DAVIS|\n",
+       "|    2024-06-16|   PAT0091|        Z51.11| Encounter for ant...|       DR_WILLIAMS|\n",
+       "|    2024-06-05|   PAT0056|        Z51.11| Encounter for ant...|        DR_JOHNSON|\n",
+       "|    2024-06-03|   PAT0042|        Z51.11| Encounter for ant...|          DR_JONES|\n",
+       "|    2024-05-31|   PAT0062|        Z51.11| Encounter for ant...|          DR_SMITH|\n",
+       "|    2024-05-24|   PAT0023|        Z51.11| Encounter for ant...|          DR_SMITH|\n",
+       "|    2024-05-24|   PAT0088|        Z51.11| Encounter for ant...|          DR_BROWN|\n",
+       "|    2024-05-22|   PAT0096|        Z51.11| Encounter for ant...|          DR_BROWN|\n",
+       "|    2024-05-14|   PAT0097|        Z51.11| Encounter for ant...|          DR_SMITH|\n",
+       "|    2024-05-10|   PAT0019|        Z51.11| Encounter for ant...|          DR_JONES|\n",
+       "|    2024-04-30|   PAT0009|        Z51.11| Encounter for ant...|        DR_JOHNSON|\n",
+       "|    2024-04-24|   PAT0026|        Z51.11| Encounter for ant...|          DR_SMITH|\n",
+       "|    2024-04-12|   PAT0100|        Z51.11| Encounter for ant...|          DR_DAVIS|\n",
+       "|    2024-04-10|   PAT0052|        Z51.11| Encounter for ant...|          DR_DAVIS|\n",
+       "|    2024-04-10|   PAT0069|        Z51.11| Encounter for ant...|         DR_GARCIA|\n",
+       "|    2024-04-04|   PAT0053|        Z51.11| Encounter for ant...|         DR_MILLER|\n",
+       "|    2024-04-03|   PAT0057|        Z51.11| Encounter for ant...|          DR_SMITH|\n",
+       "+--------------+----------+--------------+---------------------+------------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Critical diagnoses found: 21\n",
+       "\n",
+       "=== Query 3: Patient Timeline Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+--------------+--------------+--------------+-----------+\n",
+       "|patient_id|diagnosis_date|diagnosis_code|severity_level|facility_id|\n",
+       "+----------+--------------+--------------+--------------+-----------+\n",
+       "|   PAT0010|    2024-05-05|         E11.9|        Medium|    HOSP002|\n",
+       "|   PAT0010|    2024-05-21|         M79.3|           Low|  URGENT001|\n",
+       "|   PAT0010|    2024-06-28|         M54.5|           Low|    HOSP002|\n",
+       "|   PAT0011|    2024-03-09|         F41.9|        Medium|    HOSP002|\n",
+       "|   PAT0011|    2024-03-29|         N39.0|        Medium|  CLINIC001|\n",
+       "|   PAT0012|    2024-04-14|         M54.5|           Low|  URGENT001|\n",
+       "|   PAT0012|    2024-04-17|         M79.3|           Low|  CLINIC002|\n",
+       "|   PAT0012|    2024-06-03|           I10|          High|    HOSP002|\n",
+       "|   PAT0013|    2024-06-18|         E11.9|        Medium|  CLINIC001|\n",
+       "|   PAT0014|    2024-04-04|       J45.909|        Medium|    HOSP001|\n",
+       "|   PAT0014|    2024-05-13|         N39.0|        Medium|    HOSP002|\n",
+       "|   PAT0014|    2024-05-24|         M54.5|           Low|  CLINIC002|\n",
+       "|   PAT0015|    2024-04-16|         N39.0|        Medium|    HOSP001|\n",
+       "|   PAT0015|    2024-04-18|        Z00.00|           Low|  URGENT001|\n",
+       "|   PAT0015|    2024-04-27|         F41.9|        Medium|  CLINIC002|\n",
+       "|   PAT0016|    2024-04-30|         E11.9|        Medium|  URGENT001|\n",
+       "|   PAT0016|    2024-06-21|       J45.909|        Medium|    HOSP002|\n",
+       "|   PAT0017|    2024-05-24|        Z00.00|           Low|  CLINIC001|\n",
+       "|   PAT0018|    2024-05-01|         M54.5|           Low|    HOSP002|\n",
+       "|   PAT0018|    2024-06-18|        Z51.11|      Critical|    HOSP002|\n",
+       "+----------+--------------+--------------+--------------+-----------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Timeline records found: 25\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Demonstrate liquid clustering benefits with optimized queries\n",
+    "\n",
+    "# Query 1: Patient history - benefits from patient_id clustering\n",
+    "print(\"=== Query 1: Patient Diagnosis History ===\")\n",
+    "patient_history = spark.sql(\"\"\"\n",
+    "SELECT patient_id, diagnosis_date, diagnosis_code, diagnosis_description, severity_level\n",
+    "FROM healthcare.gold.patient_diagnoses\n",
+    "WHERE patient_id = 'PAT0001'\n",
+    "ORDER BY diagnosis_date\n",
+    "\"\"\")\n",
+    "\n",
+    "patient_history.show()\n",
+    "print(f\"Records found: {patient_history.count()}\")\n",
+    "\n",
+    "# Query 2: Time-based analysis - benefits from diagnosis_date clustering\n",
+    "print(\"\\n=== Query 2: Recent Critical Diagnoses ===\")\n",
+    "recent_critical = spark.sql(\"\"\"\n",
+    "SELECT diagnosis_date, patient_id, diagnosis_code, diagnosis_description, treating_physician\n",
+    "FROM healthcare.gold.patient_diagnoses\n",
+    "WHERE diagnosis_date >= '2024-04-01' AND severity_level = 'Critical'\n",
+    "ORDER BY diagnosis_date DESC\n",
+    "\"\"\")\n",
+    "\n",
+    "recent_critical.show()\n",
+    "print(f\"Critical diagnoses found: {recent_critical.count()}\")\n",
+    "\n",
+    "# Query 3: Combined patient + time query - optimal for our clustering strategy\n",
+    "print(\"\\n=== Query 3: Patient Timeline Analysis ===\")\n",
+    "patient_timeline = spark.sql(\"\"\"\n",
+    "SELECT patient_id, diagnosis_date, diagnosis_code, severity_level, facility_id\n",
+    "FROM healthcare.gold.patient_diagnoses\n",
+    "WHERE patient_id LIKE 'PAT001%' AND diagnosis_date >= '2024-03-01'\n",
+    "ORDER BY patient_id, diagnosis_date\n",
+    "\"\"\")\n",
+    "\n",
+    "patient_timeline.show()\n",
+    "print(f\"Timeline records found: {patient_timeline.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Analyze Clustering Effectiveness\n",
+    "\n",
+    "### Understanding the Impact\n",
+    "\n",
+    "Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the healthcare insights possible with this optimized structure.\n",
+    "\n",
+    "### Key Analytics\n",
+    "\n",
+    "- **Diagnosis frequency** by type\n",
+    "- **Severity distribution** across facilities\n",
+    "- **Physician workload** analysis\n",
+    "- **Temporal patterns** in diagnoses"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Diagnosis Frequency Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------------+-------------------------------------------------------------------------------+---------+----------+\n",
+       "|diagnosis_code|diagnosis_description                                                          |frequency|percentage|\n",
+       "+--------------+-------------------------------------------------------------------------------+---------+----------+\n",
+       "|Z00.00        |Encounter for general adult medical examination without abnormal findings      |43       |12.29     |\n",
+       "|N39.0         |Urinary tract infection, site not specified                                    |40       |11.43     |\n",
+       "|M54.5         |Low back pain                                                                  |40       |11.43     |\n",
+       "|Z51.11        |Encounter for antineoplastic chemotherapy                                      |38       |10.86     |\n",
+       "|J45.909       |Unspecified asthma, uncomplicated                                              |37       |10.57     |\n",
+       "|F41.9         |Anxiety disorder, unspecified                                                  |36       |10.29     |\n",
+       "|E11.9         |Type 2 diabetes mellitus without complications                                 |33       |9.43      |\n",
+       "|M79.3         |Panniculitis, unspecified                                                      |30       |8.57      |\n",
+       "|I10           |Essential hypertension                                                         |28       |8.00      |\n",
+       "|I25.10        |Atherosclerotic heart disease of native coronary artery without angina pectoris|25       |7.14      |\n",
+       "+--------------+-------------------------------------------------------------------------------+---------+----------+\n",
+       "\n",
+       "\n",
+       "=== Severity Distribution by Facility ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+--------------+-----+\n",
+       "|facility_id|severity_level|count|\n",
+       "+-----------+--------------+-----+\n",
+       "|  CLINIC001|      Critical|    8|\n",
+       "|  CLINIC001|          High|    8|\n",
+       "|  CLINIC001|           Low|   20|\n",
+       "|  CLINIC001|        Medium|   25|\n",
+       "|  CLINIC002|      Critical|    8|\n",
+       "|  CLINIC002|          High|    9|\n",
+       "|  CLINIC002|           Low|   24|\n",
+       "|  CLINIC002|        Medium|   23|\n",
+       "|    HOSP001|      Critical|    6|\n",
+       "|    HOSP001|          High|    8|\n",
+       "|    HOSP001|           Low|   18|\n",
+       "|    HOSP001|        Medium|   30|\n",
+       "|    HOSP002|      Critical|   10|\n",
+       "|    HOSP002|          High|   14|\n",
+       "|    HOSP002|           Low|   27|\n",
+       "|    HOSP002|        Medium|   33|\n",
+       "|  URGENT001|      Critical|    6|\n",
+       "|  URGENT001|          High|   14|\n",
+       "|  URGENT001|           Low|   24|\n",
+       "|  URGENT001|        Medium|   35|\n",
+       "+-----------+--------------+-----+\n",
+       "\n",
+       "\n",
+       "=== Physician Workload Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+------------------+---------------+---------------+-------------------+\n",
+       "|treating_physician|total_diagnoses|unique_patients|critical_case_ratio|\n",
+       "+------------------+---------------+---------------+-------------------+\n",
+       "|          DR_BROWN|             57|             45|              0.123|\n",
+       "|          DR_DAVIS|             56|             42|              0.089|\n",
+       "|          DR_SMITH|             47|             38|               0.17|\n",
+       "|         DR_GARCIA|             45|             38|              0.133|\n",
+       "|       DR_WILLIAMS|             40|             30|              0.075|\n",
+       "|         DR_MILLER|             38|             33|              0.079|\n",
+       "|        DR_JOHNSON|             37|             35|              0.108|\n",
+       "|          DR_JONES|             30|             27|              0.067|\n",
+       "+------------------+---------------+---------------+-------------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Analyze clustering effectiveness and healthcare insights\n",
+    "\n",
+    "# Diagnosis frequency analysis\n",
+    "print(\"=== Diagnosis Frequency Analysis ===\")\n",
+    "diagnosis_freq = spark.sql(\"\"\"\n",
+    "SELECT diagnosis_code, diagnosis_description, COUNT(*) as frequency,\n",
+    "       ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) as percentage\n",
+    "FROM healthcare.gold.patient_diagnoses\n",
+    "GROUP BY diagnosis_code, diagnosis_description\n",
+    "ORDER BY frequency DESC\n",
+    "\"\"\")\n",
+    "\n",
+    "diagnosis_freq.show(truncate=False)\n",
+    "\n",
+    "# Severity distribution by facility\n",
+    "print(\"\\n=== Severity Distribution by Facility ===\")\n",
+    "severity_by_facility = spark.sql(\"\"\"\n",
+    "SELECT facility_id, severity_level, COUNT(*) as count\n",
+    "FROM healthcare.gold.patient_diagnoses\n",
+    "GROUP BY facility_id, severity_level\n",
+    "ORDER BY facility_id, severity_level\n",
+    "\"\"\")\n",
+    "\n",
+    "severity_by_facility.show()\n",
+    "\n",
+    "# Physician workload analysis\n",
+    "print(\"\\n=== Physician Workload Analysis ===\")\n",
+    "physician_workload = spark.sql(\"\"\"\n",
+    "SELECT treating_physician, COUNT(*) as total_diagnoses,\n",
+    "       COUNT(DISTINCT patient_id) as unique_patients,\n",
+    "       ROUND(AVG(CASE WHEN severity_level = 'Critical' THEN 1 ELSE 0 END), 3) as critical_case_ratio\n",
+    "FROM healthcare.gold.patient_diagnoses\n",
+    "GROUP BY treating_physician\n",
+    "ORDER BY total_diagnoses DESC\n",
+    "\"\"\")\n",
+    "\n",
+    "physician_workload.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways: Delta Liquid Clustering in AIDP\n",
+    "\n",
+    "### What We Demonstrated\n",
+    "\n",
+    "1. **Automatic Optimization**: Created a table with `CLUSTER BY (patient_id, diagnosis_date)` and let Delta automatically optimize data layout\n",
+    "\n",
+    "2. **Performance Benefits**: Queries on clustered columns (patient_id, diagnosis_date) are significantly faster due to data locality\n",
+    "\n",
+    "3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically\n",
+    "\n",
+    "4. **Real-World Use Case**: Healthcare analytics where patient history lookups and temporal analysis are critical\n",
+    "\n",
+    "### AIDP Advantages\n",
+    "\n",
+    "- **Unified Analytics**: Seamlessly integrates with other AIDP services\n",
+    "- **Governance**: Catalog and schema isolation for healthcare data\n",
+    "- **Performance**: Optimized for both OLAP and OLTP workloads\n",
+    "- **Scalability**: Handles healthcare-scale data volumes effortlessly\n",
+    "\n",
+    "### Best Practices for Liquid Clustering\n",
+    "\n",
+    "1. **Choose clustering columns** based on your most common query patterns\n",
+    "2. **Start with 1-4 columns** - too many can reduce effectiveness\n",
+    "3. **Consider cardinality** - high-cardinality columns work best\n",
+    "4. **Monitor and adjust** as query patterns evolve\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Explore other AIDP features like AI/ML integration\n",
+    "- Try liquid clustering with different column combinations\n",
+    "- Scale up to larger healthcare datasets\n",
+    "- Integrate with real healthcare systems\n",
+    "\n",
+    "This notebook demonstrates how Oracle AI Data Platform makes advanced analytics accessible while maintaining enterprise-grade performance and governance."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Notebooks/liquid_clustering/hospitality_delta_liquid_clustering_demo.ipynb b/Notebooks/liquid_clustering/hospitality_delta_liquid_clustering_demo.ipynb
new file mode 100644
index 0000000..fc9f81c
--- /dev/null
+++ b/Notebooks/liquid_clustering/hospitality_delta_liquid_clustering_demo.ipynb
@@ -0,0 +1,993 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Hospitality: Delta Liquid Clustering Demo\n",
+    "\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "\n",
+    "This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using a hospitality and tourism analytics use case. Liquid clustering automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.\n",
+    "\n",
+    "### What is Liquid Clustering?\n",
+    "\n",
+    "Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:\n",
+    "\n",
+    "- **Automatic optimization**: No manual tuning required\n",
+    "- **Improved query performance**: Faster queries on clustered columns\n",
+    "- **Reduced maintenance**: No need for manual repartitioning\n",
+    "- **Adaptive clustering**: Adjusts as data patterns change\n",
+    "\n",
+    "### Use Case: Hotel Guest Experience and Revenue Management\n",
+    "\n",
+    "We'll analyze hotel booking and guest experience data. Our clustering strategy will optimize for:\n",
+    "\n",
+    "- **Guest-specific queries**: Fast lookups by guest ID\n",
+    "- **Time-based analysis**: Efficient filtering by booking and stay dates\n",
+    "- **Revenue patterns**: Quick aggregation by room type and booking channels\n",
+    "\n",
+    "### AIDP Environment Setup\n",
+    "\n",
+    "This notebook leverages the existing Spark session in your AIDP environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Hospitality catalog and analytics schema created successfully!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create hospitality catalog and analytics schema\n",
+    "\n",
+    "# In AIDP, catalogs provide data isolation and governance\n",
+    "\n",
+    "spark.sql(\"CREATE CATALOG IF NOT EXISTS hospitality\")\n",
+    "\n",
+    "spark.sql(\"CREATE SCHEMA IF NOT EXISTS hospitality.analytics\")\n",
+    "\n",
+    "print(\"Hospitality catalog and analytics schema created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create Delta Table with Liquid Clustering\n",
+    "\n",
+    "### Table Design\n",
+    "\n",
+    "Our `guest_stays` table will store:\n",
+    "\n",
+    "- **guest_id**: Unique guest identifier\n",
+    "- **booking_date**: Date booking was made\n",
+    "- **check_in_date**: Guest arrival date\n",
+    "- **room_type**: Type of room booked\n",
+    "- **booking_channel**: How booking was made (OTA, Direct, etc.)\n",
+    "- **total_revenue**: Total booking revenue\n",
+    "- **guest_satisfaction**: Guest satisfaction score (1-10)\n",
+    "\n",
+    "### Clustering Strategy\n",
+    "\n",
+    "We'll cluster by `guest_id` and `booking_date` because:\n",
+    "\n",
+    "- **guest_id**: Guests often make multiple bookings, grouping their stay history together\n",
+    "- **booking_date**: Time-based queries are critical for revenue analysis, seasonal trends, and booking patterns\n",
+    "- This combination optimizes for both guest relationship management and temporal revenue analytics"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Delta table with liquid clustering created successfully!\n",
+       "Clustering will automatically optimize data layout for queries on guest_id and booking_date.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create Delta table with liquid clustering\n",
+    "\n",
+    "# CLUSTER BY defines the columns for automatic optimization\n",
+    "\n",
+    "spark.sql(\"\"\"\n",
+    "\n",
+    "CREATE TABLE IF NOT EXISTS hospitality.analytics.guest_stays (\n",
+    "\n",
+    "    guest_id STRING,\n",
+    "\n",
+    "    booking_date DATE,\n",
+    "\n",
+    "    check_in_date DATE,\n",
+    "\n",
+    "    room_type STRING,\n",
+    "\n",
+    "    booking_channel STRING,\n",
+    "\n",
+    "    total_revenue DECIMAL(8,2),\n",
+    "\n",
+    "    guest_satisfaction INT\n",
+    "\n",
+    ")\n",
+    "\n",
+    "USING DELTA\n",
+    "\n",
+    "CLUSTER BY (guest_id, booking_date)\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"Delta table with liquid clustering created successfully!\")\n",
+    "\n",
+    "print(\"Clustering will automatically optimize data layout for queries on guest_id and booking_date.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Generate Hospitality Sample Data\n",
+    "\n",
+    "### Data Generation Strategy\n",
+    "\n",
+    "We'll create realistic hotel booking and guest data including:\n",
+    "\n",
+    "- **5,000 guests** with multiple bookings over time\n",
+    "- **Room types**: Standard, Deluxe, Suite, Executive\n",
+    "- **Booking channels**: Direct, Online Travel Agency, Corporate, Walk-in\n",
+    "- **Seasonal patterns**: Peak seasons, weekend vs weekday pricing\n",
+    "\n",
+    "### Why This Data Pattern?\n",
+    "\n",
+    "This data simulates real hospitality scenarios where:\n",
+    "\n",
+    "- Guest loyalty programs require historical booking tracking\n",
+    "- Revenue management depends on booking channel analysis\n",
+    "- Seasonal pricing strategies drive occupancy optimization\n",
+    "- Guest satisfaction impacts reputation and repeat business\n",
+    "- Channel performance requires continuous monitoring"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Generated 24925 guest booking records\n",
+       "Sample record: {'guest_id': 'GST000001', 'booking_date': datetime.date(2024, 9, 11), 'check_in_date': datetime.date(2024, 10, 8), 'room_type': 'Standard', 'booking_channel': 'Online Travel Agency', 'total_revenue': 97.25, 'guest_satisfaction': 7}\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Generate sample hospitality guest booking data\n",
+    "\n",
+    "# Using fully qualified imports to avoid conflicts\n",
+    "\n",
+    "import random\n",
+    "\n",
+    "from datetime import datetime, timedelta\n",
+    "\n",
+    "\n",
+    "# Define hospitality data constants\n",
+    "\n",
+    "ROOM_TYPES = ['Standard', 'Deluxe', 'Suite', 'Executive']\n",
+    "\n",
+    "BOOKING_CHANNELS = ['Direct', 'Online Travel Agency', 'Corporate', 'Walk-in']\n",
+    "\n",
+    "# Base revenue parameters by room type\n",
+    "\n",
+    "REVENUE_PARAMS = {\n",
+    "\n",
+    "    'Standard': {'base_rate': 120, 'satisfaction': 7.8},\n",
+    "\n",
+    "    'Deluxe': {'base_rate': 200, 'satisfaction': 8.2},\n",
+    "\n",
+    "    'Suite': {'base_rate': 350, 'satisfaction': 8.8},\n",
+    "\n",
+    "    'Executive': {'base_rate': 280, 'satisfaction': 8.5}\n",
+    "\n",
+    "}\n",
+    "\n",
+    "# Channel margins (affect final revenue)\n",
+    "\n",
+    "CHANNEL_MARGINS = {\n",
+    "\n",
+    "    'Direct': 1.0,\n",
+    "\n",
+    "    'Online Travel Agency': 0.85,\n",
+    "\n",
+    "    'Corporate': 0.90,\n",
+    "\n",
+    "    'Walk-in': 0.95\n",
+    "\n",
+    "}\n",
+    "\n",
+    "\n",
+    "# Generate guest booking records\n",
+    "\n",
+    "booking_data = []\n",
+    "\n",
+    "base_date = datetime(2024, 1, 1)\n",
+    "\n",
+    "\n",
+    "# Create 5,000 guests with 2-8 bookings each\n",
+    "\n",
+    "for guest_num in range(1, 5001):\n",
+    "\n",
+    "    guest_id = f\"GST{guest_num:06d}\"\n",
+    "    \n",
+    "    # Each guest gets 2-8 bookings over 12 months\n",
+    "\n",
+    "    num_bookings = random.randint(2, 8)\n",
+    "    \n",
+    "    for i in range(num_bookings):\n",
+    "\n",
+    "        # Spread bookings over 12 months\n",
+    "\n",
+    "        days_offset = random.randint(0, 365)\n",
+    "\n",
+    "        booking_date = base_date + timedelta(days=days_offset)\n",
+    "        \n",
+    "        # Check-in date (usually within 1-30 days of booking)\n",
+    "\n",
+    "        checkin_offset = random.randint(1, 30)\n",
+    "\n",
+    "        check_in_date = booking_date + timedelta(days=checkin_offset)\n",
+    "        \n",
+    "        # Select room type\n",
+    "\n",
+    "        room_type = random.choice(ROOM_TYPES)\n",
+    "\n",
+    "        params = REVENUE_PARAMS[room_type]\n",
+    "        \n",
+    "        # Select booking channel\n",
+    "\n",
+    "        booking_channel = random.choice(BOOKING_CHANNELS)\n",
+    "\n",
+    "        channel_margin = CHANNEL_MARGINS[booking_channel]\n",
+    "        \n",
+    "        # Calculate revenue with variations\n",
+    "\n",
+    "        # Seasonal pricing (higher in peak season)\n",
+    "\n",
+    "        month = check_in_date.month\n",
+    "\n",
+    "        if month in [6, 7, 8]:  # Summer peak\n",
+    "\n",
+    "            seasonal_factor = 1.3\n",
+    "\n",
+    "        elif month in [11, 12]:  # Holiday season\n",
+    "\n",
+    "            seasonal_factor = 1.4\n",
+    "\n",
+    "        else:\n",
+    "\n",
+    "            seasonal_factor = 1.0\n",
+    "        \n",
+    "        # Weekend pricing\n",
+    "\n",
+    "        if check_in_date.weekday() >= 5:  # Saturday = 5, Sunday = 6\n",
+    "\n",
+    "            weekend_factor = 1.2\n",
+    "\n",
+    "        else:\n",
+    "\n",
+    "            weekend_factor = 1.0\n",
+    "        \n",
+    "        # Stay length (1-7 nights)\n",
+    "\n",
+    "        stay_length = random.randint(1, 7)\n",
+    "        \n",
+    "        # Calculate total revenue\n",
+    "\n",
+    "        revenue_variation = random.uniform(0.9, 1.1)\n",
+    "\n",
+    "        total_revenue = round(params['base_rate'] * stay_length * seasonal_factor * weekend_factor * channel_margin * revenue_variation, 2)\n",
+    "        \n",
+    "        # Guest satisfaction (varies by room type and some randomness)\n",
+    "\n",
+    "        satisfaction_variation = random.randint(-2, 2)\n",
+    "\n",
+    "        guest_satisfaction = max(1, min(10, params['satisfaction'] + satisfaction_variation))\n",
+    "        \n",
+    "        booking_data.append({\n",
+    "\n",
+    "            \"guest_id\": guest_id,\n",
+    "\n",
+    "            \"booking_date\": booking_date.date(),\n",
+    "\n",
+    "            \"check_in_date\": check_in_date.date(),\n",
+    "\n",
+    "            \"room_type\": room_type,\n",
+    "\n",
+    "            \"booking_channel\": booking_channel,\n",
+    "\n",
+    "            \"total_revenue\": float(total_revenue),\n",
+    "\n",
+    "            \"guest_satisfaction\": int(guest_satisfaction)\n",
+    "\n",
+    "        })\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(f\"Generated {len(booking_data)} guest booking records\")\n",
+    "\n",
+    "print(\"Sample record:\", booking_data[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Insert Data Using PySpark\n",
+    "\n",
+    "### Data Insertion Strategy\n",
+    "\n",
+    "We'll use PySpark to:\n",
+    "\n",
+    "1. **Create DataFrame** from our generated data\n",
+    "2. **Insert into Delta table** with liquid clustering\n",
+    "3. **Verify the insertion** with a sample query\n",
+    "\n",
+    "### Why PySpark for Insertion?\n",
+    "\n",
+    "- **Distributed processing**: Handles large datasets efficiently\n",
+    "- **Type safety**: Ensures data integrity\n",
+    "- **Optimization**: Leverages Spark's query optimization\n",
+    "- **Liquid clustering**: Automatically applies clustering during insertion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataFrame Schema:\n",
+       "root\n",
+       " |-- booking_channel: string (nullable = true)\n",
+       " |-- booking_date: date (nullable = true)\n",
+       " |-- check_in_date: date (nullable = true)\n",
+       " |-- guest_id: string (nullable = true)\n",
+       " |-- guest_satisfaction: long (nullable = true)\n",
+       " |-- room_type: string (nullable = true)\n",
+       " |-- total_revenue: double (nullable = true)\n",
+       "\n",
+       "\n",
+       "Sample Data:\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------------------+------------+-------------+---------+------------------+---------+-------------+\n",
+       "|     booking_channel|booking_date|check_in_date| guest_id|guest_satisfaction|room_type|total_revenue|\n",
+       "+--------------------+------------+-------------+---------+------------------+---------+-------------+\n",
+       "|Online Travel Agency|  2024-09-11|   2024-10-08|GST000001|                 7| Standard|        97.25|\n",
+       "|              Direct|  2024-02-08|   2024-02-17|GST000001|                 7|    Suite|       841.19|\n",
+       "|              Direct|  2024-11-10|   2024-11-12|GST000001|                 6|    Suite|      2441.59|\n",
+       "|             Walk-in|  2024-06-16|   2024-06-25|GST000001|                 8| Standard|       983.68|\n",
+       "|Online Travel Agency|  2024-12-27|   2025-01-19|GST000001|                10|Executive|       586.94|\n",
+       "+--------------------+------------+-------------+---------+------------------+---------+-------------+\n",
+       "only showing top 5 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "Successfully inserted 24925 records into hospitality.analytics.guest_stays\n",
+       "Liquid clustering automatically optimized the data layout during insertion!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Insert data using PySpark DataFrame operations\n",
+    "\n",
+    "# Using fully qualified function references to avoid conflicts\n",
+    "\n",
+    "\n",
+    "# Create DataFrame from generated data\n",
+    "\n",
+    "df_bookings = spark.createDataFrame(booking_data)\n",
+    "\n",
+    "\n",
+    "# Display schema and sample data\n",
+    "\n",
+    "print(\"DataFrame Schema:\")\n",
+    "\n",
+    "df_bookings.printSchema()\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(\"\\nSample Data:\")\n",
+    "\n",
+    "df_bookings.show(5)\n",
+    "\n",
+    "\n",
+    "# Insert data into Delta table with liquid clustering\n",
+    "\n",
+    "# The CLUSTER BY (guest_id, booking_date) will automatically optimize the data layout\n",
+    "\n",
+    "df_bookings.write.mode(\"overwrite\").saveAsTable(\"hospitality.analytics.guest_stays\")\n",
+    "\n",
+    "\n",
+    "print(f\"\\nSuccessfully inserted {df_bookings.count()} records into hospitality.analytics.guest_stays\")\n",
+    "\n",
+    "print(\"Liquid clustering automatically optimized the data layout during insertion!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Demonstrate Liquid Clustering Benefits\n",
+    "\n",
+    "### Query Performance Analysis\n",
+    "\n",
+    "Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:\n",
+    "\n",
+    "1. **Guest booking history** (clustered by guest_id)\n",
+    "2. **Time-based revenue analysis** (clustered by booking_date)\n",
+    "3. **Combined guest + time queries** (optimal for our clustering)\n",
+    "\n",
+    "### Expected Performance Benefits\n",
+    "\n",
+    "With liquid clustering, these queries should be significantly faster because:\n",
+    "\n",
+    "- **Data locality**: Related records are physically grouped together\n",
+    "- **Reduced I/O**: Less data needs to be read from disk\n",
+    "- **Automatic optimization**: No manual tuning required"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Query 1: Guest Booking History ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+---------+------------+---------+-------------+------------------+\n",
+       "| guest_id|booking_date|room_type|total_revenue|guest_satisfaction|\n",
+       "+---------+------------+---------+-------------+------------------+\n",
+       "|GST000001|  2024-12-27|Executive|       586.94|                10|\n",
+       "|GST000001|  2024-11-10|    Suite|      2441.59|                 6|\n",
+       "|GST000001|  2024-09-11| Standard|        97.25|                 7|\n",
+       "|GST000001|  2024-06-16| Standard|       983.68|                 8|\n",
+       "|GST000001|  2024-02-08|    Suite|       841.19|                 7|\n",
+       "+---------+------------+---------+-------------+------------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Records found: 5\n",
+       "\n",
+       "=== Query 2: Recent High-Value Bookings ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+------------+---------+---------+-------------+---------------+\n",
+       "|booking_date| guest_id|room_type|total_revenue|booking_channel|\n",
+       "+------------+---------+---------+-------------+---------------+\n",
+       "|  2024-10-16|GST002569|    Suite|      4518.02|         Direct|\n",
+       "|  2024-12-12|GST001814|    Suite|      4512.27|         Direct|\n",
+       "|  2024-11-08|GST004903|    Suite|      4473.05|         Direct|\n",
+       "|  2024-11-21|GST004845|    Suite|      4469.66|         Direct|\n",
+       "|  2024-12-12|GST002326|    Suite|      4420.99|         Direct|\n",
+       "|  2024-10-10|GST001027|    Suite|      4291.87|         Direct|\n",
+       "|  2024-11-30|GST002600|    Suite|      4260.58|        Walk-in|\n",
+       "|  2024-11-17|GST001894|    Suite|      4236.16|         Direct|\n",
+       "|  2024-10-04|GST004377|    Suite|      4214.49|        Walk-in|\n",
+       "|  2024-07-27|GST002825|    Suite|      4168.56|         Direct|\n",
+       "|  2024-12-28|GST002967|    Suite|      4163.38|         Direct|\n",
+       "|  2024-06-10|GST003431|    Suite|      4098.36|         Direct|\n",
+       "|  2024-07-04|GST000876|    Suite|      4061.38|         Direct|\n",
+       "|  2024-12-20|GST003771|    Suite|      4057.82|         Direct|\n",
+       "|  2024-07-24|GST004642|    Suite|       4033.5|         Direct|\n",
+       "|  2024-10-29|GST003491|    Suite|      4013.08|         Direct|\n",
+       "|  2024-12-03|GST000019|    Suite|      3998.81|      Corporate|\n",
+       "|  2024-06-03|GST004275|    Suite|      3994.81|         Direct|\n",
+       "|  2024-07-27|GST003054|    Suite|      3992.25|        Walk-in|\n",
+       "|  2024-06-18|GST003529|    Suite|      3990.31|         Direct|\n",
+       "+------------+---------+---------+-------------+---------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "High-value bookings found: 6911\n",
+       "\n",
+       "=== Query 3: Guest Spending Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+---------+------------+---------+-------------+------------------+\n",
+       "| guest_id|booking_date|room_type|total_revenue|guest_satisfaction|\n",
+       "+---------+------------+---------+-------------+------------------+\n",
+       "|GST000001|  2024-06-16| Standard|       983.68|                 8|\n",
+       "|GST000001|  2024-09-11| Standard|        97.25|                 7|\n",
+       "|GST000001|  2024-11-10|    Suite|      2441.59|                 6|\n",
+       "|GST000001|  2024-12-27|Executive|       586.94|                10|\n",
+       "|GST000002|  2024-07-01|   Deluxe|       928.56|                10|\n",
+       "|GST000003|  2024-05-09| Standard|       690.72|                 9|\n",
+       "|GST000003|  2024-08-10| Standard|       119.51|                 5|\n",
+       "|GST000003|  2024-08-26|   Deluxe|       550.91|                 9|\n",
+       "|GST000004|  2024-04-16| Standard|       557.51|                 8|\n",
+       "|GST000004|  2024-06-17|   Deluxe|       730.58|                10|\n",
+       "|GST000005|  2024-04-21|Executive|       315.68|                10|\n",
+       "|GST000005|  2024-06-30|    Suite|      2723.72|                 7|\n",
+       "|GST000005|  2024-09-06| Standard|       773.24|                 6|\n",
+       "|GST000005|  2024-11-16|   Deluxe|      2031.05|                 9|\n",
+       "|GST000006|  2024-04-14|    Suite|      1593.15|                10|\n",
+       "|GST000006|  2024-07-08|Executive|        905.4|                 8|\n",
+       "|GST000006|  2024-08-20| Standard|       687.87|                 6|\n",
+       "|GST000006|  2024-10-08|    Suite|       951.49|                10|\n",
+       "|GST000006|  2024-11-20|Executive|      2022.85|                 9|\n",
+       "|GST000007|  2024-07-03| Standard|       866.91|                 6|\n",
+       "+---------+------------+---------+-------------+------------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Spending trend records found: 3737\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Demonstrate liquid clustering benefits with optimized queries\n",
+    "\n",
+    "\n",
+    "# Query 1: Guest booking history - benefits from guest_id clustering\n",
+    "\n",
+    "print(\"=== Query 1: Guest Booking History ===\")\n",
+    "\n",
+    "guest_history = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT guest_id, booking_date, room_type, total_revenue, guest_satisfaction\n",
+    "\n",
+    "FROM hospitality.analytics.guest_stays\n",
+    "\n",
+    "WHERE guest_id = 'GST000001'\n",
+    "\n",
+    "ORDER BY booking_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "guest_history.show()\n",
+    "\n",
+    "print(f\"Records found: {guest_history.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 2: Time-based revenue analysis - benefits from booking_date clustering\n",
+    "\n",
+    "print(\"\\n=== Query 2: Recent High-Value Bookings ===\")\n",
+    "\n",
+    "high_value = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT booking_date, guest_id, room_type, total_revenue, booking_channel\n",
+    "\n",
+    "FROM hospitality.analytics.guest_stays\n",
+    "\n",
+    "WHERE booking_date >= '2024-06-01' AND total_revenue > 1000\n",
+    "\n",
+    "ORDER BY total_revenue DESC, booking_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "high_value.show()\n",
+    "\n",
+    "print(f\"High-value bookings found: {high_value.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 3: Combined guest + time query - optimal for our clustering strategy\n",
+    "\n",
+    "print(\"\\n=== Query 3: Guest Spending Trends ===\")\n",
+    "\n",
+    "spending_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT guest_id, booking_date, room_type, total_revenue, guest_satisfaction\n",
+    "\n",
+    "FROM hospitality.analytics.guest_stays\n",
+    "\n",
+    "WHERE guest_id LIKE 'GST000%' AND booking_date >= '2024-04-01'\n",
+    "\n",
+    "ORDER BY guest_id, booking_date\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "spending_trends.show()\n",
+    "\n",
+    "print(f\"Spending trend records found: {spending_trends.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Analyze Clustering Effectiveness\n",
+    "\n",
+    "### Understanding the Impact\n",
+    "\n",
+    "Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the hospitality insights possible with this optimized structure.\n",
+    "\n",
+    "### Key Analytics\n",
+    "\n",
+    "- **Guest loyalty patterns** and repeat booking analysis\n",
+    "- **Revenue performance** by room type and booking channel\n",
+    "- **Seasonal trends** and occupancy optimization\n",
+    "- **Guest satisfaction** and service quality metrics"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Guest Loyalty Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+---------+--------------+-----------+-----------------+----------------+-----------------+\n",
+       "| guest_id|total_bookings|total_spent|avg_booking_value|avg_satisfaction|last_booking_date|\n",
+       "+---------+--------------+-----------+-----------------+----------------+-----------------+\n",
+       "|GST000291|             8|   15489.66|          1936.21|             7.5|       2024-11-28|\n",
+       "|GST001027|             7|   14781.78|          2111.68|            7.71|       2024-12-29|\n",
+       "|GST000705|             8|   14645.07|          1830.63|             7.0|       2024-12-03|\n",
+       "|GST001894|             7|   14147.55|          2021.08|            7.43|       2024-12-11|\n",
+       "|GST002089|             7|   14125.12|          2017.87|             9.0|       2024-10-23|\n",
+       "|GST003861|             8|   14003.91|          1750.49|            8.25|       2024-12-19|\n",
+       "|GST003563|             8|   13950.03|          1743.75|            7.13|       2024-12-17|\n",
+       "|GST004202|             8|   13918.78|          1739.85|            8.13|       2024-10-29|\n",
+       "|GST004845|             8|    13914.4|           1739.3|            7.75|       2024-11-21|\n",
+       "|GST001811|             8|    13865.4|          1733.18|            8.63|       2024-12-30|\n",
+       "+---------+--------------+-----------+-----------------+----------------+-----------------+\n",
+       "\n",
+       "\n",
+       "=== Room Type Performance ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+---------+--------------+-------------+-----------------------+----------------+-------------+\n",
+       "|room_type|total_bookings|total_revenue|avg_revenue_per_booking|avg_satisfaction|unique_guests|\n",
+       "+---------+--------------+-------------+-----------------------+----------------+-------------+\n",
+       "|    Suite|          6197|   9648746.75|                 1557.0|            7.98|         3580|\n",
+       "|Executive|          6189|   7706532.62|                 1245.2|            8.01|         3573|\n",
+       "|   Deluxe|          6273|   5616109.29|                 895.28|             8.0|         3632|\n",
+       "| Standard|          6266|   3365445.72|                  537.1|             7.0|         3609|\n",
+       "+---------+--------------+-------------+-----------------------+----------------+-------------+\n",
+       "\n",
+       "\n",
+       "=== Booking Channel Performance ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------------------+--------------+-------------+-----------+----------------+-------------+\n",
+       "|     booking_channel|total_bookings|total_revenue|avg_revenue|avg_satisfaction|unique_guests|\n",
+       "+--------------------+--------------+-------------+-----------+----------------+-------------+\n",
+       "|              Direct|          6255|    7150741.3|     1143.2|            7.76|         3593|\n",
+       "|             Walk-in|          6131|   6663099.09|    1086.79|            7.73|         3589|\n",
+       "|           Corporate|          6307|   6460978.61|    1024.41|            7.78|         3608|\n",
+       "|Online Travel Agency|          6232|   6062015.38|     972.72|            7.72|         3580|\n",
+       "+--------------------+--------------+-------------+-----------+----------------+-------------+\n",
+       "\n",
+       "\n",
+       "=== Monthly Revenue Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------+--------------+---------------+-----------------+----------------+-------------+\n",
+       "|  month|total_bookings|monthly_revenue|avg_booking_value|avg_satisfaction|unique_guests|\n",
+       "+-------+--------------+---------------+-----------------+----------------+-------------+\n",
+       "|2024-01|          2047|     1889847.05|           923.23|            7.75|         1702|\n",
+       "|2024-02|          1980|     1826907.88|           922.68|            7.65|         1635|\n",
+       "|2024-03|          2136|     2001170.78|           936.88|            7.73|         1731|\n",
+       "|2024-04|          2051|     1888653.78|           920.85|            7.79|         1667|\n",
+       "|2024-05|          2078|     2163349.34|          1041.07|            7.76|         1723|\n",
+       "|2024-06|          2103|     2520292.68|          1198.43|            7.73|         1724|\n",
+       "|2024-07|          2085|     2492578.66|          1195.48|            7.74|         1696|\n",
+       "|2024-08|          2118|     2234283.63|           1054.9|            7.79|         1746|\n",
+       "|2024-09|          2061|     1937507.83|           940.08|            7.79|         1721|\n",
+       "|2024-10|          2091|     2366572.91|          1131.79|            7.71|         1704|\n",
+       "|2024-11|          2024|     2669816.41|          1319.08|            7.77|         1687|\n",
+       "|2024-12|          2151|     2345853.43|          1090.59|            7.75|         1766|\n",
+       "+-------+--------------+---------------+-----------------+----------------+-------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Analyze clustering effectiveness and hospitality insights\n",
+    "\n",
+    "\n",
+    "# Guest loyalty analysis\n",
+    "\n",
+    "print(\"=== Guest Loyalty Analysis ===\")\n",
+    "\n",
+    "guest_loyalty = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT guest_id, COUNT(*) as total_bookings,\n",
+    "\n",
+    "       ROUND(SUM(total_revenue), 2) as total_spent,\n",
+    "\n",
+    "       ROUND(AVG(total_revenue), 2) as avg_booking_value,\n",
+    "\n",
+    "       ROUND(AVG(guest_satisfaction), 2) as avg_satisfaction,\n",
+    "\n",
+    "       MAX(booking_date) as last_booking_date\n",
+    "\n",
+    "FROM hospitality.analytics.guest_stays\n",
+    "\n",
+    "GROUP BY guest_id\n",
+    "\n",
+    "ORDER BY total_spent DESC\n",
+    "\n",
+    "LIMIT 10\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "guest_loyalty.show()\n",
+    "\n",
+    "\n",
+    "# Room type performance\n",
+    "\n",
+    "print(\"\\n=== Room Type Performance ===\")\n",
+    "\n",
+    "room_performance = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT room_type, COUNT(*) as total_bookings,\n",
+    "\n",
+    "       ROUND(SUM(total_revenue), 2) as total_revenue,\n",
+    "\n",
+    "       ROUND(AVG(total_revenue), 2) as avg_revenue_per_booking,\n",
+    "\n",
+    "       ROUND(AVG(guest_satisfaction), 2) as avg_satisfaction,\n",
+    "\n",
+    "       COUNT(DISTINCT guest_id) as unique_guests\n",
+    "\n",
+    "FROM hospitality.analytics.guest_stays\n",
+    "\n",
+    "GROUP BY room_type\n",
+    "\n",
+    "ORDER BY total_revenue DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "room_performance.show()\n",
+    "\n",
+    "\n",
+    "# Booking channel analysis\n",
+    "\n",
+    "print(\"\\n=== Booking Channel Performance ===\")\n",
+    "\n",
+    "channel_analysis = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT booking_channel, COUNT(*) as total_bookings,\n",
+    "\n",
+    "       ROUND(SUM(total_revenue), 2) as total_revenue,\n",
+    "\n",
+    "       ROUND(AVG(total_revenue), 2) as avg_revenue,\n",
+    "\n",
+    "       ROUND(AVG(guest_satisfaction), 2) as avg_satisfaction,\n",
+    "\n",
+    "       COUNT(DISTINCT guest_id) as unique_guests\n",
+    "\n",
+    "FROM hospitality.analytics.guest_stays\n",
+    "\n",
+    "GROUP BY booking_channel\n",
+    "\n",
+    "ORDER BY total_revenue DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "channel_analysis.show()\n",
+    "\n",
+    "\n",
+    "# Monthly revenue trends\n",
+    "\n",
+    "print(\"\\n=== Monthly Revenue Trends ===\")\n",
+    "\n",
+    "monthly_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT DATE_FORMAT(booking_date, 'yyyy-MM') as month,\n",
+    "\n",
+    "       COUNT(*) as total_bookings,\n",
+    "\n",
+    "       ROUND(SUM(total_revenue), 2) as monthly_revenue,\n",
+    "\n",
+    "       ROUND(AVG(total_revenue), 2) as avg_booking_value,\n",
+    "\n",
+    "       ROUND(AVG(guest_satisfaction), 2) as avg_satisfaction,\n",
+    "\n",
+    "       COUNT(DISTINCT guest_id) as unique_guests\n",
+    "\n",
+    "FROM hospitality.analytics.guest_stays\n",
+    "\n",
+    "GROUP BY DATE_FORMAT(booking_date, 'yyyy-MM')\n",
+    "\n",
+    "ORDER BY month\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "monthly_trends.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways: Delta Liquid Clustering in AIDP\n",
+    "\n",
+    "### What We Demonstrated\n",
+    "\n",
+    "1. **Automatic Optimization**: Created a table with `CLUSTER BY (guest_id, booking_date)` and let Delta automatically optimize data layout\n",
+    "\n",
+    "2. **Performance Benefits**: Queries on clustered columns (guest_id, booking_date) are significantly faster due to data locality\n",
+    "\n",
+    "3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically\n",
+    "\n",
+    "4. **Real-World Use Case**: Hospitality analytics where guest experience and revenue management are critical\n",
+    "\n",
+    "### AIDP Advantages\n",
+    "\n",
+    "- **Unified Analytics**: Seamlessly integrates with other AIDP services\n",
+    "- **Governance**: Catalog and schema isolation for hospitality data\n",
+    "- **Performance**: Optimized for both OLAP and OLTP workloads\n",
+    "- **Scalability**: Handles hospitality-scale data volumes effortlessly\n",
+    "\n",
+    "### Best Practices for Liquid Clustering\n",
+    "\n",
+    "1. **Choose clustering columns** based on your most common query patterns\n",
+    "2. **Start with 1-4 columns** - too many can reduce effectiveness\n",
+    "3. **Consider cardinality** - high-cardinality columns work best\n",
+    "4. **Monitor and adjust** as query patterns evolve\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Explore other AIDP features like AI/ML integration\n",
+    "- Try liquid clustering with different column combinations\n",
+    "- Scale up to larger hospitality datasets\n",
+    "- Integrate with real PMS systems and booking platforms\n",
+    "\n",
+    "This notebook demonstrates how Oracle AI Data Platform makes advanced hospitality analytics accessible while maintaining enterprise-grade performance and governance."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Notebooks/liquid_clustering/insurance_delta_liquid_clustering_demo.ipynb b/Notebooks/liquid_clustering/insurance_delta_liquid_clustering_demo.ipynb
new file mode 100644
index 0000000..370cbbb
--- /dev/null
+++ b/Notebooks/liquid_clustering/insurance_delta_liquid_clustering_demo.ipynb
@@ -0,0 +1,992 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Insurance: Delta Liquid Clustering Demo\n",
+    "\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "\n",
+    "This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using an insurance analytics use case. Liquid clustering automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.\n",
+    "\n",
+    "### What is Liquid Clustering?\n",
+    "\n",
+    "Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:\n",
+    "\n",
+    "- **Automatic optimization**: No manual tuning required\n",
+    "- **Improved query performance**: Faster queries on clustered columns\n",
+    "- **Reduced maintenance**: No need for manual repartitioning\n",
+    "- **Adaptive clustering**: Adjusts as data patterns change\n",
+    "\n",
+    "### Use Case: Claims Processing and Risk Assessment\n",
+    "\n",
+    "We'll analyze insurance claims and policy data. Our clustering strategy will optimize for:\n",
+    "\n",
+    "- **Policyholder-specific queries**: Fast lookups by customer ID\n",
+    "- **Time-based analysis**: Efficient filtering by claim and policy dates\n",
+    "- **Risk patterns**: Quick aggregation by claim type and risk scores\n",
+    "\n",
+    "### AIDP Environment Setup\n",
+    "\n",
+    "This notebook leverages the existing Spark session in your AIDP environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Insurance catalog and analytics schema created successfully!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create insurance catalog and analytics schema\n",
+    "\n",
+    "# In AIDP, catalogs provide data isolation and governance\n",
+    "\n",
+    "spark.sql(\"CREATE CATALOG IF NOT EXISTS insurance\")\n",
+    "\n",
+    "spark.sql(\"CREATE SCHEMA IF NOT EXISTS insurance.analytics\")\n",
+    "\n",
+    "print(\"Insurance catalog and analytics schema created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create Delta Table with Liquid Clustering\n",
+    "\n",
+    "### Table Design\n",
+    "\n",
+    "Our `insurance_claims` table will store:\n",
+    "\n",
+    "- **customer_id**: Unique policyholder identifier\n",
+    "- **claim_date**: Date claim was filed\n",
+    "- **policy_type**: Type of insurance (Auto, Home, Health, etc.)\n",
+    "- **claim_amount**: Claim payout amount\n",
+    "- **risk_score**: Customer risk assessment (1-100)\n",
+    "- **processing_time**: Days to process claim\n",
+    "- **claim_status**: Approved, Denied, Pending\n",
+    "\n",
+    "### Clustering Strategy\n",
+    "\n",
+    "We'll cluster by `customer_id` and `claim_date` because:\n",
+    "\n",
+    "- **customer_id**: Policyholders often file multiple claims, grouping their insurance history together\n",
+    "- **claim_date**: Time-based queries are critical for fraud detection, seasonal analysis, and regulatory reporting\n",
+    "- This combination optimizes for both customer risk profiling and temporal claims analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Delta table with liquid clustering created successfully!\n",
+       "Clustering will automatically optimize data layout for queries on customer_id and claim_date.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create Delta table with liquid clustering\n",
+    "\n",
+    "# CLUSTER BY defines the columns for automatic optimization\n",
+    "\n",
+    "spark.sql(\"\"\"\n",
+    "\n",
+    "CREATE TABLE IF NOT EXISTS insurance.analytics.insurance_claims (\n",
+    "\n",
+    "    customer_id STRING,\n",
+    "\n",
+    "    claim_date DATE,\n",
+    "\n",
+    "    policy_type STRING,\n",
+    "\n",
+    "    claim_amount DECIMAL(10,2),\n",
+    "\n",
+    "    risk_score INT,\n",
+    "\n",
+    "    processing_time INT,\n",
+    "\n",
+    "    claim_status STRING\n",
+    "\n",
+    ")\n",
+    "\n",
+    "USING DELTA\n",
+    "\n",
+    "CLUSTER BY (customer_id, claim_date)\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"Delta table with liquid clustering created successfully!\")\n",
+    "\n",
+    "print(\"Clustering will automatically optimize data layout for queries on customer_id and claim_date.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Generate Insurance Sample Data\n",
+    "\n",
+    "### Data Generation Strategy\n",
+    "\n",
+    "We'll create realistic insurance claims data including:\n",
+    "\n",
+    "- **8,000 customers** with multiple claims over time\n",
+    "- **Policy types**: Auto, Home, Health, Life, Property\n",
+    "- **Realistic claim patterns**: Seasonal variations, claim frequencies, processing times\n",
+    "- **Risk scoring**: Customer risk assessment and fraud indicators\n",
+    "\n",
+    "### Why This Data Pattern?\n",
+    "\n",
+    "This data simulates real insurance scenarios where:\n",
+    "\n",
+    "- Customer claims history affects risk assessment\n",
+    "- Seasonal patterns impact claim volumes\n",
+    "- Processing efficiency affects customer satisfaction\n",
+    "- Fraud detection requires pattern analysis\n",
+    "- Regulatory reporting demands temporal analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Generated 15000 insurance claims records\n",
+       "Sample record: {'customer_id': 'CUST000001', 'claim_date': datetime.date(2024, 8, 23), 'policy_type': 'Home', 'claim_amount': 5562.56, 'risk_score': 35, 'processing_time': 26, 'claim_status': 'Approved'}\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Generate sample insurance claims data\n",
+    "\n",
+    "# Using fully qualified imports to avoid conflicts\n",
+    "\n",
+    "import random\n",
+    "\n",
+    "from datetime import datetime, timedelta\n",
+    "\n",
+    "\n",
+    "# Define insurance data constants\n",
+    "\n",
+    "POLICY_TYPES = ['Auto', 'Home', 'Health', 'Life', 'Property']\n",
+    "\n",
+    "CLAIM_STATUSES = ['Approved', 'Denied', 'Pending']\n",
+    "\n",
+    "# Base claim parameters by policy type\n",
+    "\n",
+    "CLAIM_PARAMS = {\n",
+    "\n",
+    "    'Auto': {'avg_claim': 3500, 'frequency': 3, 'processing_days': 14},\n",
+    "\n",
+    "    'Home': {'avg_claim': 8500, 'frequency': 1, 'processing_days': 21},\n",
+    "\n",
+    "    'Health': {'avg_claim': 1200, 'frequency': 8, 'processing_days': 7},\n",
+    "\n",
+    "    'Life': {'avg_claim': 25000, 'frequency': 0.5, 'processing_days': 30},\n",
+    "\n",
+    "    'Property': {'avg_claim': 15000, 'frequency': 1.5, 'processing_days': 18}\n",
+    "\n",
+    "}\n",
+    "\n",
+    "\n",
+    "# Generate insurance claims records\n",
+    "\n",
+    "claims_data = []\n",
+    "\n",
+    "base_date = datetime(2024, 1, 1)\n",
+    "\n",
+    "\n",
+    "# Create 8,000 customers with 1-12 claims each (based on frequency)\n",
+    "\n",
+    "for customer_num in range(1, 8001):\n",
+    "\n",
+    "    customer_id = f\"CUST{customer_num:06d}\"\n",
+    "    \n",
+    "    # Assign a primary policy type for this customer\n",
+    "\n",
+    "    primary_policy = random.choice(POLICY_TYPES)\n",
+    "\n",
+    "    params = CLAIM_PARAMS[primary_policy]\n",
+    "    \n",
+    "    # Determine number of claims based on frequency (some customers have no claims)\n",
+    "\n",
+    "    if random.random() < 0.3:  # 30% of customers have no claims\n",
+    "\n",
+    "        num_claims = 0\n",
+    "\n",
+    "    else:\n",
+    "\n",
+    "        num_claims = max(1, int(random.gauss(params['frequency'], params['frequency'] * 0.5)))\n",
+    "        num_claims = min(num_claims, 12)  # Cap at 12 claims\n",
+    "    \n",
+    "    # Generate claims\n",
+    "\n",
+    "    for i in range(num_claims):\n",
+    "\n",
+    "        # Spread claims over 12 months\n",
+    "\n",
+    "        days_offset = random.randint(0, 365)\n",
+    "\n",
+    "        claim_date = base_date + timedelta(days=days_offset)\n",
+    "        \n",
+    "        # Sometimes use different policy types for the same customer\n",
+    "\n",
+    "        if random.random() < 0.2:\n",
+    "\n",
+    "            policy_type = random.choice(POLICY_TYPES)\n",
+    "\n",
+    "            params = CLAIM_PARAMS[policy_type]\n",
+    "\n",
+    "        else:\n",
+    "\n",
+    "            policy_type = primary_policy\n",
+    "        \n",
+    "        # Calculate claim amount with variation\n",
+    "\n",
+    "        amount_variation = random.uniform(0.1, 3.0)\n",
+    "\n",
+    "        claim_amount = round(params['avg_claim'] * amount_variation, 2)\n",
+    "        \n",
+    "        # Risk score (higher for larger/frequent claims)\n",
+    "\n",
+    "        base_risk = random.randint(20, 80)\n",
+    "\n",
+    "        risk_adjustment = min(20, int(claim_amount / 1000))  # Higher amounts increase risk\n",
+    "\n",
+    "        risk_score = min(100, base_risk + risk_adjustment)\n",
+    "        \n",
+    "        # Processing time (varies by claim type and some randomness)\n",
+    "\n",
+    "        time_variation = random.uniform(0.5, 2.0)\n",
+    "\n",
+    "        processing_time = max(1, int(params['processing_days'] * time_variation))\n",
+    "        \n",
+    "        # Claim status (most approved, some denied, few pending)\n",
+    "\n",
+    "        status_weights = [0.75, 0.15, 0.10]  # Approved, Denied, Pending\n",
+    "\n",
+    "        claim_status = random.choices(CLAIM_STATUSES, weights=status_weights)[0]\n",
+    "        \n",
+    "        claims_data.append({\n",
+    "\n",
+    "            \"customer_id\": customer_id,\n",
+    "\n",
+    "            \"claim_date\": claim_date.date(),\n",
+    "\n",
+    "            \"policy_type\": policy_type,\n",
+    "\n",
+    "            \"claim_amount\": claim_amount,\n",
+    "\n",
+    "            \"risk_score\": risk_score,\n",
+    "\n",
+    "            \"processing_time\": processing_time,\n",
+    "\n",
+    "            \"claim_status\": claim_status\n",
+    "\n",
+    "        })\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(f\"Generated {len(claims_data)} insurance claims records\")\n",
+    "\n",
+    "print(\"Sample record:\", claims_data[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Insert Data Using PySpark\n",
+    "\n",
+    "### Data Insertion Strategy\n",
+    "\n",
+    "We'll use PySpark to:\n",
+    "\n",
+    "1. **Create DataFrame** from our generated data\n",
+    "2. **Insert into Delta table** with liquid clustering\n",
+    "3. **Verify the insertion** with a sample query\n",
+    "\n",
+    "### Why PySpark for Insertion?\n",
+    "\n",
+    "- **Distributed processing**: Handles large datasets efficiently\n",
+    "- **Type safety**: Ensures data integrity\n",
+    "- **Optimization**: Leverages Spark's query optimization\n",
+    "- **Liquid clustering**: Automatically applies clustering during insertion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataFrame Schema:\n",
+       "root\n",
+       " |-- claim_amount: double (nullable = true)\n",
+       " |-- claim_date: date (nullable = true)\n",
+       " |-- claim_status: string (nullable = true)\n",
+       " |-- customer_id: string (nullable = true)\n",
+       " |-- policy_type: string (nullable = true)\n",
+       " |-- processing_time: long (nullable = true)\n",
+       " |-- risk_score: long (nullable = true)\n",
+       "\n",
+       "\n",
+       "Sample Data:\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+------------+----------+------------+-----------+-----------+---------------+----------+\n",
+       "|claim_amount|claim_date|claim_status|customer_id|policy_type|processing_time|risk_score|\n",
+       "+------------+----------+------------+-----------+-----------+---------------+----------+\n",
+       "|     5562.56|2024-08-23|    Approved| CUST000001|       Home|             26|        35|\n",
+       "|     6011.12|2024-08-26|      Denied| CUST000001|     Health|             24|        76|\n",
+       "|    23118.44|2024-08-03|    Approved| CUST000002|       Home|             34|        55|\n",
+       "|     30107.2|2024-04-25|    Approved| CUST000003|       Life|             31|        47|\n",
+       "|     2186.86|2024-01-04|    Approved| CUST000004|     Health|              5|        54|\n",
+       "+------------+----------+------------+-----------+-----------+---------------+----------+\n",
+       "only showing top 5 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "Successfully inserted 15000 records into insurance.analytics.insurance_claims\n",
+       "Liquid clustering automatically optimized the data layout during insertion!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Insert data using PySpark DataFrame operations\n",
+    "\n",
+    "# Using fully qualified function references to avoid conflicts\n",
+    "\n",
+    "\n",
+    "# Create DataFrame from generated data\n",
+    "\n",
+    "df_claims = spark.createDataFrame(claims_data)\n",
+    "\n",
+    "\n",
+    "# Display schema and sample data\n",
+    "\n",
+    "print(\"DataFrame Schema:\")\n",
+    "\n",
+    "df_claims.printSchema()\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(\"\\nSample Data:\")\n",
+    "\n",
+    "df_claims.show(5)\n",
+    "\n",
+    "\n",
+    "# Insert data into Delta table with liquid clustering\n",
+    "\n",
+    "# The CLUSTER BY (customer_id, claim_date) will automatically optimize the data layout\n",
+    "\n",
+    "df_claims.write.mode(\"overwrite\").saveAsTable(\"insurance.analytics.insurance_claims\")\n",
+    "\n",
+    "\n",
+    "print(f\"\\nSuccessfully inserted {df_claims.count()} records into insurance.analytics.insurance_claims\")\n",
+    "\n",
+    "print(\"Liquid clustering automatically optimized the data layout during insertion!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Demonstrate Liquid Clustering Benefits\n",
+    "\n",
+    "### Query Performance Analysis\n",
+    "\n",
+    "Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:\n",
+    "\n",
+    "1. **Customer claims history** (clustered by customer_id)\n",
+    "2. **Time-based claims analysis** (clustered by claim_date)\n",
+    "3. **Combined customer + time queries** (optimal for our clustering)\n",
+    "\n",
+    "### Expected Performance Benefits\n",
+    "\n",
+    "With liquid clustering, these queries should be significantly faster because:\n",
+    "\n",
+    "- **Data locality**: Related records are physically grouped together\n",
+    "- **Reduced I/O**: Less data needs to be read from disk\n",
+    "- **Automatic optimization**: No manual tuning required"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Query 1: Customer Claims History ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+----------+-----------+------------+------------+\n",
+       "|customer_id|claim_date|policy_type|claim_amount|claim_status|\n",
+       "+-----------+----------+-----------+------------+------------+\n",
+       "| CUST000001|2024-08-26|     Health|     6011.12|      Denied|\n",
+       "| CUST000001|2024-08-23|       Home|     5562.56|    Approved|\n",
+       "+-----------+----------+-----------+------------+------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Records found: 2\n",
+       "\n",
+       "=== Query 2: Recent High-Value Claims ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+-----------+-----------+------------+----------+\n",
+       "|claim_date|customer_id|policy_type|claim_amount|risk_score|\n",
+       "+----------+-----------+-----------+------------+----------+\n",
+       "|2024-12-29| CUST005908|     Health|    74916.03|        96|\n",
+       "|2024-07-23| CUST007016|     Health|    74900.02|        52|\n",
+       "|2024-10-18| CUST001581|       Life|     74895.3|        80|\n",
+       "|2024-12-15| CUST004733|       Life|     74883.6|        97|\n",
+       "|2024-07-25| CUST002601|       Life|    74874.85|        57|\n",
+       "|2024-11-17| CUST005594|       Life|    74829.33|        57|\n",
+       "|2024-12-24| CUST005524|       Life|    74818.84|        82|\n",
+       "|2024-10-16| CUST001368|     Health|    74812.53|        68|\n",
+       "|2024-10-18| CUST005266|       Life|    74701.96|        88|\n",
+       "|2024-07-15| CUST001375|       Life|    74683.53|        71|\n",
+       "|2024-06-04| CUST007676|       Life|    74576.77|        45|\n",
+       "|2024-06-13| CUST004179|     Health|    74573.16|        55|\n",
+       "|2024-07-05| CUST005762|       Life|    74488.06|        91|\n",
+       "|2024-06-25| CUST005196|       Life|    74420.28|        69|\n",
+       "|2024-09-06| CUST005887|       Life|    74244.99|        94|\n",
+       "|2024-10-31| CUST005898|     Health|    74241.14|        67|\n",
+       "|2024-10-13| CUST004707|     Health|    74039.53|        81|\n",
+       "|2024-08-20| CUST006660|       Life|    74012.73|        66|\n",
+       "|2024-12-31| CUST003724|       Life|    73950.38|        64|\n",
+       "|2024-12-15| CUST003666|     Health|    73901.43|        80|\n",
+       "+----------+-----------+-----------+------------+----------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "High-value claims found: 3176\n",
+       "\n",
+       "=== Query 3: Customer Claims Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+----------+-----------+------------+----------+\n",
+       "|customer_id|claim_date|policy_type|claim_amount|risk_score|\n",
+       "+-----------+----------+-----------+------------+----------+\n",
+       "| CUST000001|2024-08-23|       Home|     5562.56|        35|\n",
+       "| CUST000001|2024-08-26|     Health|     6011.12|        76|\n",
+       "| CUST000002|2024-08-03|       Home|    23118.44|        55|\n",
+       "| CUST000003|2024-04-25|       Life|     30107.2|        47|\n",
+       "| CUST000004|2024-06-17|     Health|     3418.76|        27|\n",
+       "| CUST000004|2024-07-13|   Property|      5252.1|        78|\n",
+       "| CUST000004|2024-09-21|     Health|     4055.38|        34|\n",
+       "| CUST000004|2024-12-21|     Health|     3026.38|        73|\n",
+       "| CUST000012|2024-07-10|       Auto|     5113.28|        45|\n",
+       "| CUST000014|2024-05-20|       Auto|     5076.39|        82|\n",
+       "| CUST000014|2024-07-18|       Auto|     7187.73|        30|\n",
+       "| CUST000015|2024-04-24|   Property|    10582.94|        78|\n",
+       "| CUST000015|2024-07-18|   Property|    20606.38|        77|\n",
+       "| CUST000016|2024-12-09|     Health|      734.86|        41|\n",
+       "| CUST000017|2024-04-17|     Health|     2787.38|        58|\n",
+       "| CUST000017|2024-05-31|     Health|     3050.43|        69|\n",
+       "| CUST000017|2024-06-12|     Health|     2451.17|        32|\n",
+       "| CUST000017|2024-09-07|     Health|     1164.47|        67|\n",
+       "| CUST000017|2024-10-15|     Health|     2573.82|        80|\n",
+       "| CUST000017|2024-11-23|     Health|      1507.4|        44|\n",
+       "+-----------+----------+-----------+------------+----------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Claims trend records found: 1340\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Demonstrate liquid clustering benefits with optimized queries\n",
+    "\n",
+    "\n",
+    "# Query 1: Customer claims history - benefits from customer_id clustering\n",
+    "\n",
+    "print(\"=== Query 1: Customer Claims History ===\")\n",
+    "\n",
+    "customer_history = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT customer_id, claim_date, policy_type, claim_amount, claim_status\n",
+    "\n",
+    "FROM insurance.analytics.insurance_claims\n",
+    "\n",
+    "WHERE customer_id = 'CUST000001'\n",
+    "\n",
+    "ORDER BY claim_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "customer_history.show()\n",
+    "\n",
+    "print(f\"Records found: {customer_history.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 2: Time-based high-value claims analysis - benefits from claim_date clustering\n",
+    "\n",
+    "print(\"\\n=== Query 2: Recent High-Value Claims ===\")\n",
+    "\n",
+    "high_value_claims = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT claim_date, customer_id, policy_type, claim_amount, risk_score\n",
+    "\n",
+    "FROM insurance.analytics.insurance_claims\n",
+    "\n",
+    "WHERE claim_date >= '2024-06-01' AND claim_amount > 10000\n",
+    "\n",
+    "ORDER BY claim_amount DESC, claim_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "high_value_claims.show()\n",
+    "\n",
+    "print(f\"High-value claims found: {high_value_claims.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 3: Combined customer + time query - optimal for our clustering strategy\n",
+    "\n",
+    "print(\"\\n=== Query 3: Customer Claims Trends ===\")\n",
+    "\n",
+    "claims_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT customer_id, claim_date, policy_type, claim_amount, risk_score\n",
+    "\n",
+    "FROM insurance.analytics.insurance_claims\n",
+    "\n",
+    "WHERE customer_id LIKE 'CUST000%' AND claim_date >= '2024-04-01'\n",
+    "\n",
+    "ORDER BY customer_id, claim_date\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "claims_trends.show()\n",
+    "\n",
+    "print(f\"Claims trend records found: {claims_trends.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Analyze Clustering Effectiveness\n",
+    "\n",
+    "### Understanding the Impact\n",
+    "\n",
+    "Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the insurance insights possible with this optimized structure.\n",
+    "\n",
+    "### Key Analytics\n",
+    "\n",
+    "- **Customer risk profiling** and claims frequency analysis\n",
+    "- **Policy performance** and loss ratio calculations\n",
+    "- **Claims processing efficiency** and operational metrics\n",
+    "- **Fraud detection patterns** and risk scoring effectiveness"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Customer Risk Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+------------+-------------+----------------+--------------+-------------------+\n",
+       "|customer_id|total_claims|total_claimed|avg_claim_amount|avg_risk_score|avg_processing_days|\n",
+       "+-----------+------------+-------------+----------------+--------------+-------------------+\n",
+       "| CUST007870|          12|    436160.35|         36346.7|          55.5|               38.5|\n",
+       "| CUST002884|          12|     430604.8|        35883.73|         54.83|              33.58|\n",
+       "| CUST002783|          10|    424762.01|         42476.2|          60.5|               29.1|\n",
+       "| CUST006960|          11|    418878.93|         38079.9|         69.91|              48.64|\n",
+       "| CUST001729|          11|    412611.37|        37510.12|         67.55|              37.18|\n",
+       "| CUST000883|          12|    395490.98|        32957.58|         60.58|              34.75|\n",
+       "| CUST004078|          12|    395238.21|        32936.52|         76.92|              27.42|\n",
+       "| CUST003279|          12|    389299.34|        32441.61|         69.92|              38.25|\n",
+       "| CUST001321|          12|    386399.15|        32199.93|         67.25|              32.17|\n",
+       "| CUST004110|          12|    373686.71|        31140.56|         69.42|              31.33|\n",
+       "+-----------+------------+-------------+----------------+--------------+-------------------+\n",
+       "\n",
+       "\n",
+       "=== Policy Type Performance ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+------------+-------------+----------------+-------------------+------------------+\n",
+       "|policy_type|total_claims| total_payout|avg_claim_amount|avg_processing_days|affected_customers|\n",
+       "+-----------+------------+-------------+----------------+-------------------+------------------+\n",
+       "|     Health|        7136|6.025945934E7|         8444.43|               14.5|              1341|\n",
+       "|       Life|        1473|5.693053767E7|        38649.38|              37.63|              1434|\n",
+       "|   Property|        1733|3.911063925E7|        22568.17|              22.22|              1445|\n",
+       "|       Auto|        3150|2.241207207E7|         7114.94|              17.87|              1541|\n",
+       "|       Home|        1508|1.988170738E7|        13184.16|              25.84|              1440|\n",
+       "+-----------+------------+-------------+----------------+-------------------+------------------+\n",
+       "\n",
+       "\n",
+       "=== Claims Processing Efficiency ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------------------+-----------+--------+--------------+\n",
+       "| processing_category|claim_count|avg_days|  total_amount|\n",
+       "+--------------------+-----------+--------+--------------+\n",
+       "|     Fast (1-7 days)|       2110|    5.35|    4533004.37|\n",
+       "|  Normal (8-14 days)|       4624|   10.85| 2.668768244E7|\n",
+       "|   Slow (15-21 days)|       2623|    18.0| 4.000567373E7|\n",
+       "|Very Slow (22+ days)|       5643|    32.6|1.2736805517E8|\n",
+       "+--------------------+-----------+--------+--------------+\n",
+       "\n",
+       "\n",
+       "=== Monthly Claims Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------+------------+--------------+----------------+--------------+----------------+\n",
+       "|  month|total_claims|monthly_payout|avg_claim_amount|avg_risk_score|unique_claimants|\n",
+       "+-------+------------+--------------+----------------+--------------+----------------+\n",
+       "|2024-01|        1303| 1.700282262E7|        13048.98|         58.71|            1077|\n",
+       "|2024-02|        1159| 1.530272761E7|        13203.39|         58.93|             966|\n",
+       "|2024-03|        1290| 1.739153802E7|        13481.81|         58.84|            1074|\n",
+       "|2024-04|        1206|  1.61826177E7|        13418.42|         59.45|            1007|\n",
+       "|2024-05|        1324|  1.80161403E7|        13607.36|         59.54|            1086|\n",
+       "|2024-06|        1220|  1.56975657E7|        12866.86|          58.2|            1008|\n",
+       "|2024-07|        1309| 1.737609287E7|        13274.33|         57.87|            1081|\n",
+       "|2024-08|        1222| 1.543843201E7|        12633.74|          57.6|            1029|\n",
+       "|2024-09|        1292| 1.686660742E7|        13054.65|         58.56|            1068|\n",
+       "|2024-10|        1219| 1.671987222E7|        13716.06|         58.46|            1012|\n",
+       "|2024-11|        1173| 1.568186196E7|        13369.02|         59.22|             959|\n",
+       "|2024-12|        1283| 1.691813728E7|        13186.39|         58.29|            1047|\n",
+       "+-------+------------+--------------+----------------+--------------+----------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Analyze clustering effectiveness and insurance insights\n",
+    "\n",
+    "\n",
+    "# Customer risk analysis\n",
+    "\n",
+    "print(\"=== Customer Risk Analysis ===\")\n",
+    "\n",
+    "customer_risk = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT customer_id, COUNT(*) as total_claims,\n",
+    "\n",
+    "       ROUND(SUM(claim_amount), 2) as total_claimed,\n",
+    "\n",
+    "       ROUND(AVG(claim_amount), 2) as avg_claim_amount,\n",
+    "\n",
+    "       ROUND(AVG(risk_score), 2) as avg_risk_score,\n",
+    "\n",
+    "       ROUND(AVG(processing_time), 2) as avg_processing_days\n",
+    "\n",
+    "FROM insurance.analytics.insurance_claims\n",
+    "\n",
+    "GROUP BY customer_id\n",
+    "\n",
+    "ORDER BY total_claimed DESC\n",
+    "\n",
+    "LIMIT 10\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "customer_risk.show()\n",
+    "\n",
+    "\n",
+    "# Policy type performance\n",
+    "\n",
+    "print(\"\\n=== Policy Type Performance ===\")\n",
+    "\n",
+    "policy_performance = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT policy_type, COUNT(*) as total_claims,\n",
+    "\n",
+    "       ROUND(SUM(claim_amount), 2) as total_payout,\n",
+    "\n",
+    "       ROUND(AVG(claim_amount), 2) as avg_claim_amount,\n",
+    "\n",
+    "       ROUND(AVG(processing_time), 2) as avg_processing_days,\n",
+    "\n",
+    "       COUNT(DISTINCT customer_id) as affected_customers\n",
+    "\n",
+    "FROM insurance.analytics.insurance_claims\n",
+    "\n",
+    "GROUP BY policy_type\n",
+    "\n",
+    "ORDER BY total_payout DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "policy_performance.show()\n",
+    "\n",
+    "\n",
+    "# Claims processing efficiency\n",
+    "\n",
+    "print(\"\\n=== Claims Processing Efficiency ===\")\n",
+    "\n",
+    "processing_efficiency = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT \n",
+    "\n",
+    "    CASE \n",
+    "\n",
+    "        WHEN processing_time <= 7 THEN 'Fast (1-7 days)'\n",
+    "\n",
+    "        WHEN processing_time <= 14 THEN 'Normal (8-14 days)'\n",
+    "\n",
+    "        WHEN processing_time <= 21 THEN 'Slow (15-21 days)'\n",
+    "\n",
+    "        ELSE 'Very Slow (22+ days)'\n",
+    "\n",
+    "    END as processing_category,\n",
+    "\n",
+    "    COUNT(*) as claim_count,\n",
+    "\n",
+    "    ROUND(AVG(processing_time), 2) as avg_days,\n",
+    "\n",
+    "    ROUND(SUM(claim_amount), 2) as total_amount\n",
+    "\n",
+    "FROM insurance.analytics.insurance_claims\n",
+    "\n",
+    "GROUP BY \n",
+    "\n",
+    "    CASE \n",
+    "\n",
+    "        WHEN processing_time <= 7 THEN 'Fast (1-7 days)'\n",
+    "\n",
+    "        WHEN processing_time <= 14 THEN 'Normal (8-14 days)'\n",
+    "\n",
+    "        WHEN processing_time <= 21 THEN 'Slow (15-21 days)'\n",
+    "\n",
+    "        ELSE 'Very Slow (22+ days)'\n",
+    "\n",
+    "    END\n",
+    "\n",
+    "ORDER BY avg_days\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "processing_efficiency.show()\n",
+    "\n",
+    "\n",
+    "# Monthly claims trends\n",
+    "\n",
+    "print(\"\\n=== Monthly Claims Trends ===\")\n",
+    "\n",
+    "monthly_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT DATE_FORMAT(claim_date, 'yyyy-MM') as month,\n",
+    "\n",
+    "       COUNT(*) as total_claims,\n",
+    "\n",
+    "       ROUND(SUM(claim_amount), 2) as monthly_payout,\n",
+    "\n",
+    "       ROUND(AVG(claim_amount), 2) as avg_claim_amount,\n",
+    "\n",
+    "       ROUND(AVG(risk_score), 2) as avg_risk_score,\n",
+    "\n",
+    "       COUNT(DISTINCT customer_id) as unique_claimants\n",
+    "\n",
+    "FROM insurance.analytics.insurance_claims\n",
+    "\n",
+    "GROUP BY DATE_FORMAT(claim_date, 'yyyy-MM')\n",
+    "\n",
+    "ORDER BY month\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "monthly_trends.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways: Delta Liquid Clustering in AIDP\n",
+    "\n",
+    "### What We Demonstrated\n",
+    "\n",
+    "1. **Automatic Optimization**: Created a table with `CLUSTER BY (customer_id, claim_date)` and let Delta automatically optimize data layout\n",
+    "\n",
+    "2. **Performance Benefits**: Queries on clustered columns (customer_id, claim_date) are significantly faster due to data locality\n",
+    "\n",
+    "3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically\n",
+    "\n",
+    "4. **Real-World Use Case**: Insurance analytics where claims processing and risk assessment are critical\n",
+    "\n",
+    "### AIDP Advantages\n",
+    "\n",
+    "- **Unified Analytics**: Seamlessly integrates with other AIDP services\n",
+    "- **Governance**: Catalog and schema isolation for insurance data\n",
+    "- **Performance**: Optimized for both OLAP and OLTP workloads\n",
+    "- **Scalability**: Handles insurance-scale data volumes effortlessly\n",
+    "\n",
+    "### Best Practices for Liquid Clustering\n",
+    "\n",
+    "1. **Choose clustering columns** based on your most common query patterns\n",
+    "2. **Start with 1-4 columns** - too many can reduce effectiveness\n",
+    "3. **Consider cardinality** - high-cardinality columns work best\n",
+    "4. **Monitor and adjust** as query patterns evolve\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Explore other AIDP features like AI/ML integration\n",
+    "- Try liquid clustering with different column combinations\n",
+    "- Scale up to larger insurance datasets\n",
+    "- Integrate with real claims processing systems\n",
+    "\n",
+    "This notebook demonstrates how Oracle AI Data Platform makes advanced insurance analytics accessible while maintaining enterprise-grade performance and governance."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Notebooks/liquid_clustering/manufacturing_delta_liquid_clustering_demo.ipynb b/Notebooks/liquid_clustering/manufacturing_delta_liquid_clustering_demo.ipynb
new file mode 100644
index 0000000..b001f0f
--- /dev/null
+++ b/Notebooks/liquid_clustering/manufacturing_delta_liquid_clustering_demo.ipynb
@@ -0,0 +1,989 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Manufacturing: Delta Liquid Clustering Demo\n",
+    "\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "\n",
+    "This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using a manufacturing analytics use case. Liquid clustering automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.\n",
+    "\n",
+    "### What is Liquid Clustering?\n",
+    "\n",
+    "Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:\n",
+    "\n",
+    "- **Automatic optimization**: No manual tuning required\n",
+    "- **Improved query performance**: Faster queries on clustered columns\n",
+    "- **Reduced maintenance**: No need for manual repartitioning\n",
+    "- **Adaptive clustering**: Adjusts as data patterns change\n",
+    "\n",
+    "### Use Case: Production Quality Control and Equipment Monitoring\n",
+    "\n",
+    "We'll analyze manufacturing production records from a factory. Our clustering strategy will optimize for:\n",
+    "\n",
+    "- **Equipment-specific queries**: Fast lookups by machine ID\n",
+    "- **Time-based analysis**: Efficient filtering by production date\n",
+    "- **Quality control patterns**: Quick aggregation by product type and defect rates\n",
+    "\n",
+    "### AIDP Environment Setup\n",
+    "\n",
+    "This notebook leverages the existing Spark session in your AIDP environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Manufacturing catalog and analytics schema created successfully!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create manufacturing catalog and analytics schema\n",
+    "\n",
+    "# In AIDP, catalogs provide data isolation and governance\n",
+    "\n",
+    "spark.sql(\"CREATE CATALOG IF NOT EXISTS manufacturing\")\n",
+    "\n",
+    "spark.sql(\"CREATE SCHEMA IF NOT EXISTS manufacturing.analytics\")\n",
+    "\n",
+    "print(\"Manufacturing catalog and analytics schema created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create Delta Table with Liquid Clustering\n",
+    "\n",
+    "### Table Design\n",
+    "\n",
+    "Our `production_records` table will store:\n",
+    "\n",
+    "- **machine_id**: Unique equipment identifier\n",
+    "- **production_date**: Date and time of production\n",
+    "- **product_type**: Type of product manufactured\n",
+    "- **units_produced**: Number of units produced\n",
+    "- **defect_count**: Number of defective units\n",
+    "- **production_line**: Assembly line identifier\n",
+    "- **cycle_time**: Time to produce one unit (minutes)\n",
+    "\n",
+    "### Clustering Strategy\n",
+    "\n",
+    "We'll cluster by `machine_id` and `production_date` because:\n",
+    "\n",
+    "- **machine_id**: Equipment often produces multiple batches, grouping maintenance and performance data together\n",
+    "- **production_date**: Time-based queries are essential for shift analysis, maintenance scheduling, and quality trending\n",
+    "- This combination optimizes for both equipment monitoring and temporal production analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Delta table with liquid clustering created successfully!\n",
+       "Clustering will automatically optimize data layout for queries on machine_id and production_date.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create Delta table with liquid clustering\n",
+    "\n",
+    "# CLUSTER BY defines the columns for automatic optimization\n",
+    "\n",
+    "spark.sql(\"\"\"\n",
+    "\n",
+    "CREATE TABLE IF NOT EXISTS manufacturing.analytics.production_records (\n",
+    "\n",
+    "    machine_id STRING,\n",
+    "\n",
+    "    production_date TIMESTAMP,\n",
+    "\n",
+    "    product_type STRING,\n",
+    "\n",
+    "    units_produced INT,\n",
+    "\n",
+    "    defect_count INT,\n",
+    "\n",
+    "    production_line STRING,\n",
+    "\n",
+    "    cycle_time DECIMAL(5,2)\n",
+    "\n",
+    ")\n",
+    "\n",
+    "USING DELTA\n",
+    "\n",
+    "CLUSTER BY (machine_id, production_date)\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"Delta table with liquid clustering created successfully!\")\n",
+    "\n",
+    "print(\"Clustering will automatically optimize data layout for queries on machine_id and production_date.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Generate Manufacturing Sample Data\n",
+    "\n",
+    "### Data Generation Strategy\n",
+    "\n",
+    "We'll create realistic manufacturing production data including:\n",
+    "\n",
+    "- **200 machines** with multiple production runs over time\n",
+    "- **Product types**: Electronics, Automotive Parts, Consumer Goods, Industrial Equipment\n",
+    "- **Realistic production patterns**: Shift-based operations, maintenance downtime, quality variations\n",
+    "- **Multiple production lines**: Different assembly areas and facilities\n",
+    "\n",
+    "### Why This Data Pattern?\n",
+    "\n",
+    "This data simulates real manufacturing scenarios where:\n",
+    "\n",
+    "- Equipment performance varies over time\n",
+    "- Quality control requires tracking defects and yields\n",
+    "- Maintenance scheduling depends on usage patterns\n",
+    "- Production optimization drives efficiency improvements\n",
+    "- Supply chain visibility requires real-time production data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Generated 12298 production records\n",
+       "Sample record: {'machine_id': 'MCH0001', 'production_date': datetime.datetime(2024, 9, 6, 6, 0), 'product_type': 'Industrial Equipment', 'units_produced': 36, 'defect_count': 2, 'production_line': 'LINE_A', 'cycle_time': 22.15}\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Generate sample manufacturing production data\n",
+    "\n",
+    "# Using fully qualified imports to avoid conflicts\n",
+    "\n",
+    "import random\n",
+    "\n",
+    "from datetime import datetime, timedelta\n",
+    "\n",
+    "\n",
+    "# Define manufacturing data constants\n",
+    "\n",
+    "PRODUCT_TYPES = ['Electronics', 'Automotive Parts', 'Consumer Goods', 'Industrial Equipment']\n",
+    "\n",
+    "PRODUCTION_LINES = ['LINE_A', 'LINE_B', 'LINE_C', 'LINE_D', 'LINE_E']\n",
+    "\n",
+    "# Base production parameters by product type\n",
+    "\n",
+    "PRODUCTION_PARAMS = {\n",
+    "\n",
+    "    'Electronics': {'base_units': 500, 'defect_rate': 0.02, 'cycle_time': 2.5},\n",
+    "\n",
+    "    'Automotive Parts': {'base_units': 200, 'defect_rate': 0.05, 'cycle_time': 8.0},\n",
+    "\n",
+    "    'Consumer Goods': {'base_units': 800, 'defect_rate': 0.03, 'cycle_time': 1.8},\n",
+    "\n",
+    "    'Industrial Equipment': {'base_units': 50, 'defect_rate': 0.08, 'cycle_time': 25.0}\n",
+    "\n",
+    "}\n",
+    "\n",
+    "\n",
+    "# Generate production records\n",
+    "\n",
+    "production_data = []\n",
+    "\n",
+    "base_date = datetime(2024, 1, 1)\n",
+    "\n",
+    "\n",
+    "# Create 200 machines with 30-90 production runs each\n",
+    "\n",
+    "for machine_num in range(1, 201):\n",
+    "\n",
+    "    machine_id = f\"MCH{machine_num:04d}\"\n",
+    "    \n",
+    "    # Each machine gets 30-90 production runs over 12 months\n",
+    "\n",
+    "    num_runs = random.randint(30, 90)\n",
+    "    \n",
+    "    for i in range(num_runs):\n",
+    "\n",
+    "        # Spread production runs over 12 months (weekdays only, during shifts)\n",
+    "\n",
+    "        days_offset = random.randint(0, 365)\n",
+    "\n",
+    "        production_date = base_date + timedelta(days=days_offset)\n",
+    "        \n",
+    "        # Skip weekends\n",
+    "\n",
+    "        while production_date.weekday() >= 5:\n",
+    "\n",
+    "            production_date += timedelta(days=1)\n",
+    "        \n",
+    "        # Add shift timing (6 AM - 6 PM)\n",
+    "\n",
+    "        hours_offset = random.randint(6, 18)\n",
+    "\n",
+    "        production_date = production_date.replace(hour=hours_offset, minute=0, second=0, microsecond=0)\n",
+    "        \n",
+    "        # Select product type\n",
+    "\n",
+    "        product_type = random.choice(PRODUCT_TYPES)\n",
+    "\n",
+    "        params = PRODUCTION_PARAMS[product_type]\n",
+    "        \n",
+    "        # Calculate production with variability\n",
+    "\n",
+    "        units_variation = random.uniform(0.7, 1.3)\n",
+    "\n",
+    "        units_produced = int(params['base_units'] * units_variation)\n",
+    "        \n",
+    "        # Calculate defects\n",
+    "\n",
+    "        defect_rate_variation = random.uniform(0.5, 2.0)\n",
+    "\n",
+    "        actual_defect_rate = params['defect_rate'] * defect_rate_variation\n",
+    "\n",
+    "        defect_count = int(units_produced * actual_defect_rate)\n",
+    "        \n",
+    "        # Calculate cycle time with variation\n",
+    "\n",
+    "        cycle_time_variation = random.uniform(0.8, 1.4)\n",
+    "\n",
+    "        cycle_time = round(params['cycle_time'] * cycle_time_variation, 2)\n",
+    "        \n",
+    "        # Select production line\n",
+    "\n",
+    "        production_line = random.choice(PRODUCTION_LINES)\n",
+    "        \n",
+    "        production_data.append({\n",
+    "\n",
+    "            \"machine_id\": machine_id,\n",
+    "\n",
+    "            \"production_date\": production_date,\n",
+    "\n",
+    "            \"product_type\": product_type,\n",
+    "\n",
+    "            \"units_produced\": units_produced,\n",
+    "\n",
+    "            \"defect_count\": defect_count,\n",
+    "\n",
+    "            \"production_line\": production_line,\n",
+    "\n",
+    "            \"cycle_time\": cycle_time\n",
+    "\n",
+    "        })\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(f\"Generated {len(production_data)} production records\")\n",
+    "\n",
+    "print(\"Sample record:\", production_data[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Insert Data Using PySpark\n",
+    "\n",
+    "### Data Insertion Strategy\n",
+    "\n",
+    "We'll use PySpark to:\n",
+    "\n",
+    "1. **Create DataFrame** from our generated data\n",
+    "2. **Insert into Delta table** with liquid clustering\n",
+    "3. **Verify the insertion** with a sample query\n",
+    "\n",
+    "### Why PySpark for Insertion?\n",
+    "\n",
+    "- **Distributed processing**: Handles large datasets efficiently\n",
+    "- **Type safety**: Ensures data integrity\n",
+    "- **Optimization**: Leverages Spark's query optimization\n",
+    "- **Liquid clustering**: Automatically applies clustering during insertion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataFrame Schema:\n",
+       "root\n",
+       " |-- cycle_time: double (nullable = true)\n",
+       " |-- defect_count: long (nullable = true)\n",
+       " |-- machine_id: string (nullable = true)\n",
+       " |-- product_type: string (nullable = true)\n",
+       " |-- production_date: timestamp (nullable = true)\n",
+       " |-- production_line: string (nullable = true)\n",
+       " |-- units_produced: long (nullable = true)\n",
+       "\n",
+       "\n",
+       "Sample Data:\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+------------+----------+--------------------+-------------------+---------------+--------------+\n",
+       "|cycle_time|defect_count|machine_id|        product_type|    production_date|production_line|units_produced|\n",
+       "+----------+------------+----------+--------------------+-------------------+---------------+--------------+\n",
+       "|     22.15|           2|   MCH0001|Industrial Equipment|2024-09-06 06:00:00|         LINE_A|            36|\n",
+       "|      1.75|          30|   MCH0001|      Consumer Goods|2024-03-26 11:00:00|         LINE_B|          1034|\n",
+       "|      9.09|           8|   MCH0001|    Automotive Parts|2024-12-30 17:00:00|         LINE_B|           259|\n",
+       "|      2.65|          25|   MCH0001|         Electronics|2024-10-21 06:00:00|         LINE_B|           641|\n",
+       "|      2.11|           9|   MCH0001|         Electronics|2024-05-13 18:00:00|         LINE_A|           437|\n",
+       "+----------+------------+----------+--------------------+-------------------+---------------+--------------+\n",
+       "only showing top 5 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "Successfully inserted 12298 records into manufacturing.analytics.production_records\n",
+       "Liquid clustering automatically optimized the data layout during insertion!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Insert data using PySpark DataFrame operations\n",
+    "\n",
+    "# Using fully qualified function references to avoid conflicts\n",
+    "\n",
+    "\n",
+    "# Create DataFrame from generated data\n",
+    "\n",
+    "df_production = spark.createDataFrame(production_data)\n",
+    "\n",
+    "\n",
+    "# Display schema and sample data\n",
+    "\n",
+    "print(\"DataFrame Schema:\")\n",
+    "\n",
+    "df_production.printSchema()\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(\"\\nSample Data:\")\n",
+    "\n",
+    "df_production.show(5)\n",
+    "\n",
+    "\n",
+    "# Insert data into Delta table with liquid clustering\n",
+    "\n",
+    "# The CLUSTER BY (machine_id, production_date) will automatically optimize the data layout\n",
+    "\n",
+    "df_production.write.mode(\"overwrite\").saveAsTable(\"manufacturing.analytics.production_records\")\n",
+    "\n",
+    "\n",
+    "print(f\"\\nSuccessfully inserted {df_production.count()} records into manufacturing.analytics.production_records\")\n",
+    "\n",
+    "print(\"Liquid clustering automatically optimized the data layout during insertion!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Demonstrate Liquid Clustering Benefits\n",
+    "\n",
+    "### Query Performance Analysis\n",
+    "\n",
+    "Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:\n",
+    "\n",
+    "1. **Machine performance history** (clustered by machine_id)\n",
+    "2. **Time-based production analysis** (clustered by production_date)\n",
+    "3. **Combined machine + time queries** (optimal for our clustering)\n",
+    "\n",
+    "### Expected Performance Benefits\n",
+    "\n",
+    "With liquid clustering, these queries should be significantly faster because:\n",
+    "\n",
+    "- **Data locality**: Related records are physically grouped together\n",
+    "- **Reduced I/O**: Less data needs to be read from disk\n",
+    "- **Automatic optimization**: No manual tuning required"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Query 1: Machine Performance History ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+-------------------+--------------------+--------------+------------+-------------------+\n",
+       "|machine_id|    production_date|        product_type|units_produced|defect_count|defect_rate_percent|\n",
+       "+----------+-------------------+--------------------+--------------+------------+-------------------+\n",
+       "|   MCH0001|2024-12-30 17:00:00|    Automotive Parts|           259|           8|               3.09|\n",
+       "|   MCH0001|2024-12-27 08:00:00|Industrial Equipment|            58|           5|               8.62|\n",
+       "|   MCH0001|2024-12-23 08:00:00|Industrial Equipment|            45|           5|              11.11|\n",
+       "|   MCH0001|2024-12-17 08:00:00|         Electronics|           647|          22|               3.40|\n",
+       "|   MCH0001|2024-12-11 14:00:00|Industrial Equipment|            48|           5|              10.42|\n",
+       "|   MCH0001|2024-12-02 18:00:00|    Automotive Parts|           175|          14|               8.00|\n",
+       "|   MCH0001|2024-12-02 08:00:00|    Automotive Parts|           184|           4|               2.17|\n",
+       "|   MCH0001|2024-11-22 16:00:00|      Consumer Goods|           704|          30|               4.26|\n",
+       "|   MCH0001|2024-11-12 18:00:00|Industrial Equipment|            62|           8|              12.90|\n",
+       "|   MCH0001|2024-11-11 17:00:00|      Consumer Goods|           990|          23|               2.32|\n",
+       "|   MCH0001|2024-11-08 08:00:00|Industrial Equipment|            41|           3|               7.32|\n",
+       "|   MCH0001|2024-10-25 11:00:00|    Automotive Parts|           183|          11|               6.01|\n",
+       "|   MCH0001|2024-10-24 06:00:00|    Automotive Parts|           191|          11|               5.76|\n",
+       "|   MCH0001|2024-10-21 06:00:00|         Electronics|           641|          25|               3.90|\n",
+       "|   MCH0001|2024-10-21 06:00:00|      Consumer Goods|           826|          23|               2.78|\n",
+       "|   MCH0001|2024-10-16 15:00:00|Industrial Equipment|            52|           6|              11.54|\n",
+       "|   MCH0001|2024-10-14 14:00:00|      Consumer Goods|           974|          16|               1.64|\n",
+       "|   MCH0001|2024-10-07 18:00:00|         Electronics|           451|           7|               1.55|\n",
+       "|   MCH0001|2024-10-01 10:00:00|Industrial Equipment|            52|           3|               5.77|\n",
+       "|   MCH0001|2024-09-19 07:00:00|      Consumer Goods|           654|          35|               5.35|\n",
+       "+----------+-------------------+--------------------+--------------+------------+-------------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Records found: 64\n",
+       "\n",
+       "=== Query 2: Recent Quality Issues ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------------+----------+--------------------+--------------+------------+-------------------+\n",
+       "|    production_date|machine_id|        product_type|units_produced|defect_count|defect_rate_percent|\n",
+       "+-------------------+----------+--------------------+--------------+------------+-------------------+\n",
+       "|2024-07-26 06:00:00|   MCH0183|Industrial Equipment|            44|           7|              15.91|\n",
+       "|2024-11-04 15:00:00|   MCH0135|Industrial Equipment|            63|          10|              15.87|\n",
+       "|2024-12-13 16:00:00|   MCH0169|Industrial Equipment|            51|           8|              15.69|\n",
+       "|2024-10-25 11:00:00|   MCH0023|Industrial Equipment|            51|           8|              15.69|\n",
+       "|2024-09-04 08:00:00|   MCH0086|Industrial Equipment|            51|           8|              15.69|\n",
+       "|2024-09-03 16:00:00|   MCH0001|Industrial Equipment|            64|          10|              15.63|\n",
+       "|2024-11-28 12:00:00|   MCH0099|Industrial Equipment|            45|           7|              15.56|\n",
+       "|2024-12-26 08:00:00|   MCH0148|Industrial Equipment|            58|           9|              15.52|\n",
+       "|2024-11-08 08:00:00|   MCH0116|Industrial Equipment|            58|           9|              15.52|\n",
+       "|2024-08-02 12:00:00|   MCH0134|Industrial Equipment|            58|           9|              15.52|\n",
+       "|2024-12-19 11:00:00|   MCH0073|Industrial Equipment|            39|           6|              15.38|\n",
+       "|2024-11-11 13:00:00|   MCH0158|Industrial Equipment|            52|           8|              15.38|\n",
+       "|2024-06-11 18:00:00|   MCH0119|Industrial Equipment|            52|           8|              15.38|\n",
+       "|2024-12-25 10:00:00|   MCH0106|Industrial Equipment|            59|           9|              15.25|\n",
+       "|2024-11-27 18:00:00|   MCH0182|Industrial Equipment|            59|           9|              15.25|\n",
+       "|2024-11-04 12:00:00|   MCH0063|Industrial Equipment|            59|           9|              15.25|\n",
+       "|2024-10-31 06:00:00|   MCH0071|Industrial Equipment|            59|           9|              15.25|\n",
+       "|2024-08-30 07:00:00|   MCH0184|Industrial Equipment|            59|           9|              15.25|\n",
+       "|2024-08-26 16:00:00|   MCH0117|Industrial Equipment|            59|           9|              15.25|\n",
+       "|2024-08-02 08:00:00|   MCH0122|Industrial Equipment|            59|           9|              15.25|\n",
+       "+-------------------+----------+--------------------+--------------+------------+-------------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Quality issues found: 2950\n",
+       "\n",
+       "=== Query 3: Equipment Performance Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+-------------------+--------------------+--------------+----------+-----------+\n",
+       "|machine_id|    production_date|        product_type|units_produced|cycle_time|hourly_rate|\n",
+       "+----------+-------------------+--------------------+--------------+----------+-----------+\n",
+       "|   MCH0001|2024-04-01 13:00:00|    Automotive Parts|           204|     10.15|    1205.91|\n",
+       "|   MCH0001|2024-04-01 15:00:00|    Automotive Parts|           214|      9.77|    1314.23|\n",
+       "|   MCH0001|2024-04-11 09:00:00|         Electronics|           613|       3.3|   11145.45|\n",
+       "|   MCH0001|2024-04-29 09:00:00|Industrial Equipment|            49|     25.46|     115.48|\n",
+       "|   MCH0001|2024-04-30 09:00:00|    Automotive Parts|           209|       6.8|    1844.12|\n",
+       "|   MCH0001|2024-05-07 13:00:00|Industrial Equipment|            47|     33.01|      85.43|\n",
+       "|   MCH0001|2024-05-13 18:00:00|         Electronics|           437|      2.11|   12426.54|\n",
+       "|   MCH0001|2024-05-14 18:00:00|Industrial Equipment|            44|     30.47|      86.64|\n",
+       "|   MCH0001|2024-05-17 18:00:00|      Consumer Goods|           862|      1.84|    28108.7|\n",
+       "|   MCH0001|2024-05-20 16:00:00|      Consumer Goods|           767|      1.68|   27392.86|\n",
+       "|   MCH0001|2024-06-03 17:00:00|      Consumer Goods|           573|      1.61|   21354.04|\n",
+       "|   MCH0001|2024-06-07 18:00:00|    Automotive Parts|           240|      9.11|    1580.68|\n",
+       "|   MCH0001|2024-06-28 06:00:00|Industrial Equipment|            37|     34.71|      63.96|\n",
+       "|   MCH0001|2024-07-15 13:00:00|    Automotive Parts|           195|      6.67|    1754.12|\n",
+       "|   MCH0001|2024-07-15 18:00:00|      Consumer Goods|           883|       2.3|   23034.78|\n",
+       "|   MCH0001|2024-07-17 14:00:00|      Consumer Goods|           942|      2.22|   25459.46|\n",
+       "|   MCH0001|2024-08-08 07:00:00|Industrial Equipment|            35|      21.0|      100.0|\n",
+       "|   MCH0001|2024-08-20 08:00:00|         Electronics|           390|      3.18|    7358.49|\n",
+       "|   MCH0001|2024-08-26 08:00:00|         Electronics|           436|      2.38|    10991.6|\n",
+       "|   MCH0001|2024-08-29 06:00:00|    Automotive Parts|           248|      9.27|    1605.18|\n",
+       "+----------+-------------------+--------------------+--------------+----------+-----------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Performance records found: 382\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Demonstrate liquid clustering benefits with optimized queries\n",
+    "\n",
+    "\n",
+    "# Query 1: Machine performance history - benefits from machine_id clustering\n",
+    "\n",
+    "print(\"=== Query 1: Machine Performance History ===\")\n",
+    "\n",
+    "machine_history = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT machine_id, production_date, product_type, units_produced, defect_count,\n",
+    "\n",
+    "       ROUND(defect_count * 100.0 / units_produced, 2) as defect_rate_percent\n",
+    "\n",
+    "FROM manufacturing.analytics.production_records\n",
+    "\n",
+    "WHERE machine_id = 'MCH0001'\n",
+    "\n",
+    "ORDER BY production_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "machine_history.show()\n",
+    "\n",
+    "print(f\"Records found: {machine_history.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 2: Time-based quality analysis - benefits from production_date clustering\n",
+    "\n",
+    "print(\"\\n=== Query 2: Recent Quality Issues ===\")\n",
+    "\n",
+    "quality_issues = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT production_date, machine_id, product_type, units_produced, defect_count,\n",
+    "\n",
+    "       ROUND(defect_count * 100.0 / units_produced, 2) as defect_rate_percent\n",
+    "\n",
+    "FROM manufacturing.analytics.production_records\n",
+    "\n",
+    "WHERE production_date >= '2024-06-01' AND (defect_count * 100.0 / units_produced) > 5.0\n",
+    "\n",
+    "ORDER BY defect_rate_percent DESC, production_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "quality_issues.show()\n",
+    "\n",
+    "print(f\"Quality issues found: {quality_issues.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 3: Combined machine + time query - optimal for our clustering strategy\n",
+    "\n",
+    "print(\"\\n=== Query 3: Equipment Performance Trends ===\")\n",
+    "\n",
+    "performance_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT machine_id, production_date, product_type, units_produced, cycle_time,\n",
+    "\n",
+    "       ROUND(units_produced * 60.0 / cycle_time, 2) as hourly_rate\n",
+    "\n",
+    "FROM manufacturing.analytics.production_records\n",
+    "\n",
+    "WHERE machine_id LIKE 'MCH000%' AND production_date >= '2024-04-01'\n",
+    "\n",
+    "ORDER BY machine_id, production_date\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "performance_trends.show()\n",
+    "\n",
+    "print(f\"Performance records found: {performance_trends.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Analyze Clustering Effectiveness\n",
+    "\n",
+    "### Understanding the Impact\n",
+    "\n",
+    "Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the manufacturing insights possible with this optimized structure.\n",
+    "\n",
+    "### Key Analytics\n",
+    "\n",
+    "- **Equipment utilization** and performance metrics\n",
+    "- **Quality control analysis** and defect patterns\n",
+    "- **Production line efficiency** and bottleneck identification\n",
+    "- **Product type performance** and optimization opportunities"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Equipment Performance Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+----------+------------------+---------------+--------------+-----------+\n",
+       "|machine_id|total_runs|avg_units_produced|avg_defect_rate|avg_cycle_time|total_units|\n",
+       "+----------+----------+------------------+---------------+--------------+-----------+\n",
+       "|   MCH0163|        87|            437.47|           5.52|          9.06|      38060|\n",
+       "|   MCH0169|        83|            447.94|           5.18|         10.47|      37179|\n",
+       "|   MCH0006|        84|             432.4|           4.73|          8.64|      36322|\n",
+       "|   MCH0108|        88|            411.07|           5.20|          8.93|      36174|\n",
+       "|   MCH0153|        88|            409.22|           5.03|         10.13|      36011|\n",
+       "|   MCH0097|        90|            396.48|           5.16|         10.12|      35683|\n",
+       "|   MCH0101|        86|            402.84|           4.81|          9.15|      34644|\n",
+       "|   MCH0070|        86|            402.67|           4.90|          9.46|      34630|\n",
+       "|   MCH0044|        84|            410.06|           5.25|          9.96|      34445|\n",
+       "|   MCH0082|        87|            392.25|           5.09|         10.19|      34126|\n",
+       "|   MCH0068|        87|            391.46|           5.35|          9.89|      34057|\n",
+       "|   MCH0142|        85|            398.65|           5.05|          9.32|      33885|\n",
+       "|   MCH0149|        87|            388.13|           5.38|         10.38|      33767|\n",
+       "|   MCH0093|        82|            411.34|           5.27|          9.51|      33730|\n",
+       "|   MCH0157|        84|            398.89|           5.28|          9.92|      33507|\n",
+       "|   MCH0183|        81|            409.95|           5.37|          9.28|      33206|\n",
+       "|   MCH0144|        81|            405.86|           5.24|          9.76|      32875|\n",
+       "|   MCH0041|        90|             364.6|           5.24|         11.08|      32814|\n",
+       "|   MCH0118|        79|            413.46|           5.47|          9.43|      32663|\n",
+       "|   MCH0036|        83|            390.28|           5.44|         10.55|      32393|\n",
+       "+----------+----------+------------------+---------------+--------------+-----------+\n",
+       "only showing top 20 rows\n",
+       "\n",
+       "\n",
+       "=== Quality Analysis by Product Type ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------------------+---------------+-----------+-------------+---------------+--------------+\n",
+       "|        product_type|production_runs|total_units|total_defects|avg_defect_rate|avg_cycle_time|\n",
+       "+--------------------+---------------+-----------+-------------+---------------+--------------+\n",
+       "|      Consumer Goods|           2982|    2392057|        87950|           3.68|          1.98|\n",
+       "|         Electronics|           3172|    1580318|        37855|           2.39|          2.75|\n",
+       "|    Automotive Parts|           3091|     615989|        37171|           6.02|          8.82|\n",
+       "|Industrial Equipment|           3053|     151345|        13680|           8.99|         27.52|\n",
+       "+--------------------+---------------+-----------+-------------+---------------+--------------+\n",
+       "\n",
+       "\n",
+       "=== Production Line Efficiency ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+---------------+----------+-------------+----------------+------------+---------------+\n",
+       "|production_line|total_runs|machines_used|total_production|avg_run_size|avg_defect_rate|\n",
+       "+---------------+----------+-------------+----------------+------------+---------------+\n",
+       "|         LINE_E|      2478|          200|          964486|      389.22|           5.18|\n",
+       "|         LINE_C|      2442|          200|          959370|      392.86|           5.19|\n",
+       "|         LINE_D|      2464|          200|          944868|      383.47|           5.27|\n",
+       "|         LINE_B|      2473|          200|          944805|      382.05|           5.28|\n",
+       "|         LINE_A|      2441|          200|          926180|      379.43|           5.35|\n",
+       "+---------------+----------+-------------+----------------+------------+---------------+\n",
+       "\n",
+       "\n",
+       "=== Monthly Production Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------+---------------+-----------+---------------+---------------+\n",
+       "|  month|production_runs|total_units|avg_defect_rate|active_machines|\n",
+       "+-------+---------------+-----------+---------------+---------------+\n",
+       "|2024-01|           1045|     397326|           5.24|            198|\n",
+       "|2024-02|            914|     368483|           5.17|            195|\n",
+       "|2024-03|           1003|     383685|           5.41|            197|\n",
+       "|2024-04|           1074|     407309|           5.18|            199|\n",
+       "|2024-05|           1083|     413054|           5.38|            195|\n",
+       "|2024-06|            992|     375035|           5.28|            198|\n",
+       "|2024-07|           1138|     456635|           5.20|            197|\n",
+       "|2024-08|            930|     366966|           5.01|            195|\n",
+       "|2024-09|           1045|     391363|           5.28|            195|\n",
+       "|2024-10|           1015|     394063|           5.27|            198|\n",
+       "|2024-11|            946|     363835|           5.30|            192|\n",
+       "|2024-12|           1113|     421955|           5.30|            199|\n",
+       "+-------+---------------+-----------+---------------+---------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Analyze clustering effectiveness and manufacturing insights\n",
+    "\n",
+    "\n",
+    "# Equipment performance analysis\n",
+    "\n",
+    "print(\"=== Equipment Performance Analysis ===\")\n",
+    "\n",
+    "equipment_performance = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT machine_id, COUNT(*) as total_runs,\n",
+    "\n",
+    "       ROUND(AVG(units_produced), 2) as avg_units_produced,\n",
+    "\n",
+    "       ROUND(AVG(defect_count * 100.0 / units_produced), 2) as avg_defect_rate,\n",
+    "\n",
+    "       ROUND(AVG(cycle_time), 2) as avg_cycle_time,\n",
+    "\n",
+    "       ROUND(SUM(units_produced), 0) as total_units\n",
+    "\n",
+    "FROM manufacturing.analytics.production_records\n",
+    "\n",
+    "GROUP BY machine_id\n",
+    "\n",
+    "ORDER BY total_units DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "equipment_performance.show()\n",
+    "\n",
+    "\n",
+    "# Quality analysis by product type\n",
+    "\n",
+    "print(\"\\n=== Quality Analysis by Product Type ===\")\n",
+    "\n",
+    "quality_by_product = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT product_type, COUNT(*) as production_runs,\n",
+    "\n",
+    "       ROUND(SUM(units_produced), 0) as total_units,\n",
+    "\n",
+    "       ROUND(SUM(defect_count), 0) as total_defects,\n",
+    "\n",
+    "       ROUND(AVG(defect_count * 100.0 / units_produced), 2) as avg_defect_rate,\n",
+    "\n",
+    "       ROUND(AVG(cycle_time), 2) as avg_cycle_time\n",
+    "\n",
+    "FROM manufacturing.analytics.production_records\n",
+    "\n",
+    "GROUP BY product_type\n",
+    "\n",
+    "ORDER BY total_units DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "quality_by_product.show()\n",
+    "\n",
+    "\n",
+    "# Production line efficiency\n",
+    "\n",
+    "print(\"\\n=== Production Line Efficiency ===\")\n",
+    "\n",
+    "line_efficiency = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT production_line, COUNT(*) as total_runs,\n",
+    "\n",
+    "       COUNT(DISTINCT machine_id) as machines_used,\n",
+    "\n",
+    "       ROUND(SUM(units_produced), 0) as total_production,\n",
+    "\n",
+    "       ROUND(AVG(units_produced), 2) as avg_run_size,\n",
+    "\n",
+    "       ROUND(SUM(defect_count * 100.0 / units_produced) / COUNT(*), 2) as avg_defect_rate\n",
+    "\n",
+    "FROM manufacturing.analytics.production_records\n",
+    "\n",
+    "GROUP BY production_line\n",
+    "\n",
+    "ORDER BY total_production DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "line_efficiency.show()\n",
+    "\n",
+    "\n",
+    "# Monthly production trends\n",
+    "\n",
+    "print(\"\\n=== Monthly Production Trends ===\")\n",
+    "\n",
+    "monthly_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT DATE_FORMAT(production_date, 'yyyy-MM') as month,\n",
+    "\n",
+    "       COUNT(*) as production_runs,\n",
+    "\n",
+    "       ROUND(SUM(units_produced), 0) as total_units,\n",
+    "\n",
+    "       ROUND(AVG(defect_count * 100.0 / units_produced), 2) as avg_defect_rate,\n",
+    "\n",
+    "       COUNT(DISTINCT machine_id) as active_machines\n",
+    "\n",
+    "FROM manufacturing.analytics.production_records\n",
+    "\n",
+    "GROUP BY DATE_FORMAT(production_date, 'yyyy-MM')\n",
+    "\n",
+    "ORDER BY month\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "monthly_trends.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways: Delta Liquid Clustering in AIDP\n",
+    "\n",
+    "### What We Demonstrated\n",
+    "\n",
+    "1. **Automatic Optimization**: Created a table with `CLUSTER BY (machine_id, production_date)` and let Delta automatically optimize data layout\n",
+    "\n",
+    "2. **Performance Benefits**: Queries on clustered columns (machine_id, production_date) are significantly faster due to data locality\n",
+    "\n",
+    "3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically\n",
+    "\n",
+    "4. **Real-World Use Case**: Manufacturing analytics where equipment monitoring and quality control are critical\n",
+    "\n",
+    "### AIDP Advantages\n",
+    "\n",
+    "- **Unified Analytics**: Seamlessly integrates with other AIDP services\n",
+    "- **Governance**: Catalog and schema isolation for manufacturing data\n",
+    "- **Performance**: Optimized for both OLAP and OLTP workloads\n",
+    "- **Scalability**: Handles manufacturing-scale data volumes effortlessly\n",
+    "\n",
+    "### Best Practices for Liquid Clustering\n",
+    "\n",
+    "1. **Choose clustering columns** based on your most common query patterns\n",
+    "2. **Start with 1-4 columns** - too many can reduce effectiveness\n",
+    "3. **Consider cardinality** - high-cardinality columns work best\n",
+    "4. **Monitor and adjust** as query patterns evolve\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Explore other AIDP features like AI/ML integration\n",
+    "- Try liquid clustering with different column combinations\n",
+    "- Scale up to larger manufacturing datasets\n",
+    "- Integrate with real SCADA systems and IoT sensors\n",
+    "\n",
+    "This notebook demonstrates how Oracle AI Data Platform makes advanced manufacturing analytics accessible while maintaining enterprise-grade performance and governance."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Notebooks/liquid_clustering/media_delta_liquid_clustering_demo.ipynb b/Notebooks/liquid_clustering/media_delta_liquid_clustering_demo.ipynb
new file mode 100644
index 0000000..e191bcd
--- /dev/null
+++ b/Notebooks/liquid_clustering/media_delta_liquid_clustering_demo.ipynb
@@ -0,0 +1,1038 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Media: Delta Liquid Clustering Demo\n",
+    "\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "\n",
+    "This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using a media and entertainment analytics use case. Liquid clustering automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.\n",
+    "\n",
+    "### What is Liquid Clustering?\n",
+    "\n",
+    "Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:\n",
+    "\n",
+    "- **Automatic optimization**: No manual tuning required\n",
+    "- **Improved query performance**: Faster queries on clustered columns\n",
+    "- **Reduced maintenance**: No need for manual repartitioning\n",
+    "- **Adaptive clustering**: Adjusts as data patterns change\n",
+    "\n",
+    "### Use Case: Content Performance and User Engagement Analytics\n",
+    "\n",
+    "We'll analyze media content consumption and user engagement data. Our clustering strategy will optimize for:\n",
+    "\n",
+    "- **User-specific queries**: Fast lookups by user ID\n",
+    "- **Time-based analysis**: Efficient filtering by viewing and engagement dates\n",
+    "- **Content performance patterns**: Quick aggregation by content type and engagement metrics\n",
+    "\n",
+    "### AIDP Environment Setup\n",
+    "\n",
+    "This notebook leverages the existing Spark session in your AIDP environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Media catalog and analytics schema created successfully!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create media catalog and analytics schema\n",
+    "\n",
+    "# In AIDP, catalogs provide data isolation and governance\n",
+    "\n",
+    "spark.sql(\"CREATE CATALOG IF NOT EXISTS media\")\n",
+    "\n",
+    "spark.sql(\"CREATE SCHEMA IF NOT EXISTS media.analytics\")\n",
+    "\n",
+    "print(\"Media catalog and analytics schema created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create Delta Table with Liquid Clustering\n",
+    "\n",
+    "### Table Design\n",
+    "\n",
+    "Our `content_engagement` table will store:\n",
+    "\n",
+    "- **user_id**: Unique user identifier\n",
+    "- **engagement_date**: Date and time of engagement\n",
+    "- **content_type**: Type (Video, Article, Podcast, Live Stream)\n",
+    "- **watch_time**: Time spent consuming content (minutes)\n",
+    "- **content_id**: Specific content identifier\n",
+    "- **engagement_score**: User engagement metric (0-100)\n",
+    "- **device_type**: Device used (Mobile, Desktop, TV, etc.)\n",
+    "\n",
+    "### Clustering Strategy\n",
+    "\n",
+    "We'll cluster by `user_id` and `engagement_date` because:\n",
+    "\n",
+    "- **user_id**: Users consume multiple pieces of content, grouping their viewing history together\n",
+    "- **engagement_date**: Time-based queries are critical for content performance analysis, recommendation systems, and user behavior trends\n",
+    "- This combination optimizes for both personalized content recommendations and temporal engagement analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Delta table with liquid clustering created successfully!\n",
+       "Clustering will automatically optimize data layout for queries on user_id and engagement_date.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create Delta table with liquid clustering\n",
+    "\n",
+    "# CLUSTER BY defines the columns for automatic optimization\n",
+    "\n",
+    "spark.sql(\"\"\"\n",
+    "\n",
+    "CREATE TABLE IF NOT EXISTS media.analytics.content_engagement (\n",
+    "\n",
+    "    user_id STRING,\n",
+    "\n",
+    "    engagement_date TIMESTAMP,\n",
+    "\n",
+    "    content_type STRING,\n",
+    "\n",
+    "    watch_time DECIMAL(8,2),\n",
+    "\n",
+    "    content_id STRING,\n",
+    "\n",
+    "    engagement_score INT,\n",
+    "\n",
+    "    device_type STRING\n",
+    "\n",
+    ")\n",
+    "\n",
+    "USING DELTA\n",
+    "\n",
+    "CLUSTER BY (user_id, engagement_date)\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"Delta table with liquid clustering created successfully!\")\n",
+    "\n",
+    "print(\"Clustering will automatically optimize data layout for queries on user_id and engagement_date.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Generate Media Sample Data\n",
+    "\n",
+    "### Data Generation Strategy\n",
+    "\n",
+    "We'll create realistic media engagement data including:\n",
+    "\n",
+    "- **12,000 users** with multiple content interactions over time\n",
+    "- **Content types**: Video, Article, Podcast, Live Stream\n",
+    "- **Realistic engagement patterns**: Peak viewing times, content preferences, device usage\n",
+    "- **Engagement metrics**: Watch time, completion rates, interaction scores\n",
+    "\n",
+    "### Why This Data Pattern?\n",
+    "\n",
+    "This data simulates real media scenarios where:\n",
+    "\n",
+    "- User preferences drive content recommendations\n",
+    "- Engagement metrics determine content success\n",
+    "- Device usage affects viewing experience\n",
+    "- Time-based patterns influence programming decisions\n",
+    "- Personalization requires historical user behavior"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Generated 299540 content engagement records\n",
+       "Sample record: {'user_id': 'USER000001', 'engagement_date': datetime.datetime(2024, 8, 13, 17, 29), 'content_type': 'Podcast', 'watch_time': 34.22, 'content_id': 'POD96528', 'engagement_score': 74, 'device_type': 'Desktop'}\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Generate sample media engagement data\n",
+    "\n",
+    "# Using fully qualified imports to avoid conflicts\n",
+    "\n",
+    "import random\n",
+    "\n",
+    "from datetime import datetime, timedelta\n",
+    "\n",
+    "\n",
+    "# Define media data constants\n",
+    "\n",
+    "CONTENT_TYPES = ['Video', 'Article', 'Podcast', 'Live Stream']\n",
+    "\n",
+    "DEVICE_TYPES = ['Mobile', 'Desktop', 'Tablet', 'Smart TV', 'Gaming Console']\n",
+    "\n",
+    "# Base engagement parameters by content type\n",
+    "\n",
+    "ENGAGEMENT_PARAMS = {\n",
+    "\n",
+    "    'Video': {'avg_watch_time': 15, 'engagement_base': 75, 'frequency': 12},\n",
+    "\n",
+    "    'Article': {'avg_watch_time': 8, 'engagement_base': 65, 'frequency': 8},\n",
+    "\n",
+    "    'Podcast': {'avg_watch_time': 25, 'engagement_base': 70, 'frequency': 6},\n",
+    "\n",
+    "    'Live Stream': {'avg_watch_time': 45, 'engagement_base': 80, 'frequency': 4}\n",
+    "\n",
+    "}\n",
+    "\n",
+    "# Device engagement multipliers\n",
+    "\n",
+    "DEVICE_MULTIPLIERS = {\n",
+    "\n",
+    "    'Mobile': 0.9, 'Desktop': 1.0, 'Tablet': 0.95, 'Smart TV': 1.1, 'Gaming Console': 1.05\n",
+    "\n",
+    "}\n",
+    "\n",
+    "\n",
+    "# Generate content engagement records\n",
+    "\n",
+    "engagement_data = []\n",
+    "\n",
+    "base_date = datetime(2024, 1, 1)\n",
+    "\n",
+    "\n",
+    "# Create 12,000 users with 10-40 engagement events each\n",
+    "\n",
+    "for user_num in range(1, 12001):\n",
+    "\n",
+    "    user_id = f\"USER{user_num:06d}\"\n",
+    "    \n",
+    "    # Each user gets 10-40 engagement events over 12 months\n",
+    "\n",
+    "    num_engagements = random.randint(10, 40)\n",
+    "    \n",
+    "    for i in range(num_engagements):\n",
+    "\n",
+    "        # Spread engagements over 12 months\n",
+    "\n",
+    "        days_offset = random.randint(0, 365)\n",
+    "\n",
+    "        engagement_date = base_date + timedelta(days=days_offset)\n",
+    "        \n",
+    "        # Add realistic timing (more engagement during certain hours)\n",
+    "\n",
+    "        hour_weights = [2, 1, 1, 1, 1, 1, 3, 6, 8, 7, 6, 7, 8, 9, 10, 9, 8, 10, 12, 9, 7, 5, 4, 3]\n",
+    "\n",
+    "        hours_offset = random.choices(range(24), weights=hour_weights)[0]\n",
+    "\n",
+    "        engagement_date = engagement_date.replace(hour=hours_offset, minute=random.randint(0, 59), second=0, microsecond=0)\n",
+    "        \n",
+    "        # Select content type\n",
+    "\n",
+    "        content_type = random.choice(CONTENT_TYPES)\n",
+    "\n",
+    "        params = ENGAGEMENT_PARAMS[content_type]\n",
+    "        \n",
+    "        # Select device type\n",
+    "\n",
+    "        device_type = random.choice(DEVICE_TYPES)\n",
+    "\n",
+    "        device_multiplier = DEVICE_MULTIPLIERS[device_type]\n",
+    "        \n",
+    "        # Calculate watch time with variations\n",
+    "\n",
+    "        time_variation = random.uniform(0.3, 2.5)\n",
+    "\n",
+    "        watch_time = round(params['avg_watch_time'] * time_variation * device_multiplier, 2)\n",
+    "        \n",
+    "        # Content ID\n",
+    "\n",
+    "        content_id = f\"{content_type[:3].upper()}{random.randint(10000, 99999)}\"\n",
+    "        \n",
+    "        # Engagement score (based on content type, device, and some randomness)\n",
+    "\n",
+    "        engagement_variation = random.randint(-15, 15)\n",
+    "\n",
+    "        engagement_score = max(0, min(100, int(params['engagement_base'] * device_multiplier) + engagement_variation))\n",
+    "        \n",
+    "        engagement_data.append({\n",
+    "\n",
+    "            \"user_id\": user_id,\n",
+    "\n",
+    "            \"engagement_date\": engagement_date,\n",
+    "\n",
+    "            \"content_type\": content_type,\n",
+    "\n",
+    "            \"watch_time\": watch_time,\n",
+    "\n",
+    "            \"content_id\": content_id,\n",
+    "\n",
+    "            \"engagement_score\": engagement_score,\n",
+    "\n",
+    "            \"device_type\": device_type\n",
+    "\n",
+    "        })\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(f\"Generated {len(engagement_data)} content engagement records\")\n",
+    "\n",
+    "print(\"Sample record:\", engagement_data[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Insert Data Using PySpark\n",
+    "\n",
+    "### Data Insertion Strategy\n",
+    "\n",
+    "We'll use PySpark to:\n",
+    "\n",
+    "1. **Create DataFrame** from our generated data\n",
+    "2. **Insert into Delta table** with liquid clustering\n",
+    "3. **Verify the insertion** with a sample query\n",
+    "\n",
+    "### Why PySpark for Insertion?\n",
+    "\n",
+    "- **Distributed processing**: Handles large datasets efficiently\n",
+    "- **Type safety**: Ensures data integrity\n",
+    "- **Optimization**: Leverages Spark's query optimization\n",
+    "- **Liquid clustering**: Automatically applies clustering during insertion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataFrame Schema:\n",
+       "root\n",
+       " |-- content_id: string (nullable = true)\n",
+       " |-- content_type: string (nullable = true)\n",
+       " |-- device_type: string (nullable = true)\n",
+       " |-- engagement_date: timestamp (nullable = true)\n",
+       " |-- engagement_score: long (nullable = true)\n",
+       " |-- user_id: string (nullable = true)\n",
+       " |-- watch_time: double (nullable = true)\n",
+       "\n",
+       "\n",
+       "Sample Data:\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+------------+--------------+-------------------+----------------+----------+----------+\n",
+       "|content_id|content_type|   device_type|    engagement_date|engagement_score|   user_id|watch_time|\n",
+       "+----------+------------+--------------+-------------------+----------------+----------+----------+\n",
+       "|  POD96528|     Podcast|       Desktop|2024-08-13 17:29:00|              74|USER000001|     34.22|\n",
+       "|  VID98484|       Video|        Mobile|2024-09-04 00:59:00|              81|USER000001|     13.27|\n",
+       "|  VID15293|       Video|        Tablet|2024-01-01 10:39:00|              84|USER000001|      9.75|\n",
+       "|  POD83689|     Podcast|        Mobile|2024-06-04 20:33:00|              76|USER000001|     41.79|\n",
+       "|  POD56644|     Podcast|Gaming Console|2024-02-19 13:31:00|              63|USER000001|      27.7|\n",
+       "+----------+------------+--------------+-------------------+----------------+----------+----------+\n",
+       "only showing top 5 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "Successfully inserted 299540 records into media.analytics.content_engagement\n",
+       "Liquid clustering automatically optimized the data layout during insertion!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Insert data using PySpark DataFrame operations\n",
+    "\n",
+    "# Using fully qualified function references to avoid conflicts\n",
+    "\n",
+    "\n",
+    "# Create DataFrame from generated data\n",
+    "\n",
+    "df_engagement = spark.createDataFrame(engagement_data)\n",
+    "\n",
+    "\n",
+    "# Display schema and sample data\n",
+    "\n",
+    "print(\"DataFrame Schema:\")\n",
+    "\n",
+    "df_engagement.printSchema()\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(\"\\nSample Data:\")\n",
+    "\n",
+    "df_engagement.show(5)\n",
+    "\n",
+    "\n",
+    "# Insert data into Delta table with liquid clustering\n",
+    "\n",
+    "# The CLUSTER BY (user_id, engagement_date) will automatically optimize the data layout\n",
+    "\n",
+    "df_engagement.write.mode(\"overwrite\").saveAsTable(\"media.analytics.content_engagement\")\n",
+    "\n",
+    "\n",
+    "print(f\"\\nSuccessfully inserted {df_engagement.count()} records into media.analytics.content_engagement\")\n",
+    "\n",
+    "print(\"Liquid clustering automatically optimized the data layout during insertion!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Demonstrate Liquid Clustering Benefits\n",
+    "\n",
+    "### Query Performance Analysis\n",
+    "\n",
+    "Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:\n",
+    "\n",
+    "1. **User engagement history** (clustered by user_id)\n",
+    "2. **Time-based content analysis** (clustered by engagement_date)\n",
+    "3. **Combined user + time queries** (optimal for our clustering)\n",
+    "\n",
+    "### Expected Performance Benefits\n",
+    "\n",
+    "With liquid clustering, these queries should be significantly faster because:\n",
+    "\n",
+    "- **Data locality**: Related records are physically grouped together\n",
+    "- **Reduced I/O**: Less data needs to be read from disk\n",
+    "- **Automatic optimization**: No manual tuning required"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Query 1: User Engagement History ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+-------------------+------------+----------+----------------+\n",
+       "|   user_id|    engagement_date|content_type|watch_time|engagement_score|\n",
+       "+----------+-------------------+------------+----------+----------------+\n",
+       "|USER000001|2024-12-30 07:16:00|     Podcast|     41.06|              83|\n",
+       "|USER000001|2024-12-08 17:18:00|     Podcast|     13.61|              75|\n",
+       "|USER000001|2024-11-27 07:56:00|     Article|     18.44|              63|\n",
+       "|USER000001|2024-10-15 15:23:00| Live Stream|     111.8|              80|\n",
+       "|USER000001|2024-09-04 00:59:00|       Video|     13.27|              81|\n",
+       "|USER000001|2024-09-03 23:01:00| Live Stream|      65.6|              88|\n",
+       "|USER000001|2024-09-03 14:35:00| Live Stream|     44.77|              91|\n",
+       "|USER000001|2024-08-20 19:50:00|     Podcast|     40.36|              67|\n",
+       "|USER000001|2024-08-13 17:29:00|     Podcast|     34.22|              74|\n",
+       "|USER000001|2024-07-17 23:14:00| Live Stream|     113.5|              74|\n",
+       "+----------+-------------------+------------+----------+----------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Records found: 10\n",
+       "\n",
+       "=== Query 2: Recent High-Engagement Content ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------------+----------+----------+------------+----------------+----------+\n",
+       "|    engagement_date|   user_id|content_id|content_type|engagement_score|watch_time|\n",
+       "+-------------------+----------+----------+------------+----------------+----------+\n",
+       "|2024-02-15 16:33:00|USER004701|  LIV23443| Live Stream|             100|     111.0|\n",
+       "|2024-02-15 15:56:00|USER009133|  LIV37632| Live Stream|             100|    107.46|\n",
+       "|2024-02-15 06:56:00|USER005956|  LIV52538| Live Stream|             100|    102.42|\n",
+       "|2024-02-15 15:32:00|USER002011|  LIV53566| Live Stream|             100|     57.66|\n",
+       "|2024-02-15 10:38:00|USER004131|  LIV78476| Live Stream|             100|     21.97|\n",
+       "|2024-02-15 07:53:00|USER001098|  LIV42709| Live Stream|             100|     21.52|\n",
+       "|2024-02-15 15:50:00|USER011262|  LIV59439| Live Stream|              99|     74.89|\n",
+       "|2024-02-15 13:38:00|USER006084|  LIV42623| Live Stream|              98|    110.39|\n",
+       "|2024-02-15 02:57:00|USER010226|  LIV65581| Live Stream|              98|     21.21|\n",
+       "|2024-02-15 18:31:00|USER010806|  LIV22812| Live Stream|              97|    104.68|\n",
+       "|2024-02-15 15:57:00|USER011843|  LIV75072| Live Stream|              97|     85.95|\n",
+       "|2024-02-15 19:05:00|USER001313|  LIV27251| Live Stream|              97|     80.72|\n",
+       "|2024-02-15 13:35:00|USER002206|  LIV20408| Live Stream|              97|      26.6|\n",
+       "|2024-02-15 21:38:00|USER010468|  LIV75912| Live Stream|              96|    111.89|\n",
+       "|2024-02-15 15:08:00|USER010862|  LIV57131| Live Stream|              96|     85.56|\n",
+       "|2024-02-15 13:46:00|USER007068|  LIV56576| Live Stream|              96|     73.59|\n",
+       "|2024-02-15 14:03:00|USER002667|  LIV60308| Live Stream|              96|     43.27|\n",
+       "|2024-02-15 11:15:00|USER003909|  VID86057|       Video|              96|     26.42|\n",
+       "|2024-02-15 08:09:00|USER009458|  LIV92626| Live Stream|              95|    107.98|\n",
+       "|2024-02-15 14:27:00|USER006756|  LIV23306| Live Stream|              95|    105.01|\n",
+       "+-------------------+----------+----------+------------+----------------+----------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "High-engagement records found: 106\n",
+       "\n",
+       "=== Query 3: User Content Preferences ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+-------------------+------------+----------+--------------+\n",
+       "|   user_id|    engagement_date|content_type|watch_time|   device_type|\n",
+       "+----------+-------------------+------------+----------+--------------+\n",
+       "|USER000001|2024-02-19 13:31:00|     Podcast|      27.7|Gaming Console|\n",
+       "|USER000001|2024-03-06 18:48:00| Live Stream|     93.56|        Mobile|\n",
+       "|USER000001|2024-03-19 21:42:00|       Video|     32.25|       Desktop|\n",
+       "|USER000001|2024-03-26 07:32:00|     Podcast|      17.3|      Smart TV|\n",
+       "|USER000001|2024-04-02 12:00:00|     Podcast|     40.56|      Smart TV|\n",
+       "|USER000001|2024-04-02 13:07:00|     Podcast|     24.74|       Desktop|\n",
+       "|USER000001|2024-04-27 14:31:00|     Podcast|     32.07|        Tablet|\n",
+       "|USER000001|2024-05-05 23:26:00|       Video|     11.33|        Tablet|\n",
+       "|USER000001|2024-05-06 18:32:00|     Podcast|      17.0|        Tablet|\n",
+       "|USER000001|2024-06-04 20:33:00|     Podcast|     41.79|        Mobile|\n",
+       "|USER000001|2024-06-06 13:12:00|       Video|     30.08|      Smart TV|\n",
+       "|USER000001|2024-06-08 10:16:00| Live Stream|      95.7|        Mobile|\n",
+       "|USER000001|2024-06-21 09:42:00| Live Stream|     54.65|        Mobile|\n",
+       "|USER000001|2024-07-17 23:14:00| Live Stream|     113.5|      Smart TV|\n",
+       "|USER000001|2024-08-13 17:29:00|     Podcast|     34.22|       Desktop|\n",
+       "|USER000001|2024-08-20 19:50:00|     Podcast|     40.36|       Desktop|\n",
+       "|USER000001|2024-09-03 14:35:00| Live Stream|     44.77|       Desktop|\n",
+       "|USER000001|2024-09-03 23:01:00| Live Stream|      65.6|        Tablet|\n",
+       "|USER000001|2024-09-04 00:59:00|       Video|     13.27|        Mobile|\n",
+       "|USER000001|2024-10-15 15:23:00| Live Stream|     111.8|      Smart TV|\n",
+       "+----------+-------------------+------------+----------+--------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "User preference records found: 25\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Demonstrate liquid clustering benefits with optimized queries\n",
+    "\n",
+    "\n",
+    "# Query 1: User engagement history - benefits from user_id clustering\n",
+    "\n",
+    "print(\"=== Query 1: User Engagement History ===\")\n",
+    "\n",
+    "user_history = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT user_id, engagement_date, content_type, watch_time, engagement_score\n",
+    "\n",
+    "FROM media.analytics.content_engagement\n",
+    "\n",
+    "WHERE user_id = 'USER000001'\n",
+    "\n",
+    "ORDER BY engagement_date DESC\n",
+    "\n",
+    "LIMIT 10\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "user_history.show()\n",
+    "\n",
+    "print(f\"Records found: {user_history.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 2: Time-based high-engagement content analysis - benefits from engagement_date clustering\n",
+    "\n",
+    "print(\"\\n=== Query 2: Recent High-Engagement Content ===\")\n",
+    "\n",
+    "high_engagement = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT engagement_date, user_id, content_id, content_type, engagement_score, watch_time\n",
+    "\n",
+    "FROM media.analytics.content_engagement\n",
+    "\n",
+    "WHERE DATE(engagement_date) = '2024-02-15' AND engagement_score > 85\n",
+    "\n",
+    "ORDER BY engagement_score DESC, watch_time DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "high_engagement.show()\n",
+    "\n",
+    "print(f\"High-engagement records found: {high_engagement.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 3: Combined user + time query - optimal for our clustering strategy\n",
+    "\n",
+    "print(\"\\n=== Query 3: User Content Preferences ===\")\n",
+    "\n",
+    "user_preferences = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT user_id, engagement_date, content_type, watch_time, device_type\n",
+    "\n",
+    "FROM media.analytics.content_engagement\n",
+    "\n",
+    "WHERE user_id LIKE 'USER000%' AND engagement_date >= '2024-02-01'\n",
+    "\n",
+    "ORDER BY user_id, engagement_date\n",
+    "\n",
+    "LIMIT 25\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "user_preferences.show()\n",
+    "\n",
+    "print(f\"User preference records found: {user_preferences.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Analyze Clustering Effectiveness\n",
+    "\n",
+    "### Understanding the Impact\n",
+    "\n",
+    "Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the media insights possible with this optimized structure.\n",
+    "\n",
+    "### Key Analytics\n",
+    "\n",
+    "- **User engagement patterns** and content preferences\n",
+    "- **Content performance** by type and popularity metrics\n",
+    "- **Device usage trends** and platform optimization\n",
+    "- **Time-based consumption patterns** and programming insights"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== User Engagement Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+--------------+----------------+----------------+--------------+------------------+\n",
+       "|   user_id|total_sessions|total_watch_time|avg_session_time|avg_engagement|content_types_used|\n",
+       "+----------+--------------+----------------+----------------+--------------+------------------+\n",
+       "|USER007579|            40|         1877.93|           46.95|         75.13|                 4|\n",
+       "|USER005840|            37|         1833.53|           49.55|         74.32|                 4|\n",
+       "|USER001865|            38|         1811.01|           47.66|         74.92|                 4|\n",
+       "|USER004356|            38|         1750.62|           46.07|         72.79|                 4|\n",
+       "|USER007922|            36|         1738.63|            48.3|         75.08|                 4|\n",
+       "|USER002936|            35|         1729.81|           49.42|         69.69|                 4|\n",
+       "|USER002713|            40|         1712.54|           42.81|         71.73|                 4|\n",
+       "|USER007310|            40|         1705.58|           42.64|          74.9|                 4|\n",
+       "|USER001554|            39|         1680.15|           43.08|         72.31|                 4|\n",
+       "|USER008670|            40|         1678.74|           41.97|          75.5|                 4|\n",
+       "+----------+--------------+----------------+----------------+--------------+------------------+\n",
+       "\n",
+       "\n",
+       "=== Content Type Performance ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+------------+-----------------+----------------+--------------+--------------+------------+--------------+\n",
+       "|content_type|total_engagements|total_watch_time|avg_watch_time|avg_engagement|unique_users|unique_content|\n",
+       "+------------+-----------------+----------------+--------------+--------------+------------+--------------+\n",
+       "| Live Stream|            75054|      4737522.68|         63.12|         79.97|       11912|         50853|\n",
+       "|     Podcast|            75096|      2632220.72|         35.05|         69.87|       11904|         51028|\n",
+       "|       Video|            74449|      1568878.01|         21.07|         74.64|       11906|         50616|\n",
+       "|     Article|            74941|       839239.02|          11.2|         64.59|       11923|         50708|\n",
+       "+------------+-----------------+----------------+--------------+--------------+------------+--------------+\n",
+       "\n",
+       "\n",
+       "=== Device Usage Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------------+--------------+----------------+----------------+--------------+------------+\n",
+       "|   device_type|total_sessions|total_watch_time|avg_session_time|avg_engagement|unique_users|\n",
+       "+--------------+--------------+----------------+----------------+--------------+------------+\n",
+       "|      Smart TV|         60108|      2160351.14|           35.94|         79.43|       11778|\n",
+       "|Gaming Console|         59734|      2028688.05|           33.96|         75.74|       11802|\n",
+       "|       Desktop|         59949|      1969632.73|           32.86|          72.5|       11783|\n",
+       "|        Tablet|         60175|      1869267.79|           31.06|         68.54|       11804|\n",
+       "|        Mobile|         59574|      1749920.72|           29.37|         65.08|       11784|\n",
+       "+--------------+--------------+----------------+----------------+--------------+------------+\n",
+       "\n",
+       "\n",
+       "=== Hourly Engagement Patterns ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+-----------------+----------------+--------------+------------+\n",
+       "|hour_of_day|engagement_events|total_watch_time|avg_engagement|active_users|\n",
+       "+-----------+-----------------+----------------+--------------+------------+\n",
+       "|          0|               15|           472.0|         71.47|          15|\n",
+       "|          1|                7|          158.18|         73.71|           7|\n",
+       "|          2|                8|           322.8|         68.25|           8|\n",
+       "|          3|                6|          199.68|          68.0|           6|\n",
+       "|          4|                8|          219.29|         68.88|           8|\n",
+       "|          5|                3|          116.65|         76.33|           3|\n",
+       "|          6|               18|           568.5|         72.56|          18|\n",
+       "|          7|               42|         1211.49|         71.38|          42|\n",
+       "|          8|               43|         1407.64|         73.84|          43|\n",
+       "|          9|               47|          1604.9|         70.06|          47|\n",
+       "|         10|               39|         1341.92|         71.82|          39|\n",
+       "|         11|               48|         1707.31|         75.85|          48|\n",
+       "|         12|               49|         1723.38|         72.92|          49|\n",
+       "|         13|               70|          2297.3|         72.96|          70|\n",
+       "|         14|               47|         1873.51|         73.87|          47|\n",
+       "|         15|               51|         1556.71|         72.69|          51|\n",
+       "|         16|               42|         1095.14|         70.02|          42|\n",
+       "|         17|               63|         2550.92|         72.48|          63|\n",
+       "|         18|               72|         2541.56|         72.81|          72|\n",
+       "|         19|               40|         1289.31|          73.4|          40|\n",
+       "+-----------+-----------------+----------------+--------------+------------+\n",
+       "only showing top 20 rows\n",
+       "\n",
+       "\n",
+       "=== Monthly Engagement Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------+-----------------+------------------+----------------+--------------+------------+\n",
+       "|  month|total_engagements|monthly_watch_time|avg_session_time|avg_engagement|active_users|\n",
+       "+-------+-----------------+------------------+----------------+--------------+------------+\n",
+       "|2024-01|            25159|         827121.13|           32.88|         72.26|       10203|\n",
+       "|2024-02|            23872|         772994.45|           32.38|         72.24|       10000|\n",
+       "|2024-03|            25510|         827291.65|           32.43|         72.29|       10244|\n",
+       "|2024-04|            24519|          798865.9|           32.58|         72.23|       10145|\n",
+       "|2024-05|            25288|         829255.26|           32.79|         72.26|       10225|\n",
+       "|2024-06|            24308|         794100.99|           32.67|         72.17|       10062|\n",
+       "|2024-07|            25428|         832311.23|           32.73|         72.25|       10260|\n",
+       "|2024-08|            25603|         833486.22|           32.55|         72.34|       10257|\n",
+       "|2024-09|            24588|         808066.62|           32.86|         72.33|       10097|\n",
+       "|2024-10|            25287|         820795.48|           32.46|         72.26|       10214|\n",
+       "|2024-11|            24695|         804246.35|           32.57|         72.18|       10137|\n",
+       "|2024-12|            25283|         829325.15|            32.8|         72.37|       10259|\n",
+       "+-------+-----------------+------------------+----------------+--------------+------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Analyze clustering effectiveness and media insights\n",
+    "\n",
+    "\n",
+    "# User engagement analysis\n",
+    "\n",
+    "print(\"=== User Engagement Analysis ===\")\n",
+    "\n",
+    "user_engagement = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT user_id, COUNT(*) as total_sessions,\n",
+    "\n",
+    "       ROUND(SUM(watch_time), 2) as total_watch_time,\n",
+    "\n",
+    "       ROUND(AVG(watch_time), 2) as avg_session_time,\n",
+    "\n",
+    "       ROUND(AVG(engagement_score), 2) as avg_engagement,\n",
+    "\n",
+    "       COUNT(DISTINCT content_type) as content_types_used\n",
+    "\n",
+    "FROM media.analytics.content_engagement\n",
+    "\n",
+    "GROUP BY user_id\n",
+    "\n",
+    "ORDER BY total_watch_time DESC\n",
+    "\n",
+    "LIMIT 10\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "user_engagement.show()\n",
+    "\n",
+    "\n",
+    "# Content type performance\n",
+    "\n",
+    "print(\"\\n=== Content Type Performance ===\")\n",
+    "\n",
+    "content_performance = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT content_type, COUNT(*) as total_engagements,\n",
+    "\n",
+    "       ROUND(SUM(watch_time), 2) as total_watch_time,\n",
+    "\n",
+    "       ROUND(AVG(watch_time), 2) as avg_watch_time,\n",
+    "\n",
+    "       ROUND(AVG(engagement_score), 2) as avg_engagement,\n",
+    "\n",
+    "       COUNT(DISTINCT user_id) as unique_users,\n",
+    "\n",
+    "       COUNT(DISTINCT content_id) as unique_content\n",
+    "\n",
+    "FROM media.analytics.content_engagement\n",
+    "\n",
+    "GROUP BY content_type\n",
+    "\n",
+    "ORDER BY total_watch_time DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "content_performance.show()\n",
+    "\n",
+    "\n",
+    "# Device usage analysis\n",
+    "\n",
+    "print(\"\\n=== Device Usage Analysis ===\")\n",
+    "\n",
+    "device_analysis = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT device_type, COUNT(*) as total_sessions,\n",
+    "\n",
+    "       ROUND(SUM(watch_time), 2) as total_watch_time,\n",
+    "\n",
+    "       ROUND(AVG(watch_time), 2) as avg_session_time,\n",
+    "\n",
+    "       ROUND(AVG(engagement_score), 2) as avg_engagement,\n",
+    "\n",
+    "       COUNT(DISTINCT user_id) as unique_users\n",
+    "\n",
+    "FROM media.analytics.content_engagement\n",
+    "\n",
+    "GROUP BY device_type\n",
+    "\n",
+    "ORDER BY total_watch_time DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "device_analysis.show()\n",
+    "\n",
+    "\n",
+    "# Hourly engagement patterns\n",
+    "\n",
+    "print(\"\\n=== Hourly Engagement Patterns ===\")\n",
+    "\n",
+    "hourly_patterns = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT HOUR(engagement_date) as hour_of_day, COUNT(*) as engagement_events,\n",
+    "\n",
+    "       ROUND(SUM(watch_time), 2) as total_watch_time,\n",
+    "\n",
+    "       ROUND(AVG(engagement_score), 2) as avg_engagement,\n",
+    "\n",
+    "       COUNT(DISTINCT user_id) as active_users\n",
+    "\n",
+    "FROM media.analytics.content_engagement\n",
+    "\n",
+    "WHERE DATE(engagement_date) = '2024-02-01'\n",
+    "\n",
+    "GROUP BY HOUR(engagement_date)\n",
+    "\n",
+    "ORDER BY hour_of_day\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "hourly_patterns.show()\n",
+    "\n",
+    "\n",
+    "# Monthly engagement trends\n",
+    "\n",
+    "print(\"\\n=== Monthly Engagement Trends ===\")\n",
+    "\n",
+    "monthly_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT DATE_FORMAT(engagement_date, 'yyyy-MM') as month,\n",
+    "\n",
+    "       COUNT(*) as total_engagements,\n",
+    "\n",
+    "       ROUND(SUM(watch_time), 2) as monthly_watch_time,\n",
+    "\n",
+    "       ROUND(AVG(watch_time), 2) as avg_session_time,\n",
+    "\n",
+    "       ROUND(AVG(engagement_score), 2) as avg_engagement,\n",
+    "\n",
+    "       COUNT(DISTINCT user_id) as active_users\n",
+    "\n",
+    "FROM media.analytics.content_engagement\n",
+    "\n",
+    "GROUP BY DATE_FORMAT(engagement_date, 'yyyy-MM')\n",
+    "\n",
+    "ORDER BY month\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "monthly_trends.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways: Delta Liquid Clustering in AIDP\n",
+    "\n",
+    "### What We Demonstrated\n",
+    "\n",
+    "1. **Automatic Optimization**: Created a table with `CLUSTER BY (user_id, engagement_date)` and let Delta automatically optimize data layout\n",
+    "\n",
+    "2. **Performance Benefits**: Queries on clustered columns (user_id, engagement_date) are significantly faster due to data locality\n",
+    "\n",
+    "3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically\n",
+    "\n",
+    "4. **Real-World Use Case**: Media analytics where content engagement and user behavior analysis are critical\n",
+    "\n",
+    "### AIDP Advantages\n",
+    "\n",
+    "- **Unified Analytics**: Seamlessly integrates with other AIDP services\n",
+    "- **Governance**: Catalog and schema isolation for media data\n",
+    "- **Performance**: Optimized for both OLAP and OLTP workloads\n",
+    "- **Scalability**: Handles media-scale data volumes effortlessly\n",
+    "\n",
+    "### Best Practices for Liquid Clustering\n",
+    "\n",
+    "1. **Choose clustering columns** based on your most common query patterns\n",
+    "2. **Start with 1-4 columns** - too many can reduce effectiveness\n",
+    "3. **Consider cardinality** - high-cardinality columns work best\n",
+    "4. **Monitor and adjust** as query patterns evolve\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Explore other AIDP features like AI/ML integration\n",
+    "- Try liquid clustering with different column combinations\n",
+    "- Scale up to larger media datasets\n",
+    "- Integrate with real content management and streaming platforms\n",
+    "\n",
+    "This notebook demonstrates how Oracle AI Data Platform makes advanced media analytics accessible while maintaining enterprise-grade performance and governance."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Notebooks/liquid_clustering/real_estate_delta_liquid_clustering_demo.ipynb b/Notebooks/liquid_clustering/real_estate_delta_liquid_clustering_demo.ipynb
new file mode 100644
index 0000000..66e7278
--- /dev/null
+++ b/Notebooks/liquid_clustering/real_estate_delta_liquid_clustering_demo.ipynb
@@ -0,0 +1,1109 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Real Estate: Delta Liquid Clustering Demo\n",
+    "\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "\n",
+    "This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using a real estate analytics use case. Liquid clustering automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.\n",
+    "\n",
+    "### What is Liquid Clustering?\n",
+    "\n",
+    "Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:\n",
+    "\n",
+    "- **Automatic optimization**: No manual tuning required\n",
+    "- **Improved query performance**: Faster queries on clustered columns\n",
+    "- **Reduced maintenance**: No need for manual repartitioning\n",
+    "- **Adaptive clustering**: Adjusts as data patterns change\n",
+    "\n",
+    "### Use Case: Property Transactions and Market Analysis\n",
+    "\n",
+    "We'll analyze real estate transactions and property market data. Our clustering strategy will optimize for:\n",
+    "\n",
+    "- **Property-specific queries**: Fast lookups by property ID\n",
+    "- **Time-based analysis**: Efficient filtering by transaction and listing dates\n",
+    "- **Market performance patterns**: Quick aggregation by location and property type\n",
+    "\n",
+    "### AIDP Environment Setup\n",
+    "\n",
+    "This notebook leverages the existing Spark session in your AIDP environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Real estate catalog and analytics schema created successfully!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create real estate catalog and analytics schema\n",
+    "\n",
+    "# In AIDP, catalogs provide data isolation and governance\n",
+    "\n",
+    "spark.sql(\"CREATE CATALOG IF NOT EXISTS real_estate\")\n",
+    "\n",
+    "spark.sql(\"CREATE SCHEMA IF NOT EXISTS real_estate.analytics\")\n",
+    "\n",
+    "print(\"Real estate catalog and analytics schema created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create Delta Table with Liquid Clustering\n",
+    "\n",
+    "### Table Design\n",
+    "\n",
+    "Our `property_transactions` table will store:\n",
+    "\n",
+    "- **property_id**: Unique property identifier\n",
+    "- **transaction_date**: Date of property transaction\n",
+    "- **property_type**: Type (Single Family, Condo, Apartment, etc.)\n",
+    "- **sale_price**: Transaction sale price\n",
+    "- **location**: Geographic location/neighborhood\n",
+    "- **days_on_market**: Time property was listed before sale\n",
+    "- **price_per_sqft**: Price per square foot\n",
+    "\n",
+    "### Clustering Strategy\n",
+    "\n",
+    "We'll cluster by `property_id` and `transaction_date` because:\n",
+    "\n",
+    "- **property_id**: Properties may have multiple transactions over time, grouping their sales history together\n",
+    "- **transaction_date**: Time-based queries are critical for market analysis, seasonal trends, and investment performance\n",
+    "- This combination optimizes for both property tracking and temporal market analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Delta table with liquid clustering created successfully!\n",
+       "Clustering will automatically optimize data layout for queries on property_id and transaction_date.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create Delta table with liquid clustering\n",
+    "\n",
+    "# CLUSTER BY defines the columns for automatic optimization\n",
+    "\n",
+    "spark.sql(\"\"\"\n",
+    "\n",
+    "CREATE TABLE IF NOT EXISTS real_estate.analytics.property_transactions (\n",
+    "\n",
+    "    property_id STRING,\n",
+    "\n",
+    "    transaction_date DATE,\n",
+    "\n",
+    "    property_type STRING,\n",
+    "\n",
+    "    sale_price DECIMAL(12,2),\n",
+    "\n",
+    "    location STRING,\n",
+    "\n",
+    "    days_on_market INT,\n",
+    "\n",
+    "    price_per_sqft DECIMAL(8,2)\n",
+    "\n",
+    ")\n",
+    "\n",
+    "USING DELTA\n",
+    "\n",
+    "CLUSTER BY (property_id, transaction_date)\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"Delta table with liquid clustering created successfully!\")\n",
+    "\n",
+    "print(\"Clustering will automatically optimize data layout for queries on property_id and transaction_date.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Generate Real Estate Sample Data\n",
+    "\n",
+    "### Data Generation Strategy\n",
+    "\n",
+    "We'll create realistic real estate transaction data including:\n",
+    "\n",
+    "- **8,000 properties** with multiple transactions over time\n",
+    "- **Property types**: Single Family, Condo, Townhouse, Apartment, Commercial\n",
+    "- **Realistic market patterns**: Seasonal pricing, location premiums, market fluctuations\n",
+    "- **Geographic diversity**: Different neighborhoods with varying price points\n",
+    "\n",
+    "### Why This Data Pattern?\n",
+    "\n",
+    "This data simulates real real estate scenarios where:\n",
+    "\n",
+    "- Properties appreciate or depreciate over time\n",
+    "- Market conditions vary by season and location\n",
+    "- Investment performance requires historical tracking\n",
+    "- Neighborhood analysis drives pricing strategies\n",
+    "- Market trends influence buying/selling decisions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Generated 11372 property transaction records\n",
+       "Sample record: {'property_id': 'PROP000001', 'transaction_date': datetime.date(2024, 3, 26), 'property_type': 'Single Family', 'sale_price': 1071982.06, 'location': 'Downtown', 'days_on_market': 48, 'price_per_sqft': 404.98}\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Generate sample real estate transaction data\n",
+    "\n",
+    "# Using fully qualified imports to avoid conflicts\n",
+    "\n",
+    "import random\n",
+    "\n",
+    "from datetime import datetime, timedelta\n",
+    "\n",
+    "\n",
+    "# Define real estate data constants\n",
+    "\n",
+    "PROPERTY_TYPES = ['Single Family', 'Condo', 'Townhouse', 'Apartment', 'Commercial']\n",
+    "\n",
+    "LOCATIONS = ['Downtown', 'Suburban', 'Waterfront', 'Mountain View', 'Urban Core', 'Residential District']\n",
+    "\n",
+    "# Base pricing parameters by property type and location\n",
+    "\n",
+    "PRICE_PARAMS = {\n",
+    "\n",
+    "    'Single Family': {\n",
+    "\n",
+    "        'Downtown': {'base_price': 850000, 'sqft_range': (1800, 3500)},\n",
+    "\n",
+    "        'Suburban': {'base_price': 650000, 'sqft_range': (2000, 4000)},\n",
+    "\n",
+    "        'Waterfront': {'base_price': 1200000, 'sqft_range': (2200, 4500)},\n",
+    "\n",
+    "        'Mountain View': {'base_price': 750000, 'sqft_range': (1900, 3800)},\n",
+    "\n",
+    "        'Urban Core': {'base_price': 950000, 'sqft_range': (1600, 3200)},\n",
+    "\n",
+    "        'Residential District': {'base_price': 700000, 'sqft_range': (2100, 4200)}\n",
+    "\n",
+    "    },\n",
+    "\n",
+    "    'Condo': {\n",
+    "\n",
+    "        'Downtown': {'base_price': 550000, 'sqft_range': (800, 1800)},\n",
+    "\n",
+    "        'Suburban': {'base_price': 350000, 'sqft_range': (900, 2000)},\n",
+    "\n",
+    "        'Waterfront': {'base_price': 750000, 'sqft_range': (1000, 2200)},\n",
+    "\n",
+    "        'Mountain View': {'base_price': 450000, 'sqft_range': (850, 1900)},\n",
+    "\n",
+    "        'Urban Core': {'base_price': 650000, 'sqft_range': (750, 1700)},\n",
+    "\n",
+    "        'Residential District': {'base_price': 400000, 'sqft_range': (950, 2100)}\n",
+    "\n",
+    "    },\n",
+    "\n",
+    "    'Townhouse': {\n",
+    "\n",
+    "        'Downtown': {'base_price': 700000, 'sqft_range': (1400, 2800)},\n",
+    "\n",
+    "        'Suburban': {'base_price': 550000, 'sqft_range': (1600, 3200)},\n",
+    "\n",
+    "        'Waterfront': {'base_price': 900000, 'sqft_range': (1500, 3000)},\n",
+    "\n",
+    "        'Mountain View': {'base_price': 600000, 'sqft_range': (1450, 2900)},\n",
+    "\n",
+    "        'Urban Core': {'base_price': 800000, 'sqft_range': (1300, 2600)},\n",
+    "\n",
+    "        'Residential District': {'base_price': 580000, 'sqft_range': (1650, 3300)}\n",
+    "\n",
+    "    },\n",
+    "\n",
+    "    'Apartment': {\n",
+    "\n",
+    "        'Downtown': {'base_price': 450000, 'sqft_range': (600, 1400)},\n",
+    "\n",
+    "        'Suburban': {'base_price': 280000, 'sqft_range': (650, 1500)},\n",
+    "\n",
+    "        'Waterfront': {'base_price': 600000, 'sqft_range': (700, 1600)},\n",
+    "\n",
+    "        'Mountain View': {'base_price': 350000, 'sqft_range': (625, 1450)},\n",
+    "\n",
+    "        'Urban Core': {'base_price': 520000, 'sqft_range': (550, 1300)},\n",
+    "\n",
+    "        'Residential District': {'base_price': 320000, 'sqft_range': (675, 1550)}\n",
+    "\n",
+    "    },\n",
+    "\n",
+    "    'Commercial': {\n",
+    "\n",
+    "        'Downtown': {'base_price': 2500000, 'sqft_range': (3000, 10000)},\n",
+    "\n",
+    "        'Suburban': {'base_price': 1500000, 'sqft_range': (2500, 8000)},\n",
+    "\n",
+    "        'Waterfront': {'base_price': 3500000, 'sqft_range': (4000, 12000)},\n",
+    "\n",
+    "        'Mountain View': {'base_price': 1800000, 'sqft_range': (2800, 9000)},\n",
+    "\n",
+    "        'Urban Core': {'base_price': 3000000, 'sqft_range': (3500, 11000)},\n",
+    "\n",
+    "        'Residential District': {'base_price': 1600000, 'sqft_range': (2600, 8500)}\n",
+    "\n",
+    "    }\n",
+    "\n",
+    "}\n",
+    "\n",
+    "\n",
+    "# Generate property transaction records\n",
+    "\n",
+    "transaction_data = []\n",
+    "\n",
+    "base_date = datetime(2024, 1, 1)\n",
+    "\n",
+    "\n",
+    "# Create 8,000 properties with 1-4 transactions each\n",
+    "\n",
+    "for property_num in range(1, 8001):\n",
+    "\n",
+    "    property_id = f\"PROP{property_num:06d}\"\n",
+    "    \n",
+    "    # Each property gets 1-4 transactions over 12 months (most have 1, some flip/resale)\n",
+    "\n",
+    "    num_transactions = random.choices([1, 2, 3, 4], weights=[0.7, 0.2, 0.08, 0.02])[0]\n",
+    "    \n",
+    "    # Select property type and location (consistent for the same property)\n",
+    "\n",
+    "    property_type = random.choice(PROPERTY_TYPES)\n",
+    "\n",
+    "    location = random.choice(LOCATIONS)\n",
+    "    \n",
+    "    params = PRICE_PARAMS[property_type][location]\n",
+    "    \n",
+    "    # Base square footage for this property\n",
+    "\n",
+    "    sqft = random.randint(params['sqft_range'][0], params['sqft_range'][1])\n",
+    "    \n",
+    "    for i in range(num_transactions):\n",
+    "\n",
+    "        # Spread transactions over 12 months\n",
+    "\n",
+    "        days_offset = random.randint(0, 365)\n",
+    "\n",
+    "        transaction_date = base_date + timedelta(days=days_offset)\n",
+    "        \n",
+    "        # Calculate sale price with market variations\n",
+    "\n",
+    "        # Seasonal pricing (higher in spring/summer)\n",
+    "\n",
+    "        month = transaction_date.month\n",
+    "\n",
+    "        if month in [3, 4, 5, 6]:  # Spring/Summer peak\n",
+    "\n",
+    "            seasonal_factor = 1.15\n",
+    "\n",
+    "        elif month in [11, 12, 1, 2]:  # Winter off-season\n",
+    "\n",
+    "            seasonal_factor = 0.9\n",
+    "\n",
+    "        else:\n",
+    "\n",
+    "            seasonal_factor = 1.0\n",
+    "        \n",
+    "        # Market appreciation over time (slight increase)\n",
+    "\n",
+    "        months_elapsed = (transaction_date.year - base_date.year) * 12 + (transaction_date.month - base_date.month)\n",
+    "\n",
+    "        appreciation_factor = 1.0 + (months_elapsed * 0.002)  # 0.2% monthly appreciation\n",
+    "\n",
+    "        # Calculate price per square foot\n",
+    "\n",
+    "        base_price_per_sqft = params['base_price'] / ((params['sqft_range'][0] + params['sqft_range'][1]) / 2)\n",
+    "\n",
+    "        price_per_sqft = round(base_price_per_sqft * seasonal_factor * appreciation_factor * random.uniform(0.9, 1.1), 2)\n",
+    "        \n",
+    "        # Calculate total sale price\n",
+    "\n",
+    "        sale_price = round(price_per_sqft * sqft, 2)\n",
+    "        \n",
+    "        # Days on market (varies by property type and market conditions)\n",
+    "\n",
+    "        if property_type == 'Commercial':\n",
+    "\n",
+    "            days_on_market = random.randint(30, 180)\n",
+    "\n",
+    "        else:\n",
+    "\n",
+    "            days_on_market = random.randint(7, 90)\n",
+    "        \n",
+    "        transaction_data.append({\n",
+    "\n",
+    "            \"property_id\": property_id,\n",
+    "\n",
+    "            \"transaction_date\": transaction_date.date(),\n",
+    "\n",
+    "            \"property_type\": property_type,\n",
+    "\n",
+    "            \"sale_price\": sale_price,\n",
+    "\n",
+    "            \"location\": location,\n",
+    "\n",
+    "            \"days_on_market\": days_on_market,\n",
+    "\n",
+    "            \"price_per_sqft\": price_per_sqft\n",
+    "\n",
+    "        })\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(f\"Generated {len(transaction_data)} property transaction records\")\n",
+    "\n",
+    "print(\"Sample record:\", transaction_data[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Insert Data Using PySpark\n",
+    "\n",
+    "### Data Insertion Strategy\n",
+    "\n",
+    "We'll use PySpark to:\n",
+    "\n",
+    "1. **Create DataFrame** from our generated data\n",
+    "2. **Insert into Delta table** with liquid clustering\n",
+    "3. **Verify the insertion** with a sample query\n",
+    "\n",
+    "### Why PySpark for Insertion?\n",
+    "\n",
+    "- **Distributed processing**: Handles large datasets efficiently\n",
+    "- **Type safety**: Ensures data integrity\n",
+    "- **Optimization**: Leverages Spark's query optimization\n",
+    "- **Liquid clustering**: Automatically applies clustering during insertion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataFrame Schema:\n",
+       "root\n",
+       " |-- days_on_market: long (nullable = true)\n",
+       " |-- location: string (nullable = true)\n",
+       " |-- price_per_sqft: double (nullable = true)\n",
+       " |-- property_id: string (nullable = true)\n",
+       " |-- property_type: string (nullable = true)\n",
+       " |-- sale_price: double (nullable = true)\n",
+       " |-- transaction_date: date (nullable = true)\n",
+       "\n",
+       "\n",
+       "Sample Data:\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------------+--------------------+--------------+-----------+-------------+----------+----------------+\n",
+       "|days_on_market|            location|price_per_sqft|property_id|property_type|sale_price|transaction_date|\n",
+       "+--------------+--------------------+--------------+-----------+-------------+----------+----------------+\n",
+       "|            48|            Downtown|        404.98| PROP000001|Single Family|1071982.06|      2024-03-26|\n",
+       "|            22|Residential District|        254.48| PROP000002|    Townhouse| 621440.16|      2024-05-31|\n",
+       "|            62|          Urban Core|        370.97| PROP000003|    Townhouse| 595406.85|      2024-11-14|\n",
+       "|           148|Residential District|        274.64| PROP000004|   Commercial|1020562.24|      2024-10-31|\n",
+       "|            56|            Downtown|        415.72| PROP000005|        Condo| 362092.12|      2024-01-17|\n",
+       "+--------------+--------------------+--------------+-----------+-------------+----------+----------------+\n",
+       "only showing top 5 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "Successfully inserted 11372 records into real_estate.analytics.property_transactions\n",
+       "Liquid clustering automatically optimized the data layout during insertion!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Insert data using PySpark DataFrame operations\n",
+    "\n",
+    "# Using fully qualified function references to avoid conflicts\n",
+    "\n",
+    "\n",
+    "# Create DataFrame from generated data\n",
+    "\n",
+    "df_transactions = spark.createDataFrame(transaction_data)\n",
+    "\n",
+    "\n",
+    "# Display schema and sample data\n",
+    "\n",
+    "print(\"DataFrame Schema:\")\n",
+    "\n",
+    "df_transactions.printSchema()\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(\"\\nSample Data:\")\n",
+    "\n",
+    "df_transactions.show(5)\n",
+    "\n",
+    "\n",
+    "# Insert data into Delta table with liquid clustering\n",
+    "\n",
+    "# The CLUSTER BY (property_id, transaction_date) will automatically optimize the data layout\n",
+    "\n",
+    "df_transactions.write.mode(\"overwrite\").saveAsTable(\"real_estate.analytics.property_transactions\")\n",
+    "\n",
+    "\n",
+    "print(f\"\\nSuccessfully inserted {df_transactions.count()} records into real_estate.analytics.property_transactions\")\n",
+    "\n",
+    "print(\"Liquid clustering automatically optimized the data layout during insertion!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Demonstrate Liquid Clustering Benefits\n",
+    "\n",
+    "### Query Performance Analysis\n",
+    "\n",
+    "Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:\n",
+    "\n",
+    "1. **Property transaction history** (clustered by property_id)\n",
+    "2. **Time-based market analysis** (clustered by transaction_date)\n",
+    "3. **Combined property + time queries** (optimal for our clustering)\n",
+    "\n",
+    "### Expected Performance Benefits\n",
+    "\n",
+    "With liquid clustering, these queries should be significantly faster because:\n",
+    "\n",
+    "- **Data locality**: Related records are physically grouped together\n",
+    "- **Reduced I/O**: Less data needs to be read from disk\n",
+    "- **Automatic optimization**: No manual tuning required"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Query 1: Property Transaction History ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+----------------+-------------+----------+--------+\n",
+       "|property_id|transaction_date|property_type|sale_price|location|\n",
+       "+-----------+----------------+-------------+----------+--------+\n",
+       "| PROP000001|      2024-03-26|Single Family|1071982.06|Downtown|\n",
+       "+-----------+----------------+-------------+----------+--------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Records found: 1\n",
+       "\n",
+       "=== Query 2: Recent High-Value Transactions ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------------+-----------+-------------+----------+----------+\n",
+       "|transaction_date|property_id|property_type|sale_price|  location|\n",
+       "+----------------+-----------+-------------+----------+----------+\n",
+       "|      2024-06-10| PROP006087|   Commercial| 6386463.6|Waterfront|\n",
+       "|      2024-06-23| PROP006087|   Commercial|6074959.55|Waterfront|\n",
+       "|      2024-06-06| PROP003416|   Commercial|5999320.32|Waterfront|\n",
+       "|      2024-06-30| PROP003052|   Commercial| 5659487.1|Waterfront|\n",
+       "|      2024-06-28| PROP004596|   Commercial|5609426.68|Waterfront|\n",
+       "|      2024-06-11| PROP007661|   Commercial|5575704.44|Waterfront|\n",
+       "|      2024-07-24| PROP003416|   Commercial|5540088.96|Waterfront|\n",
+       "|      2024-06-20| PROP002013|   Commercial|5535950.94|Waterfront|\n",
+       "|      2024-07-05| PROP002013|   Commercial|5288486.98|Waterfront|\n",
+       "|      2024-10-05| PROP004988|   Commercial|5258298.18|Waterfront|\n",
+       "|      2024-06-06| PROP005373|   Commercial|5242295.52|Waterfront|\n",
+       "|      2024-10-02| PROP000600|   Commercial|5229563.04|Waterfront|\n",
+       "|      2024-10-10| PROP002000|   Commercial|5221318.55|Waterfront|\n",
+       "|      2024-06-16| PROP007748|   Commercial|5219796.51|Waterfront|\n",
+       "|      2024-06-29| PROP000353|   Commercial| 5171034.0|Urban Core|\n",
+       "|      2024-06-18| PROP003405|   Commercial|5166032.95|Urban Core|\n",
+       "|      2024-06-09| PROP004845|   Commercial| 5147234.4|Waterfront|\n",
+       "|      2024-12-09| PROP001483|   Commercial|5098624.65|Waterfront|\n",
+       "|      2024-12-09| PROP004901|   Commercial|5075851.14|Waterfront|\n",
+       "|      2024-10-12| PROP003462|   Commercial|5058786.39|Waterfront|\n",
+       "+----------------+-----------+-------------+----------+----------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "High-value transactions found: 1684\n",
+       "\n",
+       "=== Query 3: Property Value Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+----------------+-------------+----------+--------------+\n",
+       "|property_id|transaction_date|property_type|sale_price|price_per_sqft|\n",
+       "+-----------+----------------+-------------+----------+--------------+\n",
+       "| PROP000002|      2024-05-31|    Townhouse| 621440.16|        254.48|\n",
+       "| PROP000003|      2024-11-14|    Townhouse| 595406.85|        370.97|\n",
+       "| PROP000004|      2024-10-31|   Commercial|1020562.24|        274.64|\n",
+       "| PROP000007|      2024-12-18|    Apartment|  465049.2|         323.4|\n",
+       "| PROP000008|      2024-09-28|    Townhouse| 441549.36|        211.47|\n",
+       "| PROP000009|      2024-04-09|   Commercial| 4561434.5|         454.1|\n",
+       "| PROP000009|      2024-10-03|   Commercial|4357219.65|        433.77|\n",
+       "| PROP000009|      2024-10-09|   Commercial|4204535.65|        418.57|\n",
+       "| PROP000010|      2024-05-01|Single Family|1248957.92|        441.64|\n",
+       "| PROP000010|      2024-05-30|Single Family|1194999.68|        422.56|\n",
+       "| PROP000010|      2024-08-09|Single Family|1219122.52|        431.09|\n",
+       "| PROP000011|      2024-09-22|        Condo|  436550.4|         343.2|\n",
+       "| PROP000012|      2024-09-20|        Condo| 530021.08|        253.72|\n",
+       "| PROP000013|      2024-07-25|    Apartment| 379305.99|        520.31|\n",
+       "| PROP000014|      2024-10-10|    Apartment| 440308.48|        288.16|\n",
+       "| PROP000015|      2024-11-19|Single Family|  850184.1|        286.74|\n",
+       "| PROP000016|      2024-11-16|Single Family|  828172.2|        225.66|\n",
+       "| PROP000017|      2024-08-31|   Commercial|1756840.32|        428.08|\n",
+       "| PROP000018|      2024-08-28|   Commercial|4382253.48|        455.82|\n",
+       "| PROP000019|      2024-11-10|    Townhouse| 901382.94|        397.26|\n",
+       "+-----------+----------------+-------------+----------+--------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Value trend records found: 1046\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Demonstrate liquid clustering benefits with optimized queries\n",
+    "\n",
+    "\n",
+    "# Query 1: Property transaction history - benefits from property_id clustering\n",
+    "\n",
+    "print(\"=== Query 1: Property Transaction History ===\")\n",
+    "\n",
+    "property_history = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT property_id, transaction_date, property_type, sale_price, location\n",
+    "\n",
+    "FROM real_estate.analytics.property_transactions\n",
+    "\n",
+    "WHERE property_id = 'PROP000001'\n",
+    "\n",
+    "ORDER BY transaction_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "property_history.show()\n",
+    "\n",
+    "print(f\"Records found: {property_history.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 2: Time-based high-value transaction analysis - benefits from transaction_date clustering\n",
+    "\n",
+    "print(\"\\n=== Query 2: Recent High-Value Transactions ===\")\n",
+    "\n",
+    "high_value = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT transaction_date, property_id, property_type, sale_price, location\n",
+    "\n",
+    "FROM real_estate.analytics.property_transactions\n",
+    "\n",
+    "WHERE transaction_date >= '2024-06-01' AND sale_price > 1000000\n",
+    "\n",
+    "ORDER BY sale_price DESC, transaction_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "high_value.show()\n",
+    "\n",
+    "print(f\"High-value transactions found: {high_value.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 3: Combined property + time query - optimal for our clustering strategy\n",
+    "\n",
+    "print(\"\\n=== Query 3: Property Value Trends ===\")\n",
+    "\n",
+    "value_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT property_id, transaction_date, property_type, sale_price, price_per_sqft\n",
+    "\n",
+    "FROM real_estate.analytics.property_transactions\n",
+    "\n",
+    "WHERE property_id LIKE 'PROP000%' AND transaction_date >= '2024-04-01'\n",
+    "\n",
+    "ORDER BY property_id, transaction_date\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "value_trends.show()\n",
+    "\n",
+    "print(f\"Value trend records found: {value_trends.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Analyze Clustering Effectiveness\n",
+    "\n",
+    "### Understanding the Impact\n",
+    "\n",
+    "Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the real estate insights possible with this optimized structure.\n",
+    "\n",
+    "### Key Analytics\n",
+    "\n",
+    "- **Property value appreciation** and market performance\n",
+    "- **Location-based pricing** and neighborhood analysis\n",
+    "- **Property type trends** and market segmentation\n",
+    "- **Market timing** and seasonal patterns"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Property Value Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+------------------+--------------+--------------+--------------+------------------+-------------+----------+\n",
+       "|property_id|total_transactions|min_sale_price|max_sale_price|avg_sale_price|avg_price_per_sqft|property_type|  location|\n",
+       "+-----------+------------------+--------------+--------------+--------------+------------------+-------------+----------+\n",
+       "| PROP001960|                 1|    6349995.27|    6349995.27|    6349995.27|            543.99|   Commercial|Waterfront|\n",
+       "| PROP006087|                 2|    6074959.55|     6386463.6|    6230711.57|            543.45|   Commercial|Waterfront|\n",
+       "| PROP001727|                 1|     5789451.2|     5789451.2|     5789451.2|            517.84|   Commercial|Waterfront|\n",
+       "| PROP007555|                 1|    5784090.12|    5784090.12|    5784090.12|            482.49|   Commercial|Waterfront|\n",
+       "| PROP004332|                 1|    5750284.36|    5750284.36|    5750284.36|            526.39|   Commercial|Urban Core|\n",
+       "| PROP006731|                 1|    5637737.05|    5637737.05|    5637737.05|            507.95|   Commercial|Waterfront|\n",
+       "| PROP007714|                 1|    5625904.48|    5625904.48|    5625904.48|            547.48|   Commercial|Waterfront|\n",
+       "| PROP003955|                 1|     5620209.9|     5620209.9|     5620209.9|            471.89|   Commercial|Waterfront|\n",
+       "| PROP000900|                 3|    4758664.84|    6037540.36|    5593865.89|            491.29|   Commercial|Waterfront|\n",
+       "| PROP007661|                 1|    5575704.44|    5575704.44|    5575704.44|            539.08|   Commercial|Waterfront|\n",
+       "+-----------+------------------+--------------+--------------+--------------+------------------+-------------+----------+\n",
+       "\n",
+       "\n",
+       "=== Location Market Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------------------+------------------+--------------+------------------+------------------+-----------------+\n",
+       "|            location|total_transactions|avg_sale_price|avg_price_per_sqft|avg_days_on_market|unique_properties|\n",
+       "+--------------------+------------------+--------------+------------------+------------------+-----------------+\n",
+       "|          Waterfront|              1881|    1409207.07|            443.56|             59.16|             1322|\n",
+       "|          Urban Core|              1866|    1255564.55|            473.94|              59.7|             1310|\n",
+       "|            Downtown|              1877|    1021967.95|            393.32|              60.1|             1337|\n",
+       "|       Mountain View|              1890|     804231.04|            312.89|             60.04|             1322|\n",
+       "|Residential District|              1907|     723700.39|            267.59|             59.58|             1366|\n",
+       "|            Suburban|              1951|     675060.74|            252.12|             59.67|             1343|\n",
+       "+--------------------+------------------+--------------+------------------+------------------+-----------------+\n",
+       "\n",
+       "\n",
+       "=== Property Type Market Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------+-----------+--------------+------------------+------------------+-----------------+\n",
+       "|property_type|total_sales|avg_sale_price|avg_price_per_sqft|avg_days_on_market|unique_properties|\n",
+       "+-------------+-----------+--------------+------------------+------------------+-----------------+\n",
+       "|   Commercial|       2246|    2364622.79|            361.68|             104.5|             1571|\n",
+       "|Single Family|       2285|     882472.63|            306.23|             49.79|             1612|\n",
+       "|    Townhouse|       2294|     712690.76|            323.59|             48.78|             1599|\n",
+       "|        Condo|       2261|     529188.26|            379.15|              47.3|             1579|\n",
+       "|    Apartment|       2286|     424397.47|            410.72|             48.85|             1639|\n",
+       "+-------------+-----------+--------------+------------------+------------------+-----------------+\n",
+       "\n",
+       "\n",
+       "=== Market Timing Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------------------+-----------------+--------------+--------+---------------+\n",
+       "|          sale_speed|transaction_count|avg_sale_price|avg_days|   total_volume|\n",
+       "+--------------------+-----------------+--------------+--------+---------------+\n",
+       "|Fast Sale (1-30 d...|             2603|     648593.87|   18.48| 1.6882898554E9|\n",
+       "|Normal Sale (31-6...|             3715|     849302.21|   45.58| 3.1551577265E9|\n",
+       "|Slow Sale (61-90 ...|             3718|     841586.19|   75.59|3.12901747284E9|\n",
+       "|Very Slow Sale (9...|             1336|    2362655.36|  135.14|3.15650755849E9|\n",
+       "+--------------------+-----------------+--------------+--------+---------------+\n",
+       "\n",
+       "\n",
+       "=== Monthly Market Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------+------------------+---------------+--------------+------------------+-----------------+\n",
+       "|  month|total_transactions| monthly_volume|avg_sale_price|avg_price_per_sqft|unique_properties|\n",
+       "+-------+------------------+---------------+--------------+------------------+-----------------+\n",
+       "|2024-01|               951| 8.0156294112E8|     842863.24|            314.57|              919|\n",
+       "|2024-02|               920| 7.5918726717E8|     825203.55|            314.39|              902|\n",
+       "|2024-03|               938|1.01247160402E9|    1079394.03|            401.65|              900|\n",
+       "|2024-04|               988|1.14251391792E9|    1156390.61|            396.33|              954|\n",
+       "|2024-05|              1002|1.07273555366E9|    1070594.36|            400.43|              979|\n",
+       "|2024-06|               909|1.01445630032E9|    1116013.53|            403.76|              876|\n",
+       "|2024-07|              1006| 9.4594880925E8|     940306.97|            352.47|              976|\n",
+       "|2024-08|               892| 8.5601528164E8|     959658.39|            349.58|              861|\n",
+       "|2024-09|               916| 8.6428797707E8|     943545.83|            351.74|              888|\n",
+       "|2024-10|               981| 9.9316499128E8|     1012400.6|            351.47|              951|\n",
+       "|2024-11|               919| 8.3654569217E8|     910278.23|            310.92|              892|\n",
+       "|2024-12|               950| 8.3008227761E8|     873770.82|            322.55|              914|\n",
+       "+-------+------------------+---------------+--------------+------------------+-----------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Analyze clustering effectiveness and real estate insights\n",
+    "\n",
+    "\n",
+    "# Property value analysis\n",
+    "\n",
+    "print(\"=== Property Value Analysis ===\")\n",
+    "\n",
+    "property_values = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT property_id, COUNT(*) as total_transactions,\n",
+    "\n",
+    "       ROUND(MIN(sale_price), 2) as min_sale_price,\n",
+    "\n",
+    "       ROUND(MAX(sale_price), 2) as max_sale_price,\n",
+    "\n",
+    "       ROUND(AVG(sale_price), 2) as avg_sale_price,\n",
+    "\n",
+    "       ROUND(AVG(price_per_sqft), 2) as avg_price_per_sqft,\n",
+    "\n",
+    "       property_type, location\n",
+    "\n",
+    "FROM real_estate.analytics.property_transactions\n",
+    "\n",
+    "GROUP BY property_id, property_type, location\n",
+    "\n",
+    "ORDER BY avg_sale_price DESC\n",
+    "\n",
+    "LIMIT 10\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "property_values.show()\n",
+    "\n",
+    "\n",
+    "# Location market analysis\n",
+    "\n",
+    "print(\"\\n=== Location Market Analysis ===\")\n",
+    "\n",
+    "location_analysis = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT location, COUNT(*) as total_transactions,\n",
+    "\n",
+    "       ROUND(AVG(sale_price), 2) as avg_sale_price,\n",
+    "\n",
+    "       ROUND(AVG(price_per_sqft), 2) as avg_price_per_sqft,\n",
+    "\n",
+    "       ROUND(AVG(days_on_market), 2) as avg_days_on_market,\n",
+    "\n",
+    "       COUNT(DISTINCT property_id) as unique_properties\n",
+    "\n",
+    "FROM real_estate.analytics.property_transactions\n",
+    "\n",
+    "GROUP BY location\n",
+    "\n",
+    "ORDER BY avg_sale_price DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "location_analysis.show()\n",
+    "\n",
+    "\n",
+    "# Property type market trends\n",
+    "\n",
+    "print(\"\\n=== Property Type Market Trends ===\")\n",
+    "\n",
+    "property_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT property_type, COUNT(*) as total_sales,\n",
+    "\n",
+    "       ROUND(AVG(sale_price), 2) as avg_sale_price,\n",
+    "\n",
+    "       ROUND(AVG(price_per_sqft), 2) as avg_price_per_sqft,\n",
+    "\n",
+    "       ROUND(AVG(days_on_market), 2) as avg_days_on_market,\n",
+    "\n",
+    "       COUNT(DISTINCT property_id) as unique_properties\n",
+    "\n",
+    "FROM real_estate.analytics.property_transactions\n",
+    "\n",
+    "GROUP BY property_type\n",
+    "\n",
+    "ORDER BY avg_sale_price DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "property_trends.show()\n",
+    "\n",
+    "\n",
+    "# Market timing analysis\n",
+    "\n",
+    "print(\"\\n=== Market Timing Analysis ===\")\n",
+    "\n",
+    "market_timing = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT \n",
+    "\n",
+    "    CASE \n",
+    "\n",
+    "        WHEN days_on_market <= 30 THEN 'Fast Sale (1-30 days)'\n",
+    "\n",
+    "        WHEN days_on_market <= 60 THEN 'Normal Sale (31-60 days)'\n",
+    "\n",
+    "        WHEN days_on_market <= 90 THEN 'Slow Sale (61-90 days)'\n",
+    "\n",
+    "        ELSE 'Very Slow Sale (90+ days)'\n",
+    "\n",
+    "    END as sale_speed,\n",
+    "\n",
+    "    COUNT(*) as transaction_count,\n",
+    "\n",
+    "    ROUND(AVG(sale_price), 2) as avg_sale_price,\n",
+    "\n",
+    "    ROUND(AVG(days_on_market), 2) as avg_days,\n",
+    "\n",
+    "    ROUND(SUM(sale_price), 2) as total_volume\n",
+    "\n",
+    "FROM real_estate.analytics.property_transactions\n",
+    "\n",
+    "GROUP BY \n",
+    "\n",
+    "    CASE \n",
+    "\n",
+    "        WHEN days_on_market <= 30 THEN 'Fast Sale (1-30 days)'\n",
+    "\n",
+    "        WHEN days_on_market <= 60 THEN 'Normal Sale (31-60 days)'\n",
+    "\n",
+    "        WHEN days_on_market <= 90 THEN 'Slow Sale (61-90 days)'\n",
+    "\n",
+    "        ELSE 'Very Slow Sale (90+ days)'\n",
+    "\n",
+    "    END\n",
+    "\n",
+    "ORDER BY avg_days\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "market_timing.show()\n",
+    "\n",
+    "\n",
+    "# Monthly market trends\n",
+    "\n",
+    "print(\"\\n=== Monthly Market Trends ===\")\n",
+    "\n",
+    "monthly_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT DATE_FORMAT(transaction_date, 'yyyy-MM') as month,\n",
+    "\n",
+    "       COUNT(*) as total_transactions,\n",
+    "\n",
+    "       ROUND(SUM(sale_price), 2) as monthly_volume,\n",
+    "\n",
+    "       ROUND(AVG(sale_price), 2) as avg_sale_price,\n",
+    "\n",
+    "       ROUND(AVG(price_per_sqft), 2) as avg_price_per_sqft,\n",
+    "\n",
+    "       COUNT(DISTINCT property_id) as unique_properties\n",
+    "\n",
+    "FROM real_estate.analytics.property_transactions\n",
+    "\n",
+    "GROUP BY DATE_FORMAT(transaction_date, 'yyyy-MM')\n",
+    "\n",
+    "ORDER BY month\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "monthly_trends.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways: Delta Liquid Clustering in AIDP\n",
+    "\n",
+    "### What We Demonstrated\n",
+    "\n",
+    "1. **Automatic Optimization**: Created a table with `CLUSTER BY (property_id, transaction_date)` and let Delta automatically optimize data layout\n",
+    "\n",
+    "2. **Performance Benefits**: Queries on clustered columns (property_id, transaction_date) are significantly faster due to data locality\n",
+    "\n",
+    "3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically\n",
+    "\n",
+    "4. **Real-World Use Case**: Real estate analytics where property tracking and market analysis are critical\n",
+    "\n",
+    "### AIDP Advantages\n",
+    "\n",
+    "- **Unified Analytics**: Seamlessly integrates with other AIDP services\n",
+    "- **Governance**: Catalog and schema isolation for real estate data\n",
+    "- **Performance**: Optimized for both OLAP and OLTP workloads\n",
+    "- **Scalability**: Handles real estate-scale data volumes effortlessly\n",
+    "\n",
+    "### Best Practices for Liquid Clustering\n",
+    "\n",
+    "1. **Choose clustering columns** based on your most common query patterns\n",
+    "2. **Start with 1-4 columns** - too many can reduce effectiveness\n",
+    "3. **Consider cardinality** - high-cardinality columns work best\n",
+    "4. **Monitor and adjust** as query patterns evolve\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Explore other AIDP features like AI/ML integration\n",
+    "- Try liquid clustering with different column combinations\n",
+    "- Scale up to larger real estate datasets\n",
+    "- Integrate with real MLS and property management systems\n",
+    "\n",
+    "This notebook demonstrates how Oracle AI Data Platform makes advanced real estate analytics accessible while maintaining enterprise-grade performance and governance."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Notebooks/liquid_clustering/retail_delta_liquid_clustering_demo.ipynb b/Notebooks/liquid_clustering/retail_delta_liquid_clustering_demo.ipynb
new file mode 100644
index 0000000..1f076d5
--- /dev/null
+++ b/Notebooks/liquid_clustering/retail_delta_liquid_clustering_demo.ipynb
@@ -0,0 +1,1011 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Retail Analytics: Delta Liquid Clustering Demo\n",
+    "\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "\n",
+    "This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using a retail analytics use case. Liquid clustering automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.\n",
+    "\n",
+    "### What is Liquid Clustering?\n",
+    "\n",
+    "Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:\n",
+    "\n",
+    "- **Automatic optimization**: No manual tuning required\n",
+    "- **Improved query performance**: Faster queries on clustered columns\n",
+    "- **Reduced maintenance**: No need for manual repartitioning\n",
+    "- **Adaptive clustering**: Adjusts as data patterns change\n",
+    "\n",
+    "### Use Case: Customer Purchase Analytics\n",
+    "\n",
+    "We'll analyze customer purchase records from a retail company. Our clustering strategy will optimize for:\n",
+    "\n",
+    "- **Customer-specific queries**: Fast lookups by customer ID\n",
+    "- **Time-based analysis**: Efficient filtering by purchase date\n",
+    "- **Purchase patterns**: Quick aggregation by product category and customer segments\n",
+    "\n",
+    "### AIDP Environment Setup\n",
+    "\n",
+    "This notebook leverages the existing Spark session in your AIDP environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Retail catalog and analytics schema created successfully!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create retail catalog and analytics schema\n",
+    "\n",
+    "# In AIDP, catalogs provide data isolation and governance\n",
+    "\n",
+    "spark.sql(\"CREATE CATALOG IF NOT EXISTS retail\")\n",
+    "\n",
+    "spark.sql(\"CREATE SCHEMA IF NOT EXISTS retail.analytics\")\n",
+    "\n",
+    "print(\"Retail catalog and analytics schema created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create Delta Table with Liquid Clustering\n",
+    "\n",
+    "### Table Design\n",
+    "\n",
+    "Our `customer_purchases` table will store:\n",
+    "\n",
+    "- **customer_id**: Unique customer identifier\n",
+    "- **purchase_date**: Date of purchase\n",
+    "- **product_id**: Product identifier\n",
+    "- **product_category**: Category (Electronics, Clothing, Home, etc.)\n",
+    "- **purchase_amount**: Transaction amount\n",
+    "- **store_id**: Store location identifier\n",
+    "- **payment_method**: Payment type (Credit, Debit, Cash, etc.)\n",
+    "\n",
+    "### Clustering Strategy\n",
+    "\n",
+    "We'll cluster by `customer_id` and `purchase_date` because:\n",
+    "\n",
+    "- **customer_id**: Customers often make multiple purchases, grouping their transaction history together\n",
+    "- **purchase_date**: Time-based queries are common for sales analysis, seasonality, and trends\n",
+    "- This combination optimizes for both customer behavior analysis and temporal sales reporting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Delta table with liquid clustering created successfully!\n",
+       "Clustering will automatically optimize data layout for queries on customer_id and purchase_date.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create Delta table with liquid clustering\n",
+    "\n",
+    "# CLUSTER BY defines the columns for automatic optimization\n",
+    "\n",
+    "spark.sql(\"\"\"\n",
+    "\n",
+    "CREATE TABLE IF NOT EXISTS retail.analytics.customer_purchases (\n",
+    "\n",
+    "    customer_id STRING,\n",
+    "\n",
+    "    purchase_date DATE,\n",
+    "\n",
+    "    product_id STRING,\n",
+    "\n",
+    "    product_category STRING,\n",
+    "\n",
+    "    purchase_amount DECIMAL(10,2),\n",
+    "\n",
+    "    store_id STRING,\n",
+    "\n",
+    "    payment_method STRING\n",
+    "\n",
+    ")\n",
+    "\n",
+    "USING DELTA\n",
+    "\n",
+    "CLUSTER BY (customer_id, purchase_date)\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"Delta table with liquid clustering created successfully!\")\n",
+    "\n",
+    "print(\"Clustering will automatically optimize data layout for queries on customer_id and purchase_date.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Generate Retail Sample Data\n",
+    "\n",
+    "### Data Generation Strategy\n",
+    "\n",
+    "We'll create realistic retail purchase data including:\n",
+    "\n",
+    "- **1,000 customers** with multiple purchases over time\n",
+    "- **Product categories**: Electronics, Clothing, Home & Garden, Books, Sports\n",
+    "- **Realistic temporal patterns**: Seasonal shopping, repeat purchases, varying amounts\n",
+    "- **Multiple stores**: Different retail locations across regions\n",
+    "\n",
+    "### Why This Data Pattern?\n",
+    "\n",
+    "This data simulates real retail scenarios where:\n",
+    "\n",
+    "- Customers make multiple purchases over time\n",
+    "- Seasonal trends affect buying patterns\n",
+    "- Product categories drive different analytics needs\n",
+    "- Store-level performance analysis is required\n",
+    "- Customer segmentation enables personalized marketing"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Generated 5544 customer purchase records\n",
+       "Sample record: {'customer_id': 'CUST000001', 'purchase_date': datetime.date(2024, 9, 19), 'product_id': 'BOK003', 'product_category': 'Books', 'purchase_amount': 22.1, 'store_id': 'STORE_CHI_003', 'payment_method': 'Debit Card'}\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Generate sample retail purchase data\n",
+    "\n",
+    "# Using fully qualified imports to avoid conflicts\n",
+    "\n",
+    "import random\n",
+    "\n",
+    "from datetime import datetime, timedelta\n",
+    "\n",
+    "\n",
+    "# Define retail data constants\n",
+    "\n",
+    "PRODUCTS = {\n",
+    "\n",
+    "    \"Electronics\": [\n",
+    "\n",
+    "        (\"ELE001\", \"Smartphone\", 599.99),\n",
+    "\n",
+    "        (\"ELE002\", \"Laptop\", 1299.99),\n",
+    "\n",
+    "        (\"ELE003\", \"Headphones\", 149.99),\n",
+    "\n",
+    "        (\"ELE004\", \"Smart TV\", 799.99),\n",
+    "\n",
+    "        (\"ELE005\", \"Tablet\", 399.99)\n",
+    "\n",
+    "    ],\n",
+    "\n",
+    "    \"Clothing\": [\n",
+    "\n",
+    "        (\"CLO001\", \"T-Shirt\", 19.99),\n",
+    "\n",
+    "        (\"CLO002\", \"Jeans\", 79.99),\n",
+    "\n",
+    "        (\"CLO003\", \"Jacket\", 129.99),\n",
+    "\n",
+    "        (\"CLO004\", \"Sneakers\", 89.99),\n",
+    "\n",
+    "        (\"CLO005\", \"Dress\", 59.99)\n",
+    "\n",
+    "    ],\n",
+    "\n",
+    "    \"Home & Garden\": [\n",
+    "\n",
+    "        (\"HOM001\", \"Blender\", 79.99),\n",
+    "\n",
+    "        (\"HOM002\", \"Coffee Maker\", 49.99),\n",
+    "\n",
+    "        (\"HOM003\", \"Garden Tools Set\", 39.99),\n",
+    "\n",
+    "        (\"HOM004\", \"Bedding Set\", 89.99),\n",
+    "\n",
+    "        (\"HOM005\", \"Decorative Pillow\", 24.99)\n",
+    "\n",
+    "    ],\n",
+    "\n",
+    "    \"Books\": [\n",
+    "\n",
+    "        (\"BOK001\", \"Fiction Novel\", 14.99),\n",
+    "\n",
+    "        (\"BOK002\", \"Cookbook\", 24.99),\n",
+    "\n",
+    "        (\"BOK003\", \"Biography\", 19.99),\n",
+    "\n",
+    "        (\"BOK004\", \"Self-Help Book\", 16.99),\n",
+    "\n",
+    "        (\"BOK005\", \"Children's Book\", 9.99)\n",
+    "\n",
+    "    ],\n",
+    "\n",
+    "    \"Sports\": [\n",
+    "\n",
+    "        (\"SPO001\", \"Yoga Mat\", 29.99),\n",
+    "\n",
+    "        (\"SPO002\", \"Dumbbells\", 49.99),\n",
+    "\n",
+    "        (\"SPO003\", \"Running Shoes\", 119.99),\n",
+    "\n",
+    "        (\"SPO004\", \"Basketball\", 24.99),\n",
+    "\n",
+    "        (\"SPO005\", \"Tennis Racket\", 89.99)\n",
+    "\n",
+    "    ]\n",
+    "\n",
+    "}\n",
+    "\n",
+    "\n",
+    "\n",
+    "STORES = [\"STORE_NYC_001\", \"STORE_LAX_002\", \"STORE_CHI_003\", \"STORE_HOU_004\", \"STORE_MIA_005\"]\n",
+    "\n",
+    "PAYMENT_METHODS = [\"Credit Card\", \"Debit Card\", \"Cash\", \"Digital Wallet\", \"Buy Now Pay Later\"]\n",
+    "\n",
+    "\n",
+    "# Generate customer purchase records\n",
+    "\n",
+    "purchase_data = []\n",
+    "\n",
+    "base_date = datetime(2024, 1, 1)\n",
+    "\n",
+    "\n",
+    "# Create 1,000 customers with 3-8 purchases each\n",
+    "\n",
+    "for customer_num in range(1, 1001):\n",
+    "\n",
+    "    customer_id = f\"CUST{customer_num:06d}\"\n",
+    "    \n",
+    "    # Each customer gets 3-8 purchases over 12 months\n",
+    "\n",
+    "    num_purchases = random.randint(3, 8)\n",
+    "    \n",
+    "    for i in range(num_purchases):\n",
+    "\n",
+    "        # Spread purchases over 12 months\n",
+    "\n",
+    "        days_offset = random.randint(0, 365)\n",
+    "\n",
+    "        purchase_date = base_date + timedelta(days=days_offset)\n",
+    "        \n",
+    "        # Select random category and product\n",
+    "\n",
+    "        category = random.choice(list(PRODUCTS.keys()))\n",
+    "\n",
+    "        product_id, product_name, base_price = random.choice(PRODUCTS[category])\n",
+    "        \n",
+    "        # Add some price variation (±20%)\n",
+    "\n",
+    "        price_variation = random.uniform(0.8, 1.2)\n",
+    "\n",
+    "        purchase_amount = round(base_price * price_variation, 2)\n",
+    "        \n",
+    "        # Select random store and payment method\n",
+    "\n",
+    "        store_id = random.choice(STORES)\n",
+    "\n",
+    "        payment_method = random.choice(PAYMENT_METHODS)\n",
+    "        \n",
+    "        purchase_data.append({\n",
+    "\n",
+    "            \"customer_id\": customer_id,\n",
+    "\n",
+    "            \"purchase_date\": purchase_date.date(),\n",
+    "\n",
+    "            \"product_id\": product_id,\n",
+    "\n",
+    "            \"product_category\": category,\n",
+    "\n",
+    "            \"purchase_amount\": purchase_amount,\n",
+    "\n",
+    "            \"store_id\": store_id,\n",
+    "\n",
+    "            \"payment_method\": payment_method\n",
+    "\n",
+    "        })\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(f\"Generated {len(purchase_data)} customer purchase records\")\n",
+    "\n",
+    "print(\"Sample record:\", purchase_data[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Insert Data Using PySpark\n",
+    "\n",
+    "### Data Insertion Strategy\n",
+    "\n",
+    "We'll use PySpark to:\n",
+    "\n",
+    "1. **Create DataFrame** from our generated data\n",
+    "2. **Insert into Delta table** with liquid clustering\n",
+    "3. **Verify the insertion** with a sample query\n",
+    "\n",
+    "### Why PySpark for Insertion?\n",
+    "\n",
+    "- **Distributed processing**: Handles large datasets efficiently\n",
+    "- **Type safety**: Ensures data integrity\n",
+    "- **Optimization**: Leverages Spark's query optimization\n",
+    "- **Liquid clustering**: Automatically applies clustering during insertion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataFrame Schema:\n",
+       "root\n",
+       " |-- customer_id: string (nullable = true)\n",
+       " |-- payment_method: string (nullable = true)\n",
+       " |-- product_category: string (nullable = true)\n",
+       " |-- product_id: string (nullable = true)\n",
+       " |-- purchase_amount: double (nullable = true)\n",
+       " |-- purchase_date: date (nullable = true)\n",
+       " |-- store_id: string (nullable = true)\n",
+       "\n",
+       "\n",
+       "Sample Data:\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+-----------------+----------------+----------+---------------+-------------+-------------+\n",
+       "|customer_id|   payment_method|product_category|product_id|purchase_amount|purchase_date|     store_id|\n",
+       "+-----------+-----------------+----------------+----------+---------------+-------------+-------------+\n",
+       "| CUST000001|       Debit Card|           Books|    BOK003|           22.1|   2024-09-19|STORE_CHI_003|\n",
+       "| CUST000001|      Credit Card|          Sports|    SPO004|          23.78|   2024-10-29|STORE_CHI_003|\n",
+       "| CUST000001|Buy Now Pay Later|          Sports|    SPO004|           20.7|   2024-03-20|STORE_LAX_002|\n",
+       "| CUST000001|             Cash|     Electronics|    ELE003|         153.44|   2024-11-07|STORE_HOU_004|\n",
+       "| CUST000001|             Cash|   Home & Garden|    HOM005|          21.11|   2024-05-11|STORE_HOU_004|\n",
+       "+-----------+-----------------+----------------+----------+---------------+-------------+-------------+\n",
+       "only showing top 5 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "Successfully inserted 5544 records into retail.analytics.customer_purchases\n",
+       "Liquid clustering automatically optimized the data layout during insertion!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Insert data using PySpark DataFrame operations\n",
+    "\n",
+    "# Using fully qualified function references to avoid conflicts\n",
+    "\n",
+    "\n",
+    "# Create DataFrame from generated data\n",
+    "\n",
+    "df_purchases = spark.createDataFrame(purchase_data)\n",
+    "\n",
+    "\n",
+    "# Display schema and sample data\n",
+    "\n",
+    "print(\"DataFrame Schema:\")\n",
+    "\n",
+    "df_purchases.printSchema()\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(\"\\nSample Data:\")\n",
+    "\n",
+    "df_purchases.show(5)\n",
+    "\n",
+    "\n",
+    "# Insert data into Delta table with liquid clustering\n",
+    "\n",
+    "# The CLUSTER BY (customer_id, purchase_date) will automatically optimize the data layout\n",
+    "\n",
+    "df_purchases.write.mode(\"overwrite\").saveAsTable(\"retail.analytics.customer_purchases\")\n",
+    "\n",
+    "\n",
+    "print(f\"\\nSuccessfully inserted {df_purchases.count()} records into retail.analytics.customer_purchases\")\n",
+    "\n",
+    "print(\"Liquid clustering automatically optimized the data layout during insertion!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Demonstrate Liquid Clustering Benefits\n",
+    "\n",
+    "### Query Performance Analysis\n",
+    "\n",
+    "Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:\n",
+    "\n",
+    "1. **Customer purchase history** (clustered by customer_id)\n",
+    "2. **Time-based sales analysis** (clustered by purchase_date)\n",
+    "3. **Combined customer + time queries** (optimal for our clustering)\n",
+    "\n",
+    "### Expected Performance Benefits\n",
+    "\n",
+    "With liquid clustering, these queries should be significantly faster because:\n",
+    "\n",
+    "- **Data locality**: Related records are physically grouped together\n",
+    "- **Reduced I/O**: Less data needs to be read from disk\n",
+    "- **Automatic optimization**: No manual tuning required"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Query 1: Customer Purchase History ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+-------------+----------------+---------------+-------------+\n",
+       "|customer_id|purchase_date|product_category|purchase_amount|     store_id|\n",
+       "+-----------+-------------+----------------+---------------+-------------+\n",
+       "| CUST000001|   2024-03-20|          Sports|           20.7|STORE_LAX_002|\n",
+       "| CUST000001|   2024-05-11|   Home & Garden|          21.11|STORE_HOU_004|\n",
+       "| CUST000001|   2024-09-19|           Books|           22.1|STORE_CHI_003|\n",
+       "| CUST000001|   2024-10-29|          Sports|          23.78|STORE_CHI_003|\n",
+       "| CUST000001|   2024-11-07|     Electronics|         153.44|STORE_HOU_004|\n",
+       "+-----------+-------------+----------------+---------------+-------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Records found: 5\n",
+       "\n",
+       "=== Query 2: High-Value Purchases This Month ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------+-----------+----------------+---------------+-----------------+\n",
+       "|purchase_date|customer_id|product_category|purchase_amount|   payment_method|\n",
+       "+-------------+-----------+----------------+---------------+-----------------+\n",
+       "|   2024-12-31| CUST000360|     Electronics|        1539.12|       Debit Card|\n",
+       "|   2024-12-31| CUST000133|     Electronics|         941.32|   Digital Wallet|\n",
+       "|   2024-12-31| CUST000989|     Electronics|         708.76|Buy Now Pay Later|\n",
+       "|   2024-12-31| CUST000279|     Electronics|         691.22|   Digital Wallet|\n",
+       "|   2024-12-31| CUST000047|     Electronics|         561.09|Buy Now Pay Later|\n",
+       "|   2024-12-30| CUST000366|     Electronics|        1413.89|             Cash|\n",
+       "|   2024-12-30| CUST000560|     Electronics|          900.6|             Cash|\n",
+       "|   2024-12-29| CUST000006|     Electronics|         896.14|       Debit Card|\n",
+       "|   2024-12-27| CUST000861|     Electronics|         546.64|             Cash|\n",
+       "|   2024-12-26| CUST000858|     Electronics|          569.1|             Cash|\n",
+       "|   2024-12-25| CUST000574|     Electronics|         882.02|Buy Now Pay Later|\n",
+       "|   2024-12-25| CUST000621|     Electronics|         676.28|             Cash|\n",
+       "|   2024-12-24| CUST000865|     Electronics|        1341.39|   Digital Wallet|\n",
+       "|   2024-12-24| CUST000192|     Electronics|        1313.52|      Credit Card|\n",
+       "|   2024-12-24| CUST000130|     Electronics|        1308.95|       Debit Card|\n",
+       "|   2024-12-24| CUST000004|     Electronics|         634.57|   Digital Wallet|\n",
+       "|   2024-12-24| CUST000593|     Electronics|         540.54|Buy Now Pay Later|\n",
+       "|   2024-12-23| CUST000184|     Electronics|         1389.5|      Credit Card|\n",
+       "|   2024-12-23| CUST000423|     Electronics|         554.73|Buy Now Pay Later|\n",
+       "|   2024-12-22| CUST000651|     Electronics|        1409.86|       Debit Card|\n",
+       "+-------------+-----------+----------------+---------------+-----------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "High-value purchases found: 385\n",
+       "\n",
+       "=== Query 3: Customer Spending Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+-------------+----------------+---------------+\n",
+       "|customer_id|purchase_date|product_category|purchase_amount|\n",
+       "+-----------+-------------+----------------+---------------+\n",
+       "| CUST000100|   2024-05-05|     Electronics|        1507.71|\n",
+       "| CUST000100|   2024-05-15|   Home & Garden|          90.15|\n",
+       "| CUST000100|   2024-06-03|          Sports|         125.47|\n",
+       "| CUST000100|   2024-10-27|          Sports|          24.08|\n",
+       "| CUST000100|   2024-11-14|        Clothing|          85.26|\n",
+       "| CUST000100|   2024-12-02|           Books|          10.94|\n",
+       "| CUST000101|   2024-06-15|          Sports|          28.67|\n",
+       "| CUST000101|   2024-08-02|        Clothing|          85.37|\n",
+       "| CUST000101|   2024-08-10|          Sports|         128.44|\n",
+       "| CUST000101|   2024-09-03|          Sports|          79.51|\n",
+       "| CUST000102|   2024-05-28|          Sports|          24.17|\n",
+       "| CUST000102|   2024-06-17|           Books|          22.35|\n",
+       "| CUST000102|   2024-07-16|        Clothing|          80.71|\n",
+       "| CUST000102|   2024-09-28|           Books|           8.38|\n",
+       "| CUST000103|   2024-04-09|        Clothing|          18.75|\n",
+       "| CUST000103|   2024-06-30|           Books|          10.06|\n",
+       "| CUST000104|   2024-04-01|        Clothing|          63.41|\n",
+       "| CUST000104|   2024-06-05|   Home & Garden|          21.87|\n",
+       "| CUST000104|   2024-10-09|     Electronics|         592.26|\n",
+       "| CUST000104|   2024-12-02|   Home & Garden|          86.61|\n",
+       "+-----------+-------------+----------------+---------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Trend records found: 400\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Demonstrate liquid clustering benefits with optimized queries\n",
+    "\n",
+    "\n",
+    "# Query 1: Customer purchase history - benefits from customer_id clustering\n",
+    "\n",
+    "print(\"=== Query 1: Customer Purchase History ===\")\n",
+    "\n",
+    "customer_history = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT customer_id, purchase_date, product_category, purchase_amount, store_id\n",
+    "\n",
+    "FROM retail.analytics.customer_purchases\n",
+    "\n",
+    "WHERE customer_id = 'CUST000001'\n",
+    "\n",
+    "ORDER BY purchase_date\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "customer_history.show()\n",
+    "\n",
+    "print(f\"Records found: {customer_history.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 2: Time-based sales analysis - benefits from purchase_date clustering\n",
+    "\n",
+    "print(\"\\n=== Query 2: High-Value Purchases This Month ===\")\n",
+    "\n",
+    "high_value_recent = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT purchase_date, customer_id, product_category, purchase_amount, payment_method\n",
+    "\n",
+    "FROM retail.analytics.customer_purchases\n",
+    "\n",
+    "WHERE purchase_date >= '2024-06-01' AND purchase_amount > 500\n",
+    "\n",
+    "ORDER BY purchase_date DESC, purchase_amount DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "high_value_recent.show()\n",
+    "\n",
+    "print(f\"High-value purchases found: {high_value_recent.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 3: Combined customer + time query - optimal for our clustering strategy\n",
+    "\n",
+    "print(\"\\n=== Query 3: Customer Spending Trends ===\")\n",
+    "\n",
+    "customer_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT customer_id, purchase_date, product_category, purchase_amount\n",
+    "\n",
+    "FROM retail.analytics.customer_purchases\n",
+    "\n",
+    "WHERE customer_id LIKE 'CUST0001%' AND purchase_date >= '2024-04-01'\n",
+    "\n",
+    "ORDER BY customer_id, purchase_date\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "customer_trends.show()\n",
+    "\n",
+    "print(f\"Trend records found: {customer_trends.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Analyze Clustering Effectiveness\n",
+    "\n",
+    "### Understanding the Impact\n",
+    "\n",
+    "Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the retail insights possible with this optimized structure.\n",
+    "\n",
+    "### Key Analytics\n",
+    "\n",
+    "- **Sales by category** and performance trends\n",
+    "- **Customer segmentation** by spending patterns\n",
+    "- **Store performance** analysis\n",
+    "- **Payment method preferences** and seasonal trends"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Sales by Category Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------------+---------------+-------------+------------+------------------+\n",
+       "|product_category|total_purchases|total_revenue|avg_purchase|revenue_percentage|\n",
+       "+----------------+---------------+-------------+------------+------------------+\n",
+       "|     Electronics|           1069|    700376.34|      655.17|             74.54|\n",
+       "|        Clothing|           1134|     85054.48|        75.0|              9.05|\n",
+       "|          Sports|           1104|     69841.08|       63.26|              7.43|\n",
+       "|   Home & Garden|           1116|     64960.34|       58.21|              6.91|\n",
+       "|           Books|           1121|     19371.41|       17.28|              2.06|\n",
+       "+----------------+---------------+-------------+------------+------------------+\n",
+       "\n",
+       "\n",
+       "=== Customer Segmentation Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------------+--------------+---------------+---------------+\n",
+       "|customer_segment|customer_count|avg_total_spent|segment_revenue|\n",
+       "+----------------+--------------+---------------+---------------+\n",
+       "|    Medium Value|           511|        1133.51|      579225.07|\n",
+       "|      High Value|            94|        2668.46|      250835.62|\n",
+       "|       Low Value|           395|         277.32|      109542.96|\n",
+       "+----------------+--------------+---------------+---------------+\n",
+       "\n",
+       "\n",
+       "=== Store Performance Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------+------------------+----------------+-------------+---------------------+\n",
+       "|     store_id|total_transactions|unique_customers|total_revenue|avg_transaction_value|\n",
+       "+-------------+------------------+----------------+-------------+---------------------+\n",
+       "|STORE_MIA_005|              1144|             691|    204945.24|               179.15|\n",
+       "|STORE_LAX_002|              1180|             710|    195725.56|               165.87|\n",
+       "|STORE_HOU_004|              1042|             654|     181276.2|               173.97|\n",
+       "|STORE_CHI_003|              1106|             698|     180939.6|                163.6|\n",
+       "|STORE_NYC_001|              1072|             680|    176717.05|               164.85|\n",
+       "+-------------+------------------+----------------+-------------+---------------------+\n",
+       "\n",
+       "\n",
+       "=== Monthly Sales Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------+------------+---------------+----------------+\n",
+       "|  month|transactions|monthly_revenue|active_customers|\n",
+       "+-------+------------+---------------+----------------+\n",
+       "|2024-01|         485|       77417.29|             389|\n",
+       "|2024-02|         422|       67018.43|             350|\n",
+       "|2024-03|         447|       74457.04|             375|\n",
+       "|2024-04|         458|       82553.07|             380|\n",
+       "|2024-05|         469|       79372.22|             369|\n",
+       "|2024-06|         477|       91938.76|             384|\n",
+       "|2024-07|         466|       75765.05|             382|\n",
+       "|2024-08|         477|       71764.42|             392|\n",
+       "|2024-09|         473|       86854.52|             377|\n",
+       "|2024-10|         442|       82179.17|             358|\n",
+       "|2024-11|         457|       71592.74|             373|\n",
+       "|2024-12|         471|       78690.94|             378|\n",
+       "+-------+------------+---------------+----------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Analyze clustering effectiveness and retail insights\n",
+    "\n",
+    "\n",
+    "# Sales by category analysis\n",
+    "\n",
+    "print(\"=== Sales by Category Analysis ===\")\n",
+    "\n",
+    "category_sales = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT product_category, COUNT(*) as total_purchases,\n",
+    "\n",
+    "       ROUND(SUM(purchase_amount), 2) as total_revenue,\n",
+    "\n",
+    "       ROUND(AVG(purchase_amount), 2) as avg_purchase,\n",
+    "\n",
+    "       ROUND(SUM(purchase_amount) * 100.0 / SUM(SUM(purchase_amount)) OVER (), 2) as revenue_percentage\n",
+    "\n",
+    "FROM retail.analytics.customer_purchases\n",
+    "\n",
+    "GROUP BY product_category\n",
+    "\n",
+    "ORDER BY total_revenue DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "category_sales.show()\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Customer segmentation by spending\n",
+    "\n",
+    "print(\"\\n=== Customer Segmentation Analysis ===\")\n",
+    "\n",
+    "customer_segments = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT \n",
+    "\n",
+    "    CASE \n",
+    "\n",
+    "        WHEN total_spent >= 2000 THEN 'High Value'\n",
+    "\n",
+    "        WHEN total_spent >= 500 THEN 'Medium Value'\n",
+    "\n",
+    "        ELSE 'Low Value'\n",
+    "\n",
+    "    END as customer_segment,\n",
+    "\n",
+    "    COUNT(*) as customer_count,\n",
+    "\n",
+    "    ROUND(AVG(total_spent), 2) as avg_total_spent,\n",
+    "\n",
+    "    ROUND(SUM(total_spent), 2) as segment_revenue\n",
+    "\n",
+    "FROM (\n",
+    "\n",
+    "    SELECT customer_id, SUM(purchase_amount) as total_spent\n",
+    "\n",
+    "    FROM retail.analytics.customer_purchases\n",
+    "\n",
+    "    GROUP BY customer_id\n",
+    "\n",
+    ") customer_totals\n",
+    "\n",
+    "GROUP BY \n",
+    "\n",
+    "    CASE \n",
+    "\n",
+    "        WHEN total_spent >= 2000 THEN 'High Value'\n",
+    "\n",
+    "        WHEN total_spent >= 500 THEN 'Medium Value'\n",
+    "\n",
+    "        ELSE 'Low Value'\n",
+    "\n",
+    "    END\n",
+    "\n",
+    "ORDER BY segment_revenue DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "customer_segments.show()\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Store performance analysis\n",
+    "\n",
+    "print(\"\\n=== Store Performance Analysis ===\")\n",
+    "\n",
+    "store_performance = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT store_id, COUNT(*) as total_transactions,\n",
+    "\n",
+    "       COUNT(DISTINCT customer_id) as unique_customers,\n",
+    "\n",
+    "       ROUND(SUM(purchase_amount), 2) as total_revenue,\n",
+    "\n",
+    "       ROUND(AVG(purchase_amount), 2) as avg_transaction_value\n",
+    "\n",
+    "FROM retail.analytics.customer_purchases\n",
+    "\n",
+    "GROUP BY store_id\n",
+    "\n",
+    "ORDER BY total_revenue DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "store_performance.show()\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Monthly sales trends\n",
+    "\n",
+    "print(\"\\n=== Monthly Sales Trends ===\")\n",
+    "\n",
+    "monthly_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT DATE_FORMAT(purchase_date, 'yyyy-MM') as month,\n",
+    "\n",
+    "       COUNT(*) as transactions,\n",
+    "\n",
+    "       ROUND(SUM(purchase_amount), 2) as monthly_revenue,\n",
+    "\n",
+    "       COUNT(DISTINCT customer_id) as active_customers\n",
+    "\n",
+    "FROM retail.analytics.customer_purchases\n",
+    "\n",
+    "GROUP BY DATE_FORMAT(purchase_date, 'yyyy-MM')\n",
+    "\n",
+    "ORDER BY month\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "monthly_trends.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways: Delta Liquid Clustering in AIDP\n",
+    "\n",
+    "### What We Demonstrated\n",
+    "\n",
+    "1. **Automatic Optimization**: Created a table with `CLUSTER BY (customer_id, purchase_date)` and let Delta automatically optimize data layout\n",
+    "\n",
+    "2. **Performance Benefits**: Queries on clustered columns (customer_id, purchase_date) are significantly faster due to data locality\n",
+    "\n",
+    "3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically\n",
+    "\n",
+    "4. **Real-World Use Case**: Retail analytics where customer behavior analysis and sales reporting are critical\n",
+    "\n",
+    "### AIDP Advantages\n",
+    "\n",
+    "- **Unified Analytics**: Seamlessly integrates with other AIDP services\n",
+    "- **Governance**: Catalog and schema isolation for retail data\n",
+    "- **Performance**: Optimized for both OLAP and OLTP workloads\n",
+    "- **Scalability**: Handles retail-scale data volumes effortlessly\n",
+    "\n",
+    "### Best Practices for Liquid Clustering\n",
+    "\n",
+    "1. **Choose clustering columns** based on your most common query patterns\n",
+    "2. **Start with 1-4 columns** - too many can reduce effectiveness\n",
+    "3. **Consider cardinality** - high-cardinality columns work best\n",
+    "4. **Monitor and adjust** as query patterns evolve\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Explore other AIDP features like AI/ML integration\n",
+    "- Try liquid clustering with different column combinations\n",
+    "- Scale up to larger retail datasets\n",
+    "- Integrate with real POS systems and e-commerce platforms\n",
+    "\n",
+    "This notebook demonstrates how Oracle AI Data Platform makes advanced retail analytics accessible while maintaining enterprise-grade performance and governance."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Notebooks/liquid_clustering/telecommunications_delta_liquid_clustering_demo.ipynb b/Notebooks/liquid_clustering/telecommunications_delta_liquid_clustering_demo.ipynb
new file mode 100644
index 0000000..996518e
--- /dev/null
+++ b/Notebooks/liquid_clustering/telecommunications_delta_liquid_clustering_demo.ipynb
@@ -0,0 +1,1066 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Telecommunications: Delta Liquid Clustering Demo\n",
+    "\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "\n",
+    "This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using a telecommunications analytics use case. Liquid clustering automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.\n",
+    "\n",
+    "### What is Liquid Clustering?\n",
+    "\n",
+    "Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:\n",
+    "\n",
+    "- **Automatic optimization**: No manual tuning required\n",
+    "- **Improved query performance**: Faster queries on clustered columns\n",
+    "- **Reduced maintenance**: No need for manual repartitioning\n",
+    "- **Adaptive clustering**: Adjusts as data patterns change\n",
+    "\n",
+    "### Use Case: Network Performance Monitoring and Customer Experience Analytics\n",
+    "\n",
+    "We'll analyze telecommunications network performance and customer usage data. Our clustering strategy will optimize for:\n",
+    "\n",
+    "- **Customer-specific queries**: Fast lookups by subscriber ID\n",
+    "- **Time-based analysis**: Efficient filtering by call/service date\n",
+    "- **Network performance patterns**: Quick aggregation by cell tower and service quality metrics\n",
+    "\n",
+    "### AIDP Environment Setup\n",
+    "\n",
+    "This notebook leverages the existing Spark session in your AIDP environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Telecommunications catalog and analytics schema created successfully!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create telecommunications catalog and analytics schema\n",
+    "\n",
+    "# In AIDP, catalogs provide data isolation and governance\n",
+    "\n",
+    "spark.sql(\"CREATE CATALOG IF NOT EXISTS telecom\")\n",
+    "\n",
+    "spark.sql(\"CREATE SCHEMA IF NOT EXISTS telecom.analytics\")\n",
+    "\n",
+    "print(\"Telecommunications catalog and analytics schema created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create Delta Table with Liquid Clustering\n",
+    "\n",
+    "### Table Design\n",
+    "\n",
+    "Our `network_usage` table will store:\n",
+    "\n",
+    "- **subscriber_id**: Unique customer identifier\n",
+    "- **usage_date**: Date and time of service usage\n",
+    "- **service_type**: Type (Voice, Data, SMS, Streaming)\n",
+    "- **data_volume**: Data consumed (GB)\n",
+    "- **call_duration**: Call length (minutes)\n",
+    "- **cell_tower_id**: Network cell tower identifier\n",
+    "- **signal_quality**: Network signal strength (0-100)\n",
+    "\n",
+    "### Clustering Strategy\n",
+    "\n",
+    "We'll cluster by `subscriber_id` and `usage_date` because:\n",
+    "\n",
+    "- **subscriber_id**: Customers generate multiple service interactions, grouping their usage patterns together\n",
+    "- **usage_date**: Time-based queries are critical for billing cycles, network planning, and customer behavior analysis\n",
+    "- This combination optimizes for both customer analytics and temporal network performance monitoring"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Delta table with liquid clustering created successfully!\n",
+       "Clustering will automatically optimize data layout for queries on subscriber_id and usage_date.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create Delta table with liquid clustering\n",
+    "\n",
+    "# CLUSTER BY defines the columns for automatic optimization\n",
+    "\n",
+    "spark.sql(\"\"\"\n",
+    "\n",
+    "CREATE TABLE IF NOT EXISTS telecom.analytics.network_usage (\n",
+    "\n",
+    "    subscriber_id STRING,\n",
+    "\n",
+    "    usage_date TIMESTAMP,\n",
+    "\n",
+    "    service_type STRING,\n",
+    "\n",
+    "    data_volume DECIMAL(10,3),\n",
+    "\n",
+    "    call_duration DECIMAL(8,2),\n",
+    "\n",
+    "    cell_tower_id STRING,\n",
+    "\n",
+    "    signal_quality INT\n",
+    "\n",
+    ")\n",
+    "\n",
+    "USING DELTA\n",
+    "\n",
+    "CLUSTER BY (subscriber_id, usage_date)\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"Delta table with liquid clustering created successfully!\")\n",
+    "\n",
+    "print(\"Clustering will automatically optimize data layout for queries on subscriber_id and usage_date.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Generate Telecommunications Sample Data\n",
+    "\n",
+    "### Data Generation Strategy\n",
+    "\n",
+    "We'll create realistic telecommunications usage data including:\n",
+    "\n",
+    "- **10,000 subscribers** with multiple service interactions over time\n",
+    "- **Service types**: Voice calls, Data usage, SMS, Video streaming\n",
+    "- **Realistic usage patterns**: Peak hours, weekend vs weekday patterns, roaming\n",
+    "- **Network infrastructure**: Multiple cell towers with varying signal quality\n",
+    "\n",
+    "### Why This Data Pattern?\n",
+    "\n",
+    "This data simulates real telecommunications scenarios where:\n",
+    "\n",
+    "- Customer usage varies by time of day and service type\n",
+    "- Network performance impacts customer experience\n",
+    "- Billing and service quality require temporal analysis\n",
+    "- Capacity planning depends on usage patterns\n",
+    "- Fraud detection needs real-time monitoring"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Generated 603319 network usage records\n",
+       "Sample record: {'subscriber_id': 'SUB00000001', 'usage_date': datetime.datetime(2024, 3, 4, 13, 52), 'service_type': 'Voice', 'data_volume': 0.0, 'call_duration': 11.72, 'cell_tower_id': 'TOWER_SFO_006', 'signal_quality': 64}\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Generate sample telecommunications usage data\n",
+    "\n",
+    "# Using fully qualified imports to avoid conflicts\n",
+    "\n",
+    "import random\n",
+    "\n",
+    "from datetime import datetime, timedelta\n",
+    "\n",
+    "\n",
+    "# Define telecommunications data constants\n",
+    "\n",
+    "SERVICE_TYPES = ['Voice', 'Data', 'SMS', 'Streaming']\n",
+    "\n",
+    "CELL_TOWERS = ['TOWER_NYC_001', 'TOWER_LAX_002', 'TOWER_CHI_003', 'TOWER_HOU_004', 'TOWER_MIA_005', 'TOWER_SFO_006', 'TOWER_SEA_007']\n",
+    "\n",
+    "# Base usage parameters by service type\n",
+    "\n",
+    "USAGE_PARAMS = {\n",
+    "\n",
+    "    'Voice': {'avg_duration': 5.0, 'frequency': 8, 'data_volume': 0.0},\n",
+    "\n",
+    "    'Data': {'avg_duration': 0.0, 'frequency': 15, 'data_volume': 0.5},\n",
+    "\n",
+    "    'SMS': {'avg_duration': 0.0, 'frequency': 12, 'data_volume': 0.0},\n",
+    "\n",
+    "    'Streaming': {'avg_duration': 0.0, 'frequency': 6, 'data_volume': 2.0}\n",
+    "\n",
+    "}\n",
+    "\n",
+    "\n",
+    "# Generate network usage records\n",
+    "\n",
+    "usage_data = []\n",
+    "\n",
+    "base_date = datetime(2024, 1, 1)\n",
+    "\n",
+    "\n",
+    "# Create 10,000 subscribers with 20-100 usage events each\n",
+    "\n",
+    "for subscriber_num in range(1, 10001):\n",
+    "\n",
+    "    subscriber_id = f\"SUB{subscriber_num:08d}\"\n",
+    "    \n",
+    "    # Each subscriber gets 20-100 usage events over 12 months\n",
+    "\n",
+    "    num_events = random.randint(20, 100)\n",
+    "    \n",
+    "    for i in range(num_events):\n",
+    "\n",
+    "        # Spread usage events over 12 months\n",
+    "\n",
+    "        days_offset = random.randint(0, 365)\n",
+    "\n",
+    "        usage_date = base_date + timedelta(days=days_offset)\n",
+    "        \n",
+    "        # Add realistic timing (more usage during business hours and evenings)\n",
+    "\n",
+    "        hour_weights = [1, 1, 1, 1, 1, 2, 4, 6, 8, 7, 6, 8, 9, 8, 7, 6, 8, 9, 10, 8, 6, 4, 3, 2]\n",
+    "\n",
+    "        hours_offset = random.choices(range(24), weights=hour_weights)[0]\n",
+    "\n",
+    "        usage_date = usage_date.replace(hour=hours_offset, minute=random.randint(0, 59), second=0, microsecond=0)\n",
+    "        \n",
+    "        # Select service type\n",
+    "\n",
+    "        service_type = random.choice(SERVICE_TYPES)\n",
+    "\n",
+    "        params = USAGE_PARAMS[service_type]\n",
+    "        \n",
+    "        # Calculate usage metrics with variability\n",
+    "\n",
+    "        if service_type == 'Voice':\n",
+    "\n",
+    "            duration_variation = random.uniform(0.3, 3.0)\n",
+    "\n",
+    "            call_duration = round(params['avg_duration'] * duration_variation, 2)\n",
+    "\n",
+    "            data_volume = 0.0\n",
+    "\n",
+    "        elif service_type == 'Data':\n",
+    "\n",
+    "            data_variation = random.uniform(0.1, 5.0)\n",
+    "\n",
+    "            data_volume = round(params['data_volume'] * data_variation, 3)\n",
+    "\n",
+    "            call_duration = 0.0\n",
+    "\n",
+    "        elif service_type == 'SMS':\n",
+    "\n",
+    "            data_volume = 0.0\n",
+    "\n",
+    "            call_duration = 0.0\n",
+    "\n",
+    "        else:  # Streaming\n",
+    "\n",
+    "            data_variation = random.uniform(0.5, 8.0)\n",
+    "\n",
+    "            data_volume = round(params['data_volume'] * data_variation, 3)\n",
+    "\n",
+    "            call_duration = 0.0\n",
+    "        \n",
+    "        # Select cell tower and signal quality\n",
+    "\n",
+    "        cell_tower_id = random.choice(CELL_TOWERS)\n",
+    "\n",
+    "        # Signal quality varies by tower and time\n",
+    "\n",
+    "        base_signal = random.randint(60, 95)\n",
+    "\n",
+    "        signal_variation = random.randint(-15, 5)\n",
+    "\n",
+    "        signal_quality = max(0, min(100, base_signal + signal_variation))\n",
+    "        \n",
+    "        usage_data.append({\n",
+    "\n",
+    "            \"subscriber_id\": subscriber_id,\n",
+    "\n",
+    "            \"usage_date\": usage_date,\n",
+    "\n",
+    "            \"service_type\": service_type,\n",
+    "\n",
+    "            \"data_volume\": data_volume,\n",
+    "\n",
+    "            \"call_duration\": call_duration,\n",
+    "\n",
+    "            \"cell_tower_id\": cell_tower_id,\n",
+    "\n",
+    "            \"signal_quality\": signal_quality\n",
+    "\n",
+    "        })\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(f\"Generated {len(usage_data)} network usage records\")\n",
+    "\n",
+    "print(\"Sample record:\", usage_data[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Insert Data Using PySpark\n",
+    "\n",
+    "### Data Insertion Strategy\n",
+    "\n",
+    "We'll use PySpark to:\n",
+    "\n",
+    "1. **Create DataFrame** from our generated data\n",
+    "2. **Insert into Delta table** with liquid clustering\n",
+    "3. **Verify the insertion** with a sample query\n",
+    "\n",
+    "### Why PySpark for Insertion?\n",
+    "\n",
+    "- **Distributed processing**: Handles large datasets efficiently\n",
+    "- **Type safety**: Ensures data integrity\n",
+    "- **Optimization**: Leverages Spark's query optimization\n",
+    "- **Liquid clustering**: Automatically applies clustering during insertion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataFrame Schema:\n",
+       "root\n",
+       " |-- call_duration: double (nullable = true)\n",
+       " |-- cell_tower_id: string (nullable = true)\n",
+       " |-- data_volume: double (nullable = true)\n",
+       " |-- service_type: string (nullable = true)\n",
+       " |-- signal_quality: long (nullable = true)\n",
+       " |-- subscriber_id: string (nullable = true)\n",
+       " |-- usage_date: timestamp (nullable = true)\n",
+       "\n",
+       "\n",
+       "Sample Data:\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------+-------------+-----------+------------+--------------+-------------+-------------------+\n",
+       "|call_duration|cell_tower_id|data_volume|service_type|signal_quality|subscriber_id|         usage_date|\n",
+       "+-------------+-------------+-----------+------------+--------------+-------------+-------------------+\n",
+       "|        11.72|TOWER_SFO_006|        0.0|       Voice|            64|  SUB00000001|2024-03-04 13:52:00|\n",
+       "|          0.0|TOWER_NYC_001|        0.0|         SMS|            62|  SUB00000001|2024-04-30 15:44:00|\n",
+       "|         2.56|TOWER_NYC_001|        0.0|       Voice|            85|  SUB00000001|2024-01-14 04:37:00|\n",
+       "|          0.0|TOWER_LAX_002|     14.926|   Streaming|            71|  SUB00000001|2024-09-13 12:56:00|\n",
+       "|          0.0|TOWER_SEA_007|      8.358|   Streaming|            88|  SUB00000001|2024-03-16 16:04:00|\n",
+       "+-------------+-------------+-----------+------------+--------------+-------------+-------------------+\n",
+       "only showing top 5 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "Successfully inserted 603319 records into telecom.analytics.network_usage\n",
+       "Liquid clustering automatically optimized the data layout during insertion!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Insert data using PySpark DataFrame operations\n",
+    "\n",
+    "# Using fully qualified function references to avoid conflicts\n",
+    "\n",
+    "\n",
+    "# Create DataFrame from generated data\n",
+    "\n",
+    "df_usage = spark.createDataFrame(usage_data)\n",
+    "\n",
+    "\n",
+    "# Display schema and sample data\n",
+    "\n",
+    "print(\"DataFrame Schema:\")\n",
+    "\n",
+    "df_usage.printSchema()\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(\"\\nSample Data:\")\n",
+    "\n",
+    "df_usage.show(5)\n",
+    "\n",
+    "\n",
+    "# Insert data into Delta table with liquid clustering\n",
+    "\n",
+    "# The CLUSTER BY (subscriber_id, usage_date) will automatically optimize the data layout\n",
+    "\n",
+    "df_usage.write.mode(\"overwrite\").saveAsTable(\"telecom.analytics.network_usage\")\n",
+    "\n",
+    "\n",
+    "print(f\"\\nSuccessfully inserted {df_usage.count()} records into telecom.analytics.network_usage\")\n",
+    "\n",
+    "print(\"Liquid clustering automatically optimized the data layout during insertion!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Demonstrate Liquid Clustering Benefits\n",
+    "\n",
+    "### Query Performance Analysis\n",
+    "\n",
+    "Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:\n",
+    "\n",
+    "1. **Subscriber usage history** (clustered by subscriber_id)\n",
+    "2. **Time-based network analysis** (clustered by usage_date)\n",
+    "3. **Combined subscriber + time queries** (optimal for our clustering)\n",
+    "\n",
+    "### Expected Performance Benefits\n",
+    "\n",
+    "With liquid clustering, these queries should be significantly faster because:\n",
+    "\n",
+    "- **Data locality**: Related records are physically grouped together\n",
+    "- **Reduced I/O**: Less data needs to be read from disk\n",
+    "- **Automatic optimization**: No manual tuning required"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Query 1: Subscriber Usage History ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------+-------------------+------------+-----------+-------------+--------------+\n",
+       "|subscriber_id|         usage_date|service_type|data_volume|call_duration|signal_quality|\n",
+       "+-------------+-------------------+------------+-----------+-------------+--------------+\n",
+       "|  SUB00000001|2024-12-22 16:14:00|         SMS|        0.0|          0.0|            72|\n",
+       "|  SUB00000001|2024-12-08 17:36:00|        Data|      0.108|          0.0|            77|\n",
+       "|  SUB00000001|2024-12-06 15:00:00|        Data|      0.056|          0.0|            85|\n",
+       "|  SUB00000001|2024-11-23 13:11:00|   Streaming|     14.654|          0.0|            84|\n",
+       "|  SUB00000001|2024-11-07 18:22:00|         SMS|        0.0|          0.0|            95|\n",
+       "|  SUB00000001|2024-10-24 20:26:00|         SMS|        0.0|          0.0|            75|\n",
+       "|  SUB00000001|2024-10-08 19:32:00|   Streaming|      6.947|          0.0|            74|\n",
+       "|  SUB00000001|2024-09-25 19:05:00|        Data|      1.264|          0.0|            78|\n",
+       "|  SUB00000001|2024-09-13 12:56:00|   Streaming|     14.926|          0.0|            71|\n",
+       "|  SUB00000001|2024-09-03 13:38:00|       Voice|        0.0|         5.68|            76|\n",
+       "|  SUB00000001|2024-08-26 11:33:00|         SMS|        0.0|          0.0|            69|\n",
+       "|  SUB00000001|2024-08-21 21:14:00|        Data|      0.845|          0.0|            62|\n",
+       "|  SUB00000001|2024-08-09 08:20:00|   Streaming|      5.556|          0.0|            84|\n",
+       "|  SUB00000001|2024-07-28 14:09:00|         SMS|        0.0|          0.0|            87|\n",
+       "|  SUB00000001|2024-07-21 17:13:00|         SMS|        0.0|          0.0|            78|\n",
+       "|  SUB00000001|2024-07-09 22:13:00|        Data|      1.784|          0.0|            97|\n",
+       "|  SUB00000001|2024-06-28 09:56:00|   Streaming|     15.775|          0.0|            92|\n",
+       "|  SUB00000001|2024-06-17 20:17:00|   Streaming|     11.564|          0.0|            96|\n",
+       "|  SUB00000001|2024-05-31 19:01:00|        Data|      1.873|          0.0|            60|\n",
+       "|  SUB00000001|2024-05-03 23:01:00|       Voice|        0.0|        12.49|            81|\n",
+       "+-------------+-------------------+------------+-----------+-------------+--------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Records found: 33\n",
+       "\n",
+       "=== Query 2: Recent Network Quality Issues ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------------+-------------+-------------+--------------+------------+\n",
+       "|         usage_date|subscriber_id|cell_tower_id|signal_quality|service_type|\n",
+       "+-------------------+-------------+-------------+--------------+------------+\n",
+       "|2024-12-31 13:12:00|  SUB00009850|TOWER_SEA_007|            45|       Voice|\n",
+       "|2024-12-31 07:42:00|  SUB00001957|TOWER_NYC_001|            45|   Streaming|\n",
+       "|2024-12-30 17:24:00|  SUB00009189|TOWER_MIA_005|            45|   Streaming|\n",
+       "|2024-12-30 17:12:00|  SUB00009185|TOWER_CHI_003|            45|        Data|\n",
+       "|2024-12-28 11:49:00|  SUB00002129|TOWER_HOU_004|            45|         SMS|\n",
+       "|2024-12-26 17:32:00|  SUB00006483|TOWER_SFO_006|            45|        Data|\n",
+       "|2024-12-26 16:21:00|  SUB00000968|TOWER_CHI_003|            45|         SMS|\n",
+       "|2024-12-26 15:30:00|  SUB00007641|TOWER_NYC_001|            45|       Voice|\n",
+       "|2024-12-26 11:30:00|  SUB00007019|TOWER_SEA_007|            45|   Streaming|\n",
+       "|2024-12-25 19:01:00|  SUB00009049|TOWER_NYC_001|            45|       Voice|\n",
+       "|2024-12-25 03:37:00|  SUB00006282|TOWER_NYC_001|            45|        Data|\n",
+       "|2024-12-24 20:44:00|  SUB00001952|TOWER_SFO_006|            45|        Data|\n",
+       "|2024-12-24 18:22:00|  SUB00009904|TOWER_HOU_004|            45|       Voice|\n",
+       "|2024-12-23 13:36:00|  SUB00001633|TOWER_NYC_001|            45|       Voice|\n",
+       "|2024-12-23 13:19:00|  SUB00007155|TOWER_SFO_006|            45|        Data|\n",
+       "|2024-12-23 07:49:00|  SUB00008914|TOWER_NYC_001|            45|         SMS|\n",
+       "|2024-12-22 12:02:00|  SUB00009445|TOWER_LAX_002|            45|       Voice|\n",
+       "|2024-12-22 08:58:00|  SUB00008143|TOWER_HOU_004|            45|        Data|\n",
+       "|2024-12-22 06:58:00|  SUB00003470|TOWER_LAX_002|            45|        Data|\n",
+       "|2024-12-21 18:25:00|  SUB00006545|TOWER_SEA_007|            45|   Streaming|\n",
+       "+-------------------+-------------+-------------+--------------+------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Network quality issues found: 7091\n",
+       "\n",
+       "=== Query 3: Subscriber Data Usage Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------+-------------------+------------+-----------+-------------+\n",
+       "|subscriber_id|         usage_date|service_type|data_volume|call_duration|\n",
+       "+-------------+-------------------+------------+-----------+-------------+\n",
+       "|  SUB00000001|2024-04-01 19:20:00|         SMS|        0.0|          0.0|\n",
+       "|  SUB00000001|2024-04-12 10:27:00|   Streaming|      8.765|          0.0|\n",
+       "|  SUB00000001|2024-04-13 16:43:00|   Streaming|      8.624|          0.0|\n",
+       "|  SUB00000001|2024-04-19 10:25:00|       Voice|        0.0|        13.93|\n",
+       "|  SUB00000001|2024-04-30 15:44:00|         SMS|        0.0|          0.0|\n",
+       "|  SUB00000001|2024-05-03 23:01:00|       Voice|        0.0|        12.49|\n",
+       "|  SUB00000001|2024-05-31 19:01:00|        Data|      1.873|          0.0|\n",
+       "|  SUB00000001|2024-06-17 20:17:00|   Streaming|     11.564|          0.0|\n",
+       "|  SUB00000001|2024-06-28 09:56:00|   Streaming|     15.775|          0.0|\n",
+       "|  SUB00000001|2024-07-09 22:13:00|        Data|      1.784|          0.0|\n",
+       "|  SUB00000001|2024-07-21 17:13:00|         SMS|        0.0|          0.0|\n",
+       "|  SUB00000001|2024-07-28 14:09:00|         SMS|        0.0|          0.0|\n",
+       "|  SUB00000001|2024-08-09 08:20:00|   Streaming|      5.556|          0.0|\n",
+       "|  SUB00000001|2024-08-21 21:14:00|        Data|      0.845|          0.0|\n",
+       "|  SUB00000001|2024-08-26 11:33:00|         SMS|        0.0|          0.0|\n",
+       "|  SUB00000001|2024-09-03 13:38:00|       Voice|        0.0|         5.68|\n",
+       "|  SUB00000001|2024-09-13 12:56:00|   Streaming|     14.926|          0.0|\n",
+       "|  SUB00000001|2024-09-25 19:05:00|        Data|      1.264|          0.0|\n",
+       "|  SUB00000001|2024-10-08 19:32:00|   Streaming|      6.947|          0.0|\n",
+       "|  SUB00000001|2024-10-24 20:26:00|         SMS|        0.0|          0.0|\n",
+       "+-------------+-------------------+------------+-----------+-------------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Usage trend records found: 4537\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Demonstrate liquid clustering benefits with optimized queries\n",
+    "\n",
+    "\n",
+    "# Query 1: Subscriber usage history - benefits from subscriber_id clustering\n",
+    "\n",
+    "print(\"=== Query 1: Subscriber Usage History ===\")\n",
+    "\n",
+    "subscriber_history = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT subscriber_id, usage_date, service_type, data_volume, call_duration, signal_quality\n",
+    "\n",
+    "FROM telecom.analytics.network_usage\n",
+    "\n",
+    "WHERE subscriber_id = 'SUB00000001'\n",
+    "\n",
+    "ORDER BY usage_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "subscriber_history.show()\n",
+    "\n",
+    "print(f\"Records found: {subscriber_history.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 2: Time-based network quality analysis - benefits from usage_date clustering\n",
+    "\n",
+    "print(\"\\n=== Query 2: Recent Network Quality Issues ===\")\n",
+    "\n",
+    "network_quality = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT usage_date, subscriber_id, cell_tower_id, signal_quality, service_type\n",
+    "\n",
+    "FROM telecom.analytics.network_usage\n",
+    "\n",
+    "WHERE usage_date >= '2024-06-01' AND signal_quality < 50\n",
+    "\n",
+    "ORDER BY signal_quality ASC, usage_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "network_quality.show()\n",
+    "\n",
+    "print(f\"Network quality issues found: {network_quality.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 3: Combined subscriber + time query - optimal for our clustering strategy\n",
+    "\n",
+    "print(\"\\n=== Query 3: Subscriber Data Usage Trends ===\")\n",
+    "\n",
+    "usage_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT subscriber_id, usage_date, service_type, data_volume, call_duration\n",
+    "\n",
+    "FROM telecom.analytics.network_usage\n",
+    "\n",
+    "WHERE subscriber_id LIKE 'SUB000000%' AND usage_date >= '2024-04-01'\n",
+    "\n",
+    "ORDER BY subscriber_id, usage_date\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "usage_trends.show()\n",
+    "\n",
+    "print(f\"Usage trend records found: {usage_trends.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Analyze Clustering Effectiveness\n",
+    "\n",
+    "### Understanding the Impact\n",
+    "\n",
+    "Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the telecommunications insights possible with this optimized structure.\n",
+    "\n",
+    "### Key Analytics\n",
+    "\n",
+    "- **Subscriber usage patterns** and data consumption analysis\n",
+    "- **Network performance metrics** and signal quality trends\n",
+    "- **Service type adoption** and usage distribution\n",
+    "- **Cell tower utilization** and capacity planning"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Subscriber Usage Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------+--------------+-------------+------------------+------------------+-------------+\n",
+       "|subscriber_id|total_sessions|total_data_gb|total_call_minutes|avg_signal_quality|services_used|\n",
+       "+-------------+--------------+-------------+------------------+------------------+-------------+\n",
+       "|  SUB00002907|            98|      374.303|            122.78|             72.31|            4|\n",
+       "|  SUB00003041|            97|      374.246|            151.33|              72.2|            4|\n",
+       "|  SUB00005923|            89|      371.788|            121.21|             75.24|            4|\n",
+       "|  SUB00009440|            95|      370.988|              66.9|             74.13|            4|\n",
+       "|  SUB00000337|            90|      365.707|            162.18|             73.34|            4|\n",
+       "|  SUB00007490|            98|      364.002|            179.59|             73.59|            4|\n",
+       "|  SUB00004805|            93|      348.482|            113.78|             71.77|            4|\n",
+       "|  SUB00009257|           100|      348.306|             99.18|              75.0|            4|\n",
+       "|  SUB00000197|            96|      346.222|            188.72|             71.34|            4|\n",
+       "|  SUB00008578|            99|      342.624|            189.36|             72.83|            4|\n",
+       "|  SUB00004058|           100|      342.045|             171.1|             70.36|            4|\n",
+       "|  SUB00004808|            98|      340.867|            159.38|              73.1|            4|\n",
+       "|  SUB00006830|            94|      338.258|             174.3|             70.86|            4|\n",
+       "|  SUB00007904|           100|      331.652|             139.5|              70.8|            4|\n",
+       "|  SUB00003574|            97|      330.939|            188.04|             71.42|            4|\n",
+       "|  SUB00005290|            99|      330.374|            180.29|              73.6|            4|\n",
+       "|  SUB00009749|            96|      329.265|            158.48|             72.55|            4|\n",
+       "|  SUB00000841|            98|      329.183|            160.89|             73.26|            4|\n",
+       "|  SUB00002395|            98|      326.711|            169.83|              71.5|            4|\n",
+       "|  SUB00009036|            99|      326.502|            214.12|              73.8|            4|\n",
+       "+-------------+--------------+-------------+------------------+------------------+-------------+\n",
+       "only showing top 20 rows\n",
+       "\n",
+       "\n",
+       "=== Service Type Usage Patterns ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+------------+-----------+-------------+------------------+------------------+------------------+\n",
+       "|service_type|total_usage|total_data_gb|total_call_minutes|avg_signal_quality|unique_subscribers|\n",
+       "+------------+-----------+-------------+------------------+------------------+------------------+\n",
+       "|        Data|     151453|   193415.991|               0.0|             72.51|              9998|\n",
+       "|       Voice|     151003|          0.0|        1246808.73|              72.5|              9999|\n",
+       "|         SMS|     150646|          0.0|               0.0|             72.51|             10000|\n",
+       "|   Streaming|     150217|  1278512.672|               0.0|             72.48|             10000|\n",
+       "+------------+-----------+-------------+------------------+------------------+------------------+\n",
+       "\n",
+       "\n",
+       "=== Cell Tower Performance ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------+-----------------+------------------+------------------+-------------+------------------+\n",
+       "|cell_tower_id|total_connections|unique_subscribers|avg_signal_quality|total_data_gb|total_call_minutes|\n",
+       "+-------------+-----------------+------------------+------------------+-------------+------------------+\n",
+       "|TOWER_CHI_003|            86783|              9968|             72.47|   212538.431|         177926.82|\n",
+       "|TOWER_HOU_004|            86557|              9965|             72.56|   210666.317|         177868.82|\n",
+       "|TOWER_SFO_006|            86185|              9960|             72.46|   212706.208|         179430.77|\n",
+       "|TOWER_MIA_005|            86174|              9964|             72.55|   210610.422|         178815.47|\n",
+       "|TOWER_NYC_001|            86160|              9962|             72.49|   210421.392|         176386.73|\n",
+       "|TOWER_LAX_002|            85784|              9953|             72.49|   206811.988|         180098.11|\n",
+       "|TOWER_SEA_007|            85676|              9966|             72.47|   208173.905|         176282.01|\n",
+       "+-------------+-----------------+------------------+------------------+-------------+------------------+\n",
+       "\n",
+       "\n",
+       "=== Hourly Usage Patterns ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-----------+------------+--------------+------------+------------------+\n",
+       "|hour_of_day|usage_events|data_volume_gb|call_minutes|avg_signal_quality|\n",
+       "+-----------+------------+--------------+------------+------------------+\n",
+       "|          0|        4792|     11788.074|    10100.05|             72.24|\n",
+       "|          1|        4796|     11492.777|    10334.85|             72.51|\n",
+       "|          2|        4849|     12027.327|    10040.76|             72.46|\n",
+       "|          3|        4802|     11230.272|     9795.77|              72.6|\n",
+       "|          4|        4786|     11925.201|     9794.47|             72.43|\n",
+       "|          5|        9544|     22804.363|    19789.58|             72.47|\n",
+       "|          6|       19183|     46809.918|    39957.14|             72.46|\n",
+       "|          7|       28830|     70531.404|    59719.89|             72.52|\n",
+       "|          8|       37931|     92887.894|    77744.11|             72.57|\n",
+       "|          9|       33653|      81507.36|    70418.82|             72.45|\n",
+       "|         10|       28633|     69056.154|    59004.92|              72.5|\n",
+       "|         11|       38344|     94047.209|    78831.68|             72.39|\n",
+       "|         12|       43116|    106191.504|    88863.47|             72.51|\n",
+       "|         13|       38026|     92789.657|    77937.27|             72.59|\n",
+       "|         14|       33454|     80862.764|    69920.12|             72.47|\n",
+       "|         15|       28912|       69966.3|    59963.27|             72.59|\n",
+       "|         16|       38375|     94255.459|     79582.8|             72.46|\n",
+       "|         17|       42985|    106404.281|    89543.92|             72.47|\n",
+       "|         18|       48021|    116934.422|    98731.61|             72.49|\n",
+       "|         19|       38443|     93611.677|    79642.43|             72.49|\n",
+       "+-----------+------------+--------------+------------+------------------+\n",
+       "only showing top 20 rows\n",
+       "\n",
+       "\n",
+       "=== Monthly Network Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------+-----------+---------------+--------------------+------------------+------------------+\n",
+       "|  month|total_usage|monthly_data_gb|monthly_call_minutes|avg_signal_quality|active_subscribers|\n",
+       "+-------+-----------+---------------+--------------------+------------------+------------------+\n",
+       "|2024-01|      51136|     125284.093|           104712.14|             72.54|              9738|\n",
+       "|2024-02|      47931|     117988.209|            99280.97|             72.52|              9701|\n",
+       "|2024-03|      50911|     122055.327|           104858.54|             72.56|              9787|\n",
+       "|2024-04|      49399|     120953.517|           102123.58|             72.53|              9726|\n",
+       "|2024-05|      51122|     124817.533|           106054.49|             72.49|              9748|\n",
+       "|2024-06|      49539|     119047.661|           103015.89|             72.54|              9713|\n",
+       "|2024-07|      50844|     124592.418|           104429.72|             72.45|              9760|\n",
+       "|2024-08|      51173|     125721.521|           105530.47|             72.46|              9770|\n",
+       "|2024-09|      49588|     119861.591|           102674.27|             72.54|              9744|\n",
+       "|2024-10|      51271|     125224.522|           105831.48|             72.52|              9762|\n",
+       "|2024-11|      49301|     121538.791|            102184.0|             72.44|              9736|\n",
+       "|2024-12|      51104|      124843.48|           106113.18|             72.41|              9762|\n",
+       "+-------+-----------+---------------+--------------------+------------------+------------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Analyze clustering effectiveness and telecommunications insights\n",
+    "\n",
+    "\n",
+    "# Subscriber usage analysis\n",
+    "\n",
+    "print(\"=== Subscriber Usage Analysis ===\")\n",
+    "\n",
+    "subscriber_usage = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT subscriber_id, COUNT(*) as total_sessions,\n",
+    "\n",
+    "       ROUND(SUM(data_volume), 3) as total_data_gb,\n",
+    "\n",
+    "       ROUND(SUM(call_duration), 2) as total_call_minutes,\n",
+    "\n",
+    "       ROUND(AVG(signal_quality), 2) as avg_signal_quality,\n",
+    "\n",
+    "       COUNT(DISTINCT service_type) as services_used\n",
+    "\n",
+    "FROM telecom.analytics.network_usage\n",
+    "\n",
+    "GROUP BY subscriber_id\n",
+    "\n",
+    "ORDER BY total_data_gb DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "subscriber_usage.show()\n",
+    "\n",
+    "\n",
+    "# Service type usage patterns\n",
+    "\n",
+    "print(\"\\n=== Service Type Usage Patterns ===\")\n",
+    "\n",
+    "service_patterns = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT service_type, COUNT(*) as total_usage,\n",
+    "\n",
+    "       ROUND(SUM(data_volume), 3) as total_data_gb,\n",
+    "\n",
+    "       ROUND(SUM(call_duration), 2) as total_call_minutes,\n",
+    "\n",
+    "       ROUND(AVG(signal_quality), 2) as avg_signal_quality,\n",
+    "\n",
+    "       COUNT(DISTINCT subscriber_id) as unique_subscribers\n",
+    "\n",
+    "FROM telecom.analytics.network_usage\n",
+    "\n",
+    "GROUP BY service_type\n",
+    "\n",
+    "ORDER BY total_usage DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "service_patterns.show()\n",
+    "\n",
+    "\n",
+    "# Cell tower performance\n",
+    "\n",
+    "print(\"\\n=== Cell Tower Performance ===\")\n",
+    "\n",
+    "tower_performance = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT cell_tower_id, COUNT(*) as total_connections,\n",
+    "\n",
+    "       COUNT(DISTINCT subscriber_id) as unique_subscribers,\n",
+    "\n",
+    "       ROUND(AVG(signal_quality), 2) as avg_signal_quality,\n",
+    "\n",
+    "       ROUND(SUM(data_volume), 3) as total_data_gb,\n",
+    "\n",
+    "       ROUND(SUM(call_duration), 2) as total_call_minutes\n",
+    "\n",
+    "FROM telecom.analytics.network_usage\n",
+    "\n",
+    "GROUP BY cell_tower_id\n",
+    "\n",
+    "ORDER BY total_connections DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "tower_performance.show()\n",
+    "\n",
+    "\n",
+    "# Hourly usage patterns\n",
+    "\n",
+    "print(\"\\n=== Hourly Usage Patterns ===\")\n",
+    "\n",
+    "hourly_patterns = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT HOUR(usage_date) as hour_of_day, COUNT(*) as usage_events,\n",
+    "\n",
+    "       ROUND(SUM(data_volume), 3) as data_volume_gb,\n",
+    "\n",
+    "       ROUND(SUM(call_duration), 2) as call_minutes,\n",
+    "\n",
+    "       ROUND(AVG(signal_quality), 2) as avg_signal_quality\n",
+    "\n",
+    "FROM telecom.analytics.network_usage\n",
+    "\n",
+    "GROUP BY HOUR(usage_date)\n",
+    "\n",
+    "ORDER BY hour_of_day\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "hourly_patterns.show()\n",
+    "\n",
+    "\n",
+    "# Monthly network trends\n",
+    "\n",
+    "print(\"\\n=== Monthly Network Trends ===\")\n",
+    "\n",
+    "monthly_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT DATE_FORMAT(usage_date, 'yyyy-MM') as month,\n",
+    "\n",
+    "       COUNT(*) as total_usage,\n",
+    "\n",
+    "       ROUND(SUM(data_volume), 3) as monthly_data_gb,\n",
+    "\n",
+    "       ROUND(SUM(call_duration), 2) as monthly_call_minutes,\n",
+    "\n",
+    "       ROUND(AVG(signal_quality), 2) as avg_signal_quality,\n",
+    "\n",
+    "       COUNT(DISTINCT subscriber_id) as active_subscribers\n",
+    "\n",
+    "FROM telecom.analytics.network_usage\n",
+    "\n",
+    "GROUP BY DATE_FORMAT(usage_date, 'yyyy-MM')\n",
+    "\n",
+    "ORDER BY month\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "monthly_trends.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways: Delta Liquid Clustering in AIDP\n",
+    "\n",
+    "### What We Demonstrated\n",
+    "\n",
+    "1. **Automatic Optimization**: Created a table with `CLUSTER BY (subscriber_id, usage_date)` and let Delta automatically optimize data layout\n",
+    "\n",
+    "2. **Performance Benefits**: Queries on clustered columns (subscriber_id, usage_date) are significantly faster due to data locality\n",
+    "\n",
+    "3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically\n",
+    "\n",
+    "4. **Real-World Use Case**: Telecommunications analytics where network monitoring and customer experience are critical\n",
+    "\n",
+    "### AIDP Advantages\n",
+    "\n",
+    "- **Unified Analytics**: Seamlessly integrates with other AIDP services\n",
+    "- **Governance**: Catalog and schema isolation for telecommunications data\n",
+    "- **Performance**: Optimized for both OLAP and OLTP workloads\n",
+    "- **Scalability**: Handles telecommunications-scale data volumes effortlessly\n",
+    "\n",
+    "### Best Practices for Liquid Clustering\n",
+    "\n",
+    "1. **Choose clustering columns** based on your most common query patterns\n",
+    "2. **Start with 1-4 columns** - too many can reduce effectiveness\n",
+    "3. **Consider cardinality** - high-cardinality columns work best\n",
+    "4. **Monitor and adjust** as query patterns evolve\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Explore other AIDP features like AI/ML integration\n",
+    "- Try liquid clustering with different column combinations\n",
+    "- Scale up to larger telecommunications datasets\n",
+    "- Integrate with real network monitoring systems and CDR data\n",
+    "\n",
+    "This notebook demonstrates how Oracle AI Data Platform makes advanced telecommunications analytics accessible while maintaining enterprise-grade performance and governance."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Notebooks/liquid_clustering/transportation_delta_liquid_clustering_demo.ipynb b/Notebooks/liquid_clustering/transportation_delta_liquid_clustering_demo.ipynb
new file mode 100644
index 0000000..722dcb4
--- /dev/null
+++ b/Notebooks/liquid_clustering/transportation_delta_liquid_clustering_demo.ipynb
@@ -0,0 +1,1007 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Transportation: Delta Liquid Clustering Demo\n",
+    "\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "\n",
+    "This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using a transportation and logistics analytics use case. Liquid clustering automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.\n",
+    "\n",
+    "### What is Liquid Clustering?\n",
+    "\n",
+    "Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:\n",
+    "\n",
+    "- **Automatic optimization**: No manual tuning required\n",
+    "- **Improved query performance**: Faster queries on clustered columns\n",
+    "- **Reduced maintenance**: No need for manual repartitioning\n",
+    "- **Adaptive clustering**: Adjusts as data patterns change\n",
+    "\n",
+    "### Use Case: Fleet Management and Route Optimization\n",
+    "\n",
+    "We'll analyze transportation fleet operations and logistics data. Our clustering strategy will optimize for:\n",
+    "\n",
+    "- **Vehicle-specific queries**: Fast lookups by vehicle ID\n",
+    "- **Time-based analysis**: Efficient filtering by trip date and time\n",
+    "- **Route performance patterns**: Quick aggregation by route and operational metrics\n",
+    "\n",
+    "### AIDP Environment Setup\n",
+    "\n",
+    "This notebook leverages the existing Spark session in your AIDP environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Transportation catalog and analytics schema created successfully!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create transportation catalog and analytics schema\n",
+    "\n",
+    "# In AIDP, catalogs provide data isolation and governance\n",
+    "\n",
+    "spark.sql(\"CREATE CATALOG IF NOT EXISTS transportation\")\n",
+    "\n",
+    "spark.sql(\"CREATE SCHEMA IF NOT EXISTS transportation.analytics\")\n",
+    "\n",
+    "print(\"Transportation catalog and analytics schema created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Create Delta Table with Liquid Clustering\n",
+    "\n",
+    "### Table Design\n",
+    "\n",
+    "Our `fleet_trips` table will store:\n",
+    "\n",
+    "- **vehicle_id**: Unique vehicle identifier\n",
+    "- **trip_date**: Date and time of trip start\n",
+    "- **route_id**: Route identifier\n",
+    "- **distance**: Distance traveled (miles/km)\n",
+    "- **duration**: Trip duration (minutes)\n",
+    "- **fuel_consumed**: Fuel used (gallons/liters)\n",
+    "- **load_factor**: Capacity utilization (0-100)\n",
+    "\n",
+    "### Clustering Strategy\n",
+    "\n",
+    "We'll cluster by `vehicle_id` and `trip_date` because:\n",
+    "\n",
+    "- **vehicle_id**: Vehicles generate multiple trips, grouping maintenance and performance data together\n",
+    "- **trip_date**: Time-based queries are essential for scheduling, fuel analysis, and operational reporting\n",
+    "- This combination optimizes for both vehicle monitoring and temporal fleet performance analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Delta table with liquid clustering created successfully!\n",
+       "Clustering will automatically optimize data layout for queries on vehicle_id and trip_date.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Create Delta table with liquid clustering\n",
+    "\n",
+    "# CLUSTER BY defines the columns for automatic optimization\n",
+    "\n",
+    "spark.sql(\"\"\"\n",
+    "\n",
+    "CREATE TABLE IF NOT EXISTS transportation.analytics.fleet_trips (\n",
+    "\n",
+    "    vehicle_id STRING,\n",
+    "\n",
+    "    trip_date TIMESTAMP,\n",
+    "\n",
+    "    route_id STRING,\n",
+    "\n",
+    "    distance DECIMAL(8,2),\n",
+    "\n",
+    "    duration DECIMAL(6,2),\n",
+    "\n",
+    "    fuel_consumed DECIMAL(6,2),\n",
+    "\n",
+    "    load_factor INT\n",
+    "\n",
+    ")\n",
+    "\n",
+    "USING DELTA\n",
+    "\n",
+    "CLUSTER BY (vehicle_id, trip_date)\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "print(\"Delta table with liquid clustering created successfully!\")\n",
+    "\n",
+    "print(\"Clustering will automatically optimize data layout for queries on vehicle_id and trip_date.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Generate Transportation Sample Data\n",
+    "\n",
+    "### Data Generation Strategy\n",
+    "\n",
+    "We'll create realistic transportation fleet data including:\n",
+    "\n",
+    "- **500 vehicles** with multiple trips over time\n",
+    "- **Route types**: Urban delivery, Long-haul, Local transport, Express delivery\n",
+    "- **Realistic operational patterns**: Peak hours, route variations, fuel efficiency differences\n",
+    "- **Fleet diversity**: Different vehicle types with varying capacities and fuel consumption\n",
+    "\n",
+    "### Why This Data Pattern?\n",
+    "\n",
+    "This data simulates real transportation scenarios where:\n",
+    "\n",
+    "- Vehicle performance varies by route and time of day\n",
+    "- Fuel efficiency impacts operational costs\n",
+    "- Route optimization requires historical performance data\n",
+    "- Capacity utilization affects profitability\n",
+    "- Maintenance scheduling depends on usage patterns"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Generated 20176 fleet trip records\n",
+       "Sample record: {'vehicle_id': 'VH0001', 'trip_date': datetime.datetime(2024, 9, 21, 14, 44), 'route_id': 'RT_HOU_DAL_004', 'distance': 48.18, 'duration': 107.57, 'fuel_consumed': 8.54, 'load_factor': 79}\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Generate sample transportation fleet data\n",
+    "\n",
+    "# Using fully qualified imports to avoid conflicts\n",
+    "\n",
+    "import random\n",
+    "\n",
+    "from datetime import datetime, timedelta\n",
+    "\n",
+    "\n",
+    "# Define transportation data constants\n",
+    "\n",
+    "ROUTE_TYPES = ['Urban Delivery', 'Long-haul', 'Local Transport', 'Express Delivery']\n",
+    "\n",
+    "ROUTES = ['RT_NYC_MAN_001', 'RT_LAX_SFO_002', 'RT_CHI_DET_003', 'RT_HOU_DAL_004', 'RT_MIA_ORL_005']\n",
+    "\n",
+    "# Base trip parameters by route type\n",
+    "\n",
+    "TRIP_PARAMS = {\n",
+    "\n",
+    "    'Urban Delivery': {'avg_distance': 45, 'avg_duration': 120, 'avg_fuel': 8.5, 'load_factor': 85},\n",
+    "\n",
+    "    'Long-haul': {'avg_distance': 450, 'avg_duration': 480, 'avg_fuel': 65.0, 'load_factor': 92},\n",
+    "\n",
+    "    'Local Transport': {'avg_distance': 120, 'avg_duration': 180, 'avg_fuel': 15.2, 'load_factor': 78},\n",
+    "\n",
+    "    'Express Delivery': {'avg_distance': 80, 'avg_duration': 90, 'avg_fuel': 12.8, 'load_factor': 95}\n",
+    "\n",
+    "}\n",
+    "\n",
+    "\n",
+    "# Generate fleet trip records\n",
+    "\n",
+    "trip_data = []\n",
+    "\n",
+    "base_date = datetime(2024, 1, 1)\n",
+    "\n",
+    "\n",
+    "# Create 500 vehicles with 20-60 trips each\n",
+    "\n",
+    "for vehicle_num in range(1, 501):\n",
+    "\n",
+    "    vehicle_id = f\"VH{vehicle_num:04d}\"\n",
+    "    \n",
+    "    # Each vehicle gets 20-60 trips over 12 months\n",
+    "\n",
+    "    num_trips = random.randint(20, 60)\n",
+    "    \n",
+    "    for i in range(num_trips):\n",
+    "\n",
+    "        # Spread trips over 12 months\n",
+    "\n",
+    "        days_offset = random.randint(0, 365)\n",
+    "\n",
+    "        trip_date = base_date + timedelta(days=days_offset)\n",
+    "        \n",
+    "        # Add realistic timing (more trips during business hours)\n",
+    "\n",
+    "        hour_weights = [1, 1, 1, 1, 1, 3, 8, 10, 12, 10, 8, 6, 8, 9, 8, 7, 6, 5, 3, 2, 2, 1, 1, 1]\n",
+    "\n",
+    "        hours_offset = random.choices(range(24), weights=hour_weights)[0]\n",
+    "\n",
+    "        trip_date = trip_date.replace(hour=hours_offset, minute=random.randint(0, 59), second=0, microsecond=0)\n",
+    "        \n",
+    "        # Select route type\n",
+    "\n",
+    "        route_type = random.choice(ROUTE_TYPES)\n",
+    "\n",
+    "        params = TRIP_PARAMS[route_type]\n",
+    "        \n",
+    "        # Calculate trip metrics with variability\n",
+    "\n",
+    "        distance_variation = random.uniform(0.7, 1.4)\n",
+    "\n",
+    "        distance = round(params['avg_distance'] * distance_variation, 2)\n",
+    "        \n",
+    "        duration_variation = random.uniform(0.8, 1.6)\n",
+    "\n",
+    "        duration = round(params['avg_duration'] * duration_variation, 2)\n",
+    "        \n",
+    "        fuel_variation = random.uniform(0.85, 1.25)\n",
+    "\n",
+    "        fuel_consumed = round(params['avg_fuel'] * fuel_variation, 2)\n",
+    "        \n",
+    "        load_factor_variation = random.randint(-10, 8)\n",
+    "\n",
+    "        load_factor = max(0, min(100, params['load_factor'] + load_factor_variation))\n",
+    "        \n",
+    "        # Select specific route\n",
+    "\n",
+    "        route_id = random.choice(ROUTES)\n",
+    "        \n",
+    "        trip_data.append({\n",
+    "\n",
+    "            \"vehicle_id\": vehicle_id,\n",
+    "\n",
+    "            \"trip_date\": trip_date,\n",
+    "\n",
+    "            \"route_id\": route_id,\n",
+    "\n",
+    "            \"distance\": distance,\n",
+    "\n",
+    "            \"duration\": duration,\n",
+    "\n",
+    "            \"fuel_consumed\": fuel_consumed,\n",
+    "\n",
+    "            \"load_factor\": load_factor\n",
+    "\n",
+    "        })\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(f\"Generated {len(trip_data)} fleet trip records\")\n",
+    "\n",
+    "print(\"Sample record:\", trip_data[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Insert Data Using PySpark\n",
+    "\n",
+    "### Data Insertion Strategy\n",
+    "\n",
+    "We'll use PySpark to:\n",
+    "\n",
+    "1. **Create DataFrame** from our generated data\n",
+    "2. **Insert into Delta table** with liquid clustering\n",
+    "3. **Verify the insertion** with a sample query\n",
+    "\n",
+    "### Why PySpark for Insertion?\n",
+    "\n",
+    "- **Distributed processing**: Handles large datasets efficiently\n",
+    "- **Type safety**: Ensures data integrity\n",
+    "- **Optimization**: Leverages Spark's query optimization\n",
+    "- **Liquid clustering**: Automatically applies clustering during insertion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataFrame Schema:\n",
+       "root\n",
+       " |-- distance: double (nullable = true)\n",
+       " |-- duration: double (nullable = true)\n",
+       " |-- fuel_consumed: double (nullable = true)\n",
+       " |-- load_factor: long (nullable = true)\n",
+       " |-- route_id: string (nullable = true)\n",
+       " |-- trip_date: timestamp (nullable = true)\n",
+       " |-- vehicle_id: string (nullable = true)\n",
+       "\n",
+       "\n",
+       "Sample Data:\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------+--------+-------------+-----------+--------------+-------------------+----------+\n",
+       "|distance|duration|fuel_consumed|load_factor|      route_id|          trip_date|vehicle_id|\n",
+       "+--------+--------+-------------+-----------+--------------+-------------------+----------+\n",
+       "|   48.18|  107.57|         8.54|         79|RT_HOU_DAL_004|2024-09-21 14:44:00|    VH0001|\n",
+       "|   71.26|  122.74|        14.88|         87|RT_HOU_DAL_004|2024-12-01 05:12:00|    VH0001|\n",
+       "|  136.21|  266.74|        18.61|         81|RT_NYC_MAN_001|2024-11-22 09:04:00|    VH0001|\n",
+       "|   488.8|  544.36|        62.62|         96|RT_HOU_DAL_004|2024-12-13 20:05:00|    VH0001|\n",
+       "|  417.19|  437.07|        72.73|         96|RT_MIA_ORL_005|2024-12-22 06:01:00|    VH0001|\n",
+       "+--------+--------+-------------+-----------+--------------+-------------------+----------+\n",
+       "only showing top 5 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\n",
+       "Successfully inserted 20176 records into transportation.analytics.fleet_trips\n",
+       "Liquid clustering automatically optimized the data layout during insertion!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Insert data using PySpark DataFrame operations\n",
+    "\n",
+    "# Using fully qualified function references to avoid conflicts\n",
+    "\n",
+    "\n",
+    "# Create DataFrame from generated data\n",
+    "\n",
+    "df_trips = spark.createDataFrame(trip_data)\n",
+    "\n",
+    "\n",
+    "# Display schema and sample data\n",
+    "\n",
+    "print(\"DataFrame Schema:\")\n",
+    "\n",
+    "df_trips.printSchema()\n",
+    "\n",
+    "\n",
+    "\n",
+    "print(\"\\nSample Data:\")\n",
+    "\n",
+    "df_trips.show(5)\n",
+    "\n",
+    "\n",
+    "# Insert data into Delta table with liquid clustering\n",
+    "\n",
+    "# The CLUSTER BY (vehicle_id, trip_date) will automatically optimize the data layout\n",
+    "\n",
+    "df_trips.write.mode(\"overwrite\").saveAsTable(\"transportation.analytics.fleet_trips\")\n",
+    "\n",
+    "\n",
+    "print(f\"\\nSuccessfully inserted {df_trips.count()} records into transportation.analytics.fleet_trips\")\n",
+    "\n",
+    "print(\"Liquid clustering automatically optimized the data layout during insertion!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5: Demonstrate Liquid Clustering Benefits\n",
+    "\n",
+    "### Query Performance Analysis\n",
+    "\n",
+    "Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:\n",
+    "\n",
+    "1. **Vehicle trip history** (clustered by vehicle_id)\n",
+    "2. **Time-based fleet analysis** (clustered by trip_date)\n",
+    "3. **Combined vehicle + time queries** (optimal for our clustering)\n",
+    "\n",
+    "### Expected Performance Benefits\n",
+    "\n",
+    "With liquid clustering, these queries should be significantly faster because:\n",
+    "\n",
+    "- **Data locality**: Related records are physically grouped together\n",
+    "- **Reduced I/O**: Less data needs to be read from disk\n",
+    "- **Automatic optimization**: No manual tuning required"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Query 1: Vehicle Trip History ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+-------------------+--------------+--------+-------------+-----------+\n",
+       "|vehicle_id|          trip_date|      route_id|distance|fuel_consumed|load_factor|\n",
+       "+----------+-------------------+--------------+--------+-------------+-----------+\n",
+       "|    VH0001|2024-12-22 06:01:00|RT_MIA_ORL_005|  417.19|        72.73|         96|\n",
+       "|    VH0001|2024-12-15 12:53:00|RT_LAX_SFO_002|   99.26|        16.85|         84|\n",
+       "|    VH0001|2024-12-13 20:05:00|RT_HOU_DAL_004|   488.8|        62.62|         96|\n",
+       "|    VH0001|2024-12-03 11:07:00|RT_HOU_DAL_004|  519.16|        69.75|         99|\n",
+       "|    VH0001|2024-12-01 05:12:00|RT_HOU_DAL_004|   71.26|        14.88|         87|\n",
+       "|    VH0001|2024-11-23 06:58:00|RT_LAX_SFO_002|  348.19|        68.93|         98|\n",
+       "|    VH0001|2024-11-22 09:04:00|RT_NYC_MAN_001|  136.21|        18.61|         81|\n",
+       "|    VH0001|2024-11-20 13:03:00|RT_CHI_DET_003|   89.91|        16.35|         82|\n",
+       "|    VH0001|2024-11-16 11:09:00|RT_HOU_DAL_004|  605.19|        67.39|         97|\n",
+       "|    VH0001|2024-11-14 11:48:00|RT_LAX_SFO_002|   96.21|        13.51|         93|\n",
+       "|    VH0001|2024-11-11 18:27:00|RT_HOU_DAL_004|   58.57|        13.12|         85|\n",
+       "|    VH0001|2024-11-04 13:08:00|RT_MIA_ORL_005|  336.23|        79.59|         87|\n",
+       "|    VH0001|2024-10-23 08:36:00|RT_CHI_DET_003|   75.64|        11.85|         87|\n",
+       "|    VH0001|2024-10-03 09:22:00|RT_CHI_DET_003|  137.81|        15.87|         80|\n",
+       "|    VH0001|2024-09-30 15:41:00|RT_LAX_SFO_002|   58.77|         9.53|         89|\n",
+       "|    VH0001|2024-09-27 08:59:00|RT_NYC_MAN_001|  393.69|        71.33|         82|\n",
+       "|    VH0001|2024-09-21 14:44:00|RT_HOU_DAL_004|   48.18|         8.54|         79|\n",
+       "|    VH0001|2024-09-12 08:28:00|RT_CHI_DET_003|  542.22|        70.83|         90|\n",
+       "|    VH0001|2024-08-22 06:21:00|RT_LAX_SFO_002|   72.55|        13.76|         98|\n",
+       "|    VH0001|2024-08-16 04:26:00|RT_LAX_SFO_002|   42.08|         8.76|         92|\n",
+       "+----------+-------------------+--------------+--------+-------------+-----------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Records found: 31\n",
+       "\n",
+       "=== Query 2: Recent Fuel Efficiency Issues ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------------------+----------+--------------+--------+-------------+----+\n",
+       "|          trip_date|vehicle_id|      route_id|distance|fuel_consumed| mpg|\n",
+       "+-------------------+----------+--------------+--------+-------------+----+\n",
+       "|2024-08-03 07:41:00|    VH0114|RT_NYC_MAN_001|   31.71|        10.62|2.99|\n",
+       "|2024-07-04 14:10:00|    VH0416|RT_LAX_SFO_002|   31.57|        10.57|2.99|\n",
+       "|2024-11-10 09:49:00|    VH0444|RT_MIA_ORL_005|   31.83|         10.6| 3.0|\n",
+       "|2024-09-30 16:50:00|    VH0362|RT_LAX_SFO_002|   31.78|        10.61| 3.0|\n",
+       "|2024-11-29 13:49:00|    VH0117|RT_LAX_SFO_002|   31.71|        10.54|3.01|\n",
+       "|2024-06-03 13:03:00|    VH0413|RT_NYC_MAN_001|    31.9|        10.58|3.02|\n",
+       "|2024-10-03 08:02:00|    VH0452|RT_NYC_MAN_001|   31.58|        10.35|3.05|\n",
+       "|2024-10-19 18:05:00|    VH0274|RT_MIA_ORL_005|   32.63|        10.58|3.08|\n",
+       "|2024-08-03 14:49:00|    VH0058|RT_CHI_DET_003|   31.61|        10.27|3.08|\n",
+       "|2024-07-14 08:26:00|    VH0118|RT_MIA_ORL_005|   32.02|        10.39|3.08|\n",
+       "|2024-11-23 19:32:00|    VH0220|RT_HOU_DAL_004|   32.23|        10.39| 3.1|\n",
+       "|2024-09-13 15:13:00|    VH0167|RT_HOU_DAL_004|   32.17|        10.35|3.11|\n",
+       "|2024-06-17 14:21:00|    VH0426|RT_CHI_DET_003|   32.02|        10.29|3.11|\n",
+       "|2024-10-18 10:31:00|    VH0202|RT_NYC_MAN_001|   32.09|        10.27|3.12|\n",
+       "|2024-07-26 14:54:00|    VH0139|RT_HOU_DAL_004|   32.82|        10.52|3.12|\n",
+       "|2024-07-07 08:06:00|    VH0383|RT_NYC_MAN_001|   32.62|        10.45|3.12|\n",
+       "|2024-11-05 06:39:00|    VH0196|RT_NYC_MAN_001|   33.08|        10.56|3.13|\n",
+       "|2024-06-24 12:30:00|    VH0162|RT_CHI_DET_003|   33.16|        10.61|3.13|\n",
+       "|2024-11-20 12:14:00|    VH0388|RT_MIA_ORL_005|   31.98|        10.18|3.14|\n",
+       "|2024-12-11 06:59:00|    VH0302|RT_HOU_DAL_004|   32.19|        10.22|3.15|\n",
+       "+-------------------+----------+--------------+--------+-------------+----+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Fuel efficiency issues found: 11804\n",
+       "\n",
+       "=== Query 3: Vehicle Performance Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+-------------------+--------------+--------+--------+-----------+\n",
+       "|vehicle_id|          trip_date|      route_id|distance|duration|load_factor|\n",
+       "+----------+-------------------+--------------+--------+--------+-----------+\n",
+       "|    VH0001|2024-04-05 10:59:00|RT_LAX_SFO_002|   59.35|  187.65|         80|\n",
+       "|    VH0001|2024-04-14 23:04:00|RT_LAX_SFO_002|   46.08|  173.74|         91|\n",
+       "|    VH0001|2024-05-01 14:40:00|RT_CHI_DET_003|  108.14|  217.92|         77|\n",
+       "|    VH0001|2024-05-11 07:48:00|RT_NYC_MAN_001|  603.41|  701.52|         91|\n",
+       "|    VH0001|2024-06-16 11:51:00|RT_LAX_SFO_002|   554.8|  481.12|         88|\n",
+       "|    VH0001|2024-06-24 13:48:00|RT_HOU_DAL_004|    89.9|  160.75|         77|\n",
+       "|    VH0001|2024-07-18 11:37:00|RT_CHI_DET_003|  418.77|   679.2|         97|\n",
+       "|    VH0001|2024-07-23 07:31:00|RT_HOU_DAL_004|  316.56|  767.37|         99|\n",
+       "|    VH0001|2024-08-12 06:57:00|RT_CHI_DET_003|    98.6|   88.27|         99|\n",
+       "|    VH0001|2024-08-16 04:26:00|RT_LAX_SFO_002|   42.08|  127.55|         92|\n",
+       "|    VH0001|2024-08-22 06:21:00|RT_LAX_SFO_002|   72.55|   77.47|         98|\n",
+       "|    VH0001|2024-09-12 08:28:00|RT_CHI_DET_003|  542.22|  610.28|         90|\n",
+       "|    VH0001|2024-09-21 14:44:00|RT_HOU_DAL_004|   48.18|  107.57|         79|\n",
+       "|    VH0001|2024-09-27 08:59:00|RT_NYC_MAN_001|  393.69|   754.9|         82|\n",
+       "|    VH0001|2024-09-30 15:41:00|RT_LAX_SFO_002|   58.77|  160.73|         89|\n",
+       "|    VH0001|2024-10-03 09:22:00|RT_CHI_DET_003|  137.81|  234.32|         80|\n",
+       "|    VH0001|2024-10-23 08:36:00|RT_CHI_DET_003|   75.64|  135.39|         87|\n",
+       "|    VH0001|2024-11-04 13:08:00|RT_MIA_ORL_005|  336.23|  574.36|         87|\n",
+       "|    VH0001|2024-11-11 18:27:00|RT_HOU_DAL_004|   58.57|   90.88|         85|\n",
+       "|    VH0001|2024-11-14 11:48:00|RT_LAX_SFO_002|   96.21|  116.48|         93|\n",
+       "+----------+-------------------+--------------+--------+--------+-----------+\n",
+       "only showing top 20 rows\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Performance trend records found: 236\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Demonstrate liquid clustering benefits with optimized queries\n",
+    "\n",
+    "\n",
+    "# Query 1: Vehicle trip history - benefits from vehicle_id clustering\n",
+    "\n",
+    "print(\"=== Query 1: Vehicle Trip History ===\")\n",
+    "\n",
+    "vehicle_history = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT vehicle_id, trip_date, route_id, distance, fuel_consumed, load_factor\n",
+    "\n",
+    "FROM transportation.analytics.fleet_trips\n",
+    "\n",
+    "WHERE vehicle_id = 'VH0001'\n",
+    "\n",
+    "ORDER BY trip_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "vehicle_history.show()\n",
+    "\n",
+    "print(f\"Records found: {vehicle_history.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 2: Time-based fuel efficiency analysis - benefits from trip_date clustering\n",
+    "\n",
+    "print(\"\\n=== Query 2: Recent Fuel Efficiency Issues ===\")\n",
+    "\n",
+    "fuel_efficiency = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT trip_date, vehicle_id, route_id, distance, fuel_consumed,\n",
+    "\n",
+    "       ROUND(distance / fuel_consumed, 2) as mpg\n",
+    "\n",
+    "FROM transportation.analytics.fleet_trips\n",
+    "\n",
+    "WHERE trip_date >= '2024-06-01' AND (distance / fuel_consumed) < 15\n",
+    "\n",
+    "ORDER BY mpg ASC, trip_date DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "fuel_efficiency.show()\n",
+    "\n",
+    "print(f\"Fuel efficiency issues found: {fuel_efficiency.count()}\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Query 3: Combined vehicle + time query - optimal for our clustering strategy\n",
+    "\n",
+    "print(\"\\n=== Query 3: Vehicle Performance Trends ===\")\n",
+    "\n",
+    "performance_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT vehicle_id, trip_date, route_id, distance, duration, load_factor\n",
+    "\n",
+    "FROM transportation.analytics.fleet_trips\n",
+    "\n",
+    "WHERE vehicle_id LIKE 'VH000%' AND trip_date >= '2024-04-01'\n",
+    "\n",
+    "ORDER BY vehicle_id, trip_date\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "performance_trends.show()\n",
+    "\n",
+    "print(f\"Performance trend records found: {performance_trends.count()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Analyze Clustering Effectiveness\n",
+    "\n",
+    "### Understanding the Impact\n",
+    "\n",
+    "Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the transportation insights possible with this optimized structure.\n",
+    "\n",
+    "### Key Analytics\n",
+    "\n",
+    "- **Vehicle utilization** and performance metrics\n",
+    "- **Route efficiency** and fuel consumption analysis\n",
+    "- **Fleet capacity utilization** and load factors\n",
+    "- **Operational cost trends** and optimization opportunities"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "=== Vehicle Performance Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+----------+-----------+--------------+----------+-------+---------------+-----------+\n",
+       "|vehicle_id|total_trips|total_distance|total_fuel|avg_mpg|avg_load_factor|total_miles|\n",
+       "+----------+-----------+--------------+----------+-------+---------------+-----------+\n",
+       "|    VH0051|         59|       13834.0|   2033.34|   6.57|          87.44|    13834.0|\n",
+       "|    VH0123|         60|      13633.12|   1879.83|   7.08|          84.27|    13633.0|\n",
+       "|    VH0453|         57|      12890.51|    1937.5|   6.63|          86.86|    12891.0|\n",
+       "|    VH0343|         54|      12846.02|   1855.01|   6.95|          87.28|    12846.0|\n",
+       "|    VH0088|         57|      12547.45|   1816.14|   6.88|          85.05|    12547.0|\n",
+       "|    VH0238|         59|      12448.77|    1814.0|   7.05|          86.02|    12449.0|\n",
+       "|    VH0278|         53|      12418.19|   1824.83|   6.77|          87.13|    12418.0|\n",
+       "|    VH0427|         54|      12406.78|   1753.24|   6.86|          87.31|    12407.0|\n",
+       "|    VH0406|         60|      12304.12|   1810.67|    6.6|          87.52|    12304.0|\n",
+       "|    VH0049|         60|      12277.11|    1786.7|   6.74|           86.2|    12277.0|\n",
+       "|    VH0242|         58|      12200.91|   1794.24|   6.57|           86.6|    12201.0|\n",
+       "|    VH0253|         49|      12046.66|   1631.43|    7.0|          87.39|    12047.0|\n",
+       "|    VH0160|         57|      12003.29|   1622.78|   7.13|          86.44|    12003.0|\n",
+       "|    VH0126|         55|      11965.73|   1809.87|   6.63|          86.84|    11966.0|\n",
+       "|    VH0280|         40|       11953.7|    1677.1|   7.02|          86.33|    11954.0|\n",
+       "|    VH0362|         60|       11920.8|   1718.56|    6.8|           86.6|    11921.0|\n",
+       "|    VH0114|         60|      11910.09|   1783.69|   6.33|          86.62|    11910.0|\n",
+       "|    VH0498|         51|      11864.91|   1701.58|   6.74|          85.78|    11865.0|\n",
+       "|    VH0111|         60|      11821.77|   1702.64|   6.48|          86.87|    11822.0|\n",
+       "|    VH0244|         59|      11607.77|   1665.22|   6.74|          87.83|    11608.0|\n",
+       "+----------+-----------+--------------+----------+-------+---------------+-----------+\n",
+       "only showing top 20 rows\n",
+       "\n",
+       "\n",
+       "=== Route Efficiency Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+--------------+-----------+------------+------------+---------+---------------+\n",
+       "|      route_id|total_trips|avg_distance|avg_duration|avg_speed|avg_load_factor|\n",
+       "+--------------+-----------+------------+------------+---------+---------------+\n",
+       "|RT_NYC_MAN_001|       4111|      177.28|      255.39|    38.95|          86.55|\n",
+       "|RT_CHI_DET_003|       4060|      185.45|      265.02|    39.14|          86.34|\n",
+       "|RT_LAX_SFO_002|       4029|      180.58|       258.5|    39.27|          86.44|\n",
+       "|RT_MIA_ORL_005|       3991|      181.71|      260.37|    39.22|          86.31|\n",
+       "|RT_HOU_DAL_004|       3985|      182.39|      261.78|    39.02|          86.49|\n",
+       "+--------------+-----------+------------+------------+---------+---------------+\n",
+       "\n",
+       "\n",
+       "=== Fleet Fuel Consumption Analysis ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+------------------------+----------+-------+---------------+\n",
+       "|fuel_efficiency_category|trip_count|avg_mpg|total_fuel_used|\n",
+       "+------------------------+----------+-------+---------------+\n",
+       "|        Poor (10-14 MPG)|       941|  10.79|       21486.63|\n",
+       "|     Very Poor (<10 MPG)|     19235|   6.46|      514816.86|\n",
+       "+------------------------+----------+-------+---------------+\n",
+       "\n",
+       "\n",
+       "=== Monthly Operational Trends ===\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "+-------+-----------+----------------+------------+---------------+---------------+\n",
+       "|  month|total_trips|monthly_distance|monthly_fuel|avg_load_factor|active_vehicles|\n",
+       "+-------+-----------+----------------+------------+---------------+---------------+\n",
+       "|2024-01|       1756|       319156.46|    46719.73|          86.38|            479|\n",
+       "|2024-02|       1559|       268888.27|    39614.96|          86.14|            467|\n",
+       "|2024-03|       1703|       318040.34|    46279.08|          86.41|            478|\n",
+       "|2024-04|       1641|       303054.54|    44354.61|          86.33|            477|\n",
+       "|2024-05|       1713|       316645.41|    46364.37|          86.38|            476|\n",
+       "|2024-06|       1700|       312248.26|    46181.31|          86.65|            474|\n",
+       "|2024-07|       1637|       291667.53|    43144.47|          86.67|            482|\n",
+       "|2024-08|       1704|       302704.22|    44418.71|           86.5|            481|\n",
+       "|2024-09|       1640|       295376.12|    42881.73|          86.49|            475|\n",
+       "|2024-10|       1700|        307452.9|     44814.6|          86.36|            480|\n",
+       "|2024-11|       1696|       316414.16|    46494.07|          86.55|            472|\n",
+       "|2024-12|       1727|       309624.27|    45035.85|          86.25|            480|\n",
+       "+-------+-----------+----------------+------------+---------------+---------------+\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Analyze clustering effectiveness and transportation insights\n",
+    "\n",
+    "\n",
+    "# Vehicle performance analysis\n",
+    "\n",
+    "print(\"=== Vehicle Performance Analysis ===\")\n",
+    "\n",
+    "vehicle_performance = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT vehicle_id, COUNT(*) as total_trips,\n",
+    "\n",
+    "       ROUND(SUM(distance), 2) as total_distance,\n",
+    "\n",
+    "       ROUND(SUM(fuel_consumed), 2) as total_fuel,\n",
+    "\n",
+    "       ROUND(AVG(distance / fuel_consumed), 2) as avg_mpg,\n",
+    "\n",
+    "       ROUND(AVG(load_factor), 2) as avg_load_factor,\n",
+    "\n",
+    "       ROUND(SUM(distance), 0) as total_miles\n",
+    "\n",
+    "FROM transportation.analytics.fleet_trips\n",
+    "\n",
+    "GROUP BY vehicle_id\n",
+    "\n",
+    "ORDER BY total_miles DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "vehicle_performance.show()\n",
+    "\n",
+    "\n",
+    "# Route efficiency analysis\n",
+    "\n",
+    "print(\"\\n=== Route Efficiency Analysis ===\")\n",
+    "\n",
+    "route_efficiency = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT route_id, COUNT(*) as total_trips,\n",
+    "\n",
+    "       ROUND(AVG(distance), 2) as avg_distance,\n",
+    "\n",
+    "       ROUND(AVG(duration), 2) as avg_duration,\n",
+    "\n",
+    "       ROUND(AVG(distance / duration * 60), 2) as avg_speed,\n",
+    "\n",
+    "       ROUND(AVG(load_factor), 2) as avg_load_factor\n",
+    "\n",
+    "FROM transportation.analytics.fleet_trips\n",
+    "\n",
+    "GROUP BY route_id\n",
+    "\n",
+    "ORDER BY total_trips DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "route_efficiency.show()\n",
+    "\n",
+    "\n",
+    "# Fleet fuel consumption analysis\n",
+    "\n",
+    "print(\"\\n=== Fleet Fuel Consumption Analysis ===\")\n",
+    "\n",
+    "fuel_analysis = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT \n",
+    "\n",
+    "    CASE \n",
+    "\n",
+    "        WHEN distance / fuel_consumed >= 25 THEN 'Excellent (25+ MPG)'\n",
+    "\n",
+    "        WHEN distance / fuel_consumed >= 20 THEN 'Good (20-24 MPG)'\n",
+    "\n",
+    "        WHEN distance / fuel_consumed >= 15 THEN 'Average (15-19 MPG)'\n",
+    "\n",
+    "        WHEN distance / fuel_consumed >= 10 THEN 'Poor (10-14 MPG)'\n",
+    "\n",
+    "        ELSE 'Very Poor (<10 MPG)'\n",
+    "\n",
+    "    END as fuel_efficiency_category,\n",
+    "\n",
+    "    COUNT(*) as trip_count,\n",
+    "\n",
+    "    ROUND(AVG(distance / fuel_consumed), 2) as avg_mpg,\n",
+    "\n",
+    "    ROUND(SUM(fuel_consumed), 2) as total_fuel_used\n",
+    "\n",
+    "FROM transportation.analytics.fleet_trips\n",
+    "\n",
+    "GROUP BY \n",
+    "\n",
+    "    CASE \n",
+    "\n",
+    "        WHEN distance / fuel_consumed >= 25 THEN 'Excellent (25+ MPG)'\n",
+    "\n",
+    "        WHEN distance / fuel_consumed >= 20 THEN 'Good (20-24 MPG)'\n",
+    "\n",
+    "        WHEN distance / fuel_consumed >= 15 THEN 'Average (15-19 MPG)'\n",
+    "\n",
+    "        WHEN distance / fuel_consumed >= 10 THEN 'Poor (10-14 MPG)'\n",
+    "\n",
+    "        ELSE 'Very Poor (<10 MPG)'\n",
+    "\n",
+    "    END\n",
+    "\n",
+    "ORDER BY avg_mpg DESC\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "fuel_analysis.show()\n",
+    "\n",
+    "\n",
+    "# Monthly operational trends\n",
+    "\n",
+    "print(\"\\n=== Monthly Operational Trends ===\")\n",
+    "\n",
+    "monthly_trends = spark.sql(\"\"\"\n",
+    "\n",
+    "SELECT DATE_FORMAT(trip_date, 'yyyy-MM') as month,\n",
+    "\n",
+    "       COUNT(*) as total_trips,\n",
+    "\n",
+    "       ROUND(SUM(distance), 2) as monthly_distance,\n",
+    "\n",
+    "       ROUND(SUM(fuel_consumed), 2) as monthly_fuel,\n",
+    "\n",
+    "       ROUND(AVG(load_factor), 2) as avg_load_factor,\n",
+    "\n",
+    "       COUNT(DISTINCT vehicle_id) as active_vehicles\n",
+    "\n",
+    "FROM transportation.analytics.fleet_trips\n",
+    "\n",
+    "GROUP BY DATE_FORMAT(trip_date, 'yyyy-MM')\n",
+    "\n",
+    "ORDER BY month\n",
+    "\n",
+    "\"\"\")\n",
+    "\n",
+    "\n",
+    "\n",
+    "monthly_trends.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways: Delta Liquid Clustering in AIDP\n",
+    "\n",
+    "### What We Demonstrated\n",
+    "\n",
+    "1. **Automatic Optimization**: Created a table with `CLUSTER BY (vehicle_id, trip_date)` and let Delta automatically optimize data layout\n",
+    "\n",
+    "2. **Performance Benefits**: Queries on clustered columns (vehicle_id, trip_date) are significantly faster due to data locality\n",
+    "\n",
+    "3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically\n",
+    "\n",
+    "4. **Real-World Use Case**: Transportation analytics where fleet monitoring and route optimization are critical\n",
+    "\n",
+    "### AIDP Advantages\n",
+    "\n",
+    "- **Unified Analytics**: Seamlessly integrates with other AIDP services\n",
+    "- **Governance**: Catalog and schema isolation for transportation data\n",
+    "- **Performance**: Optimized for both OLAP and OLTP workloads\n",
+    "- **Scalability**: Handles transportation-scale data volumes effortlessly\n",
+    "\n",
+    "### Best Practices for Liquid Clustering\n",
+    "\n",
+    "1. **Choose clustering columns** based on your most common query patterns\n",
+    "2. **Start with 1-4 columns** - too many can reduce effectiveness\n",
+    "3. **Consider cardinality** - high-cardinality columns work best\n",
+    "4. **Monitor and adjust** as query patterns evolve\n",
+    "\n",
+    "### Next Steps\n",
+    "\n",
+    "- Explore other AIDP features like AI/ML integration\n",
+    "- Try liquid clustering with different column combinations\n",
+    "- Scale up to larger transportation datasets\n",
+    "- Integrate with real GPS tracking and IoT sensor data\n",
+    "\n",
+    "This notebook demonstrates how Oracle AI Data Platform makes advanced transportation analytics accessible while maintaining enterprise-grade performance and governance."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}