# Real Estate: Delta Liquid Clustering Demo


## Overview


This notebook demonstrates the power of **Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using a real estate analytics use case. Liquid clustering automatically optimizes data layout for query performance without requiring manual partitioning or Z-Ordering.

### What is Liquid Clustering?

Liquid clustering automatically identifies and groups similar data together based on clustering columns you define. This optimization happens automatically during data ingestion and maintenance operations, providing:

- **Automatic optimization**: No manual tuning required
- **Improved query performance**: Faster queries on clustered columns
- **Reduced maintenance**: No need for manual repartitioning
- **Adaptive clustering**: Adjusts as data patterns change

### Use Case: Property Transactions and Market Analysis

We'll analyze real estate transactions and property market data. Our clustering strategy will optimize for:

- **Property-specific queries**: Fast lookups by property ID
- **Time-based analysis**: Efficient filtering by transaction and listing dates
- **Market performance patterns**: Quick aggregation by location and property type

### AIDP Environment Setup

This notebook leverages the existing Spark session in your AIDP environment.

In [None]:
# Create real estate catalog and analytics schema

# In AIDP, catalogs provide data isolation and governance

spark.sql("CREATE CATALOG IF NOT EXISTS real_estate")

spark.sql("CREATE SCHEMA IF NOT EXISTS real_estate.analytics")

print("Real estate catalog and analytics schema created successfully!")

Real estate catalog and analytics schema created successfully!


## Step 2: Create Delta Table with Liquid Clustering

### Table Design

Our `property_transactions` table will store:

- **property_id**: Unique property identifier
- **transaction_date**: Date of property transaction
- **property_type**: Type (Single Family, Condo, Apartment, etc.)
- **sale_price**: Transaction sale price
- **location**: Geographic location/neighborhood
- **days_on_market**: Time property was listed before sale
- **price_per_sqft**: Price per square foot

### Clustering Strategy

We'll cluster by `property_id` and `transaction_date` because:

- **property_id**: Properties may have multiple transactions over time, grouping their sales history together
- **transaction_date**: Time-based queries are critical for market analysis, seasonal trends, and investment performance
- This combination optimizes for both property tracking and temporal market analysis

In [None]:
# Create Delta table with liquid clustering

# CLUSTER BY defines the columns for automatic optimization

spark.sql("""

CREATE TABLE IF NOT EXISTS real_estate.analytics.property_transactions (

    property_id STRING,

    transaction_date DATE,

    property_type STRING,

    sale_price DECIMAL(12,2),

    location STRING,

    days_on_market INT,

    price_per_sqft DECIMAL(8,2)

)

USING DELTA

CLUSTER BY (property_id, transaction_date)

""")

print("Delta table with liquid clustering created successfully!")

print("Clustering will automatically optimize data layout for queries on property_id and transaction_date.")

Delta table with liquid clustering created successfully!
Clustering will automatically optimize data layout for queries on property_id and transaction_date.


## Step 3: Generate Real Estate Sample Data

### Data Generation Strategy

We'll create realistic real estate transaction data including:

- **8,000 properties** with multiple transactions over time
- **Property types**: Single Family, Condo, Townhouse, Apartment, Commercial
- **Realistic market patterns**: Seasonal pricing, location premiums, market fluctuations
- **Geographic diversity**: Different neighborhoods with varying price points

### Why This Data Pattern?

This data simulates real real estate scenarios where:

- Properties appreciate or depreciate over time
- Market conditions vary by season and location
- Investment performance requires historical tracking
- Neighborhood analysis drives pricing strategies
- Market trends influence buying/selling decisions

In [None]:
# Generate sample real estate transaction data

# Using fully qualified imports to avoid conflicts

import random

from datetime import datetime, timedelta


# Define real estate data constants

PROPERTY_TYPES = ['Single Family', 'Condo', 'Townhouse', 'Apartment', 'Commercial']

LOCATIONS = ['Downtown', 'Suburban', 'Waterfront', 'Mountain View', 'Urban Core', 'Residential District']

# Base pricing parameters by property type and location

PRICE_PARAMS = {

    'Single Family': {

        'Downtown': {'base_price': 850000, 'sqft_range': (1800, 3500)},

        'Suburban': {'base_price': 650000, 'sqft_range': (2000, 4000)},

        'Waterfront': {'base_price': 1200000, 'sqft_range': (2200, 4500)},

        'Mountain View': {'base_price': 750000, 'sqft_range': (1900, 3800)},

        'Urban Core': {'base_price': 950000, 'sqft_range': (1600, 3200)},

        'Residential District': {'base_price': 700000, 'sqft_range': (2100, 4200)}

    },

    'Condo': {

        'Downtown': {'base_price': 550000, 'sqft_range': (800, 1800)},

        'Suburban': {'base_price': 350000, 'sqft_range': (900, 2000)},

        'Waterfront': {'base_price': 750000, 'sqft_range': (1000, 2200)},

        'Mountain View': {'base_price': 450000, 'sqft_range': (850, 1900)},

        'Urban Core': {'base_price': 650000, 'sqft_range': (750, 1700)},

        'Residential District': {'base_price': 400000, 'sqft_range': (950, 2100)}

    },

    'Townhouse': {

        'Downtown': {'base_price': 700000, 'sqft_range': (1400, 2800)},

        'Suburban': {'base_price': 550000, 'sqft_range': (1600, 3200)},

        'Waterfront': {'base_price': 900000, 'sqft_range': (1500, 3000)},

        'Mountain View': {'base_price': 600000, 'sqft_range': (1450, 2900)},

        'Urban Core': {'base_price': 800000, 'sqft_range': (1300, 2600)},

        'Residential District': {'base_price': 580000, 'sqft_range': (1650, 3300)}

    },

    'Apartment': {

        'Downtown': {'base_price': 450000, 'sqft_range': (600, 1400)},

        'Suburban': {'base_price': 280000, 'sqft_range': (650, 1500)},

        'Waterfront': {'base_price': 600000, 'sqft_range': (700, 1600)},

        'Mountain View': {'base_price': 350000, 'sqft_range': (625, 1450)},

        'Urban Core': {'base_price': 520000, 'sqft_range': (550, 1300)},

        'Residential District': {'base_price': 320000, 'sqft_range': (675, 1550)}

    },

    'Commercial': {

        'Downtown': {'base_price': 2500000, 'sqft_range': (3000, 10000)},

        'Suburban': {'base_price': 1500000, 'sqft_range': (2500, 8000)},

        'Waterfront': {'base_price': 3500000, 'sqft_range': (4000, 12000)},

        'Mountain View': {'base_price': 1800000, 'sqft_range': (2800, 9000)},

        'Urban Core': {'base_price': 3000000, 'sqft_range': (3500, 11000)},

        'Residential District': {'base_price': 1600000, 'sqft_range': (2600, 8500)}

    }

}


# Generate property transaction records

transaction_data = []

base_date = datetime(2024, 1, 1)


# Create 8,000 properties with 1-4 transactions each

for property_num in range(1, 8001):

    property_id = f"PROP{property_num:06d}"
    
    # Each property gets 1-4 transactions over 12 months (most have 1, some flip/resale)

    num_transactions = random.choices([1, 2, 3, 4], weights=[0.7, 0.2, 0.08, 0.02])[0]
    
    # Select property type and location (consistent for the same property)

    property_type = random.choice(PROPERTY_TYPES)

    location = random.choice(LOCATIONS)
    
    params = PRICE_PARAMS[property_type][location]
    
    # Base square footage for this property

    sqft = random.randint(params['sqft_range'][0], params['sqft_range'][1])
    
    for i in range(num_transactions):

        # Spread transactions over 12 months

        days_offset = random.randint(0, 365)

        transaction_date = base_date + timedelta(days=days_offset)
        
        # Calculate sale price with market variations

        # Seasonal pricing (higher in spring/summer)

        month = transaction_date.month

        if month in [3, 4, 5, 6]:  # Spring/Summer peak

            seasonal_factor = 1.15

        elif month in [11, 12, 1, 2]:  # Winter off-season

            seasonal_factor = 0.9

        else:

            seasonal_factor = 1.0
        
        # Market appreciation over time (slight increase)

        months_elapsed = (transaction_date.year - base_date.year) * 12 + (transaction_date.month - base_date.month)

        appreciation_factor = 1.0 + (months_elapsed * 0.002)  # 0.2% monthly appreciation

        # Calculate price per square foot

        base_price_per_sqft = params['base_price'] / ((params['sqft_range'][0] + params['sqft_range'][1]) / 2)

        price_per_sqft = round(base_price_per_sqft * seasonal_factor * appreciation_factor * random.uniform(0.9, 1.1), 2)
        
        # Calculate total sale price

        sale_price = round(price_per_sqft * sqft, 2)
        
        # Days on market (varies by property type and market conditions)

        if property_type == 'Commercial':

            days_on_market = random.randint(30, 180)

        else:

            days_on_market = random.randint(7, 90)
        
        transaction_data.append({

            "property_id": property_id,

            "transaction_date": transaction_date.date(),

            "property_type": property_type,

            "sale_price": sale_price,

            "location": location,

            "days_on_market": days_on_market,

            "price_per_sqft": price_per_sqft

        })



print(f"Generated {len(transaction_data)} property transaction records")

print("Sample record:", transaction_data[0])

Generated 11372 property transaction records
Sample record: {'property_id': 'PROP000001', 'transaction_date': datetime.date(2024, 3, 26), 'property_type': 'Single Family', 'sale_price': 1071982.06, 'location': 'Downtown', 'days_on_market': 48, 'price_per_sqft': 404.98}


## Step 4: Insert Data Using PySpark

### Data Insertion Strategy

We'll use PySpark to:

1. **Create DataFrame** from our generated data
2. **Insert into Delta table** with liquid clustering
3. **Verify the insertion** with a sample query

### Why PySpark for Insertion?

- **Distributed processing**: Handles large datasets efficiently
- **Type safety**: Ensures data integrity
- **Optimization**: Leverages Spark's query optimization
- **Liquid clustering**: Automatically applies clustering during insertion

In [None]:
# Insert data using PySpark DataFrame operations

# Using fully qualified function references to avoid conflicts


# Create DataFrame from generated data

df_transactions = spark.createDataFrame(transaction_data)


# Display schema and sample data

print("DataFrame Schema:")

df_transactions.printSchema()



print("\nSample Data:")

df_transactions.show(5)


# Insert data into Delta table with liquid clustering

# The CLUSTER BY (property_id, transaction_date) will automatically optimize the data layout

df_transactions.write.mode("overwrite").saveAsTable("real_estate.analytics.property_transactions")


print(f"\nSuccessfully inserted {df_transactions.count()} records into real_estate.analytics.property_transactions")

print("Liquid clustering automatically optimized the data layout during insertion!")

DataFrame Schema:
root
 |-- days_on_market: long (nullable = true)
 |-- location: string (nullable = true)
 |-- price_per_sqft: double (nullable = true)
 |-- property_id: string (nullable = true)
 |-- property_type: string (nullable = true)
 |-- sale_price: double (nullable = true)
 |-- transaction_date: date (nullable = true)


Sample Data:


+--------------+--------------------+--------------+-----------+-------------+----------+----------------+
|days_on_market|            location|price_per_sqft|property_id|property_type|sale_price|transaction_date|
+--------------+--------------------+--------------+-----------+-------------+----------+----------------+
|            48|            Downtown|        404.98| PROP000001|Single Family|1071982.06|      2024-03-26|
|            22|Residential District|        254.48| PROP000002|    Townhouse| 621440.16|      2024-05-31|
|            62|          Urban Core|        370.97| PROP000003|    Townhouse| 595406.85|      2024-11-14|
|           148|Residential District|        274.64| PROP000004|   Commercial|1020562.24|      2024-10-31|
|            56|            Downtown|        415.72| PROP000005|        Condo| 362092.12|      2024-01-17|
+--------------+--------------------+--------------+-----------+-------------+----------+----------------+
only showing top 5 rows




Successfully inserted 11372 records into real_estate.analytics.property_transactions
Liquid clustering automatically optimized the data layout during insertion!


## Step 5: Demonstrate Liquid Clustering Benefits

### Query Performance Analysis

Now let's see how liquid clustering improves query performance. We'll run queries that benefit from our clustering strategy:

1. **Property transaction history** (clustered by property_id)
2. **Time-based market analysis** (clustered by transaction_date)
3. **Combined property + time queries** (optimal for our clustering)

### Expected Performance Benefits

With liquid clustering, these queries should be significantly faster because:

- **Data locality**: Related records are physically grouped together
- **Reduced I/O**: Less data needs to be read from disk
- **Automatic optimization**: No manual tuning required

In [None]:
# Demonstrate liquid clustering benefits with optimized queries


# Query 1: Property transaction history - benefits from property_id clustering

print("=== Query 1: Property Transaction History ===")

property_history = spark.sql("""

SELECT property_id, transaction_date, property_type, sale_price, location

FROM real_estate.analytics.property_transactions

WHERE property_id = 'PROP000001'

ORDER BY transaction_date DESC

""")



property_history.show()

print(f"Records found: {property_history.count()}")



# Query 2: Time-based high-value transaction analysis - benefits from transaction_date clustering

print("\n=== Query 2: Recent High-Value Transactions ===")

high_value = spark.sql("""

SELECT transaction_date, property_id, property_type, sale_price, location

FROM real_estate.analytics.property_transactions

WHERE transaction_date >= '2024-06-01' AND sale_price > 1000000

ORDER BY sale_price DESC, transaction_date DESC

""")



high_value.show()

print(f"High-value transactions found: {high_value.count()}")



# Query 3: Combined property + time query - optimal for our clustering strategy

print("\n=== Query 3: Property Value Trends ===")

value_trends = spark.sql("""

SELECT property_id, transaction_date, property_type, sale_price, price_per_sqft

FROM real_estate.analytics.property_transactions

WHERE property_id LIKE 'PROP000%' AND transaction_date >= '2024-04-01'

ORDER BY property_id, transaction_date

""")



value_trends.show()

print(f"Value trend records found: {value_trends.count()}")

=== Query 1: Property Transaction History ===


+-----------+----------------+-------------+----------+--------+
|property_id|transaction_date|property_type|sale_price|location|
+-----------+----------------+-------------+----------+--------+
| PROP000001|      2024-03-26|Single Family|1071982.06|Downtown|
+-----------+----------------+-------------+----------+--------+



Records found: 1

=== Query 2: Recent High-Value Transactions ===


+----------------+-----------+-------------+----------+----------+
|transaction_date|property_id|property_type|sale_price|  location|
+----------------+-----------+-------------+----------+----------+
|      2024-06-10| PROP006087|   Commercial| 6386463.6|Waterfront|
|      2024-06-23| PROP006087|   Commercial|6074959.55|Waterfront|
|      2024-06-06| PROP003416|   Commercial|5999320.32|Waterfront|
|      2024-06-30| PROP003052|   Commercial| 5659487.1|Waterfront|
|      2024-06-28| PROP004596|   Commercial|5609426.68|Waterfront|
|      2024-06-11| PROP007661|   Commercial|5575704.44|Waterfront|
|      2024-07-24| PROP003416|   Commercial|5540088.96|Waterfront|
|      2024-06-20| PROP002013|   Commercial|5535950.94|Waterfront|
|      2024-07-05| PROP002013|   Commercial|5288486.98|Waterfront|
|      2024-10-05| PROP004988|   Commercial|5258298.18|Waterfront|
|      2024-06-06| PROP005373|   Commercial|5242295.52|Waterfront|
|      2024-10-02| PROP000600|   Commercial|5229563.04|Waterfr

High-value transactions found: 1684

=== Query 3: Property Value Trends ===


+-----------+----------------+-------------+----------+--------------+
|property_id|transaction_date|property_type|sale_price|price_per_sqft|
+-----------+----------------+-------------+----------+--------------+
| PROP000002|      2024-05-31|    Townhouse| 621440.16|        254.48|
| PROP000003|      2024-11-14|    Townhouse| 595406.85|        370.97|
| PROP000004|      2024-10-31|   Commercial|1020562.24|        274.64|
| PROP000007|      2024-12-18|    Apartment|  465049.2|         323.4|
| PROP000008|      2024-09-28|    Townhouse| 441549.36|        211.47|
| PROP000009|      2024-04-09|   Commercial| 4561434.5|         454.1|
| PROP000009|      2024-10-03|   Commercial|4357219.65|        433.77|
| PROP000009|      2024-10-09|   Commercial|4204535.65|        418.57|
| PROP000010|      2024-05-01|Single Family|1248957.92|        441.64|
| PROP000010|      2024-05-30|Single Family|1194999.68|        422.56|
| PROP000010|      2024-08-09|Single Family|1219122.52|        431.09|
| PROP

Value trend records found: 1046


## Step 6: Analyze Clustering Effectiveness

### Understanding the Impact

Let's examine how liquid clustering has organized our data and analyze some aggregate statistics to demonstrate the real estate insights possible with this optimized structure.

### Key Analytics

- **Property value appreciation** and market performance
- **Location-based pricing** and neighborhood analysis
- **Property type trends** and market segmentation
- **Market timing** and seasonal patterns

In [None]:
# Analyze clustering effectiveness and real estate insights


# Property value analysis

print("=== Property Value Analysis ===")

property_values = spark.sql("""

SELECT property_id, COUNT(*) as total_transactions,

       ROUND(MIN(sale_price), 2) as min_sale_price,

       ROUND(MAX(sale_price), 2) as max_sale_price,

       ROUND(AVG(sale_price), 2) as avg_sale_price,

       ROUND(AVG(price_per_sqft), 2) as avg_price_per_sqft,

       property_type, location

FROM real_estate.analytics.property_transactions

GROUP BY property_id, property_type, location

ORDER BY avg_sale_price DESC

LIMIT 10

""")



property_values.show()


# Location market analysis

print("\n=== Location Market Analysis ===")

location_analysis = spark.sql("""

SELECT location, COUNT(*) as total_transactions,

       ROUND(AVG(sale_price), 2) as avg_sale_price,

       ROUND(AVG(price_per_sqft), 2) as avg_price_per_sqft,

       ROUND(AVG(days_on_market), 2) as avg_days_on_market,

       COUNT(DISTINCT property_id) as unique_properties

FROM real_estate.analytics.property_transactions

GROUP BY location

ORDER BY avg_sale_price DESC

""")



location_analysis.show()


# Property type market trends

print("\n=== Property Type Market Trends ===")

property_trends = spark.sql("""

SELECT property_type, COUNT(*) as total_sales,

       ROUND(AVG(sale_price), 2) as avg_sale_price,

       ROUND(AVG(price_per_sqft), 2) as avg_price_per_sqft,

       ROUND(AVG(days_on_market), 2) as avg_days_on_market,

       COUNT(DISTINCT property_id) as unique_properties

FROM real_estate.analytics.property_transactions

GROUP BY property_type

ORDER BY avg_sale_price DESC

""")



property_trends.show()


# Market timing analysis

print("\n=== Market Timing Analysis ===")

market_timing = spark.sql("""

SELECT 

    CASE 

        WHEN days_on_market <= 30 THEN 'Fast Sale (1-30 days)'

        WHEN days_on_market <= 60 THEN 'Normal Sale (31-60 days)'

        WHEN days_on_market <= 90 THEN 'Slow Sale (61-90 days)'

        ELSE 'Very Slow Sale (90+ days)'

    END as sale_speed,

    COUNT(*) as transaction_count,

    ROUND(AVG(sale_price), 2) as avg_sale_price,

    ROUND(AVG(days_on_market), 2) as avg_days,

    ROUND(SUM(sale_price), 2) as total_volume

FROM real_estate.analytics.property_transactions

GROUP BY 

    CASE 

        WHEN days_on_market <= 30 THEN 'Fast Sale (1-30 days)'

        WHEN days_on_market <= 60 THEN 'Normal Sale (31-60 days)'

        WHEN days_on_market <= 90 THEN 'Slow Sale (61-90 days)'

        ELSE 'Very Slow Sale (90+ days)'

    END

ORDER BY avg_days

""")



market_timing.show()


# Monthly market trends

print("\n=== Monthly Market Trends ===")

monthly_trends = spark.sql("""

SELECT DATE_FORMAT(transaction_date, 'yyyy-MM') as month,

       COUNT(*) as total_transactions,

       ROUND(SUM(sale_price), 2) as monthly_volume,

       ROUND(AVG(sale_price), 2) as avg_sale_price,

       ROUND(AVG(price_per_sqft), 2) as avg_price_per_sqft,

       COUNT(DISTINCT property_id) as unique_properties

FROM real_estate.analytics.property_transactions

GROUP BY DATE_FORMAT(transaction_date, 'yyyy-MM')

ORDER BY month

""")



monthly_trends.show()

=== Property Value Analysis ===


+-----------+------------------+--------------+--------------+--------------+------------------+-------------+----------+
|property_id|total_transactions|min_sale_price|max_sale_price|avg_sale_price|avg_price_per_sqft|property_type|  location|
+-----------+------------------+--------------+--------------+--------------+------------------+-------------+----------+
| PROP001960|                 1|    6349995.27|    6349995.27|    6349995.27|            543.99|   Commercial|Waterfront|
| PROP006087|                 2|    6074959.55|     6386463.6|    6230711.57|            543.45|   Commercial|Waterfront|
| PROP001727|                 1|     5789451.2|     5789451.2|     5789451.2|            517.84|   Commercial|Waterfront|
| PROP007555|                 1|    5784090.12|    5784090.12|    5784090.12|            482.49|   Commercial|Waterfront|
| PROP004332|                 1|    5750284.36|    5750284.36|    5750284.36|            526.39|   Commercial|Urban Core|
| PROP006731|           

+--------------------+------------------+--------------+------------------+------------------+-----------------+
|            location|total_transactions|avg_sale_price|avg_price_per_sqft|avg_days_on_market|unique_properties|
+--------------------+------------------+--------------+------------------+------------------+-----------------+
|          Waterfront|              1881|    1409207.07|            443.56|             59.16|             1322|
|          Urban Core|              1866|    1255564.55|            473.94|              59.7|             1310|
|            Downtown|              1877|    1021967.95|            393.32|              60.1|             1337|
|       Mountain View|              1890|     804231.04|            312.89|             60.04|             1322|
|Residential District|              1907|     723700.39|            267.59|             59.58|             1366|
|            Suburban|              1951|     675060.74|            252.12|             59.67|  

+-------------+-----------+--------------+------------------+------------------+-----------------+
|property_type|total_sales|avg_sale_price|avg_price_per_sqft|avg_days_on_market|unique_properties|
+-------------+-----------+--------------+------------------+------------------+-----------------+
|   Commercial|       2246|    2364622.79|            361.68|             104.5|             1571|
|Single Family|       2285|     882472.63|            306.23|             49.79|             1612|
|    Townhouse|       2294|     712690.76|            323.59|             48.78|             1599|
|        Condo|       2261|     529188.26|            379.15|              47.3|             1579|
|    Apartment|       2286|     424397.47|            410.72|             48.85|             1639|
+-------------+-----------+--------------+------------------+------------------+-----------------+


=== Market Timing Analysis ===


+--------------------+-----------------+--------------+--------+---------------+
|          sale_speed|transaction_count|avg_sale_price|avg_days|   total_volume|
+--------------------+-----------------+--------------+--------+---------------+
|Fast Sale (1-30 d...|             2603|     648593.87|   18.48| 1.6882898554E9|
|Normal Sale (31-6...|             3715|     849302.21|   45.58| 3.1551577265E9|
|Slow Sale (61-90 ...|             3718|     841586.19|   75.59|3.12901747284E9|
|Very Slow Sale (9...|             1336|    2362655.36|  135.14|3.15650755849E9|
+--------------------+-----------------+--------------+--------+---------------+


=== Monthly Market Trends ===


+-------+------------------+---------------+--------------+------------------+-----------------+
|  month|total_transactions| monthly_volume|avg_sale_price|avg_price_per_sqft|unique_properties|
+-------+------------------+---------------+--------------+------------------+-----------------+
|2024-01|               951| 8.0156294112E8|     842863.24|            314.57|              919|
|2024-02|               920| 7.5918726717E8|     825203.55|            314.39|              902|
|2024-03|               938|1.01247160402E9|    1079394.03|            401.65|              900|
|2024-04|               988|1.14251391792E9|    1156390.61|            396.33|              954|
|2024-05|              1002|1.07273555366E9|    1070594.36|            400.43|              979|
|2024-06|               909|1.01445630032E9|    1116013.53|            403.76|              876|
|2024-07|              1006| 9.4594880925E8|     940306.97|            352.47|              976|
|2024-08|               892| 8

## Key Takeaways: Delta Liquid Clustering in AIDP

### What We Demonstrated

1. **Automatic Optimization**: Created a table with `CLUSTER BY (property_id, transaction_date)` and let Delta automatically optimize data layout

2. **Performance Benefits**: Queries on clustered columns (property_id, transaction_date) are significantly faster due to data locality

3. **Zero Maintenance**: No manual partitioning, bucketing, or Z-Ordering required - Delta handles it automatically

4. **Real-World Use Case**: Real estate analytics where property tracking and market analysis are critical

### AIDP Advantages

- **Unified Analytics**: Seamlessly integrates with other AIDP services
- **Governance**: Catalog and schema isolation for real estate data
- **Performance**: Optimized for both OLAP and OLTP workloads
- **Scalability**: Handles real estate-scale data volumes effortlessly

### Best Practices for Liquid Clustering

1. **Choose clustering columns** based on your most common query patterns
2. **Start with 1-4 columns** - too many can reduce effectiveness
3. **Consider cardinality** - high-cardinality columns work best
4. **Monitor and adjust** as query patterns evolve

### Next Steps

- Explore other AIDP features like AI/ML integration
- Try liquid clustering with different column combinations
- Scale up to larger real estate datasets
- Integrate with real MLS and property management systems

This notebook demonstrates how Oracle AI Data Platform makes advanced real estate analytics accessible while maintaining enterprise-grade performance and governance.