# Energy: Streaming Delta Liquid Clustering Demo



## Overview



This notebook demonstrates **Streaming Delta Liquid Clustering** in Oracle AI Data Platform (AIDP) Workbench using an energy and utilities analytics use case. We leverage PySpark's rate emitter to generate continuous streaming data and showcase real-time analytics with Delta Liquid Clustering.



### What is Streaming with Liquid Clustering?



Combining Structured Streaming with Delta Liquid Clustering provides:



- **Continuous data ingestion**: Real-time data processing with automatic clustering optimization

- **Optimized streaming queries**: Liquid clustering improves performance of streaming aggregations

- **Real-time insights**: Windowed operations for live analytics and monitoring

- **Automatic maintenance**: Delta handles optimization during streaming writes



### Use Case: Real-time Smart Grid Monitoring


We'll process streaming energy consumption data for:


- **Real-time meter monitoring**: Continuous tracking of consumption patterns

- **Live peak demand detection**: Streaming aggregations for demand management

- **Anomaly detection**: Real-time identification of unusual consumption patterns

- **Grid optimization**: Continuous data for operational decision-making


### AIDP Environment Setup


This notebook uses the existing Spark session in your AIDP environment.
# Create energy catalog and analytics schema


# In AIDP, catalogs provide data isolation and governance


spark.sql("CREATE CATALOG IF NOT EXISTS energy")


spark.sql("CREATE SCHEMA IF NOT EXISTS energy.analytics")


print("Energy catalog and analytics schema created successfully!")


In [None]:
# Create energy catalog and analytics schema

# In AIDP, catalogs provide data isolation and governance

spark.sql("CREATE CATALOG IF NOT EXISTS energy")

spark.sql("CREATE SCHEMA IF NOT EXISTS energy.analytics")

print("Energy catalog and analytics schema created successfully!")


In [1]:
spark.sql("CREATE VOLUME IF NOT EXISTS default.default.testdata")


DataFrame[status: string]

## Step 2: Create Delta Table with Liquid Clustering

### Table Design

Our `energy_readings_stream` table will store streaming energy consumption data with the same schema as the original demo:

- **meter_id**: Unique smart meter identifier
- **reading_date**: Timestamp of meter reading
- **energy_type**: Type (Electricity, Gas, Water, Solar)
- **consumption**: Energy consumed (kWh, therms, gallons)
- **location**: Geographic location/region
- **peak_demand**: Peak usage during interval
- **efficiency_rating**: System efficiency (0-100)

### Clustering Strategy

We'll cluster by `meter_id` and `reading_date` to optimize streaming writes and real-time queries.


In [1]:
# Create Delta table with liquid clustering for streaming

# CLUSTER BY defines the columns for automatic optimization

spark.sql("""

CREATE TABLE IF NOT EXISTS energy.analytics.energy_readings_stream (

    meter_id STRING,

    reading_date TIMESTAMP,

    energy_type STRING,

    consumption DECIMAL(10,3),

    location STRING,

    peak_demand DECIMAL(8,2),

    efficiency_rating INT

)

USING DELTA

CLUSTER BY (meter_id, reading_date)

""")

print("Streaming Delta table with liquid clustering created successfully!")
print("Clustering will automatically optimize data layout during streaming writes.")


Streaming Delta table with liquid clustering created successfully!
Clustering will automatically optimize data layout during streaming writes.


## Step 3: Streaming Data Producer with PySpark Rate Emitter

### Streaming Data Generation Strategy

We'll use PySpark's built-in **rate source** to generate continuous streaming data:

- **Rate Source**: Generates rows at a specified rate with `timestamp` and `value` columns
- **Data Transformation**: Convert rate data into realistic energy meter readings
- **Continuous Processing**: Simulate real-time meter data ingestion

### Data Transformation Logic

- **meter_id**: Derived from `value % 2000` to create 2000 unique meters
- **reading_date**: Use the `timestamp` from rate source
- **energy_type/location**: Randomly assigned based on meter characteristics
- **consumption**: Calculated with realistic patterns including time-of-day variations
- **Real-time Simulation**: Data flows continuously for streaming analytics


In [1]:
# Import necessary functions for streaming
from pyspark.sql.functions import col, expr, rand, when, hour, dayofweek, month, abs
from pyspark.sql.types import StringType

# Define constants for data generation
ENERGY_TYPES = ['Electricity', 'Natural Gas', 'Water', 'Solar']
LOCATIONS = ['Residential_NYC', 'Commercial_CHI', 'Industrial_HOU', 'Residential_LAX', 'Commercial_SFO']

# Create streaming DataFrame using rate source
# This generates rows at 10 rows per second
streaming_rate = spark.readStream \
    .format("rate") \
    .option("rowsPerSecond", 10) \
    .load()

print("Rate streaming source created")
print("Schema:")
streaming_rate.printSchema()


Rate streaming source created
Schema:
root
 |-- timestamp: timestamp (nullable = true)
 |-- value: long (nullable = true)



In [1]:
# Transform rate data into energy readings schema
energy_stream = streaming_rate \
    .withColumn("meter_num", (col("value") % 2000) + 1) \
    .withColumn("meter_id", expr("concat('MTR', lpad(cast(meter_num as string), 6, '0'))")) \
    .withColumn("reading_date", col("timestamp")) \
    .withColumn("is_anomalous_meter", when(col("meter_num").isin([42, 123, 456, 789, 999, 1500, 1750]), True).otherwise(False)) \
    .withColumn("energy_type", 
                when((col("value") % 4) == 0, "Electricity")
                .when((col("value") % 4) == 1, "Natural Gas")
                .when((col("value") % 4) == 2, "Water")
                .otherwise("Solar")) \
    .withColumn("location",
                when((col("value") % 5) == 0, "Residential_NYC")
                .when((col("value") % 5) == 1, "Commercial_CHI")
                .when((col("value") % 5) == 2, "Industrial_HOU")
                .when((col("value") % 5) == 3, "Residential_LAX")
                .otherwise("Commercial_SFO")) \
    .withColumn("base_consumption",
                when(col("energy_type") == "Electricity", 
                     when(col("location") == "Residential_NYC", 15.0)
                     .when(col("location") == "Commercial_CHI", 150.0)
                     .when(col("location") == "Industrial_HOU", 500.0)
                     .when(col("location") == "Residential_LAX", 12.0)
                     .otherwise(180.0))
                .when(col("energy_type") == "Natural Gas",
                     when(col("location") == "Residential_NYC", 25.0)
                     .when(col("location") == "Commercial_CHI", 80.0)
                     .when(col("location") == "Industrial_HOU", 200.0)
                     .when(col("location") == "Residential_LAX", 20.0)
                     .otherwise(95.0))
                .when(col("energy_type") == "Water",
                     when(col("location") == "Residential_NYC", 180.0)
                     .when(col("location") == "Commercial_CHI", 450.0)
                     .when(col("location") == "Industrial_HOU", 1200.0)
                     .when(col("location") == "Residential_LAX", 160.0)
                     .otherwise(380.0))
                .otherwise(  # Solar
                     when(col("location") == "Residential_NYC", -8.0)
                     .when(col("location") == "Commercial_CHI", -75.0)
                     .when(col("location") == "Industrial_HOU", -250.0)
                     .when(col("location") == "Residential_LAX", -12.0)
                     .otherwise(-95.0))) \
    .withColumn("hour_factor",
                when(hour(col("reading_date")).isin([6,7,8,17,18,19]), 2.5)  # Peak hours
                .when(hour(col("reading_date")).isin([2,3,4,5]), 0.4)  # Off-peak
                .otherwise(1.0)) \
    .withColumn("seasonal_factor",
                when(month(col("reading_date")).isin([12,1,2]), 1.4)  # Winter
                .when(month(col("reading_date")).isin([6,7,8]), 1.3)  # Summer
                .otherwise(1.0)) \
    .withColumn("consumption_multiplier", 
                when(col("is_anomalous_meter"), rand() * 3.0)  # Anomalous meters: 0x to 3x normal range (very extreme)\n"
                .otherwise(0.8 + rand() * 0.4)) \
    .withColumn("consumption", 
                expr("round(base_consumption * hour_factor * seasonal_factor * consumption_multiplier, 3)").cast("decimal(10, 3)")) \
    .withColumn("peak_demand_multiplier",
                when(col("is_anomalous_meter"), 1.2 + rand() * 1.0)  # Anomalous meters: 120% to 220% of consumption
                .otherwise(1.1 + rand() * 0.4)) \
    .withColumn("peak_demand", expr("round(abs(consumption) * peak_demand_multiplier, 2)").cast("decimal(8, 2)")) \
    .withColumn("efficiency_rating",
                when(col("is_anomalous_meter"),
                     when(col("energy_type") == "Electricity", 45 + expr("cast(rand() * 20 as int)"))  # 45-65 range (very low)
                     .when(col("energy_type") == "Natural Gas", 50 + expr("cast(rand() * 20 as int)"))  # 50-70 range (very low)
                     .when(col("energy_type") == "Water", 48 + expr("cast(rand() * 20 as int)"))  # 48-68 range (very low)
                     .otherwise(35 + expr("cast(rand() * 20 as int)")))  # 35-55 range for Solar (very low)
                .otherwise(when(col("energy_type") == "Electricity", 85)
                          .when(col("energy_type") == "Natural Gas", 90)
                          .when(col("energy_type") == "Water", 88)
                          .otherwise(78) \
                          + expr("cast(rand() * 8 - 4 as int)"))) \
    .select("meter_id", "reading_date", "energy_type", "consumption", 
            "location", "peak_demand", "efficiency_rating")

print("Streaming energy data transformation defined")
print("Sample transformed schema:")
energy_stream.printSchema()


Streaming energy data transformation defined
Sample transformed schema:
root
 |-- meter_id: string (nullable = true)
 |-- reading_date: timestamp (nullable = true)
 |-- energy_type: string (nullable = false)
 |-- consumption: decimal(10,3) (nullable = true)
 |-- location: string (nullable = false)
 |-- peak_demand: decimal(8,2) (nullable = true)
 |-- efficiency_rating: integer (nullable = true)



## Step 4: Streaming Write to Delta Table

### Streaming Ingestion Strategy

We'll write the transformed streaming data to the Delta table with liquid clustering:

- **Append Mode**: Continuously add new readings as they arrive
- **Checkpointing**: Enable fault tolerance and exactly-once processing
- **Liquid Clustering**: Automatic optimization during streaming writes
- **Trigger**: Process micro-batches every 10 seconds for demo purposes

### Why Streaming Writes?

- **Real-time Ingestion**: Data becomes available for querying immediately
- **Optimized Layout**: Liquid clustering maintains optimal data organization
- **Scalability**: Handles continuous high-volume data streams
- **Consistency**: ACID transactions ensure data integrity


In [1]:
# Start streaming write to Delta table
# Note: In a real scenario, this would run continuously
# For demo purposes, we'll limit it to a short duration
QUERY_NAME="energy_stream"
checkpointLocation = "/Volumes/default/default/testdata/kafkaStreamingCheckpoint"

query_handle = energy_stream.writeStream \
    .outputMode("append") \
    .trigger(processingTime='10 seconds') \
    .queryName(QUERY_NAME) \
    .option("checkpointLocation", checkpointLocation) \
    .toTable("energy.analytics.energy_readings_stream") 

print("Streaming write query configured")
print("This will continuously write energy readings to the Delta table with liquid clustering")

# For demo purposes, we'll start and stop the stream after a short time
# In production, this would run indefinitely
print("Starting streaming query...")

# Let it run for 30 seconds to generate some data
import time
time.sleep(10)
print(query_handle.status)
time.sleep(10)
print(query_handle.status)
time.sleep(10)
print(query_handle.status)

# Stop the streaming query
query_handle.stop()
print("Streaming query stopped after 30 seconds")
print("Data has been written to the Delta table with liquid clustering optimization")


Streaming write query configured
This will continuously write energy readings to the Delta table with liquid clustering
Starting streaming query...


{'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


{'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


{'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
Streaming query stopped after 30 seconds
Data has been written to the Delta table with liquid clustering optimization


## Step 5: Real-time Streaming Analytics

### Streaming Analytics Strategy

With data continuously flowing into the Delta table, we can perform real-time analytics:

- **Live Dashboards**: Query current state of the energy grid
- **Windowed Aggregations**: Time-based rolling statistics
- **Anomaly Detection**: Identify unusual consumption patterns
- **Peak Demand Monitoring**: Real-time load balancing insights

### Benefits of Streaming Analytics

- **Immediate Insights**: No waiting for batch processing
- **Optimized Queries**: Liquid clustering accelerates analytical queries
- **Continuous Monitoring**: Always up-to-date grid intelligence
- **Operational Efficiency**: Enable real-time decision making


In [1]:
# Analyze the streaming data that was ingested

print("=== Current Energy Grid Status ===")

current_status = spark.sql("""
SELECT COUNT(*) as total_readings,
       COUNT(DISTINCT meter_id) as active_meters,
       ROUND(AVG(consumption), 3) as avg_consumption,
       ROUND(MAX(peak_demand), 2) as current_peak_demand,
       ROUND(AVG(efficiency_rating), 2) as avg_efficiency
FROM energy.analytics.energy_readings_stream
""")

current_status.show()

print("\n=== Real-time Peak Demand by Location ===")
peak_by_location = spark.sql("""
SELECT location, 
       COUNT(*) as reading_count,
       ROUND(MAX(peak_demand), 2) as max_peak_demand,
       ROUND(AVG(peak_demand), 2) as avg_peak_demand,
       COUNT(DISTINCT meter_id) as active_meters
FROM energy.analytics.energy_readings_stream
GROUP BY location
ORDER BY max_peak_demand DESC
""")

peak_by_location.show()


=== Current Energy Grid Status ===


+--------------+-------------+---------------+-------------------+--------------+
|total_readings|active_meters|avg_consumption|current_peak_demand|avg_efficiency|
+--------------+-------------+---------------+-------------------+--------------+
|          3200|         2000|         89.873|            3290.76|         85.13|
+--------------+-------------+---------------+-------------------+--------------+


=== Real-time Peak Demand by Location ===


+---------------+-------------+---------------+---------------+-------------+
|       location|reading_count|max_peak_demand|avg_peak_demand|active_meters|
+---------------+-------------+---------------+---------------+-------------+
| Industrial_HOU|          640|        3290.76|         392.90|          400|
| Commercial_CHI|          640|         431.66|         139.68|          400|
| Commercial_SFO|          640|         377.52|         135.67|          400|
|Residential_LAX|          640|         276.64|          37.29|          400|
|Residential_NYC|          640|         175.76|          41.17|          400|
+---------------+-------------+---------------+---------------+-------------+



In [1]:
print("=== Energy Type Performance Analysis ===")
energy_analysis = spark.sql("""
SELECT energy_type,
       COUNT(*) as total_readings,
       ROUND(SUM(ABS(consumption)), 3) as total_consumption,
       ROUND(AVG(ABS(consumption)), 3) as avg_consumption,
       ROUND(MAX(peak_demand), 2) as max_peak_demand,
       ROUND(AVG(efficiency_rating), 2) as avg_efficiency,
       COUNT(DISTINCT meter_id) as unique_meters
FROM energy.analytics.energy_readings_stream
GROUP BY energy_type
ORDER BY total_consumption DESC
""")

energy_analysis.show()

print("\n=== Hourly Consumption Patterns ===")
hourly_patterns = spark.sql("""
SELECT HOUR(reading_date) as hour,
       COUNT(*) as readings_in_hour,
       ROUND(AVG(ABS(consumption)), 3) as avg_hourly_consumption,
       ROUND(MAX(peak_demand), 2) as max_hourly_peak
FROM energy.analytics.energy_readings_stream
GROUP BY HOUR(reading_date)
ORDER BY hour
""")

hourly_patterns.show()


=== Energy Type Performance Analysis ===


+-----------+--------------+-----------------+---------------+---------------+--------------+-------------+
|energy_type|total_readings|total_consumption|avg_consumption|max_peak_demand|avg_efficiency|unique_meters|
+-----------+--------------+-----------------+---------------+---------------+--------------+-------------+
|      Water|           800|       212207.684|        265.260|        3290.76|         87.95|          500|
|Electricity|           800|        76775.641|         95.970|         495.02|         84.92|          500|
|      Solar|           800|        39141.104|         48.926|         253.07|         77.76|          500|
|Natural Gas|           800|        37752.757|         47.191|         275.44|         89.88|          500|
+-----------+--------------+-----------------+---------------+---------------+--------------+-------------+


=== Hourly Consumption Patterns ===


+----+----------------+----------------------+---------------+
|hour|readings_in_hour|avg_hourly_consumption|max_hourly_peak|
+----+----------------+----------------------+---------------+
|   3|            3200|               114.337|        3290.76|
+----+----------------+----------------------+---------------+



## Step 6: Real-time Anomaly Detection

### Streaming Anomaly Detection Strategy

We'll implement simple statistical anomaly detection for real-time monitoring:

- **Statistical Thresholds**: Flag readings outside normal ranges
- **Efficiency Anomalies**: Identify meters with unusually low efficiency
- **Consumption Spikes**: Detect sudden increases in usage
- **Peak Demand Alerts**: Monitor for grid stability issues

### Real-time Monitoring Benefits

- **Proactive Maintenance**: Identify issues before they cause outages
- **Grid Stability**: Monitor for demand spikes that could cause blackouts
- **Efficiency Optimization**: Find meters needing maintenance or upgrades
- **Fraud Detection**: Identify unusual consumption patterns


In [1]:
from decimal import Decimal
# Calculate statistical baselines for anomaly detection
print("=== Statistical Baselines for Anomaly Detection ===")

baselines = spark.sql("""
SELECT energy_type,
       ROUND(AVG(consumption), 3) as mean_consumption,
       ROUND(STDDEV(consumption), 3) as stddev_consumption,
       ROUND(AVG(peak_demand), 2) as mean_peak_demand,
       ROUND(STDDEV(peak_demand), 2) as stddev_peak_demand,
       ROUND(AVG(efficiency_rating), 2) as mean_efficiency,
       ROUND(STDDEV(efficiency_rating), 2) as stddev_efficiency
FROM energy.analytics.energy_readings_stream
GROUP BY energy_type
""")

baselines.show()

# Convert to pandas for threshold calculations
baselines_pd = baselines.toPandas()

# Define anomaly thresholds (3 standard deviations)
anomaly_thresholds = {}
for _, row in baselines_pd.iterrows():
    energy_type = row['energy_type']
    anomaly_thresholds[energy_type] = {
        'consumption_high': row['mean_consumption'] + 3 * Decimal.from_float(row['stddev_consumption']),
        'consumption_low': row['mean_consumption'] - 3 * Decimal.from_float(row['stddev_consumption']),
        'peak_demand_high': row['mean_peak_demand'] + 3 * Decimal.from_float(row['stddev_peak_demand']),
        'efficiency_low': row['mean_efficiency'] - 3 * row['stddev_efficiency']
    }

print("\nAnomaly thresholds calculated for real-time monitoring")


=== Statistical Baselines for Anomaly Detection ===


+-----------+----------------+------------------+----------------+------------------+---------------+-----------------+
|energy_type|mean_consumption|stddev_consumption|mean_peak_demand|stddev_peak_demand|mean_efficiency|stddev_efficiency|
+-----------+----------------+------------------+----------------+------------------+---------------+-----------------+
|      Water|         265.260|           222.139|          346.55|            299.79|          87.95|             2.54|
|Electricity|          95.970|           100.706|          125.24|            132.36|          84.92|             2.46|
|Natural Gas|          47.191|             37.21|           61.81|             49.35|          89.88|             2.45|
|      Solar|         -48.926|            49.115|           63.78|             64.85|          77.76|             2.76|
+-----------+----------------+------------------+----------------+------------------+---------------+-----------------+




Anomaly thresholds calculated for real-time monitoring


In [1]:
# Perform anomaly detection on the streaming data
print("=== Real-time Anomaly Detection Results ===")

# Register thresholds as a temporary view for SQL queries
thresholds_data = []
for energy_type, thresholds in anomaly_thresholds.items():
    thresholds_data.append({
        'energy_type': energy_type,
        'consumption_high': thresholds['consumption_high'],
        'consumption_low': thresholds['consumption_low'],
        'peak_demand_high': thresholds['peak_demand_high'],
        'efficiency_low': thresholds['efficiency_low']
    })

thresholds_df = spark.createDataFrame(thresholds_data)
thresholds_df.createOrReplaceTempView("anomaly_thresholds")

# Find consumption anomalies
consumption_anomalies = spark.sql("""
SELECT r.meter_id, r.reading_date, r.energy_type, r.consumption, r.location,
       CASE 
         WHEN r.consumption > t.consumption_high THEN 'HIGH_CONSUMPTION'
         WHEN r.consumption < t.consumption_low THEN 'LOW_CONSUMPTION'
         ELSE 'NORMAL'
       END as consumption_status
FROM energy.analytics.energy_readings_stream r
JOIN anomaly_thresholds t ON r.energy_type = t.energy_type
WHERE r.consumption > t.consumption_high OR r.consumption < t.consumption_low
ORDER BY ABS(r.consumption) DESC
LIMIT 10
""")

print("Consumption Anomalies (Top 10):")
consumption_anomalies.show()

# Find peak demand anomalies
peak_anomalies = spark.sql("""
SELECT r.meter_id, r.reading_date, r.energy_type, r.peak_demand, r.location,
       'HIGH_PEAK_DEMAND' as alert_type
FROM energy.analytics.energy_readings_stream r
JOIN anomaly_thresholds t ON r.energy_type = t.energy_type
WHERE r.peak_demand > t.peak_demand_high
ORDER BY r.peak_demand DESC
LIMIT 10
""")

print("\nPeak Demand Anomalies (Top 10):")
peak_anomalies.show()

# Find efficiency anomalies
efficiency_anomalies = spark.sql("""
SELECT r.meter_id, r.energy_type, r.location,
       ROUND(AVG(r.efficiency_rating), 2) as avg_efficiency,
       COUNT(*) as reading_count,
       'LOW_EFFICIENCY' as alert_type
FROM energy.analytics.energy_readings_stream r
JOIN anomaly_thresholds t ON r.energy_type = t.energy_type
GROUP BY r.meter_id, r.energy_type, r.location, t.efficiency_low
HAVING AVG(r.efficiency_rating) < t.efficiency_low
ORDER BY avg_efficiency ASC
LIMIT 10
""")

print("\nEfficiency Anomalies (Top 10):")
efficiency_anomalies.show()


=== Real-time Anomaly Detection Results ===


Consumption Anomalies (Top 10):


+---------+--------------------+-----------+-----------+--------------+------------------+
| meter_id|        reading_date|energy_type|consumption|      location|consumption_status|
+---------+--------------------+-----------+-----------+--------------+------------------+
|MTR000123|2025-12-15 03:12:...|      Water|   1989.147|Industrial_HOU|  HIGH_CONSUMPTION|
+---------+--------------------+-----------+-----------+--------------+------------------+




Peak Demand Anomalies (Top 10):


+---------+--------------------+-----------+-----------+--------------+----------------+
| meter_id|        reading_date|energy_type|peak_demand|      location|      alert_type|
+---------+--------------------+-----------+-----------+--------------+----------------+
|MTR000123|2025-12-15 03:12:...|      Water|    3290.76|Industrial_HOU|HIGH_PEAK_DEMAND|
|MTR001750|2025-12-15 03:12:...|Natural Gas|     275.44|Commercial_SFO|HIGH_PEAK_DEMAND|
+---------+--------------------+-----------+-----------+--------------+----------------+




Efficiency Anomalies (Top 10):


+---------+-----------+---------------+--------------+-------------+--------------+
| meter_id|energy_type|       location|avg_efficiency|reading_count|    alert_type|
+---------+-----------+---------------+--------------+-------------+--------------+
|MTR000456|      Solar|Residential_NYC|          43.5|            2|LOW_EFFICIENCY|
|MTR001500|      Solar| Commercial_SFO|          49.0|            1|LOW_EFFICIENCY|
|MTR000789|Electricity|Residential_LAX|          53.0|            2|LOW_EFFICIENCY|
|MTR000042|Natural Gas| Commercial_CHI|          55.0|            1|LOW_EFFICIENCY|
|MTR000999|      Water|Residential_LAX|          60.0|            2|LOW_EFFICIENCY|
|MTR001750|Natural Gas| Commercial_SFO|          60.0|            1|LOW_EFFICIENCY|
|MTR000123|      Water| Industrial_HOU|          61.0|            1|LOW_EFFICIENCY|
+---------+-----------+---------------+--------------+-------------+--------------+



## Key Takeaways: Streaming Delta Liquid Clustering

### What We Demonstrated

1. **Streaming Data Ingestion**: Used PySpark's rate emitter to generate continuous energy meter readings

2. **Real-time Processing**: Transformed and ingested data in micro-batches with fault tolerance

3. **Liquid Clustering Optimization**: Automatic data layout optimization during streaming writes

4. **Real-time Analytics**: Live queries and aggregations on streaming data

5. **Anomaly Detection**: Statistical monitoring for operational intelligence

### AIDP Streaming Advantages

- **Unified Platform**: Seamlessly combines streaming ingestion and analytical queries
- **Optimized Performance**: Liquid clustering accelerates real-time data access
- **Fault Tolerance**: Checkpointing ensures exactly-once processing
- **Scalability**: Handles high-volume streaming data with automatic optimization

### Real-time Energy Insights

- **Grid Monitoring**: Continuous visibility into energy consumption patterns
- **Demand Management**: Real-time peak detection for load balancing
- **Operational Intelligence**: Anomaly detection enables proactive maintenance
- **Efficiency Optimization**: Identify underperforming meters and systems

### Next Steps for Production

- Deploy continuous streaming pipelines for 24/7 monitoring
- Integrate with grid control systems for automated responses
- Add predictive analytics using streaming ML models
- Implement real-time alerting and notification systems
- Scale to millions of meters with distributed processing

This notebook demonstrates how Oracle AI Data Platform enables real-time energy analytics through streaming Delta tables with liquid clustering, providing utilities with the tools for intelligent grid management and operational excellence.
