# 📘 Feature Store Registration
This notebook registers the engineered sensor features table from the Delta Live Tables (DLT) pipeline
into the Databricks Feature Store for easier discoverability, governance, and reuse during model training and inference.


In [0]:
# df = spark.read.table("arao.aerodemo.sensor_features").cache()
# df.count()  # Force materialization to avoid lazy read error
# df.write.format("delta").mode("overwrite").saveAsTable("arao.aerodemo.sensor_features_table")

In [0]:
# %sql
# ALTER TABLE arao.aerodemo.sensor_features_table 
# ALTER COLUMN aircraft_id SET NOT NULL;

# ALTER TABLE arao.aerodemo.sensor_features_table 
# ALTER COLUMN timestamp SET NOT NULL;

In [0]:
# %sql
# ALTER TABLE arao.aerodemo.sensor_features_table 
# ADD CONSTRAINT sensor_features_pk 
# PRIMARY KEY (aircraft_id, timestamp);

In [0]:
# %sql
# ALTER TABLE arao.aerodemo.sensor_features
# ADD CONSTRAINT sensor_features_pk PRIMARY KEY (aircraft_id, timestamp);

In [0]:
from pyspark.sql.types import *
from pyspark.sql import functions as F

# 🧹 1. Drop the old table if it exists to avoid constraint conflicts
spark.sql("DROP TABLE IF EXISTS arao.aerodemo.sensor_features_table")

# 📥 2. Read from existing DLT materialized table
df_raw = spark.table("arao.aerodemo.sensor_features") \
    .filter("aircraft_id IS NOT NULL AND timestamp IS NOT NULL")

# 🧼 3. Clean nulls (as double safety) and cast PKs
df_clean = df_raw.withColumn("aircraft_id", F.col("aircraft_id").cast("string")) \
                 .withColumn("timestamp", F.col("timestamp").cast("string"))

# 💾 4. Save as Delta table with schema overwrite
df_clean.write.format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable("arao.aerodemo.sensor_features_table")

# 🔐 5. Enforce NOT NULL constraints
spark.sql("""
  ALTER TABLE arao.aerodemo.sensor_features_table 
  ALTER COLUMN aircraft_id SET NOT NULL
""")
spark.sql("""
  ALTER TABLE arao.aerodemo.sensor_features_table 
  ALTER COLUMN timestamp SET NOT NULL
""")

# 🛡️ 6. Add primary key constraint (required for Feature Store)
spark.sql("""
  ALTER TABLE arao.aerodemo.sensor_features_table 
  ADD CONSTRAINT sensor_features_pk 
  PRIMARY KEY (aircraft_id, timestamp)
""")

In [0]:
from databricks.feature_store import FeatureStoreClient

fs = FeatureStoreClient()

fs.create_table(
    name="arao.aerodemo.sensor_features_table",
    primary_keys=["aircraft_id", "timestamp"],
    timestamp_keys=["timestamp"],
    description="Engineered features for anomaly prediction from sensor data",
    df=spark.read.table("arao.aerodemo.sensor_features_table")
)


# 🛠️ Feature Store Registration from DLT Materialized Tables

### Why do we need these extra steps after defining DLT tables?

While Delta Live Tables (`@dlt.table`) provides robust data engineering pipelines, **it does NOT persist primary key constraints** or explicit schema-level constraints like `NOT NULL` or `PRIMARY KEY` in the Delta table metadata.

This is because:
- DLT focuses on **data flow orchestration and lineage**, not detailed physical table design.
- By default, DLT outputs are stored as **materialized views** (even when called `dlt.table`), meaning they don’t expose all metadata properties directly to downstream systems.
- If you want to integrate these tables with systems like the **Databricks Feature Store**, you need:
    ✅ A fully materialized, managed Delta table  
    ✅ Explicitly defined primary keys (PKs)  
    ✅ Cleaned and consistent column types matching Feature Store expectations

### Why explicitly set `pipelines.materialize = true`?

This ensures:
- The table is **materialized on disk** (backed by a Delta table) instead of being purely a logical view.
- The DLT system handles physical storage, compaction, and optimization — making it queryable like a native table.

However, even with `materialize: true`, **DLT does not persist PK constraints** in the Delta metadata.

### Why these post-DLT steps?

1️⃣ **Drop + Rewrite as Managed Table**  
We read the DLT materialized table and write it back as a clean Delta-managed table to ensure:
- Schema stability
- Proper registration in the metastore

2️⃣ **Cast PK Columns**  
We cast `aircraft_id` and `event_timestamp` to `string` to match Feature Store’s strict type requirements.

3️⃣ **Apply NOT NULL + PRIMARY KEY Constraints**  
Feature Store **requires primary key persistence** at the Delta table level, which DLT does not enforce natively.
We add:
- `ALTER COLUMN ... SET NOT NULL`  
- `ADD CONSTRAINT ... PRIMARY KEY (...)`

4️⃣ **Register in Feature Store**  
We register the cleaned, constraint-enforced table as a **Feature Store table**, enabling:
- Managed feature lookups  
- Model training and inference workflows  
- Consistent feature governance and tracking

---

✅ **Summary:**  
Even though DLT handles most of the data orchestration, for full Feature Store integration we must:
- Ensure physical table materialization  
- Enforce schema constraints  
- Register tables explicitly

This gives us the best of both worlds: DLT’s orchestration + Feature Store’s feature governance.


### 📦 Component-Level Feature Store Registration

This section registers the engineered feature tables for each aircraft component — engine, landing gear, avionics, cabin pressurization, and airframe — into the Databricks Feature Store.

✅ **What’s included:**
- **component_features_engine**  
  Engine-related lag features and rolling averages (e.g., temperature, vibration, oil pressure).
  
- **component_features_landing_gear**  
  Landing gear features such as brake wear, brake temperature, and shock absorber health.

- **component_features_avionics**  
  Avionics system features like signal integrity, error log counts, and system temperature.

- **component_features_cabin_pressurization**  
  Cabin pressurization features including cabin pressure, seal integrity, and airflow rates.

- **component_features_airframe**  
  Airframe structural features covering stress points, fatigue crack growth, and integrity measures.

✅ **Why register?**
Registering these tables ensures:
- Centralized management of feature sets.
- Versioning and reproducibility for ML models.
- Easy integration into model training, batch scoring, and online inference pipelines.

Each table is registered with:
- `aircraft_id` as the primary key.
- `event_timestamp` as the time index.
- A detailed description for downstream consumers.




In [0]:
from databricks.feature_store import FeatureStoreClient
import pyspark.sql.functions as F

fs = FeatureStoreClient()

# 🔧 Define table names
dlt_table = "arao.aerodemo.component_features_engine"
managed_table = "arao.aerodemo.component_features_engine_table"
primary_keys = ["aircraft_id", "event_timestamp"]

# 1️⃣ Drop old managed table if exists
spark.sql(f"DROP TABLE IF EXISTS {managed_table}")

# 2️⃣ Read from DLT materialized table
df_raw = spark.read.table(dlt_table).filter("aircraft_id IS NOT NULL AND event_timestamp IS NOT NULL")

# 3️⃣ Cast PK columns to string (safe type)
df_clean = df_raw.withColumn("aircraft_id", F.col("aircraft_id").cast("string")) \
                 .withColumn("event_timestamp", F.col("event_timestamp").cast("string"))

# 4️⃣ Save as managed Delta table (overwrite + overwrite schema)
df_clean.write.format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable(managed_table)

# 5️⃣ Apply NOT NULL constraints
spark.sql(f"ALTER TABLE {managed_table} ALTER COLUMN aircraft_id SET NOT NULL")
spark.sql(f"ALTER TABLE {managed_table} ALTER COLUMN event_timestamp SET NOT NULL")

# 6️⃣ Add PRIMARY KEY constraint (required by Feature Store)
spark.sql(f"""
    ALTER TABLE {managed_table} 
    ADD CONSTRAINT component_features_engine_pk 
    PRIMARY KEY (aircraft_id, event_timestamp)
""")

# 7️⃣ Register in Feature Store
fs.create_table(
    name=managed_table,
    primary_keys=primary_keys,
    timestamp_keys=["event_timestamp"],
    description="Engineered features for engine components",
    df=spark.read.table(managed_table)
)

In [0]:
from databricks.feature_store import FeatureStoreClient
import pyspark.sql.functions as F

fs = FeatureStoreClient()

# 🔧 Define table names
dlt_table = "arao.aerodemo.component_features_landing_gear"
managed_table = "arao.aerodemo.component_features_landing_gear_table"
primary_keys = ["aircraft_id", "event_timestamp"]

# 1️⃣ Drop old managed table if exists
spark.sql(f"DROP TABLE IF EXISTS {managed_table}")

# 2️⃣ Read from DLT materialized table
df_raw = spark.read.table(dlt_table).filter("aircraft_id IS NOT NULL AND event_timestamp IS NOT NULL")

# 3️⃣ Cast PK columns to string (safe type)
df_clean = df_raw.withColumn("aircraft_id", F.col("aircraft_id").cast("string")) \
                 .withColumn("event_timestamp", F.col("event_timestamp").cast("string"))

# 4️⃣ Save as managed Delta table (overwrite + overwrite schema)
df_clean.write.format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable(managed_table)

# 5️⃣ Apply NOT NULL constraints
spark.sql(f"ALTER TABLE {managed_table} ALTER COLUMN aircraft_id SET NOT NULL")
spark.sql(f"ALTER TABLE {managed_table} ALTER COLUMN event_timestamp SET NOT NULL")

# 6️⃣ Add PRIMARY KEY constraint (required by Feature Store)
spark.sql(f"""
    ALTER TABLE {managed_table} 
    ADD CONSTRAINT component_features_landing_gear_pk 
    PRIMARY KEY (aircraft_id, event_timestamp)
""")

# 7️⃣ Register in Feature Store
fs.create_table(
    name=managed_table,
    primary_keys=primary_keys,
    timestamp_keys=["event_timestamp"],
    description="Engineered features for landing gear components",
    df=spark.read.table(managed_table)
)

In [0]:
from databricks.feature_store import FeatureStoreClient
import pyspark.sql.functions as F

fs = FeatureStoreClient()

# 🔧 Define table names
dlt_table = "arao.aerodemo.component_features_avionics"
managed_table = "arao.aerodemo.component_features_avionics_table"
primary_keys = ["aircraft_id", "event_timestamp"]

# 1️⃣ Drop old managed table if exists
spark.sql(f"DROP TABLE IF EXISTS {managed_table}")

# 2️⃣ Read from DLT materialized table
df_raw = spark.read.table(dlt_table).filter("aircraft_id IS NOT NULL AND event_timestamp IS NOT NULL")

# 3️⃣ Cast PK columns to string (safe type)
df_clean = df_raw.withColumn("aircraft_id", F.col("aircraft_id").cast("string")) \
                 .withColumn("event_timestamp", F.col("event_timestamp").cast("string"))

# 4️⃣ Save as managed Delta table (overwrite + overwrite schema)
df_clean.write.format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable(managed_table)

# 5️⃣ Apply NOT NULL constraints
spark.sql(f"ALTER TABLE {managed_table} ALTER COLUMN aircraft_id SET NOT NULL")
spark.sql(f"ALTER TABLE {managed_table} ALTER COLUMN event_timestamp SET NOT NULL")

# 6️⃣ Add PRIMARY KEY constraint (required by Feature Store)
spark.sql(f"""
    ALTER TABLE {managed_table} 
    ADD CONSTRAINT component_features_avionics_pk 
    PRIMARY KEY (aircraft_id, event_timestamp)
""")

# 7️⃣ Register in Feature Store
fs.create_table(
    name=managed_table,
    primary_keys=primary_keys,
    timestamp_keys=["event_timestamp"],
    description="Engineered features for avionics components",
    df=spark.read.table(managed_table)
)

In [0]:
from databricks.feature_store import FeatureStoreClient
import pyspark.sql.functions as F

fs = FeatureStoreClient()

# 🔧 Define table names
dlt_table = "arao.aerodemo.component_features_cabin_pressurization"
managed_table = "arao.aerodemo.component_features_cabin_pressurization_table"
primary_keys = ["aircraft_id", "event_timestamp"]

# 1️⃣ Drop old managed table if exists
spark.sql(f"DROP TABLE IF EXISTS {managed_table}")

# 2️⃣ Read from DLT materialized table
df_raw = spark.read.table(dlt_table).filter("aircraft_id IS NOT NULL AND event_timestamp IS NOT NULL")

# 3️⃣ Cast PK columns to string (safe type)
df_clean = df_raw.withColumn("aircraft_id", F.col("aircraft_id").cast("string")) \
                 .withColumn("event_timestamp", F.col("event_timestamp").cast("string"))

# 4️⃣ Save as managed Delta table (overwrite + overwrite schema)
df_clean.write.format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable(managed_table)

# 5️⃣ Apply NOT NULL constraints
spark.sql(f"ALTER TABLE {managed_table} ALTER COLUMN aircraft_id SET NOT NULL")
spark.sql(f"ALTER TABLE {managed_table} ALTER COLUMN event_timestamp SET NOT NULL")

# 6️⃣ Add PRIMARY KEY constraint (required by Feature Store)
spark.sql(f"""
    ALTER TABLE {managed_table} 
    ADD CONSTRAINT component_features_cabin_pressurization_pk 
    PRIMARY KEY (aircraft_id, event_timestamp)
""")

# 7️⃣ Register in Feature Store
fs.create_table(
    name=managed_table,
    primary_keys=primary_keys,
    timestamp_keys=["event_timestamp"],
    description="Engineered features for cabin pressurization components",
    df=spark.read.table(managed_table)
)

In [0]:
from databricks.feature_store import FeatureStoreClient
import pyspark.sql.functions as F

fs = FeatureStoreClient()

# 🔧 Define table names
dlt_table = "arao.aerodemo.component_features_airframe"
managed_table = "arao.aerodemo.component_features_airframe_table"
primary_keys = ["aircraft_id", "event_timestamp"]

# 1️⃣ Drop old managed table if exists
spark.sql(f"DROP TABLE IF EXISTS {managed_table}")

# 2️⃣ Read from DLT materialized table
df_raw = spark.read.table(dlt_table).filter("aircraft_id IS NOT NULL AND event_timestamp IS NOT NULL")

# 3️⃣ Cast PK columns to string (safe type)
df_clean = df_raw.withColumn("aircraft_id", F.col("aircraft_id").cast("string")) \
                 .withColumn("event_timestamp", F.col("event_timestamp").cast("string"))

# 4️⃣ Save as managed Delta table (overwrite + overwrite schema)
df_clean.write.format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable(managed_table)

# 5️⃣ Apply NOT NULL constraints
spark.sql(f"ALTER TABLE {managed_table} ALTER COLUMN aircraft_id SET NOT NULL")
spark.sql(f"ALTER TABLE {managed_table} ALTER COLUMN event_timestamp SET NOT NULL")

# 6️⃣ Add PRIMARY KEY constraint (required by Feature Store)
spark.sql(f"""
    ALTER TABLE {managed_table} 
    ADD CONSTRAINT component_features_airframe_pk 
    PRIMARY KEY (aircraft_id, event_timestamp)
""")

# 7️⃣ Register in Feature Store
fs.create_table(
    name=managed_table,
    primary_keys=primary_keys,
    timestamp_keys=["event_timestamp"],
    description="Engineered features for airframe components",
    df=spark.read.table(managed_table)
)

 ### ✅ Summary of Registered Feature Tables

Below is a consolidated overview of all feature tables registered in the Databricks Feature Store for the AeroDemo project, including their primary keys and schema details.

---

#### 🚀 1. `arao.aerodemo.component_features_engine_table`

**Primary Keys:**  
- `aircraft_id`  
- `event_timestamp`

**Schema:**  
- aircraft_id (string)  
- component_id (string)  
- event_timestamp (string)  
- thrust_level (double)  
- fuel_consumption_rate (double)  
- temperature_reading (double)  
- vibration_level (double)  
- oil_pressure (double)  
- health_status (string)  
- prev_temp (double)  
- prev_vibration (double)  
- avg_temp_7d (double)  
- avg_vibration_7d (double)  
- avg_oil_pressure_7d (double)

---

#### 🚀 2. `arao.aerodemo.component_features_landing_gear_table`

**Primary Keys:**  
- `aircraft_id`  
- `event_timestamp`

**Schema:**  
- aircraft_id (string)  
- component_id (string)  
- event_timestamp (string)  
- hydraulic_pressure (double)  
- strut_compression (double)  
- brake_wear (double)  
- brake_temperature (double)  
- shock_absorber_status (double)  
- health_status (string)  
- prev_brake_temp (double)  
- prev_brake_wear (double)  
- avg_brake_temp_7d (double)  
- avg_brake_wear_7d (double)

---

#### 🚀 3. `arao.aerodemo.component_features_airframe_table`

**Primary Keys:**  
- `aircraft_id`  
- `event_timestamp`

**Schema:**  
- aircraft_id (string)  
- component_id (string)  
- event_timestamp (string)  
- stress_points (double)  
- fatigue_crack_growth (double)  
- temperature_fluctuations (double)  
- structural_integrity (double)  
- health_status (string)  
- prev_stress_points (double)  
- prev_fatigue (double)  
- avg_stress_points_7d (double)  
- avg_fatigue_7d (double)

---

#### 🚀 4. `arao.aerodemo.component_features_avionics_table`

**Primary Keys:**  
- `aircraft_id`  
- `event_timestamp`

**Schema:**  
- aircraft_id (string)  
- component_id (string)  
- event_timestamp (string)  
- power_status (double)  
- signal_integrity (double)  
- data_transmission_rate (double)  
- system_temperature (double)  
- error_logs (int)  
- health_status (string)  
- prev_signal_integrity (double)  
- prev_error_logs (int)  
- avg_signal_integrity_7d (double)  
- avg_error_logs_7d (double)

---

#### 🚀 5. `arao.aerodemo.component_features_cabin_pressurization_table`

**Primary Keys:**  
- `aircraft_id`  
- `event_timestamp`

**Schema:**  
- aircraft_id (string)  
- component_id (string)  
- event_timestamp (string)  
- cabin_pressure (double)  
- seal_integrity (double)  
- airflow_rate (double)  
- temperature_control (double)  
- humidity_level (double)  
- health_status (string)  
- prev_cabin_pressure (double)  
- prev_seal_integrity (double)  
- avg_cabin_pressure_7d (double)  
- avg_seal_integrity_7d (double)

---

#### 🚀 6. `arao.aerodemo.sensor_features_table`

**Primary Keys:**  
- `aircraft_id`  
- `timestamp`

**Schema:**  
- timestamp (string)  
- aircraft_id (string)  
- model (string)  
- engine_temp (double)  
- fuel_efficiency (double)  
- vibration (double)  
- altitude (double)  
- airspeed (double)  
- oil_pressure (double)  
- engine_rpm (int)  
- battery_voltage (double)  
- anomaly_score (int)  
- event_type (string)  
- avg_engine_temp_7d (double)  
- avg_vibration_7d (double)  
- avg_rpm_7d (double)  
- prev_anomaly (double)  
- days_since_maint (int)  
- manufacturer (string)  
- engine_type (string)  
- capacity (int)  
- range_km (int)

---

### 💡 Why is this important?

Even though Delta Live Tables (`dlt.table`) creates managed tables, it does **not** enforce or persist primary key constraints on the Delta tables.  
The Databricks Feature Store **requires**:
✅ Explicit primary key definition  
✅ Managed Delta tables (outside `LIVE.` namespace)  
✅ Registered schemas and metadata  

This is why we:  
1️⃣ Extract DLT-generated tables  
2️⃣ Re-save them as `overwrite` Delta managed tables  
3️⃣ Apply NOT NULL and PRIMARY KEY constraints  
4️⃣ Register them into the Feature Store  

This ensures the Feature Store can manage features, serve them for ML workflows, and enforce consistency.