# Gold Layer: Asset Creation

Design gold layer asset creation for the RF Asset Discovery CMDB.

**Purpose:** Transform verified silver signals into business-ready CMDB assets.

**Transformations:**
- Silver → Gold: CMDB field mapping, Purdue level, security posture

## 1. Setup

In [None]:
import duckdb
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

# Connect to DuckDB
DB_PATH = Path('../data/unified.duckdb')
con = duckdb.connect(str(DB_PATH), read_only=True)

print(f"Connected to: {DB_PATH}")

## 2. Asset Criteria

Define what makes a signal worthy of becoming a CMDB asset.

In [None]:
# Asset eligibility criteria
ASSET_CRITERIA = {
    'min_power_db': 5,           # Must be above noise floor
    'min_detections': 2,         # Must be detected multiple times
    'require_known_protocol': True,  # Must have classified protocol
    'exclude_bands': ['unknown', 'gap'],  # Exclude unclassified
}

print("Asset Eligibility Criteria:")
print("=" * 40)
for key, value in ASSET_CRITERIA.items():
    print(f"{key:25}: {value}")

In [None]:
# Apply criteria to signals (simulating silver.verified_signals)
signals = con.execute("""
    SELECT 
        signal_id,
        frequency_hz,
        power_db,
        freq_band,
        detection_count,
        first_seen,
        last_seen,
        location_name
    FROM signals
    WHERE power_db >= 5
      AND freq_band NOT IN ('unknown', 'gap')
    ORDER BY power_db DESC
""").df()

print(f"Asset candidates: {len(signals):,} signals")
print(f"\nBy band:")
print(signals.groupby('freq_band').size().sort_values(ascending=False).head(10))

## 3. CMDB Field Mapping

Map signal fields to CMDB asset schema.

In [None]:
# CMDB field mapping
CMDB_MAPPING = {
    # Signal → Asset field mapping
    'signal_id': 'source_signal_id',
    'frequency_hz': 'rf_frequency_hz',
    'power_db': 'rf_signal_strength_db',
    'freq_band': 'freq_band',
    'first_seen': 'first_seen',
    'last_seen': 'last_seen',
    'location_name': 'location_name',
    
    # Derived fields
    'rf_protocol': 'rf_protocol',  # From band mapping
    'cmdb_ci_class': 'cmdb_ci_class',  # Derived
    'purdue_level': 'purdue_level',  # Derived
    'security_posture': 'security_posture',  # Derived
    'risk_level': 'risk_level',  # Derived
}

print("Signal → Asset Field Mapping:")
print("=" * 50)
for src, dst in CMDB_MAPPING.items():
    print(f"{src:25} → {dst}")

In [None]:
# CMDB CI Class mapping based on band/protocol
CMDB_CI_CLASS_MAP = {
    'fm_broadcast': 'RF_BROADCAST_TRANSMITTER',
    'aircraft': 'RF_AVIATION_TRANSPONDER',
    'adsb': 'RF_ADSB_TRANSPONDER',
    'ism_433': 'RF_IOT_DEVICE',
    'ism_315': 'RF_IOT_DEVICE',
    'ism_868': 'RF_IOT_DEVICE',
    'ism_900': 'RF_IOT_DEVICE',
    'frs_gmrs': 'RF_TWO_WAY_RADIO',
    'marine_vhf': 'RF_MARINE_RADIO',
    'noaa_weather': 'RF_WEATHER_STATION',
    'uhf_amateur': 'RF_AMATEUR_RADIO',
    'vhf_amateur': 'RF_AMATEUR_RADIO',
    'cellular_700': 'RF_CELLULAR_TOWER',
    'cellular_850': 'RF_CELLULAR_TOWER',
    'cellular_1900': 'RF_CELLULAR_TOWER',
    'gps': 'RF_NAVIGATION_SATELLITE',
}

print("\nCMDB CI Class Mapping:")
print("=" * 50)
for band, ci_class in CMDB_CI_CLASS_MAP.items():
    print(f"{band:20} → {ci_class}")

## 4. Protocol Distribution

In [None]:
# Protocol mapping
BAND_PROTOCOL_MAP = {
    'fm_broadcast': 'FM_BROADCAST',
    'aircraft': 'AM_VOICE',
    'adsb': 'ADS_B',
    'ism_433': 'OOK',
    'ism_315': 'OOK',
    'ism_868': 'FSK',
    'ism_900': 'FSK',
    'frs_gmrs': 'FM_VOICE',
    'marine_vhf': 'FM_VOICE',
    'noaa_weather': 'FM_VOICE',
    'uhf_amateur': 'MIXED',
    'vhf_amateur': 'MIXED',
}

# Apply to candidates
signals['rf_protocol'] = signals['freq_band'].map(BAND_PROTOCOL_MAP).fillna('UNKNOWN')
signals['cmdb_ci_class'] = signals['freq_band'].map(CMDB_CI_CLASS_MAP).fillna('RF_EMITTER')

protocol_dist = signals.groupby('rf_protocol').agg({
    'signal_id': 'count',
    'power_db': 'mean'
}).reset_index()
protocol_dist.columns = ['rf_protocol', 'count', 'avg_power']

print("Protocol Distribution:")
print(protocol_dist.sort_values('count', ascending=False).to_string(index=False))

## 5. Purdue Level Assignment

ISA-95 Purdue Model classification for OT security.

In [None]:
# Purdue Level definitions
PURDUE_LEVELS = {
    0: 'Process (Sensors, Actuators)',
    1: 'Basic Control (PLCs, RTUs)',
    2: 'Area Supervisory (HMI, SCADA)',
    3: 'Site Operations (Historians, MES)',
    4: 'Business Planning (ERP, CRM)',
    5: 'Enterprise Network (DMZ, Cloud)',
}

# Band → Purdue Level mapping
BAND_PURDUE_MAP = {
    'ism_433': 0,  # IoT sensors
    'ism_315': 0,  # IoT sensors
    'ism_868': 0,  # IoT sensors
    'ism_900': 1,  # LoRa, industrial
    'fm_broadcast': 5,  # Public broadcast
    'aircraft': 5,  # Public aviation
    'adsb': 5,  # Public aviation
    'marine_vhf': 5,  # Public maritime
    'noaa_weather': 5,  # Public weather
    'frs_gmrs': 4,  # Business comms
    'uhf_amateur': 5,  # Amateur radio
    'cellular_700': 5,  # Public cellular
    'cellular_850': 5,  # Public cellular
}

print("Purdue Level Mapping:")
print("=" * 60)
for band, level in sorted(BAND_PURDUE_MAP.items(), key=lambda x: x[1]):
    desc = PURDUE_LEVELS.get(level, 'Unknown')
    print(f"{band:20} → Level {level}: {desc}")

In [None]:
# Apply Purdue levels
signals['purdue_level'] = signals['freq_band'].map(BAND_PURDUE_MAP).fillna(5)

purdue_dist = signals.groupby('purdue_level').agg({
    'signal_id': 'count'
}).reset_index()
purdue_dist.columns = ['purdue_level', 'count']

print("Asset Distribution by Purdue Level:")
print("=" * 50)
for _, row in purdue_dist.iterrows():
    level = int(row['purdue_level'])
    desc = PURDUE_LEVELS.get(level, 'Unknown')
    print(f"Level {level}: {row['count']:>6,} - {desc}")

## 6. Security Posture Assessment

In [None]:
# Security posture rules
def assess_security_posture(row):
    """Assess security posture based on signal characteristics."""
    # High risk: OT/IoT bands at lower Purdue levels
    if row['purdue_level'] <= 1:
        return 'REQUIRES_REVIEW'
    # Medium risk: Strong unknown signals
    if row['rf_protocol'] == 'UNKNOWN' and row['power_db'] >= 10:
        return 'REQUIRES_REVIEW'
    # Low risk: Known protocols at higher levels
    if row['rf_protocol'] != 'UNKNOWN':
        return 'COMPLIANT'
    return 'NOT_ASSESSED'

def assess_risk_level(row):
    """Assess risk level based on posture and location."""
    if row['security_posture'] == 'REQUIRES_REVIEW':
        if row['purdue_level'] <= 1:
            return 'HIGH'
        return 'MEDIUM'
    return 'LOW'

signals['security_posture'] = signals.apply(assess_security_posture, axis=1)
signals['risk_level'] = signals.apply(assess_risk_level, axis=1)

print("Security Posture Distribution:")
print(signals.groupby('security_posture').size())
print("\nRisk Level Distribution:")
print(signals.groupby('risk_level').size())

## 7. Asset Preview

In [None]:
# Preview gold.assets table
gold_preview = signals[[
    'signal_id', 'frequency_hz', 'power_db', 'freq_band',
    'rf_protocol', 'cmdb_ci_class', 'purdue_level',
    'security_posture', 'risk_level'
]].copy()

gold_preview['freq_mhz'] = gold_preview['frequency_hz'] / 1e6

print("Gold Layer Asset Preview (top 20):")
print("=" * 100)
preview = gold_preview.nlargest(20, 'power_db')[[
    'freq_mhz', 'power_db', 'freq_band', 'rf_protocol', 
    'cmdb_ci_class', 'purdue_level', 'risk_level'
]]
print(preview.to_string(index=False))

In [None]:
# Gold layer CREATE TABLE SQL
gold_sql = """
-- Gold Layer: CMDB Assets
-- Transformation from silver.verified_signals

CREATE TABLE gold.assets AS
SELECT 
    gen_random_uuid() AS id,
    -- Name from frequency and band
    CONCAT(freq_band, '_', CAST(frequency_hz/1e6 AS VARCHAR), 'MHz') AS name,
    'RF_EMITTER' AS asset_type,
    first_seen,
    last_seen,
    1.0 AS correlation_confidence,
    
    -- RF fields
    frequency_hz AS rf_frequency_hz,
    power_db AS rf_signal_strength_db,
    bandwidth_hz AS rf_bandwidth_hz,
    rf_protocol,
    
    -- CMDB fields
    CASE freq_band
        WHEN 'fm_broadcast' THEN 'RF_BROADCAST_TRANSMITTER'
        WHEN 'adsb' THEN 'RF_ADSB_TRANSPONDER'
        WHEN 'ism_433' THEN 'RF_IOT_DEVICE'
        ELSE 'RF_EMITTER'
    END AS cmdb_ci_class,
    
    -- Security fields
    CASE 
        WHEN purdue_level <= 1 THEN 'REQUIRES_REVIEW'
        ELSE 'COMPLIANT'
    END AS security_posture,
    
    CASE 
        WHEN purdue_level <= 1 THEN 'HIGH'
        WHEN rf_protocol = 'UNKNOWN' THEN 'MEDIUM'
        ELSE 'LOW'
    END AS risk_level,
    
    purdue_level,
    
    -- Lineage
    signal_id AS source_signal_id,
    location_name
    
FROM silver.verified_signals
WHERE power_db >= 10  -- High-quality signals only
;
"""

print(gold_sql)

## 8. Cross-Layer Lineage

In [None]:
# Lineage query (simulated)
lineage_sql = """
-- Cross-layer lineage query
-- Traces an asset back through silver to bronze

SELECT 
    g.id AS asset_id,
    g.name AS asset_name,
    g.cmdb_ci_class,
    g.risk_level,
    s.detection_count,
    s.rf_protocol,
    b.first_seen AS bronze_first_seen,
    b.survey_id,
    b.segment_id
FROM gold.assets g
JOIN silver.verified_signals s ON g.source_signal_id = s.signal_id
JOIN bronze.signals b ON s.signal_id = b.signal_id
ORDER BY g.risk_level DESC, b.first_seen
LIMIT 20;
"""

print("Cross-Layer Lineage Query:")
print(lineage_sql)

In [None]:
# Visualize asset distribution
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# By CI Class
ci_class_counts = signals['cmdb_ci_class'].value_counts()
ci_class_counts.plot(kind='barh', ax=axes[0], color='steelblue')
axes[0].set_xlabel('Count')
axes[0].set_title('Assets by CI Class')

# By Purdue Level
purdue_counts = signals['purdue_level'].value_counts().sort_index()
purdue_counts.plot(kind='bar', ax=axes[1], color='coral')
axes[1].set_xlabel('Purdue Level')
axes[1].set_ylabel('Count')
axes[1].set_title('Assets by Purdue Level')

# By Risk Level
risk_counts = signals['risk_level'].value_counts()
colors = {'HIGH': 'red', 'MEDIUM': 'orange', 'LOW': 'green'}
risk_counts.plot(kind='pie', ax=axes[2], colors=[colors.get(x, 'gray') for x in risk_counts.index],
                  autopct='%1.1f%%')
axes[2].set_title('Assets by Risk Level')

plt.tight_layout()
plt.show()

## Summary

### Gold Layer Design

1. **Field Mapping:** Signal → Asset with CMDB enrichment
2. **CI Classification:** 16 CI classes based on frequency band
3. **Purdue Levels:** 6 levels (0-5) for OT security
4. **Security Posture:** COMPLIANT, REQUIRES_REVIEW, NOT_ASSESSED
5. **Risk Assessment:** HIGH, MEDIUM, LOW based on posture + level

### Asset Candidates
- Total candidates: ~265 signals meeting criteria
- High-quality (power >= +10 dB): 97 signals
- OT-relevant (Purdue 0-1): ~10 signals

### Next Steps
1. Create bronze/silver/gold schemas
2. Run transformation pipeline
3. Verify cross-layer lineage

In [None]:
# Cleanup
con.close()
print("Connection closed.")