### **SATCAT Data Pipeline: Global Registry Standardization**

**Dataset:** CelesTrak Satellite Catalog (SATCAT)  
**Objective:** Transform raw orbital tracking data into a physics-ready, "Gold Standard" global registry.

### **The Engineering Challenge**
The raw SATCAT is the premier source for orbital tracking (location) but presents a significant **Physics Transparency Gap**. While it identifies ~60,000 objects, nearly **82.9%** lack critical mass data, and many fields contain non-numeric placeholders or uncalibrated Radar Cross Section (RCS) values.

1.  **Ingestion & Schema Alignment:** Normalize headers to strict `snake_case` and implement `in_orbit` logic.
2.  **Physics Sanitization:** Neutralize `0.0` placeholders and enforce numeric type-safety.
3.  **Keplerian Reconstruction:** Mathematically derive missing orbital periods using Keplerâ€™s Third Law.
4.  **Tiered Mass Imputation:** Apply ESA-standard proxies for Debris and Rocket Bodies to provide a kinetic baseline.
5.  **Categorical & Geopolitical Hardening:** Normalize object types and owner codes for seamless relational integrity.

In [None]:
import pandas as pd
import numpy as np
from IPython.display import Markdown, display

### **Stage 1: Ingestion & Schema Alignment**
**The Problem:** The raw SATCAT headers use legacy uppercase formatting (e.g., `NORAD_CAT_ID`), which is inconsistent with our UCS snake_case convention. Furthermore, a "Gold Standard" dataset must preserve all original metadata to ensure no information is lost during the standardization process.

**The Solution:** * **Global Renaming:** We map critical physical and relational headers to `snake_case` to match the UCS pipeline's "DNA".
* **Feature Preservation:** We retain 100% of the original columns, only renaming the essential ones for programmatic efficiency.
* **In-Orbit Logic:** We introduce the `in_orbit` booleanâ€”a primary feature that separates active kinetic threats from historical decay records.

In [None]:
raw_registry = pd.read_csv('../data/original/satcat.csv')

rename_map = {
    'OBJECT_NAME': 'object_name',
    'OBJECT_ID': 'cospar_id',            # Perfect Match with UCS
    'NORAD_CAT_ID': 'norad_id',          # Perfect Match with UCS
    'OBJECT_TYPE': 'object_type',        # Refined in Stage 4
    'OPS_STATUS_CODE': 'ops_status',   
    'OWNER': 'owner_code',               # To be enriched by UCS 'owner'
    'LAUNCH_DATE': 'launch_date',        # Standardized to datetime
    'LAUNCH_SITE': 'launch_site',        # Matches UCS
    'DECAY_DATE': 'decay_date',
    'PERIOD': 'period_minutes',          # Standardized Physics
    'INCLINATION': 'inclination_degrees',# Standardized Physics
    'APOGEE': 'apogee_km',               # Standardized Physics
    'PERIGEE': 'perigee_km',             # Standardized Physics
    'RCS': 'rcs',                        # Core for Kinetic Modeling
    'DATA_STATUS_CODE': 'data_status', 
    'ORBIT_CENTER': 'orbit_center',    
    'ORBIT_TYPE': 'orbit_type_code'      # Changed to prevent merge collision
}

registry = raw_registry.rename(columns=rename_map).copy()

registry['in_orbit'] = registry['decay_date'].isnull().astype(int)

print(f"--- Stage 1 Audit ---")
print(f"Total Columns Retained: {len(registry.columns)}")
print(f"Total Records:         {len(registry):,}")
print(f"In-Orbit Boolean:      Added (Density: {registry['in_orbit'].mean():.1%})")

### **Stage 1.1: Strategic Ghost Column Audit**
**The Problem:** Raw CSV exports often contain "Ghost Columns" â€” unpopulated placeholders created by formatting artifacts or trailing delimiters in the original database. These columns inflate memory usage and create "Wide Data" noise without adding informational value.

**The Solution:** We implement a **Dynamic Artifact Filter** to identify and purge any columns matching the `Unnamed` pattern or those containing 100% null values. This ensures the dataset remains lean and focused on valid orbital attributes while preserving 100% of the legitimate SATCAT metadata.

In [None]:
# Filter off unwanted columns dynamically to account for version updates
unnamed_columns = [col for col in registry.columns if 'Unnamed' in col]
null_columns = [col for col in registry.columns if registry[col].isnull().all()]
ghost_columns = list(set(unnamed_columns + null_columns))

print(f"--- Ghost Column Audit ---")
if ghost_columns:
    print(f"Found {len(ghost_columns)} artifact columns: {ghost_columns}")
    registry.drop(columns=ghost_columns, inplace=True)
    print(f"âœ… SUCCESS: Artifacts purged. Remaining columns: {len(registry.columns)}")
else:
    print("âœ… CLEAN: No ghost columns or 100% null artifacts detected.")

essential_keys = ['norad_id', 'object_name', 'in_orbit', 'period_minutes', 'perigee_km', 'apogee_km', 'inclination_degrees', 'rcs']
integrity_check = all(key in registry.columns for key in essential_keys)
print(f"Relational Integrity Check: {'PASSED' if integrity_check else 'FAILED'}")

### **Stage 1.2: Pre-Sanitization Health Audit**
**The Problem:** Before enforcing physics standardization, we must quantify the scale of the **Transparency Gap** in the raw registry. Treating `0.0` or string artifacts as valid data during initial EDA would lead to a "Ghost Population" that appears to have no mass or motion, skewing our baseline risk assessments.

**The Solution:** We execute a technical audit to identify "Non-Physical" values across our five core attributes. This report tracks:
* **String Artifacts:** Non-numeric entries that force columns into `object` types.
* **Placeholder Zeros:** Valid numeric `0.0` entries that actually represent missing data in the SATCAT.
* **Density Baseline:** The true percentage of "Physics-Ready" data available in the raw source.

In [None]:
# Core kinetic and geometric attributes
physics_cols = ['period_minutes', 'inclination_degrees', 'apogee_km', 'perigee_km', 'rcs']

print(f"{'--- RAW DATA HEALTH AUDIT ---':^55}")
print(f"{'Column':<20} | {'Non-Numeric':<12} | {'Zeros':<8} | {'Health %'}")
print("-" * 55)

total_records = len(registry)

for col in physics_cols:
    if col in registry.columns:
        # 1. Count Non-Numeric (Strings/NaNs)
        # We try to convert to numeric safely to find what is currently a string
        numeric_series = pd.to_numeric(registry[col], errors='coerce')
        non_numeric = registry[col].isna().sum() + (registry[col].apply(lambda x: isinstance(x, str))).sum()
        
        # 2. Count Placeholder Zeros (Numeric 0.0)
        # Note: We check the numeric version for zeros
        zeros = (numeric_series == 0).sum()
        
        # 3. Calculate "Physics-Ready" Density
        # Valid = Not NaN AND Not Zero
        valid_count = total_records - (non_numeric + zeros)
        health_pct = (valid_count / total_records) * 100
        
        print(f"{col:<20} | {non_numeric:>12,} | {zeros:>8,} | {health_pct:>7.1f}%")

print("-" * 55)
print(f"Total Registry Records: {total_records:,}")
print("Note: 'Health %' represents records that are both numeric and non-zero.")

### **Stage 2: Universal Numeric & Identifier Sanitization**
**The Problem:** High-fidelity orbital modeling requires strict numeric types. However, fields like `rcs`, `period_minutes`, and `perigee_km` often contain `0.0` as a placeholder for "Unknown" data. Furthermore, primary identifiers like `norad_id` and `cospar_id` can contain decimal artifacts or hidden whitespace that prevent clean merging.

**The Solution:**
* **Numeric Coercion:** We apply `pd.to_numeric` across all core physics columns to ensure a stable `float64` foundation.
* **Placeholder Neutralization:** We convert `0.0` values to `NaN` to ensure missing data is handled correctly by statistical functions.
* **ID Normalization:** We force `norad_id` to a clean integer-string and apply a deep string-scrub to `cospar_id` to ensure 1:1 join-readiness with the UCS dataset.

In [None]:
# Expanded lists for universal sanitization
physics_cols = ['period_minutes', 'inclination_degrees', 'apogee_km', 'perigee_km', 'rcs']
id_numeric_cols = ['norad_id']
id_string_cols = ['cospar_id']

print(f"Sanitizing {len(physics_cols) + len(id_numeric_cols) + len(id_string_cols)} attributes...")

# 1. Standardize Numeric Identifiers (UCS Stage 3.3 Logic)
for col in id_numeric_cols:
    if col in registry.columns:
        # Strip decimal artifacts (e.g., 25544.0 -> 25544)
        registry[col] = pd.to_numeric(registry[col], errors='coerce')
        registry = registry.dropna(subset=[col])
        registry[col] = registry[col].astype(int).astype(str)

# 2. Standardize String Identifiers
for col in id_string_cols:
    if col in registry.columns:
        # Strip whitespace and normalize case to prevent "Invisible Bug" merge failures
        registry[col] = registry[col].astype(str).str.strip().str.upper()
        # Neutralize 'NAN' strings
        registry[col] = registry[col].replace('NAN', np.nan)

# 3. Sanitization Loop for Physics (UCS Stage 3.2 & 4.2 Logic)
for col in physics_cols:
    if col in registry.columns:
        # Force to numeric foundation
        registry[col] = pd.to_numeric(registry[col], errors='coerce')
        
        # Neutralize 0.0 placeholders representing 'Missing' data
        registry[col] = registry[col].replace(0, np.nan)
        
        # Enforce Physical Constraints: Geometry cannot be negative
        if col != 'rcs':
            registry.loc[registry[col] < 0, col] = 0

# Verification Diagnostic
print("\n--- Final Sanitization Health Check ---")
for col in physics_cols + id_numeric_cols + id_string_cols:
    null_count = registry[col].isnull().sum()
    dtype = registry[col].dtype
    print(f"{col:<20} | Nulls: {null_count:>6,} | Type: {dtype}")

print("\nâœ… Stage 2 Complete: Numeric and Identifier integrity enforced.")

### **Stage 3: Keplerian Reconstruction (The Density Engine)**
**The Problem:** Despite being a tracking catalog, many entries possess Perigee and Apogee data but are missing a recorded `period_minutes`. To achieve 100% density for our kinetic models, we cannot rely on incomplete records or "ghost" orbits that appear to have no motion.

**The Solution:** We implement **Keplerâ€™s Third Law**. By treating Earth's gravitational constant ($\mu$) and radius as constants, we can mathematically derive the missing periods from the existing orbital geometry. This ensures every object flagged as `in_orbit` is physics-ready for downstream kinetic energy calculations.

In [None]:
# Earth Constants for Orbital Mechanics
earth_radius = 6378.137
mu = 398600.4418 # km^3/s^2 (Earth's gravitational parameter)

def calculate_kepler_period(row):
    """
    Derives orbital period from altitudes if the period is missing.
    Matches the derivation logic used in ucs_cleanup Stage 4.2.
    """
    # Only derive if Period is missing but we have altitudes
    if pd.isna(row['period_minutes']) and not pd.isna(row['perigee_km']) and not pd.isna(row['apogee_km']):
        # a = semi-major axis (Earth Radius + Average Altitude)
        a = earth_radius + ((row['perigee_km'] + row['apogee_km']) / 2)
        # T = 2 * pi * sqrt(a^3 / mu)
        period_seconds = 2 * np.pi * np.sqrt(a**3 / mu)
        return period_seconds / 60
    return row['period_minutes']

print("Executing Keplerian Reconstruction...")
registry['period_minutes'] = registry.apply(calculate_kepler_period, axis=1)

# Diagnostic Audit of the In-Orbit Population
in_orbit_mask = registry['in_orbit'] == 1
missing_p = registry[in_orbit_mask]['period_minutes'].isnull().sum()

print(f"\n--- Keplerian Audit ---")
print(f"Remaining Missing Periods (In-Orbit): {missing_p}")
print(f"Period Density (In-Orbit):           {registry[in_orbit_mask]['period_minutes'].notna().mean():.1%}")

if missing_p == 0:
    print("\nðŸš€ SUCCESS: 100% Period density achieved for the in-orbit population.")

### **Stage 3.1: Orbital Classification & Final Period Imputation**
**The Problem:** Our Keplerian reconstruction achieved 98.1% density, but 615 objects remain "Physics-Blind" because they lack both a recorded period and altitude data. In the UCS pipeline, we addressed similar gaps by leveraging **Grouped Median Imputation**.

**The Solution:** * **Regime Classification:** We implement a `classify_orbit` function to group all objects into **LEO, MEO, GEO,** or **High Elliptical** based on their orbital period.
* **Peer-Group Imputation:** For the final 615 objects, we fill the missing `period_minutes` using the median value of their respective orbital class. This mirrors the Stage 4.2 logic from the UCS cleanup and ensures 100% density for the in-orbit population.

In [None]:
def classify_orbit(period):
    """
    Translates orbital period into standardized regimes.
    Logic based on SATCAT engineering standards.
    """
    if pd.isnull(period) or period <= 0:
        return 'UNKNOWN'
    elif period < 128:
        return 'LEO'
    elif 1400 <= period <= 1460:
        return 'GEO'
    elif 128 <= period < 1400:
        return 'MEO'
    else:
        return 'Elliptical'

# 1. Apply initial classification
registry['orbit_class'] = registry['period_minutes'].apply(classify_orbit)

# 2. Impute Remaining Periods (UCS Stage 4.2 Logic)
# We use the median of the orbit_class to fill the final gaps
print(f"Imputing final {registry[registry['in_orbit'] == 1]['period_minutes'].isnull().sum()} periods...")

# Calculate medians by orbit class
orbit_medians = registry.groupby('orbit_class')['period_minutes'].transform('median')
registry['period_minutes'] = registry['period_minutes'].fillna(orbit_medians)

# 3. Global Safety Net (In case orbit_class was 'UNKNOWN')
global_median = registry['period_minutes'].median()
registry['period_minutes'] = registry['period_minutes'].fillna(global_median)

# 4. Final Physics Audit
in_orbit_mask = registry['in_orbit'] == 1
missing_final = registry[in_orbit_mask]['period_minutes'].isnull().sum()

print(f"\n--- Final Physics Density Audit ---")
print(f"Remaining Missing Periods (In-Orbit): {missing_final}")
print(f"Final Period Density:                {registry[in_orbit_mask]['period_minutes'].notna().mean():.1%}")

if missing_final == 0:
    print("\nðŸš€ SUCCESS: 100% Physics density achieved via Grouped Median Imputation.")

### **Stage 3.2: Final Geometric Sweep**
**The Problem:** While our Period density is now at 100%, the underlying geometric attributesâ€”`inclination_degrees`, `apogee_km`, and `perigee_km`â€”still contain the `NaN` values we neutralized in Stage 2. A "Gold Standard" dataset requires 100% density across the entire physical profile to support precise kinetic modeling.

**The Solution:** Perform a final median sweep across the remaining geometric features. We leverage the `orbit_class` we just engineered to fill these gaps with regime-specific medians, ensuring that an "Unknown LEO" object receives the physical characteristics typical of its neighbors.

In [None]:
sweep_cols = ['inclination_degrees', 'apogee_km', 'perigee_km']

print("Executing Final Geometric Sweep...")

for col in sweep_cols:
    if col in registry.columns:
        # 1. Primary Fill: Grouped by the Orbit Class we just created
        regime_medians = registry.groupby('orbit_class')[col].transform('median')
        registry[col] = registry[col].fillna(regime_medians)
        
        # 2. Safety Fill: Global Median (for any 'UNKNOWN' regimes)
        registry[col] = registry[col].fillna(registry[col].median())

# Final Physics Quality Gate
print(f"\n{'--- FINAL GEOMETRY AUDIT ---':^45}")
print(f"{'Feature':<25} | {'Completeness'}")
print("-" * 45)

for col in ['period_minutes'] + sweep_cols:
    coverage = registry[registry['in_orbit'] == 1][col].notna().mean()
    print(f"{col:<25} | {coverage:>12.1%}")

print("-" * 45)
print("ðŸš€ PASS: 100% Geometric density achieved for the in-orbit population.")

### **Stage 3.3: Integrity Polish & Eccentricity Derivation**
**The Problem:** While our altitude and period density are at 100%, the SATCAT is missing an explicit **`eccentricity`** field, which is a core requirement for the kinetic models used in the UCS pipeline. Additionally, **`geo_longitude`** must be neutralized to ensure it doesn't contain string artifacts.

**The Solution:** 
* **Mathematical Derivation:** We derive eccentricity using the formula $e = (r_a - r_p) / (r_a + r_p)$, where $r$ is the distance from the Earth's center. 
* **Schema Alignment:** We add these features to our final "Gold Standard" sweep to ensure 100% parity with the UCS dataset.

In [None]:
# Derive Eccentricity
# Formula: e = (Apogee - Perigee) / (Apogee + Perigee + 2*Earth_Radius)
earth_radius = 6378.137

def derive_eccentricity(row):
    ra = row['apogee_km'] + earth_radius
    rp = row['perigee_km'] + earth_radius
    return (ra - rp) / (ra + rp)

print("Deriving orbital eccentricity...")
registry['eccentricity'] = registry.apply(derive_eccentricity, axis=1)

# 2. Initialize geo_longitude if it doesn't exist (to match UCS schema)
if 'geo_longitude' not in registry.columns:
    registry['geo_longitude'] = np.nan

# 3. Final Sweep for the "Missing Two"
final_sweep = ['eccentricity', 'geo_longitude']

for col in final_sweep:
    # Grouped Median Fill
    regime_medians = registry.groupby('orbit_class')[col].transform('median')
    registry[col] = registry[col].fillna(regime_medians)
    
    # Global Safety Net
    # GEO longitude defaults to 0 for non-GEO
    registry[col] = registry[col].fillna(0.0)

print(f"\n{'--- COMPLETE GEOMETRIC AUDIT ---':^45}")
print(f"{'Feature':<25} | {'Completeness'}")
print("-" * 45)

all_geo = ['period_minutes', 'inclination_degrees', 'apogee_km', 'perigee_km', 'eccentricity', 'geo_longitude']
for col in all_geo:
    coverage = registry[registry['in_orbit'] == 1][col].notna().mean()
    print(f"{col:<25} | {coverage:>12.1%}")

print("-" * 45)
print("ðŸš€ PASS: 100% Parity with UCS geometric schema achieved.")

### **Stage 4: Categorical Hardening & String Scrubbing**
**The Problem:** The raw SATCAT utilizes abbreviated codes (e.g., `PAY`, `R/B`) and inconsistent string formatting. Mixed-case entries and trailing whitespacesâ€”"Invisible Bugs"â€”can cause silent failures during categorical grouping or when merging with the UCS dataset later in the pipeline.

**The Solution:**
* **Object Mapping:** We translate raw codes into a controlled vocabulary (`PAYLOAD`, `ROCKET BODY`, `DEBRIS`) to ensure clarity.
* **Geopolitical Normalization:** We enforce a strict uppercase and `strip()` operation on `owner_code` and `launch_site` to ensure unique labels for geopolitical analysis.
* **Deep Scrub:** We iterate through all text-based columns to neutralize whitespace artifacts and ensure the registry meets our "Gold Standard" for data cleanliness.

In [None]:
# Map Object Types to human-readable vocabulary
type_map = {
    'PAY': 'PAYLOAD', 
    'R/B': 'ROCKET BODY', 
    'DEB': 'DEBRIS',
    'UNK': 'UNKNOWN'
}

print("Standardizing object classifications...")
registry['object_type'] = registry['object_type'].str.strip().str.upper().map(type_map).fillna('UNKNOWN')

# Global String Scrubbing
text_cols = registry.select_dtypes(include=['object']).columns

print(f"Executing Deep Scrub on {len(text_cols)} text columns...")
for col in text_cols:
    registry[col] = registry[col].astype(str).str.strip().str.upper()

# 3. Neutralize 'NAN' strings back to proper np.nan
registry = registry.replace('NAN', np.nan)

# 4. Final Categorical Audit
print("\n--- Categorical Distribution ---")
print(registry['object_type'].value_counts())

print(f"\nâœ… Stage 4 Complete: {len(text_cols)} columns standardized and scrubbed.")

### **Stage 4.1: Operational Status Hardening**
**The Problem:** Raw tracking data utilizes legacy shorthand for satellite health (e.g., `+` for operational, `-` for non-operational). These codes are insufficient for high-level risk reporting or geopolitical audits.

**The Solution:** We map the single-character status codes to standardized, human-readable labels. This ensures that the registry's operational context is immediately accessible and ready for "Zombie" identification in the next phase of the project.

In [None]:
# Map legacy single-character status codes to human-readable labels
# Source: CelesTrak SATCAT Legend
status_map = {
    '+': 'OPERATIONAL',
    '-': 'NON-OPERATIONAL',
    'P': 'PARTIAL',
    'B': 'BACKUP/STANDBY',
    'S': 'STANDBY',
    'X': 'EXTENDED MISSION',
    'D': 'DECAYED'
}

print("Hardening operational status codes...")

# Apply mapping and handle missing values
registry['ops_status'] = registry['ops_status'].map(status_map).fillna('UNKNOWN')

print("\n--- Operational Status Distribution ---")
print(registry['ops_status'].value_counts())

print("\nâœ… Stage 4.1 Complete: Legacy status codes standardized.")

### **Stage 4.2: High-Fidelity Data Enrichment (UCS Integration)**
**The Problem:** The raw SATCAT contains no physical mass data, creating a 100% Transparency Gap. Furthermore, standard `pd.read_csv` operations often infer ID columns as integers, which causes `ValueErrors` when merging against our sanitized string-based identifiers.

**The Solution:** We load the `ucs_cleaned.csv` while explicitly forcing the `norad_id` to a string type. We then perform a targeted "Left Join" on the registry. This reduces our mass gap from 100% down to the ~82.8% specifically composed of debris and rocket bodies, ensuring we use the best available data for active assets.

In [None]:
ucs_path = '../data/clean/ucs_cleaned.csv'

# We use dtype={'norad_id': str} to prevent pandas from inferring it as an integer
ucs_clean = pd.read_csv(ucs_path, dtype={'norad_id': str})

print(f"Loading UCS reference data... Found {len(ucs_clean):,} high-fidelity records.")

# Targeted Enrichment Merge
registry['norad_id'] = registry['norad_id'].astype(str)

registry = registry.merge(
    ucs_clean[['norad_id', 'launch_mass_kg']], 
    on='norad_id', 
    how='left'
)

# Transparency Audit (The "Real" Gap Discovery)
in_orbit_mask = registry['in_orbit'] == 1
total_in_orbit = registry[in_orbit_mask].shape[0]

# Check how many we successfully enriched from UCS
known_mass_count = registry[in_orbit_mask]['launch_mass_kg'].notna().sum()
gap_count = total_in_orbit - known_mass_count
gap_percent = (gap_count / total_in_orbit) * 100

print(f"\n{'--- ENRICHED MASS TRANSPARENCY AUDIT ---':^50}")
print(f"Total In-Orbit Objects:     {total_in_orbit:,}")
print(f"UCS-Enriched (High-Fi):     {known_mass_count:,} ({100-gap_percent:.1f}%)")
print(f"Remaining 'Invisible' Gap:  {gap_count:,} ({gap_percent:.1f}%)")
print("-" * 50)

# Identify the types of objects remaining in the gap
gap_breakdown = registry[in_orbit_mask & registry['launch_mass_kg'].isna()]['object_type'].value_counts()
print("\n--- Distribution of Remaining Gap (By Object Type) ---")
print(gap_breakdown)

print(f"\nâœ… Enrichment Complete. Remaining {gap_percent:.1f}% gap identified for Stage 5 proxies.")

### **Stage 5: Tiered Mass Imputation (The Proxy Engine)**
**The Problem:** Even after UCS enrichment, a ~82.8% gap remains, primarily composed of Debris and Rocket Bodies. To complete our kinetic model, these objects require a physical baseline.

**The Solution:** We initialize `proxy_mass_kg`. We preserve the high-fidelity UCS data where it exists, but fill the remaining `NaN` values using conservative averages from the **European Space Agency (ESA) Space Debris Report**.

In [None]:
# Initialize proxy mass from any existing RCS-derived or reported mass data
# If 'launch_mass_kg' doesn't exist yet in the standalone, we initialize it
if 'launch_mass_kg' not in registry.columns:
    registry['launch_mass_kg'] = np.nan

registry['proxy_mass_kg'] = registry['launch_mass_kg']

# Define categorical averages based on ESA Space Debris Environment Reports
# These provide a conservative baseline for kinetic energy modeling
mass_proxies = {
    'ROCKET BODY': 2000.0,
    'PAYLOAD': 1000.0,
    'DEBRIS': 0.1,
    'UNKNOWN': 0.1
}

print("Applying Tiered Mass Imputation...")

for category, mass_val in mass_proxies.items():
    # Only fill where proxy_mass is currently missing/NaN
    mask = (registry['object_type'] == category) & (registry['proxy_mass_kg'].isna())
    registry.loc[mask, 'proxy_mass_kg'] = mass_val

proxy_density = registry[registry['in_orbit'] == 1]['proxy_mass_kg'].notna().mean()

print(f"\n--- Mass Transparency Audit ---")
print(f"Final Proxy Mass Density (In-Orbit): {proxy_density:.1%}")

if proxy_density == 1.0:
    print("\nðŸš€ SUCCESS: 100% Kinetic density achieved. Registry is physics-ready.")