This is a second dataset I would like to use with the ucs dataset.

First, we need some imports and utility functions for classifying objects from this dataset so that object types atleast loosly match up with object types from the ucs dataset so we can compare apples to apples.

The classify_orbit function acts as a translator. It uses the laws of physics (specifically Kepler's Third Law) to convert the time it takes an object to orbit Earth (Period) into an altitude category (Class).

In [1]:
import pandas as pd
import numpy as np

ucs_data = pd.read_csv('../data/clean/ucs_cleaned.csv')
df_debris = pd.read_csv('../data/original/satcat.csv')

def classify_orbit(row):
    # look at a single row in the dataframe and grab the value in period_minutes
    period = row['period_minutes']
    
    # classify the object based on the period of time, in minutes, it takes to orbit the earth.
    if pd.isnull(period) or period == 0:
        # filter for missing/blank (nan) data. Period of 0 is physically impossible.
        return 'Unknown'
    elif period < 128:
        # Low Earth Orbit (LEO) - Crowded City
        # An object in leo orbit has an altitude of up to 2,000km and has an orbital period of exactly 127 minutes.
        return 'LEO'
    elif 1400 <= period <= 1460:
        # Geostationary Orbit (GEO) - Single Precise Lane
        # An object in geo orbit should be at exactly 35,786 km above the equator. To stay perfectly positioned over the same one spot on earth,
        # you must be exactly 35,786 km above.  Too low and you spin faster than earth, too high and you drift backwards.
        return 'GEO'
    elif 128 <= period < 1400:
        # Medium Earth Orbit (MEO) - Wide Open Highway
        # An object in meo orbit should be between 2,000 kb and 35,786 km and can operate at altitudes
        # anywhere in between. Popular uses for this orbit include Navigation Satellites
        # Orbital Period should be between 2hrs and 24hrs.
        # Good for when the satellite can't in LEO orbit because it would be too low and it would require larger clusters like StarLink, but
        # also it can't be in GEO because that would be too high, cause the signal to weaker, and the satellite would be stuck over one spot.
        # MEO orbit for nav satellitess is just right, it allows a smaller constellation to circle the earth every 12 hours.
        return 'MEO'
    elif period > 1460:
        # These satellites don't move in circles, they move in extreme, long loops (ellipses).
        # Moves very fast near perigee and very very slow near apogee.
        # Spend a long time over a single, high location, and spend much less time elsewhere.
        # Examples include "The Molniya Orbit", a Soviet spy and communication satellite that used to hover over the USSR for 11hrs and then spend only
        # 1hr flying around the rest of earth just to hover over the USSR for another 11 hours.
        # This category is also a catch-all for Graveyard Orbits.  Sates and debris are often boosted into higher orbits to effective get them out of the way
        # while items in leo orbit tend to eventually become defeated by the forces of gravity and naturally deorbit and burn up in the atmosphere.
        return 'High Elliptical / Deep'
    else:
        # Nothing else applied, avoid a crash in the math.
        return 'Unknown'
    
def categorize_object(row):
    if row['source'] == 'both':
        # The satellite appears in both datasets.  This means it is an active satellite because it appears in both the the active satellite 
        # catelog from UCS (the then), but also in the satcat from celestrak that shows ALL tracked satellites, debris, unknown, etc known to date (the now).
        return 'Active Satellite'
    elif row['object_type'] == 'PAY':
        return 'Inactive Satellite' # Payload but not in UCS = Dead
    elif row['object_type'] == 'R/B':
        return 'Rocket Body' # Spent boosters
    elif row['object_type'] == 'DEB':
        return 'Debris' # Fragments/Shrapnel
    else:
        # Nothing else applied, avoid a crash in the math.
        return 'Unknown'

In [2]:
debris_mapping = {
    'OBJECT_NAME': 'object_name',
    'OBJECT_ID': 'object_id',          
    'NORAD_CAT_ID': 'norad_id',        
    'OBJECT_TYPE': 'object_type',      
    'OPS_STATUS_CODE': 'ops_status',   
    'OWNER': 'owner',
    'LAUNCH_DATE': 'launch_date',
    'LAUNCH_SITE': 'launch_site',
    'DECAY_DATE': 'decay_date',
    'PERIOD': 'period_minutes',
    'INCLINATION': 'inclination_degrees',
    'APOGEE': 'apogee_km',
    'PERIGEE': 'perigee_km',
    'RCS': 'rcs',                      
    'DATA_STATUS_CODE': 'data_status', 
    'ORBIT_CENTER': 'orbit_center',    
    'ORBIT_TYPE': 'orbit_type'         
}

df_debris.rename(columns=debris_mapping, inplace=True)

current_junk = df_debris[df_debris['decay_date'].isnull()].copy()
current_junk['orbit_class'] = current_junk.apply(classify_orbit, axis=1)
current_junk['launch_date'] = pd.to_datetime(current_junk['launch_date'], errors='coerce')
current_junk['launch_year'] = current_junk['launch_date'].dt.year

merged_data = current_junk.merge(
    ucs_data[['norad_id', 'users', 'purpose', 'launch_mass_kg']], 
    on='norad_id', 
    how='left', 
    indicator='source'
)

merged_data['category'] = merged_data.apply(categorize_object, axis=1)

merged_data['launch_mass_kg'] = merged_data['launch_mass_kg'].replace(0, np.nan)

print("Total Objects in Orbit:", len(merged_data))
print("\nComposition of our Skies:")
print(merged_data['category'].value_counts())
print("\nWhere is it located?")
print(merged_data['orbit_class'].value_counts())

Total Objects in Orbit: 32695

Composition of our Skies:
category
Debris                12662
Inactive Satellite    11978
Active Satellite       5610
Rocket Body            2397
Unknown                  48
Name: count, dtype: int64

Where is it located?
orbit_class
LEO                       26616
MEO                        3603
GEO                        1545
Unknown                     615
High Elliptical / Deep      316
Name: count, dtype: int64


In [3]:
# make a copy of the data to play around with
has_mass = merged_data.loc[merged_data['launch_mass_kg'] > 0].copy()

has_mass

Unnamed: 0,object_name,object_id,norad_id,object_type,ops_status,owner,launch_date,launch_site,decay_date,period_minutes,...,data_status,orbit_center,orbit_type,orbit_class,launch_year,users,purpose,launch_mass_kg,source,category
1674,OSCAR 7 (AO-7),1974-089B,7530,PAY,P,US,1974-11-15,AFWTR,,114.86,...,,EA,ORB,LEO,1974,Civil,Communications,29.0,both,Active Satellite
4729,TDRS 3,1988-091B,19548,PAY,+,US,1988-09-29,AFETR,,1436.23,...,,EA,ORB,GEO,1988,Government,Communications,3180.0,both,Active Satellite
4912,FLTSATCOM 8 (USA 46),1989-077A,20253,PAY,+,US,1989-09-25,AFETR,,1436.06,...,,EA,ORB,GEO,1989,Military,Communications,2310.0,both,Active Satellite
5027,HST,1990-037B,20580,PAY,+,US,1990-04-24,AFETR,,94.21,...,,EA,ORB,LEO,1990,Government,Space Science,11110.0,both,Active Satellite
5089,SKYNET 4C,1990-079A,20776,PAY,+,UK,1990-08-30,FRGUI,,1436.07,...,,EA,ORB,GEO,1990,Military,Communications,1474.0,both,Active Satellite
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22719,STARLINK-5549,2023-058AV,56360,PAY,+,US,2023-04-27,AFWTR,,95.64,...,,EA,ORB,LEO,2023,Commercial,Communications,300.0,both,Active Satellite
22720,STARLINK-5550,2023-058AW,56361,PAY,+,US,2023-04-27,AFWTR,,95.92,...,,EA,ORB,LEO,2023,Commercial,Communications,300.0,both,Active Satellite
22721,STARLINK-5972,2023-058AX,56362,PAY,+,US,2023-04-27,AFWTR,,95.64,...,,EA,ORB,LEO,2023,Commercial,Communications,300.0,both,Active Satellite
28104,FLOCK 4G-24,2025-009AE,62637,PAY,+,US,2025-01-14,AFWTR,,94.35,...,,EA,ORB,LEO,2025,Commercial,Communications,260.0,both,Active Satellite


### **Addressing the 'Mass Transparency Gap'**

During exploratory analysis, it was discovered that approximately 77% of the tracked objects in the SATCAT dataset are assigned a mass of `0.0 kg`. Since a zero-mass physical object is an impossibility in orbital mechanics, these values represent missing data rather than actual measurements.

**Our Two-Tiered Correction Strategy:**
* **Active Payloads:** For satellites present in the UCS dataset, we have already applied a sophisticated fill using medians grouped by **Class of Orbit** and **Purpose** to ensure realistic mass attribution.
* **Debris & Unlisted Objects:** For the remaining objects (primarily debris and rocket bodies), we are converting the impossible `0.0` values to `NaN` (Not a Number) to acknowledge them as missing data.

**The Impact of this Cleaning Step:**
* **Statistical Accuracy:** Standard calculations like `.mean()` or `.std()` will now automatically ignore these missing values instead of being skewed by physically impossible zeros.
* **Visualization Integrity:** Histograms and boxplots will accurately reflect the distribution of *known and estimated* masses without an artificial outlier spike at the zero line.
* **Physics Modeling:** This creates a clean starting point for future kinetic energy calculations and **RCS-Based Modeling**, where we will attempt to estimate the mass of these remaining `NaN` fragments using their radar cross-section.

In [4]:
if 'launch_mass_kg' in merged_data.columns:
    merged_data['launch_mass_kg'] = merged_data['launch_mass_kg'].replace(0, np.nan)

In [5]:
merged_data.to_csv('../data/clean/orbital_clutter_cleaned.csv', index=False)

print("File saved successfully!")

File saved successfully!


Future considerations: Consider finding orbital data for astroids and other objects not originating from earth (not man made but naturally occurring).
This includes things like astroids, comets, and other 'space rocks' that could be hazardous to satellites orbiting at high speeds.