# Input Data Processing

This notebook serves the following purposes:

1. Read JSON data from space-track.org
2. Read the CSV SATCAT catalog from celestrak
3. Based on above data estimate mass and radius (0.5 * characteristic length), get activity state
4. Propagate all the satellites to the same point in time
5. Investigate data and clean-up unwanted data. Then store.

## Input files
- CSV data from the [CelesTrak SATCAT catalog](https://celestrak.com/pub/satcat.csv) following this [format](https://celestrak.com/satcat/satcat-format.php)
- JSON data from the [Space-Track.org JSON Full catalog](https://www.space-track.org/basicspacedata/query/class/gp/decay_date/null-val/epoch/%3Enow-30/orderby/norad_cat_id/format/json) following their [format](https://www.space-track.org/documentation#/tle)

## Output files

- Satellite data in CSV format with data on Satellite ID, Name, Position, Velocity, Mass, Radius (0.5 * characteristic length), BSTAR and Activity State

In [None]:
### Imports
%load_ext autoreload
%autoreload 2

# Append main folder
import sys
sys.path.append("../")

import pykep as pk
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from tqdm import tqdm 

starting_t = pk.epoch_from_string('2022-01-01 00:00:00.000')
lower_cutoff_in_km = 6371 + 175 # Earth radius + ...
higher_cutoff_in_km = 6371 + 2000
np.random.seed(42)

## 1 Read JSON data

In [None]:
import json
with open("../data/spacetrack.json", "r") as file:
    satellites = json.load(file)
print("Satellites number in json: ", len(satellites))

## 2. Read SATCAT data

In [None]:
# Read satcat data from celestrak
satcat = pd.read_csv("../data/satcat.csv")
satcat

## 3. Compute mass and radius (characteristic length) and get status

This follows the formulas from 
Nicholas L Johnson, Paula H Krisko, J-C Liou, and Phillip D Anz-Meador.
Nasa’s new breakup model of evolve 4.0. Advances in Space Research, 28(9):1377–
1384, 2001.

According to space-track , RCS small, medium and large are, respectively < 0.1 , 0.1 < RCS < 1.0 and 1.0 < RCS. For simplicity using above formula we convert this to 15cm, 55cm, 200cm

We get activity status from the celestrak data following https://celestrak.com/satcat/status.php

In [None]:
small_RCS = satcat[satcat["RCS"] < 0.1]
small_mean = small_RCS["RCS"].mean()
small_std = small_RCS["RCS"].std()
print("Small RCS mean/std=",small_mean,"/",small_std)

medium_RCS = satcat[(satcat["RCS"] > 0.1) & (satcat["RCS"] < 1.0)]
medium_mean = medium_RCS["RCS"].mean()
medium_std = medium_RCS["RCS"].std()
print("Medium RCS mean/std=",medium_mean,"/",medium_std)

large_RCS = satcat[satcat["RCS"] > 1.0]
large_mean = large_RCS["RCS"].mean()
large_std = large_RCS["RCS"].std()
print("Large RCS mean/std=",large_mean,"/",large_std)

In [None]:
sats_with_info = []
use_TLE = True
sampled_radii = []

for sat in tqdm(satellites):
    
    satcat_sat = satcat[satcat["COSPAR_ID"] == sat["COSPAR_ID"]]
    
    # Skip decayed ones or ones not in celestrak
    if len(satcat_sat) == 0 or satcat_sat["OPS_STATUS_CODE"].values == "D":
        continue
    
    # Choose L_C via sampling from a normal distribution and lower bounds. 
    # Bounds and distribution parameters are chosen according to satcat size category.
    if not np.isnan(satcat_sat["RCS"].values):
        sat["RADIUS"] = np.sqrt(float(satcat_sat["RCS"].values) / np.pi)
    else:
        if sat["RCS_SIZE"] == "SMALL":
            sat["RADIUS"] = np.sqrt(np.maximum(0.005**2,np.random.normal(small_mean,small_std) / np.pi))
        elif sat["RCS_SIZE"] == "MEDIUM":
            sat["RADIUS"] = np.sqrt(np.maximum(0.1 / np.pi,np.random.normal(medium_mean,medium_std) / np.pi))
        elif sat["RCS_SIZE"] == "LARGE":
            sat["RADIUS"] = np.sqrt(np.maximum(1.0 / np.pi,np.random.normal(large_mean,large_std) / np.pi))
        else:
            # skip if no info was found
            continue
        sampled_radii.append(sat["RADIUS"])
            
    # Determine Mass
    if sat["RADIUS"] > 0.01:
        sat["MASS"] = 4 / 3 * np.pi *(sat["RADIUS"])**3 * 92.937 * (sat["RADIUS"]*2)**(-0.74)
    else:
        sat["MASS"] = 4 / 3 * np.pi *(sat["RADIUS"])**3 * 2698.9
        
        
    # Determine if active satellite
    if satcat_sat["OPS_STATUS_CODE"].values in ["+","P","B","S","X"]:
        sat["TYPE"] = "evasive"
    else:
        sat["TYPE"] = "passive"
    
    # Add planet
    t0 = pk.epoch_from_string(sat["EPOCH"].replace("T"," "))
    if use_TLE:
        try:
            line1 = sat["TLE_LINE1"]
            line2 = sat["TLE_LINE2"]
            planet = pk.planet.tle(line1, line2)
        except RuntimeError:
            print("Error reading \n",line1,"\n",line2)
    else:
        elements = [float(sat["SEMIMAJOR_AXIS"]) * 1000.,
                    float(sat["ECCENTRICITY"]),
                    float(sat["INCLINATION"]) * pk.DEG2RAD,
                    float(sat["RA_OF_ASC_NODE"]) * pk.DEG2RAD,
                    float(sat["ARG_OF_PERICENTER"]) * pk.DEG2RAD,
                    float(sat["MEAN_ANOMALY"]) * pk.DEG2RAD,
                   ]
        planet = pk.planet.keplerian(t0,elements,pk.MU_EARTH,6.67430e-11*sat["MASS"],sat["RADIUS"] / 2,sat["RADIUS"] / 2)
    sat["PLANET"] = planet
    
    sats_with_info.append(sat)
    
print("Now we have a total of ",len(sats_with_info), "sats.")

### Plot radius distro and some examples

In [None]:
fig = plt.figure(figsize=(6,6),dpi=100)
satcat["RCS"].hist(log=True,bins=100)
# plt.xscale("log")
plt.xlabel("RCS");
plt.ylabel("Count");

In [None]:
sampled_rcs = np.array(sampled_radii) * np.array(sampled_radii) *np.pi
fig = plt.figure(figsize=(6,6),dpi=100)
plt.hist(sampled_rcs,log=True,bins=100);
# plt.xscale("log")
plt.xlabel("RCS");
plt.ylabel("Count");

In [None]:
hist, bins, _ = plt.hist(sampled_radii, bins=100);
fig = plt.figure(figsize=(6,6),dpi=100)
logbins = np.logspace(np.log10(bins[0]),np.log10(bins[-1]),len(bins))
plt.hist(sampled_radii,log=True,bins=logbins);
plt.xscale("log")
plt.xlabel("Radius [m]");
plt.ylabel("Count");
plt.xticks([0.005,0.1,1.0,10]);
print("Maximum sampled length ",np.max(sampled_radii))

In [None]:
fig = plt.figure(figsize=(6,6),dpi=100)
ax = plt.axes(projection='3d');
for i in range (10):
    pk.orbit_plots.plot_planet(sats_with_info[i]["PLANET"],axes=ax)

## 4. Propagate all objects to t and discard too low and high ones

In [None]:
objects = []
altitudes = []
count_too_low = 0
count_too_high = 0
count_decayed = 0
for sat in sats_with_info:
    try:
        planet = sat["PLANET"]
        pos,v = planet.eph(starting_t)
        
        # convert to km and numpy
        pos = np.asarray(pos) / 1000.0 
        v = np.asarray(v) / 1000.0
        sma,_,_,_,_,_ = pk.ic2par(pos * 1000,v *1000,mu=pk.MU_EARTH)
        
        altitude = np.linalg.norm(pos)
        if altitude < lower_cutoff_in_km:
            count_too_low += 1
            continue
        if sma / 1000. > higher_cutoff_in_km or altitude > higher_cutoff_in_km:
            count_too_high += 1
            continue
        
        altitudes.append(altitude)
        
        objects.append({"COSPAR_ID": sat["COSPAR_ID"],
                        "NAME": sat["OBJECT_NAME"],
                        "BSTAR[1 / Earth Radius]": sat["BSTAR"],
                        "R": tuple(pos),
                        "V": tuple(v),
                        "M[kg]": sat["MASS"],
                        "RADIUS[m]": sat["RADIUS"],
                        "TYPE": sat["TYPE"]
                       })
    except RuntimeError as e:
        count_decayed += 1
        print(e, " propagating ",planet.name)
        
print("Successfully propagated ",len(objects)," objects.")
print(count_decayed, "decayed.")
print(count_too_low," had a too small altitude")
print(count_too_high," had a too high altitude")

## 5. Plot, clean up and store results

In [None]:
altitudes = np.array(altitudes)-6371
print(min(altitudes))
print(max(altitudes))

In [None]:
fig = plt.figure(figsize=(6,6),dpi=100)
plt.hist(altitudes,bins=100);
plt.xlabel("Altitude [km]")
plt.ylabel("Counts")
plt.yscale("log")

In [None]:
fig = plt.figure(figsize=(6,6),dpi=100)
ax = plt.axes(projection='3d');

positions = np.array([obj["R"] for obj in objects])
velocities = np.array([obj["V"] for obj in objects])

ax.scatter(positions[:,0],positions[:,1],positions[:,2],".",alpha=0.25)

In [None]:
# Convert to pandas dataframe and drop ISS and any duplicate entries.
df = pd.DataFrame(objects)
df = df.drop(np.argmax(df["NAME"] == "ISS (ZARYA)"))
df = df.drop(df[df.NAME.str.startswith('STARLINK')].index)
df = df.drop(df[df.NAME.str.startswith('ONEWEB')].index)
df = df.drop_duplicates(subset=['R'])
df
df = df.reset_index(drop=True)
df.index.name = "ID"

In [None]:
# new df from the column of lists
split_df = pd.DataFrame(df['R'].tolist(), columns=['r_x[km]', 'r_y[km]', 'r_z[km]'])
df = pd.concat([df, split_df], axis=1,)

split_df_v = pd.DataFrame(df['V'].tolist(), columns=['v_x[km]', 'v_y[km]', 'v_z[km]'])
df = pd.concat([df, split_df_v], axis=1)

df = df.drop(columns="R")
df = df.drop(columns="V")

# display df
df

In [None]:
# Write to csv
df.to_csv("../data/initial_population.csv")