# Data Pipeline Refresh Playbook

**Purpose:** Re-run source ingestion and keep derived datasets/docs in sync when upstream CSVs change.

**Checklist:**
1) Pull latest raw sources into `data/original` (SATCAT, UCS, countries.geojson).
2) Quick diff: row counts + schema drift; snapshot versions/date ranges.
3) Re-run cleaning: `01_ucs_cleanup` ‚Üí `02_satcat_cleanup` ‚Üí `03_orbital_risk_synthesis`.
4) Regenerate outputs: `ucs_cleaned.csv`, `satcat_cleaned.csv`, `kinetic_master.csv`.
5) Refresh visuals: rerun plotting cells to update `/images` exports.
6) Update docs: README stats (objects, mass, KE, zombies, velocity) + figures captions if changed.
7) Log run metadata (source dates, hashes) in this notebook for traceability.

**Next step:** wire a small automation cell here to run the above sequence end-to-end.

In [1]:
import os
import requests
import pandas as pd
from bs4 import BeautifulSoup
from urllib.parse import urljoin

DATA_DIR = "../data/original"

# create the data/original folder if it doesnt already exist
os.makedirs(DATA_DIR, exist_ok=True)

# print a message to actually show the path so it can be verified
print(f"üìÇ Saving data to: {os.path.abspath(DATA_DIR)}")

üìÇ Saving data to: d:\repos\orbital-debris-assessment\data\original


### Fetch CelesTrak

**CelesTrak** SATCAT.csv

In [2]:
def fetch_celestrak():
    """
    Updates the local copy of satcat.csv
    """
    print("--- Fetching CelesTrak (SATCAT) ---")
    url = "https://celestrak.org/pub/satcat.csv"

    # join the file paths
    save_path = os.path.join(DATA_DIR, "satcat.csv")

    try:
        # use requests to download the file, use stream=True for large files
        response = requests.get(url, stream=True)
        
        # triggers an error if the link is broken
        response.raise_for_status()
        
        # no error has been thrown were good to save it.
        with open(save_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=1024):
                f.write(chunk)

        # output save directory.
        print(f"‚úÖ Success! SATCAT saved to: {save_path}")
    except Exception as e:
        # output the error message.
        print(f"‚ùå Error downloading CelesTrak: {e}")

In [3]:
fetch_celestrak()

--- Fetching CelesTrak (SATCAT) ---
‚úÖ Success! SATCAT saved to: ../data/original\satcat.csv
