# Launches Data Preprocessing Notebook 🚀👽🏰

<div class="alert alert-block alert-info">
⚠️ Download the data before running this notebook. Check the documentation.
</div>

## Data Loading 📥

In this cell, we load the launches raw dataset:

- **Launches Data:** Loaded from a JSON file.

In [16]:
import os
import json

# ============================================================
# SETUP: Define output directory relative to this script
# ============================================================
# Get the absolute path of the directory where this script is located
# In a notebook, __file__ is not defined so we use os.getcwd() as a fallback.
try:
    BASE_DIR = os.path.dirname(os.path.abspath(__file__))
except NameError:
    BASE_DIR = os.getcwd()

# Define the folder where raw data is stored (assumed to be "../data/raw")
RAW_DIR = os.path.join(BASE_DIR, "..", "data", "raw")

# Define the folder where processed data will be saved (assumed to be "../data/processed")
PROCESSED_DIR = os.path.join(BASE_DIR, "..", "data", "processed")
os.makedirs(PROCESSED_DIR, exist_ok=True)  # Create the folder if it doesn't exist
OUTPUT_FILE = os.path.join(PROCESSED_DIR, "spacedevs_launches_processed.json") # Build the absolute path for the output file

# Build the absolute paths for each dataset
launches_path = os.path.join(RAW_DIR, "spacedevs_launches.json")

# Load SpaceX launches data from JSON using the absolute path
with open(launches_path, "r") as f:
    launches = json.load(f)

# Extract the list of launches; ignore the "offset" for cleaning purposes.
results = launches.get("results", [])
print(f"Loaded {len(results)} launches from raw data.")

print("✅ Datasets loaded!")

Loaded 2700 launches from raw data.
✅ Datasets loaded!


## Fixing Datetime Format for Lauches Data 📆📍

In [17]:
from datetime import datetime

def convert_datetime(iso_str):
    """
    Convert an ISO datetime string like "2006-03-24T22:30:00.000Z"
    to the format "MM/DD/YYYY HH:MM".
    """
    try:
        dt = datetime.strptime(iso_str, "%Y-%m-%dT%H:%M:%S.%fZ")
    except ValueError:
        # Fallback if microseconds are not provided:
        dt = datetime.strptime(iso_str, "%Y-%m-%dT%H:%M:%SZ")
    return dt.strftime("%m/%d/%Y %H:%M")

## Selecting Only Relevant Variables From Raw Data 📆📍

In [18]:
# Create list to append processed launches
processed_launches = []

for launch in results:
    processed = {}
    processed["id"] = launch.get("id")
    processed["name"] = launch.get("name")
    
    net = launch.get("net")
    processed["net"] = convert_datetime(net) if net else None
    
    # Extract status name from the nested status object, using a fallback empty dict if needed.
    status = launch.get("status") or {}
    processed["status"] = status.get("abbrev")
    
    # Extract rocket id from the nested rocket object.
    rocket = launch.get("rocket") or {}
    processed["rocket_id"] = rocket.get("id")
    
    # Extract pad details: id, name, latitude, and longitude.
    pad = launch.get("pad") or {}
    processed["pad"] = {
        "id": pad.get("id"),
        "name": pad.get("name"),
        "latitude": pad.get("latitude"),
        "longitude": pad.get("longitude")
    }
    
    # Extract mission details: name, type, and description.
    mission = launch.get("mission") or {}
    processed["mission"] = {
        "name": mission.get("name"),
        "type": mission.get("type"),
        "description": mission.get("description")
    }
    
    # Extract launch service provider info.
    lsp = launch.get("launch_service_provider") or {}
    processed["launch_service_provider"] = {
        "id": lsp.get("id"),
        "name": lsp.get("name")
    }
    
    # Keep the URL for more info.
    processed["url"] = launch.get("url")
    
    processed_launches.append(processed)

print(f"✅ Preprocessed {len(processed_launches)} launches!")

✅ Preprocessed 2700 launches!


## Saving the Processed Lauches and Launchpads Data 💾

In [19]:
# Save the updated launches back to a JSON file (no spaces, no identation)
with open(OUTPUT_FILE, "w") as f:
    json.dump(processed_launches, f, separators=(',', ':'))

print("💾 Saved processed launches dataset!")

💾 Saved processed launches dataset!


## Making Some Preliminar Analysis on the Processed Launches 📉