# Launches Data Preprocessing Notebook 🚀👽🏰

[!WARNING]  
Download the data before running this notebook. Check the documentation.

## Data Loading 📥

In this cell, we load the launches raw datasets:

- **Launches Data:** Loaded from a JSON file.
- **Launchpads Data:** Loaded from a JSON file.

In [4]:
import pandas as pd
import numpy as np
import os
import json

# ============================================================
# SETUP: Define output directory relative to this script
# ============================================================
# Get the absolute path of the directory where this script is located
# In a notebook, __file__ is not defined so we use os.getcwd() as a fallback.
try:
    BASE_DIR = os.path.dirname(os.path.abspath(__file__))
except NameError:
    BASE_DIR = os.getcwd()

# Define the folder where raw data is stored (assumed to be "../data/raw")
RAW_DIR = os.path.join(BASE_DIR, "..", "data", "raw")

# Define the folder where processed data will be saved (assumed to be "../data/processed")
PROCESSED_DIR = os.path.join(BASE_DIR, "..", "data", "processed")
os.makedirs(PROCESSED_DIR, exist_ok=True)  # Create the folder if it doesn't exist


# Build the absolute paths for each dataset
spacex_launches_path = os.path.join(RAW_DIR, "spacex_launches.json")
spacex_launchpads_path = os.path.join(RAW_DIR, "spacex_launchpads.json")

# Load SpaceX launches data from JSON using the absolute path
with open(spacex_launches_path, "r") as f:
    launches = json.load(f)

# Load SpaceX launches data from JSON using the absolute path
with open(spacex_launchpads_path, "r") as f:
    launchpads = json.load(f)

print("✅ Datasets loaded!")

✅ Datasets loaded!


## Assigning Latitude and Longitude and Fixing Datetime Format for Lauches and Lauchpads Data 📆📍

The following cells will execute:

1. Manually creates latitude and longitude data based on reseach and google maps.
2. Creates a function to standarize datetime.
3. For each launch, assign a specific latitude and longitude based on the launchpad_id.
4. For each launchpad, assign a specific latitude and longitude based on its id.
5. Save files.

In [5]:
# Since there are only 5 launchpads we are going to manually address their latitude and longitude
# Latitude and longitude extracted from google maps
# dict_lat_lon = {launchpad_id: (lat, lon)}
lat_lon = {
    "5e9e4501f5090910d4566f83": (34.64039313749651, -120.58939191725148),
    "5e9e4501f509094ba4566f84": (28.562259557233922, -80.57734574817835),
    "5e9e4502f5090995de566f86": (9.047966704576721, 167.74304956933793),
    "5e9e4502f509092b78566f87": (34.63213232147929, -120.61065970375805),
    "5e9e4502f509094188566f88": (28.609361333259177, -80.60464369777084),
}

In [6]:
from datetime import datetime

def convert_datetime(iso_str):
    """
    Convert an ISO datetime string like "2006-03-24T22:30:00.000Z"
    to the format "MM/DD/YYYY HH:MM".
    """
    try:
        dt = datetime.strptime(iso_str, "%Y-%m-%dT%H:%M:%S.%fZ")
    except ValueError:
        # Fallback if microseconds are not provided:
        dt = datetime.strptime(iso_str, "%Y-%m-%dT%H:%M:%SZ")
    return dt.strftime("%m/%d/%Y %H:%M")

In [7]:
# Process each launch record
for launch in launches:
    # Convert the datetime from ISO to "MM/DD/YYYY HH:MM" format
    if "date" in launch:
        launch["date"] = convert_datetime(launch["date"])
    
    # Assign latitude and longitude based on launchpad_id if available in the dictionary
    launchpad_id = launch.get("launchpad_id")
    launch["lat"], launch["lon"] = lat_lon[launchpad_id]

print("SpaceX launches updated with lat/lon! 📍")

SpaceX launches updated with lat/lon! 📍


In [8]:
# Process each launchpad record
for launchpad in launchpads:
    lp_id = launchpad.get("id")
    if lp_id in lat_lon:
        launchpad["lat"], launchpad["lon"] = lat_lon[lp_id]
    else:
        # Assign none if there's no id match
        launchpad["lat"], launchpad["lon"] = None, None
        
print("SpaceX launchpads updated with lat/lon! 📍")

SpaceX launchpads updated with lat/lon! 📍


## Saving the Processed Lauches and Launchpads Data 💾

In [9]:
# Build the absolute path for the output file
launches_processed_path = os.path.join(PROCESSED_DIR, "launches_processed.json")
launchpads_processed_path = os.path.join(PROCESSED_DIR, "launchpads_processed.json")

# Save the updated launches back to a JSON file (no spaces, no identation)
with open(launches_processed_path, "w") as f:
    json.dump(launches, f, separators=(',', ':'))

print("💾 Saved processed launches dataset!")

# Save the updated launchpads back to a JSON file (no spaces, no identation)
with open(launchpads_processed_path, "w") as f:
    json.dump(launchpads, f, separators=(',', ':'))

print("💾 Saved processed launchpads dataset!")

💾 Saved processed launches dataset!
💾 Saved processed launchpads dataset!
