
# SpaceX API Calls — Data Collection & Normalization Notebook

**Author:** _Your Name_  
**Course:** IBM Data Science Professional Certificate — Capstone  
**Objective:** Collect Falcon 9/Heavy launch data from the SpaceX REST API, normalize to tabular form, enrich with reference endpoints, and persist to CSV/SQLite for downstream EDA and ML.

> This notebook contains both **code cells** and **output preview cells** (to generate when you run). Commit the executed version to GitHub with outputs visible for peer review.


In [1]:

# Core libraries
import requests
import pandas as pd
import numpy as np
from pathlib import Path

# Optional: SQLite persistence
import sqlite3

pd.set_option('display.max_columns', 100)
print("Libraries imported.")

Libraries imported.


## 1) Define SpaceX API Endpoints

In [2]:

BASE = "https://api.spacexdata.com/v4"
ENDPOINTS = {
    "launches": f"{BASE}/launches",
    "launchpads": f"{BASE}/launchpads",
    "rockets": f"{BASE}/rockets",
    "payloads": f"{BASE}/payloads",
    "landpads": f"{BASE}/landpads"
}
ENDPOINTS

{'launches': 'https://api.spacexdata.com/v4/launches',
 'launchpads': 'https://api.spacexdata.com/v4/launchpads',
 'rockets': 'https://api.spacexdata.com/v4/rockets',
 'payloads': 'https://api.spacexdata.com/v4/payloads',
 'landpads': 'https://api.spacexdata.com/v4/landpads'}

## 2) Helper: GET function with status check

In [3]:

def get_json(url, params=None):
    r = requests.get(url, params=params, timeout=60)
    r.raise_for_status()
    return r.json()

print("Helper ready.")

Helper ready.


## 3) Fetch raw data from REST endpoints

In [4]:

launches_raw = get_json(ENDPOINTS["launches"])
launchpads_raw = get_json(ENDPOINTS["launchpads"])
rockets_raw = get_json(ENDPOINTS["rockets"])
payloads_raw = get_json(ENDPOINTS["payloads"])
landpads_raw = get_json(ENDPOINTS["landpads"])

print(len(launches_raw), "launch records")
print(len(launchpads_raw), "launchpads")
print(len(rockets_raw), "rockets")
print(len(payloads_raw), "payloads")
print(len(landpads_raw), "landpads")

205 launch records
6 launchpads
4 rockets
225 payloads
7 landpads


## 4) Normalize JSON to DataFrame

In [5]:

df_launches = pd.json_normalize(launches_raw)
df_launches.head(3)

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,net,window,rocket,success,failures,details,crew,ships,capsules,payloads,launchpad,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,auto_update,tbd,launch_library_id,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 33, 'altitude': None, 'reason': 'mer...",Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight'...",True,False,,5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/94/f2/NN6Ph45r_o.png,https://images2.imgbox.com/5b/02/QcxHUb5V_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-fa...,https://en.wikipedia.org/wiki/DemoSat,
1,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 301, 'altitude': 289, 'reason': 'har...",Successful first stage burn and transition to ...,[],[],[],[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86,2,DemoSat,2007-03-21T01:10:00.000Z,1174439400,2007-03-21T13:10:00+12:00,hour,False,"[{'core': '5e9e289ef35918416a3b2624', 'flight'...",True,False,,5eb87cdaffd86e000604b32b,False,False,False,[],https://images2.imgbox.com/f9/4a/ZboXReNb_o.png,https://images2.imgbox.com/80/a2/bkWotCIS_o.png,,,,,[],[],,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,https://www.space.com/3590-spacex-falcon-1-roc...,https://en.wikipedia.org/wiki/DemoSat,
2,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 140, 'altitude': 35, 'reason': 'resi...",Residual stage 1 thrust led to collision betwe...,[],[],[],"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006e...",5e9e4502f5090995de566f86,3,Trailblazer,2008-08-03T03:34:00.000Z,1217734440,2008-08-03T15:34:00+12:00,hour,False,"[{'core': '5e9e289ef3591814873b2625', 'flight'...",True,False,,5eb87cdbffd86e000604b32c,False,False,False,[],https://images2.imgbox.com/6c/cb/na1tzhHs_o.png,https://images2.imgbox.com/4a/80/k1oAkY0k_o.png,,,,,[],[],,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,http://www.spacex.com/news/2013/02/11/falcon-1...,https://en.wikipedia.org/wiki/Trailblazer_(sat...,


## 5) Build reference maps (IDs → names/fields)

In [6]:

# Launchpads
df_launchpads = pd.json_normalize(launchpads_raw)
launchpad_name = df_launchpads.set_index('id')['name'].to_dict()
launchpad_fullname = df_launchpads.set_index('id')['full_name'].to_dict()

# Rockets
df_rockets = pd.json_normalize(rockets_raw)
rocket_name = df_rockets.set_index('id')['name'].to_dict()

# Payloads: mass_kg + orbit
df_payloads = pd.json_normalize(payloads_raw)
payload_mass = df_payloads.set_index('id')['mass_kg'].to_dict()
payload_orbit = df_payloads.set_index('id')['orbit'].to_dict()

# Landpads (landing zones)
df_landpads = pd.json_normalize(landpads_raw)
landpad_name = df_landpads.set_index('id')['name'].to_dict()

print("Reference maps built.")

Reference maps built.


## 6) Derive tidy tabular dataset (selected & engineered fields)

In [7]:

def first_or_none(lst):
    return lst[0] if isinstance(lst, list) and len(lst) > 0 else None

# Build rows
rows = []
for rec in launches_raw:
    # basic
    flight_number = rec.get('flight_number')
    date_utc = rec.get('date_utc')
    rocket_id = rec.get('rocket')
    launchpad_id = rec.get('launchpad')
    payload_ids = rec.get('payloads', [])
    cores = rec.get('cores', [])
    name = rec.get('name')
    
    # resolve names
    rocket = rocket_name.get(rocket_id)
    launch_site = launchpad_name.get(launchpad_id) or launchpad_fullname.get(launchpad_id)
    
    # payload mass & orbit: sum masses (if multiple payloads), pick first orbit (typical for this project)
    masses = [payload_mass.get(pid) for pid in payload_ids if pid in payload_mass]
    total_mass = float(np.nansum(masses)) if masses else np.nan
    orbits = [payload_orbit.get(pid) for pid in payload_ids if pid in payload_orbit]
    orbit = first_or_none(orbits)
    
    # landing outcome from first core (typical)
    core0 = first_or_none(cores) or {}
    landing_success = core0.get('landing_success')
    landing_type = core0.get('landing_type')
    landpad_id = core0.get('landpad')
    landpad = landpad_name.get(landpad_id) if landpad_id else None
    
    # Class label often used in project (1 = success, 0 = otherwise)
    # We treat landing_success True as success
    _class = 1 if landing_success is True else 0
    
    rows.append({
        'FlightNumber': flight_number,
        'MissionName': name,
        'DateUTC': date_utc,
        'Rocket': rocket,
        'LaunchSite': launch_site,
        'PayloadMass': total_mass,
        'Orbit': orbit,
        'LandingType': landing_type,
        'LandingPad': landpad,
        'Class': _class
    })

df = pd.DataFrame(rows)
df.sort_values('FlightNumber', inplace=True, ignore_index=True)
df.head(10)

Unnamed: 0,FlightNumber,MissionName,DateUTC,Rocket,LaunchSite,PayloadMass,Orbit,LandingType,LandingPad,Class
0,1,FalconSat,2006-03-24T22:30:00.000Z,Falcon 1,Kwajalein Atoll,20.0,LEO,,,0
1,2,DemoSat,2007-03-21T01:10:00.000Z,Falcon 1,Kwajalein Atoll,0.0,LEO,,,0
2,3,Trailblazer,2008-08-03T03:34:00.000Z,Falcon 1,Kwajalein Atoll,0.0,LEO,,,0
3,4,RatSat,2008-09-28T23:15:00.000Z,Falcon 1,Kwajalein Atoll,165.0,LEO,,,0
4,5,RazakSat,2009-07-13T03:35:00.000Z,Falcon 1,Kwajalein Atoll,200.0,LEO,,,0
5,6,Falcon 9 Test Flight,2010-06-04T18:45:00.000Z,Falcon 9,CCSFS SLC 40,0.0,LEO,,,0
6,7,COTS 1,2010-12-08T15:43:00.000Z,Falcon 9,CCSFS SLC 40,0.0,LEO,,,0
7,8,COTS 2,2012-05-22T07:44:00.000Z,Falcon 9,CCSFS SLC 40,525.0,LEO,,,0
8,9,CRS-1,2012-10-08T00:35:00.000Z,Falcon 9,CCSFS SLC 40,800.0,ISS,,,0
9,10,CRS-2,2013-03-01T19:10:00.000Z,Falcon 9,CCSFS SLC 40,677.0,ISS,,,0


## 7) Quick quality checks

In [8]:

print("Shape:", df.shape)
print("\nMissing values per column:")
print(df.isna().sum())

print("\nLaunch sites value counts:")
print(df['LaunchSite'].value_counts(dropna=False).head())

print("\nOrbits value counts:")
print(df['Orbit'].value_counts(dropna=False).head())

Shape: (205, 10)

Missing values per column:
FlightNumber     0
MissionName      0
DateUTC          0
Rocket           0
LaunchSite       0
PayloadMass     12
Orbit           13
LandingType     47
LandingPad      54
Class            0
dtype: int64

Launch sites value counts:
LaunchSite
CCSFS SLC 40       112
KSC LC 39A          58
VAFB SLC 4E         30
Kwajalein Atoll      5
Name: count, dtype: int64

Orbits value counts:
Orbit
VLEO    59
GTO     36
ISS     33
LEO     20
PO      15
Name: count, dtype: int64


## 8) Persist dataset to CSV & SQLite

In [9]:

out_dir = Path('data'); out_dir.mkdir(exist_ok=True, parents=True)
csv_path = out_dir / 'spacex_launches_clean.csv'
df.to_csv(csv_path, index=False)
print("Saved CSV ->", csv_path.resolve())

# SQLite (optional)
conn = sqlite3.connect('spacex.db')
df.to_sql('launches', conn, if_exists='replace', index=False)
conn.close()
print("Saved table 'launches' to spacex.db")

Saved CSV -> C:\Users\USER\Downloads\data\spacex_launches_clean.csv
Saved table 'launches' to spacex.db



## 9) Outcome Preview
- First 5 rows of the final dataset
- Column summary
- Basic distributions

In [10]:

display(df.head())
df.describe(include='all').T.head(20)

Unnamed: 0,FlightNumber,MissionName,DateUTC,Rocket,LaunchSite,PayloadMass,Orbit,LandingType,LandingPad,Class
0,1,FalconSat,2006-03-24T22:30:00.000Z,Falcon 1,Kwajalein Atoll,20.0,LEO,,,0
1,2,DemoSat,2007-03-21T01:10:00.000Z,Falcon 1,Kwajalein Atoll,0.0,LEO,,,0
2,3,Trailblazer,2008-08-03T03:34:00.000Z,Falcon 1,Kwajalein Atoll,0.0,LEO,,,0
3,4,RatSat,2008-09-28T23:15:00.000Z,Falcon 1,Kwajalein Atoll,165.0,LEO,,,0
4,5,RazakSat,2009-07-13T03:35:00.000Z,Falcon 1,Kwajalein Atoll,200.0,LEO,,,0


Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
FlightNumber,205.0,,,,102.814634,59.029112,1.0,52.0,103.0,154.0,203.0
MissionName,205.0,205.0,FalconSat,1.0,,,,,,,
DateUTC,205.0,199.0,2022-12-01T00:00:00.000Z,5.0,,,,,,,
Rocket,205.0,3.0,Falcon 9,195.0,,,,,,,
LaunchSite,205.0,4.0,CCSFS SLC 40,112.0,,,,,,,
PayloadMass,193.0,,,,6803.049482,5818.243746,0.0,1977.0,4707.0,13260.0,15712.0
Orbit,192.0,13.0,VLEO,59.0,,,,,,,
LandingType,158.0,3.0,ASDS,127.0,,,,,,,
LandingPad,151.0,6.0,OCISLY,64.0,,,,,,,
Class,205.0,,,,0.697561,0.460439,0.0,0.0,1.0,1.0,1.0


## 10) Convenience queries

In [11]:

print("How many launches came from CCAFS SLC 40?")
print((df['LaunchSite'] == 'CCAFS SLC 40').sum())

print("\nSuccess rate by LaunchSite:")
print(df.groupby('LaunchSite')['Class'].mean().sort_values(ascending=False))

print("\nLandingPad missing values:")
print(df['LandingPad'].isna().sum())

print("\nOrbit counts:")
print(df['Orbit'].value_counts())

How many launches came from CCAFS SLC 40?
0

Success rate by LaunchSite:
LaunchSite
KSC LC 39A         0.827586
VAFB SLC 4E        0.766667
CCSFS SLC 40       0.642857
Kwajalein Atoll    0.000000
Name: Class, dtype: float64

LandingPad missing values:
54

Orbit counts:
Orbit
VLEO     59
GTO      36
ISS      33
LEO      20
PO       15
SSO      13
MEO       8
GEO       2
TLI       2
ES-L1     1
HCO       1
HEO       1
SO        1
Name: count, dtype: int64
