# Falcon 9 Landing Prediction  
## Step 1: API Data Collection

**Objective:**  
Collect launch and landing data from the SpaceX public API to build the initial dataset for landing success prediction.


In [1]:
import requests
import pandas as pd
from pathlib import Path

pd.set_option("display.max_columns", None)
pd.set_option("display.width", 120)


In [2]:
LAUNCHES_PAST_URL = "https://api.spacexdata.com/v4/launches/past"

response = requests.get(LAUNCHES_PAST_URL, timeout=60)
response.raise_for_status()
launch_data = response.json()

print("Launch records:", len(launch_data))
     

Launch records: 187


In [3]:
launches = pd.json_normalize(launch_data)

keep_cols = [
    "name",
    "date_utc",
    "success",
    "cores",
    "rocket",
    "payloads",
    "launchpad"
]

launches_df = launches[keep_cols].copy()
launches_df.head(3)


Unnamed: 0,name,date_utc,success,cores,rocket,payloads,launchpad
0,FalconSat,2006-03-24T22:30:00.000Z,False,"[{'core': '5e9e289df35918033d3b2623', 'flight'...",5e9d0d95eda69955f709d1eb,[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86
1,DemoSat,2007-03-21T01:10:00.000Z,False,"[{'core': '5e9e289ef35918416a3b2624', 'flight'...",5e9d0d95eda69955f709d1eb,[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86
2,Trailblazer,2008-08-03T03:34:00.000Z,False,"[{'core': '5e9e289ef3591814873b2625', 'flight'...",5e9d0d95eda69955f709d1eb,"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006e...",5e9e4502f5090995de566f86


In [4]:
def get_first_core_value(cores, key):
    # cores is typically a list of dicts; we want the first core’s value
    if not isinstance(cores, list) or len(cores) == 0:
        return None
    return cores[0].get(key)

launches_df["core_flight"]     = launches_df["cores"].apply(lambda x: get_first_core_value(x, "flight"))
launches_df["core_reused"]     = launches_df["cores"].apply(lambda x: get_first_core_value(x, "reused"))
launches_df["landing_attempt"] = launches_df["cores"].apply(lambda x: get_first_core_value(x, "landing_attempt"))
launches_df["landing_success"] = launches_df["cores"].apply(lambda x: get_first_core_value(x, "landing_success"))
launches_df["landing_type"]    = launches_df["cores"].apply(lambda x: get_first_core_value(x, "landing_type"))
launches_df["landpad"]         = launches_df["cores"].apply(lambda x: get_first_core_value(x, "landpad"))

launches_df[["name", "date_utc", "landing_attempt", "landing_success", "landing_type"]].head(10)


Unnamed: 0,name,date_utc,landing_attempt,landing_success,landing_type
0,FalconSat,2006-03-24T22:30:00.000Z,False,,
1,DemoSat,2007-03-21T01:10:00.000Z,False,,
2,Trailblazer,2008-08-03T03:34:00.000Z,False,,
3,RatSat,2008-09-28T23:15:00.000Z,False,,
4,RazakSat,2009-07-13T03:35:00.000Z,False,,
5,Falcon 9 Test Flight,2010-06-04T18:45:00.000Z,False,,
6,COTS 1,2010-12-08T15:43:00.000Z,False,,
7,COTS 2,2012-05-22T07:44:00.000Z,False,,
8,CRS-1,2012-10-08T00:35:00.000Z,False,,
9,CRS-2,2013-03-01T19:10:00.000Z,False,,


In [5]:
launches_df["rocket"].nunique(), launches_df["rocket"].head()


(3,
 0    5e9d0d95eda69955f709d1eb
 1    5e9d0d95eda69955f709d1eb
 2    5e9d0d95eda69955f709d1eb
 3    5e9d0d95eda69955f709d1eb
 4    5e9d0d95eda69955f709d1eb
 Name: rocket, dtype: object)

In [6]:
ROCKETS_URL = "https://api.spacexdata.com/v4/rockets/"

rocket_ids = launches_df["rocket"].unique()

rocket_map = {}

for rid in rocket_ids:
    r = requests.get(ROCKETS_URL + rid, timeout=60)
    r.raise_for_status()
    rocket_map[rid] = r.json()["name"]

rocket_map


{'5e9d0d95eda69955f709d1eb': 'Falcon 1',
 '5e9d0d95eda69973a809d1ec': 'Falcon 9',
 '5e9d0d95eda69974db09d1ed': 'Falcon Heavy'}

In [7]:
launches_df["rocket_name"] = launches_df["rocket"].map(rocket_map)

launches_df["rocket_name"].value_counts(dropna=False)


rocket_name
Falcon 9        179
Falcon 1          5
Falcon Heavy      3
Name: count, dtype: int64

In [8]:
falcon9_df = launches_df[launches_df["rocket_name"] == "Falcon 9"].copy()

falcon9_df.shape


(179, 14)

In [9]:
falcon9_df["Class"] = falcon9_df["landing_success"].apply(
    lambda x: 1 if x is True else 0
)

falcon9_df[["landing_attempt", "landing_success", "Class"]].value_counts(dropna=False)


landing_attempt  landing_success  Class
True             True             1        142
False            NaN              0         24
True             False            0         11
                 NaN              0          2
Name: count, dtype: int64

In [10]:
from pathlib import Path

data_dir = Path("data")
data_dir.mkdir(exist_ok=True)

falcon9_df.to_csv(data_dir / "falcon9_step1_api_data.csv", index=False)


## Step 1 Summary – API Data Collection

In this step, I collected historical SpaceX launch data using the public SpaceX REST API.

Key actions completed:
- Retrieved all past SpaceX launches via the `/v4/launches/past` endpoint
- Flattened nested JSON data (especially core and landing information)
- Extracted landing-related features such as:
  - landing_attempt
  - landing_success
  - core reuse and flight number
- Filtered the dataset to Falcon 9 launches only
- Created a binary target variable (`Class`) representing landing success
- Saved the resulting dataset for downstream data wrangling and modeling

**Output dataset:**  
`data/falcon9_step1_api_data.csv`
