# SpaceX Falcon 9 First Stage Landing Prediction: Data Collection

This notebook collects and preprocesses SpaceX Falcon 9 launch data from the SpaceX API. It transforms raw API data into a structured DataFrame and exports it for further analysis.

Key components:

*   API data retrieval functions (booster, launch site, payload, core data)
*   Data cleaning and preprocessing
*   Creation of a DataFrame with prepared data
*   Export to CSV file

## 1. Setup

In [1]:
import requests
import pandas as pd
import numpy as np
import datetime
from IPython.display import display

# Constants
SPACEX_API_URL = 'https://api.spacexdata.com/v4'
STATIC_JSON_URL = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/API_call_spacex_api.json'
DATE_CUTOFF = datetime.date(2020, 11, 13)

## 2. Define Helper Functions

In [2]:
def get_booster_version(data):
    """
    Fetch the booster version for each rocket in the dataset.

    Args:
    data (pd.DataFrame): DataFrame containing a 'rocket' column with rocket IDs.

    Returns:
    list: A list of booster versions corresponding to each rocket.
    """
    BoosterVersion = []
    for x in data['rocket']:
        if x:
            response = requests.get(f"{SPACEX_API_URL}/rockets/{x}").json()
            BoosterVersion.append(response['name'])

    return BoosterVersion


def get_launch_site(data):
    """
    Fetch launch site details for each launch in the dataset.

    Args:
    data (pd.DataFrame): DataFrame containing a 'launchpad' column with launchpad IDs.

    Returns:
    tuple: Three lists containing longitude, latitude, and name of each launch site.
    """
    Longitude, Latitude, LaunchSite = [], [], []
    for x in data['launchpad']:
        if x:
            response = requests.get(f"{SPACEX_API_URL}/launchpads/{x}").json()
            Longitude.append(response['longitude'])
            Latitude.append(response['latitude'])
            LaunchSite.append(response['name'])

    return Longitude, Latitude, LaunchSite


def get_payload_data(data):
    """
    Fetch payload details for each launch in the dataset.

    Args:
    data (pd.DataFrame): DataFrame containing a 'payloads' column with payload IDs.

    Returns:
    tuple: Two lists containing payload mass (kg) and orbit for each payload.
    """
    PayloadMass, Orbit = [], []
    for load in data['payloads']:
        if load:
            response = requests.get(f"{SPACEX_API_URL}/payloads/{load}").json()
            PayloadMass.append(response['mass_kg'])
            Orbit.append(response['orbit'])

    return PayloadMass, Orbit


def get_core_data(data):
    """
    Fetch core details for each launch in the dataset.

    Args:
    data (pd.DataFrame): DataFrame containing a 'cores' column with core details.

    Returns:
    tuple: Nine lists containing various core-related data for each launch.
    """
    Outcome, Flights, GridFins, Reused, Legs, LandingPad, Block, ReusedCount, Serial = [], [], [], [], [], [], [], [], []
    for core in data['cores']:
        if core['core']:
            response = requests.get(f"{SPACEX_API_URL}/cores/{core['core']}").json()
            Block.append(response['block'])
            ReusedCount.append(response['reuse_count'])
            Serial.append(response['serial'])
        else:
            Block.append(None)
            ReusedCount.append(None)
            Serial.append(None)
        Outcome.append(f"{core['landing_success']} {core['landing_type']}")
        Flights.append(core['flight'])
        GridFins.append(core['gridfins'])
        Reused.append(core['reused'])
        Legs.append(core['legs'])
        LandingPad.append(core['landpad'])

    return Outcome, Flights, GridFins, Reused, Legs, LandingPad, Block, ReusedCount, Serial

## 3. Fetch and Preprocess SpaceX Launch Data

In [3]:
# Fetch data from static JSON file
response = requests.get(STATIC_JSON_URL)
data = pd.json_normalize(response.json())

# Preprocess data
columns_to_keep = ['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']
data = data[columns_to_keep]
data = data[data['cores'].map(len) == 1]
data = data[data['payloads'].map(len) == 1]

data['cores'] = data['cores'].map(lambda x: x[0])
data['payloads'] = data['payloads'].map(lambda x: x[0])
data['date'] = pd.to_datetime(data['date_utc']).dt.date
data = data[data['date'] <= DATE_CUTOFF]

# Display first few rows of raw data
display(data.head())

Unnamed: 0,rocket,payloads,launchpad,cores,flight_number,date_utc,date
0,5e9d0d95eda69955f709d1eb,5eb0e4b5b6c3bb0006eeb1e1,5e9e4502f5090995de566f86,"{'core': '5e9e289df35918033d3b2623', 'flight':...",1,2006-03-24T22:30:00.000Z,2006-03-24
1,5e9d0d95eda69955f709d1eb,5eb0e4b6b6c3bb0006eeb1e2,5e9e4502f5090995de566f86,"{'core': '5e9e289ef35918416a3b2624', 'flight':...",2,2007-03-21T01:10:00.000Z,2007-03-21
3,5e9d0d95eda69955f709d1eb,5eb0e4b7b6c3bb0006eeb1e5,5e9e4502f5090995de566f86,"{'core': '5e9e289ef3591855dc3b2626', 'flight':...",4,2008-09-28T23:15:00.000Z,2008-09-28
4,5e9d0d95eda69955f709d1eb,5eb0e4b7b6c3bb0006eeb1e6,5e9e4502f5090995de566f86,"{'core': '5e9e289ef359184f103b2627', 'flight':...",5,2009-07-13T03:35:00.000Z,2009-07-13
5,5e9d0d95eda69973a809d1ec,5eb0e4b7b6c3bb0006eeb1e7,5e9e4501f509094ba4566f84,"{'core': '5e9e289ef359185f2b3b2628', 'flight':...",6,2010-06-04T18:45:00.000Z,2010-06-04


## 4. Collect Additional Data Using Helper Functions

In [4]:
# Use helper functions to gather more detailed information
BoosterVersion = get_booster_version(data)
Longitude, Latitude, LaunchSite = get_launch_site(data)
PayloadMass, Orbit = get_payload_data(data)
Outcome, Flights, GridFins, Reused, Legs, LandingPad, Block, ReusedCount, Serial = get_core_data(data)

## 5. Create and Process the Final DataFrame

In [5]:
# Combine all collected data into dictionary
launch_dict = {
    'FlightNumber': list(data['flight_number']),
    'Date': list(data['date']),
    'BoosterVersion': BoosterVersion,
    'PayloadMass': PayloadMass,
    'Orbit': Orbit,
    'LaunchSite': LaunchSite,
    'Outcome': Outcome,
    'Flights': Flights,
    'GridFins': GridFins,
    'Reused': Reused,
    'Legs': Legs,
    'LandingPad': LandingPad,
    'Block': Block,
    'ReusedCount': ReusedCount,
    'Serial': Serial,
    'Longitude': Longitude,
    'Latitude': Latitude
}

# Create DataFrame
df = pd.DataFrame(launch_dict)

# Filter for Falcon 9 launches
data_falcon9 = df[df['BoosterVersion'] != 'Falcon 1']
data_falcon9.loc[:, 'FlightNumber'] = list(range(1, data_falcon9.shape[0] + 1))

# Display first few rows of Falcon 9 data
display(data_falcon9.head())

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857


## 6. Handle Missing Values

In [6]:
# Check for missing values
print("Missing value count before imputation:")
display(data_falcon9.isnull().sum())

# Calculate mean payload mass
mean_payload_mass = data_falcon9['PayloadMass'].mean()

# Replace NaN values in PayloadMass with mean
data_falcon9 = data_falcon9.assign(PayloadMass=data_falcon9['PayloadMass'].fillna(mean_payload_mass))

# Verify that PayloadMass no longer has missing values
print("\nMissing value count after imputation:")
display(data_falcon9.isnull().sum())

# Specific check for PayloadMass column
payload_mass_nulls = data_falcon9['PayloadMass'].isnull().sum()
print(f"\nNumber of missing values in PayloadMass after imputation: {payload_mass_nulls}")
if payload_mass_nulls == 0:
    print("All missing values in PayloadMass have been successfully imputed.")
else:
    print("Warning: There are still missing values in PayloadMass after imputation.")

# Additional check for any remaining missing values in other columns
missing_values = data_falcon9.isnull().sum()
columns_with_missing = missing_values[missing_values > 0]

if not columns_with_missing.empty:
    print("\nWarning: There are still missing values in the dataset.")
    print("Columns with missing values:")
    for column, count in columns_with_missing.items():
        print(f"  {column}: {count} missing values")

    # Optional: Display a more detailed view of rows with missing values
    print("\nSample of rows with missing values:")
    display(data_falcon9[data_falcon9.isnull().any(axis=1)].head())
else:
    print("\nAll missing values have been handled successfully.")

# Display the first few rows of the updated DataFrame
print("\nUpdated DataFrame (first few rows):")
display(data_falcon9.head())

Missing value count before imputation:


Unnamed: 0,0
FlightNumber,0
Date,0
BoosterVersion,0
PayloadMass,5
Orbit,0
LaunchSite,0
Outcome,0
Flights,0
GridFins,0
Reused,0



Missing value count after imputation:


Unnamed: 0,0
FlightNumber,0
Date,0
BoosterVersion,0
PayloadMass,0
Orbit,0
LaunchSite,0
Outcome,0
Flights,0
GridFins,0
Reused,0



Number of missing values in PayloadMass after imputation: 0
All missing values in PayloadMass have been successfully imputed.

Columns with missing values:
  LandingPad: 26 missing values

Sample of rows with missing values:


Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857



Updated DataFrame (first few rows):


Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857


## 7. Export Processed Data

In [7]:
# Save processed DataFrame to CSV file
data_falcon9.to_csv('dataset_part_1.csv', index=False)