<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo">
    </a>
</p>


# **SpaceX  Falcon 9 first stage Landing Prediction**


# Data Collection: SpaceX API


Date completed: 8 September 2024


In this capstone, I will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage.

Therefore if I can determine if the first stage will land, I can determine the cost of a launch.

This information can be used if an alternate company wants to bid against SpaceX for a rocket launch.

In this phase, I will collect and make sure the data is in the correct format from an API.

The following is an example of a successful and launch.


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/lab_v2/images/landing_1.gif)


Most unsuccessful landings are planned. Space X performs a controlled landing in the oceans.


## Objectives


- Send a GET request to the SpaceX API.
- Data wrangling: Clean and format the retrieved data.


----


## Import Libraries


In [55]:
# Requests allows HTTP requests to get data from an API
import requests
# Pandas for data manipulation and analysis.
import pandas as pd
# NumPy adds support for large, multi-dimensional arrays and matrices
import numpy as np
# Datetime represents dates
import datetime

# Print all columns of a dataframe
pd.set_option('display.max_columns', None)

# Print all of the data in a feature
pd.set_option('display.max_colwidth', None)

print("Libraries successfully imported")

Libraries successfully imported


## Define Auxiliary Functions


Helper functions for extracting information using identification numbers in the launch data.


In [56]:
# Takes the dataset and uses the rocket column to call the API and append the data to the list
def getBoosterVersion(data):
    for x in data['rocket']:
       if x:
        response = requests.get("https://api.spacexdata.com/v4/rockets/"+str(x)).json()
        BoosterVersion.append(response['name'])

From the <code>launchpad</code> I would like to know the name of the launch site being used, the longitude, and the latitude.


In [57]:
# Takes the dataset and uses the launchpad column to call the API and append the data to the list
def getLaunchSite(data):
    for x in data['launchpad']:
       if x:
         response = requests.get("https://api.spacexdata.com/v4/launchpads/"+str(x)).json()
         Longitude.append(response['longitude'])
         Latitude.append(response['latitude'])
         LaunchSite.append(response['name'])

From the <code>payload</code> I would like to learn the mass of the payload and the orbit that it is going to.


In [58]:
# Takes the dataset and uses the payloads column to call the API and append the payload mass to the DataFrame
def getPayloadData(data):
    for load in data['payloads']:
       if load:
        response = requests.get("https://api.spacexdata.com/v4/payloads/"+load).json()
        PayloadMass.append(response['mass_kg'])
        Orbit.append(response['orbit'])

From <code>cores</code> I would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.


In [59]:
# Takes the dataset and uses the cores column to call the API and append the data to the DataFrame
def getCoreData(data):
    for core in data['cores']:
            if core['core'] != None:
                response = requests.get("https://api.spacexdata.com/v4/cores/"+core['core']).json()
                Block.append(response['block'])
                ReusedCount.append(response['reuse_count'])
                Serial.append(response['serial'])
            else:
                Block.append(None)
                ReusedCount.append(None)
                Serial.append(None)
            Outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
            Flights.append(core['flight'])
            GridFins.append(core['gridfins'])
            Reused.append(core['reused'])
            Legs.append(core['legs'])
            LandingPad.append(core['landpad'])

## Task 1: Request and parse the SpaceX launch data using the GET request


In [60]:
# Request and parse the SpaceX launch data using a GET request
static_json_url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/API_call_spacex_api.json'
response = requests.get(static_json_url)

# Convert response to JSON
response_json = response.json()

# Normalize the JSON data
data_initial = pd.json_normalize(response_json)

# Display the first row
data_initial.head(1)

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,tbd,net,window,rocket,success,details,crew,ships,capsules,payloads,launchpad,auto_update,failures,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,False,0.0,5e9d0d95eda69955f709d1eb,False,Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,True,"[{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}]",1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/3c/0e/T8iJcSN3_o.png,https://images2.imgbox.com/40/e3/GypSkayF_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html,https://en.wikipedia.org/wiki/DemoSat,


## Data cleaning



A lot of the data are IDs. For example the rocket column has no information about the rocket just an identification number.

I will now use the API again to get information about the launches using the IDs given for each launch. Specifically I will be using columns <code>rocket</code>, <code>payloads</code>, <code>launchpad</code>, and <code>cores</code>.

In [61]:
# Lets take a subset of our dataframe keeping only the features I want and the flight number, and date_utc.
data = data_initial[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

# I will remove rows with multiple cores because those are falcon rockets with 2 extra rocket boosters and rows that have multiple payloads in a single rocket.
data = data[data['cores'].map(len)==1]
data = data[data['payloads'].map(len)==1]

# Since payloads and cores are lists of size 1 I will also extract the single value in the list and replace the feature.
data['cores'] = data['cores'].map(lambda x : x[0])
data['payloads'] = data['payloads'].map(lambda x : x[0])

# I also want to convert the date_utc to a datetime datatype and then extracting the date leaving the time
data['date'] = pd.to_datetime(data['date_utc']).dt.date

# Using the date I will restrict the dates of the launches
data = data[data['date'] <= datetime.date(2020, 11, 13)]

* From the <code>rocket</code> I would like to learn the booster name

* From the <code>payload</code> I would like to learn the mass of the payload and the orbit that it is going to

* From the <code>launchpad</code> I would like to know the name of the launch site being used, the longitude, and the latitude.

* From <code>cores</code> I would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.**

The data from these requests will be stored in lists and will be used to create a new dataframe.


In [62]:
# Setting global variables as empty lists

BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

These functions will apply the outputs globally to the above variables. Let's take a looks at <code>BoosterVersion</code> variable. Before I apply  <code>getBoosterVersion</code> the list is empty:


In [63]:
# Confirm that first list is empty
BoosterVersion

[]

Now, let's apply <code> getBoosterVersion</code> function method to get the booster version


In [64]:
# Call getBoosterVersion
getBoosterVersion(data)

The list has now been updated


I can apply the rest of the  functions here:


In [65]:
# Call getLaunchSite
getLaunchSite(data)

In [66]:
# Call getPayloadData
getPayloadData(data)

In [67]:
# Call getCoreData
getCoreData(data)

In [68]:
BoosterVersion[0:5]

['Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 9']

In [69]:
# Constructing the dataset using the obtained data, combining columns in to a dictionary.
launch_dict = {'FlightNumber': list(data['flight_number']),
                'Date': list(data['date']),
                'BoosterVersion':BoosterVersion,
                'PayloadMass':PayloadMass,
                'Orbit':Orbit,
                'LaunchSite':LaunchSite,
                'Outcome':Outcome,
                'Flights':Flights,
                'GridFins':GridFins,
                'Reused':Reused,
                'Legs':Legs,
                'LandingPad':LandingPad,
                'Block':Block,
                'ReusedCount':ReusedCount,
                'Serial':Serial,
                'Longitude': Longitude,
                'Latitude': Latitude}

In [70]:
# Creating a Pandas DataFrame from the dictionary launch_dict
launch_df = pd.DataFrame(launch_dict)
launch_df.head(3)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1A,167.743129,9.047721
1,2,2007-03-21,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2A,167.743129,9.047721
2,4,2008-09-28,Falcon 1,165.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2C,167.743129,9.047721


### Task 2: Filter the dataframe to only include `Falcon 9` launches


In [71]:
# Count types of booster versions.
launch_df['BoosterVersion'].value_counts()

Unnamed: 0_level_0,count
BoosterVersion,Unnamed: 1_level_1
Falcon 9,90
Falcon 1,4


In [72]:
# Filter out all launches except those with the Falcon 9 booster.
data_falcon_9 = launch_df.loc[launch_df['BoosterVersion'].isin(['Falcon 9'])]
data_falcon_9.head(2)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,8,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857


Now that I have removed some values I will reset the FlightNumber column


In [73]:
# Confirming that only the Falcon 9 booster is included.
data_falcon_9['BoosterVersion'].value_counts()

Unnamed: 0_level_0,count
BoosterVersion,Unnamed: 1_level_1
Falcon 9,90


In [74]:
# Reset the FlightNumber column
data_falcon_9.loc[:,'FlightNumber'] = list(range(1, data_falcon_9.shape[0]+1))
data_falcon_9.head(2)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857


In [75]:
# Summary statistics
data_falcon_9.describe()

Unnamed: 0,FlightNumber,PayloadMass,Flights,Block,ReusedCount,Longitude,Latitude
count,90.0,85.0,90.0,90.0,90.0,90.0,90.0
mean,45.5,6123.547647,1.788889,3.5,3.188889,-86.366477,29.449963
std,26.124701,4870.916417,1.213172,1.595288,4.194417,14.149518,2.141306
min,1.0,350.0,1.0,1.0,0.0,-120.610829,28.561857
25%,23.25,2482.0,1.0,2.0,0.0,-80.603956,28.561857
50%,45.5,4535.0,1.0,4.0,1.0,-80.577366,28.561857
75%,67.75,9600.0,2.0,5.0,4.0,-80.577366,28.608058
max,90.0,15600.0,6.0,5.0,13.0,-80.577366,34.632093


## Data Wrangling


In [76]:
# Identify missing values in the dataset
data_falcon_9.isnull().sum()

Unnamed: 0,0
FlightNumber,0
Date,0
BoosterVersion,0
PayloadMass,5
Orbit,0
LaunchSite,0
Outcome,0
Flights,0
GridFins,0
Reused,0


Before I can continue I must deal with these missing values. The <code>LandingPad</code> column will retain None values to represent when landing pads were not used.


### Task 3: Dealing with Missing Values


Calculate below the mean for the <code>PayloadMass</code> using the <code>.mean()</code>. Then use the mean and the <code>.replace()</code> function to replace `np.nan` values in the data with the calculated mean.

In [77]:
# Calculate the mean value of PayloadMass column
meanval = data_falcon_9['PayloadMass'].mean()

# Replace the np.nan values with its mean value
data_falcon_9['PayloadMass'] = data_falcon_9['PayloadMass'].replace(np.nan, meanval)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_falcon_9['PayloadMass'] = data_falcon_9['PayloadMass'].replace(np.nan, meanval)


In [78]:
data_falcon_9.isnull().sum()

Unnamed: 0,0
FlightNumber,0
Date,0
BoosterVersion,0
PayloadMass,0
Orbit,0
LaunchSite,0
Outcome,0
Flights,0
GridFins,0
Reused,0


The number of missing values of the <code>PayLoadMass</code> changed to zero.


No missing values in our dataset except for in <code>LandingPad</code>.


I can now export it to a <b>CSV</b> for the next section.


In [79]:
# Export DataFrame as .csv
data_falcon_9.to_csv('dataset_part_1.csv', index=False)

## Authors


<a href="https://www.linkedin.com/in/joseph-s-50398b136/">Joseph Santarcangelo</a> has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.


<!--## Change Log
-->


Copyright © 2021 IBM Corporation. All rights reserved.
