# **Space X  Falcon 9 first stage Landing Prediction**


# Collecting the data


## Overview

In this project, the goal is to predict whether the Falcon 9 first stage will land successfully. SpaceX offers Falcon 9 rocket launches at a cost of **\$62 million**, significantly lower than competitors, whose prices start at **\$165 million**. This cost reduction is primarily due to SpaceX's ability to reuse the first stage. By accurately predicting the success of the first stage landing, we can estimate the overall launch cost. This data could be valuable for companies looking to bid against SpaceX for rocket launches. In this task, you will collect data from an API and ensure it is properly formatted. Below is an example of a successful launch.

![](./images/landing_1.gif)


Several examples of an unsuccessful landing are shown here:


![](./images/crash.gif)


Most unsuccessful landings are planned. Space X performs a controlled landing in the oceans.


## Objectives


In this section, a `get` request will be made to the SpaceX API, followed by basic data wrangling and formatting to ensure the data is in the correct structure.

*   Request to the SpaceX API
*   Clean the requested data


***


## Import Libraries and Define Auxiliary Functions


In [1]:
import requests
import pandas as pd
import numpy as np
import datetime
import os
import spacex

from spacex.config import RAW_DATA_DIR, INTERIM_DATA_DIR, PROCESSED_DATA_DIR

pd.set_option('display.max_columns', None)    # Setting this option will print all collumns of a dataframe
pd.set_option('display.max_colwidth', None)   # Setting this option will print all of the data in a feature

Below, a series of helper functions will be defined to facilitate the use of the API for extracting information using identification numbers from the launch data.

The booster name will be retrieved from the `rocket` column.


















In [2]:
# Takes the dataset and uses the rocket column to call the API and append the data to the list
def getBoosterVersion(data):
    for x in data['rocket']:
       if x:
        response = requests.get(f"https://api.spacexdata.com/v4/rockets/{x}").json()
        BoosterVersion.append(f"{response['name']}")


From the `launchpad` column, the launch site being used, the logitude, and the latitude will be retrieved.


In [3]:
# Takes the dataset and uses the launchpad column to call the API and append the data to the list
def getLaunchSite(data):
    for x in data['launchpad']:
       if x:
         response = requests.get(f"https://api.spacexdata.com/v4/launchpads/{x}").json()
         Longitude.append(response['longitude'])
         Latitude.append(response['latitude'])
         LaunchSite.append(response['name'])

The mass of the payload and its intended orbit are derived from the <code>payload</code> column.


In [4]:
# Takes the dataset and uses the payloads column to call the API and append the data to the lists
def getPayloadData(data):
    for load in data['payloads']:
       if load:
        response = requests.get(f"https://api.spacexdata.com/v4/payloads/{load}").json()
        PayloadMass.append(response['mass_kg'])
        Orbit.append(response['orbit'])
        Customer.append(response.get('customers', ['Unknown'])[0])  # Add Customer

The following information is to be learned from `cores` column:

- The outcome of the landing
- The type of the landing
- The number of flights with that core
- Whether gridfins were used
- Whether the core is reused
- Whether legs were used
- The landing pad used
- The block of the core (a number used to separate versions of cores)
- The number of times this specific core has been reused
- The serial of the core

In [5]:
# Takes the dataset and uses the cores column to call the API and append the data to the lists
def getCoreData(data):
    for core in data['cores']:
            if core['core'] != None:
                response = requests.get(f"https://api.spacexdata.com/v4/cores/{core['core']}").json()
                Block.append(response['block'])
                ReusedCount.append(response['reuse_count'])
                Serial.append(response['serial'])
            else:
                Block.append(None)
                ReusedCount.append(None)
                Serial.append(None)
            Outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
            Flights.append(core['flight'])
            GridFins.append(core['gridfins'])
            Reused.append(core['reused'])
            Legs.append(core['legs'])
            LandingPad.append(core['landpad'])

Now let's start requesting rocket launch data from SpaceX API with the following URL:


In [6]:
spacex_url="https://api.spacexdata.com/v4/launches/past"

In [7]:
response = requests.get(spacex_url)

### Task 1: Request and parse the SpaceX launch data using the GET request


We should see that the request was successfull with the 200 status response code


In [8]:
response.status_code

200

Now we decode the response content as a Json using <code>.json()</code> and turn it into a Pandas dataframe using <code>.json_normalize()</code>


In [9]:
# Use json_normalize meethod to convert the json result into a dataframe

data = pd.json_normalize(response.json())
display(data.head())

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,net,window,rocket,success,failures,details,crew,ships,capsules,payloads,launchpad,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,auto_update,tbd,launch_library_id,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}]",Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/94/f2/NN6Ph45r_o.png,https://images2.imgbox.com/5b/02/QcxHUb5V_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html,https://en.wikipedia.org/wiki/DemoSat,
1,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 301, 'altitude': 289, 'reason': 'harmonic oscillation leading to premature engine shutdown'}]","Successful first stage burn and transition to second stage, maximum altitude 289 km, Premature engine shutdown at T+7 min 30 s, Failed to reach orbit, Failed to recover first stage",[],[],[],[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86,2,DemoSat,2007-03-21T01:10:00.000Z,1174439400,2007-03-21T13:10:00+12:00,hour,False,"[{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdaffd86e000604b32b,False,False,False,[],https://images2.imgbox.com/f9/4a/ZboXReNb_o.png,https://images2.imgbox.com/80/a2/bkWotCIS_o.png,,,,,[],[],,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,https://www.space.com/3590-spacex-falcon-1-rocket-fails-reach-orbit.html,https://en.wikipedia.org/wiki/DemoSat,
2,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 140, 'altitude': 35, 'reason': 'residual stage-1 thrust led to collision between stage 1 and stage 2'}]",Residual stage 1 thrust led to collision between stage 1 and stage 2,[],[],[],"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006eeb1e4]",5e9e4502f5090995de566f86,3,Trailblazer,2008-08-03T03:34:00.000Z,1217734440,2008-08-03T15:34:00+12:00,hour,False,"[{'core': '5e9e289ef3591814873b2625', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdbffd86e000604b32c,False,False,False,[],https://images2.imgbox.com/6c/cb/na1tzhHs_o.png,https://images2.imgbox.com/4a/80/k1oAkY0k_o.png,,,,,[],[],,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,http://www.spacex.com/news/2013/02/11/falcon-1-flight-3-mission-summary,https://en.wikipedia.org/wiki/Trailblazer_(satellite),
3,2008-09-20T00:00:00.000Z,1221869000.0,False,0.0,5e9d0d95eda69955f709d1eb,True,[],"Ratsat was carried to orbit on the first successful orbital launch of any privately funded and developed, liquid-propelled carrier rocket, the SpaceX Falcon 1",[],[],[],[5eb0e4b7b6c3bb0006eeb1e5],5e9e4502f5090995de566f86,4,RatSat,2008-09-28T23:15:00.000Z,1222643700,2008-09-28T11:15:00+12:00,hour,False,"[{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdbffd86e000604b32d,False,False,False,[],https://images2.imgbox.com/95/39/sRqN7rsv_o.png,https://images2.imgbox.com/a3/99/qswRYzE8_o.png,,,,,[],[],,https://www.youtube.com/watch?v=dLQ2tZEH6G0,dLQ2tZEH6G0,https://en.wikipedia.org/wiki/Ratsat,https://en.wikipedia.org/wiki/Ratsat,
4,,,False,0.0,5e9d0d95eda69955f709d1eb,True,[],,[],[],[],[5eb0e4b7b6c3bb0006eeb1e6],5e9e4502f5090995de566f86,5,RazakSat,2009-07-13T03:35:00.000Z,1247456100,2009-07-13T15:35:00+12:00,hour,False,"[{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdcffd86e000604b32e,False,False,False,[],https://images2.imgbox.com/ab/5a/Pequxd5d_o.png,https://images2.imgbox.com/92/e4/7Cf6MLY0_o.png,,,,,[],[],http://www.spacex.com/press/2012/12/19/spacexs-falcon-1-successfully-delivers-razaksat-satellite-orbit,https://www.youtube.com/watch?v=yTaIDooc8Og,yTaIDooc8Og,http://www.spacex.com/news/2013/02/12/falcon-1-flight-5,https://en.wikipedia.org/wiki/RazakSAT,


It should be noted that much of the data consists of IDs. For instance, the rocket column contains only an identification number, with no further details regarding the rocket itself.

The API will now be utilised once again to retrieve information about the launches using the provided IDs for each launch. Specifically, the columns <code>rocket</code>, <code>payloads</code>, <code>launchpad</code>, and <code>cores</code> will be used.


In [10]:
# Lets take a subset of our dataframe keeping only the features we want and the flight number, and date_utc.
data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

# Remove rows with multiple cores because those are falcon rockets with 2 extra rocket boosters and rows that have multiple payloads in a single rocket.
data = data[data['cores'].map(len)==1]
data = data[data['payloads'].map(len)==1]

# Since payloads and cores are lists of size 1, extract the single value in the list and replace the feature.
data['cores'] = data['cores'].map(lambda x : x[0])
data['payloads'] = data['payloads'].map(lambda x : x[0])

# Convert the date_utc to a datetime datatype and extract the date leaving the time
data['date_utc'] = pd.to_datetime(data['date_utc'])

data['Date'] = data['date_utc'].dt.date
data['Time'] = data['date_utc'].dt.time

# Using the date, restrict the dates of the launches
data = data[data['date_utc'] <= pd.Timestamp('2020-11-13', tz='UTC')]

- The booster name is to be learned from the `rocket`.

- The mass of the payload and the orbit it is going to are to be learned from the <code>payload</code>.

- The name of the launch site being used, along with the longitude and latitude, are to be obtained from the <code>launchpad</code>.

- The outcome of the landing, the type of landing, the number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core (a number used to separate versions of cores), the number of times this specific core has been reused, and the serial of the core are to be learned from <code>cores</code>.

The data from these requests will be stored in lists and will subsequently be used to create a new dataframe.



In [11]:
#Global variables 
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Date=[]
Time = []
Legs = []
LandingPad = []
Block = []
Customer=[]
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

The outputs will be applied globally to the variables mentioned above. The `BoosterVersion` variable will be examined. Before the `getBoosterVersion` function is applied, the list is empty:

The <code> getBoosterVersion</code> function method will be used to get the booster version.


In [12]:
# Call getBoosterVersion
getBoosterVersion(data)

the list has now been updated


In [13]:
BoosterVersion[0:5]

['Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 9']

The rest of the  functions can be applied:


In [14]:
# Call getLaunchSite
getLaunchSite(data)

In [15]:
# Call getPayloadData
getPayloadData(data)

In [16]:
# Call getCoreData
getCoreData(data)

Finally, the dataset will be constructed using the data that has been obtained. The columns will be combined into a dictionary.

In [17]:
launch_dict = {'Flight_Number': list(data['flight_number']),
'date_utc':list(data['date_utc']),
'Date': list(data['Date']),
'Time': list(data['Time']),
'Booster_Version':BoosterVersion,
'Payload_Mass':PayloadMass,
'Orbit':Orbit,
'Launch_Site':LaunchSite,
'Mission_Outcome':Outcome,
'Flights':Flights,
'GridFins':GridFins,
'Reused':Reused,
'Legs':Legs,
'Customer':Customer,
'Landing_Pad':LandingPad,
'Block':Block,
'Reused_Count':ReusedCount,
'Serial':Serial,
'Longitude': Longitude,
'Latitude': Latitude}


Subsequently, a Pandas DataFrame will be created from the dictionary `launch_dict`.



In [18]:
# Create a data from launch_dict
df = pd.DataFrame(launch_dict)

Show the summary of the dataframe


In [19]:
# Show the head of the dataframe
df.sample(10)

Unnamed: 0,Flight_Number,date_utc,Date,Time,Booster_Version,Payload_Mass,Orbit,Launch_Site,Mission_Outcome,Flights,GridFins,Reused,Legs,Customer,Landing_Pad,Block,Reused_Count,Serial,Longitude,Latitude
79,91,2020-03-07 04:50:31+00:00,2020-03-07,04:50:31,Falcon 9,1977.0,ISS,CCSFS SLC 40,True RTLS,2,True,True,True,NASA (CRS),5e9e3032383ecb267a34e7c7,5.0,5,B1059,-80.577366,28.561857
1,2,2007-03-21 01:10:00+00:00,2007-03-21,01:10:00,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,DARPA,,,0,Merlin2A,167.743129,9.047721
93,106,2020-11-05 23:24:00+00:00,2020-11-05,23:24:00,Falcon 9,3681.0,MEO,CCSFS SLC 40,True ASDS,1,True,False,True,United States Space Force,5e9e3032383ecb6bb234e7ca,5.0,8,B1062,-80.577366,28.561857
84,96,2020-06-13 09:21:00+00:00,2020-06-13,09:21:00,Falcon 9,15600.0,VLEO,CCSFS SLC 40,True ASDS,3,True,True,True,SpaceX,5e9e3032383ecb6bb234e7ca,5.0,5,B1059,-80.577366,28.561857
78,90,2020-02-17 15:05:55+00:00,2020-02-17,15:05:55,Falcon 9,15600.0,VLEO,CCSFS SLC 40,False ASDS,4,True,True,True,SpaceX,5e9e3032383ecb6bb234e7ca,5.0,3,B1056,-80.577366,28.561857
69,80,2019-06-12 14:17:00+00:00,2019-06-12,14:17:00,Falcon 9,1425.0,SSO,VAFB SLC 4E,True RTLS,2,True,True,True,CSA,5e9e3032383ecb554034e7c9,5.0,12,B1051,-120.610829,34.632093
71,83,2019-08-06 22:52:00+00:00,2019-08-06,22:52:00,Falcon 9,6500.0,GTO,CCSFS SLC 40,None None,3,False,True,False,Spacecom,,5.0,2,B1047,-80.577366,28.561857
19,24,2015-06-28 14:21:00+00:00,2015-06-28,14:21:00,Falcon 9,2477.0,ISS,CCSFS SLC 40,None ASDS,1,True,False,True,NASA (CRS),5e9e3032383ecb6bb234e7ca,1.0,0,B1018,-80.577366,28.561857
81,93,2020-04-22 19:30:00+00:00,2020-04-22,19:30:00,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,4,True,True,True,SpaceX,5e9e3032383ecb6bb234e7ca,5.0,12,B1051,-80.603956,28.608058
36,42,2017-06-23 19:10:00+00:00,2017-06-23,19:10:00,Falcon 9,3669.0,GTO,KSC LC 39A,True ASDS,2,True,True,True,Bulgaria Sat,5e9e3032383ecb6bb234e7ca,3.0,1,B1029,-80.603956,28.608058


### Task 2: Filter the dataframe to only include `Falcon 9` launches


Finally, the Falcon 1 launches will be removed, with only the Falcon 9 launches being retained. The data will be filtered using the <code>BoosterVersion</code> column to keep only the Falcon 9 launches. The filtered data will be saved to a new dataframe called <code>data_falcon9</code>.



In [20]:
# Hint data['BoosterVersion']!='Falcon 1'
data_falcon9 = df[df['Booster_Version']!='Falcon 1']
data_falcon9.head()

Unnamed: 0,Flight_Number,date_utc,Date,Time,Booster_Version,Payload_Mass,Orbit,Launch_Site,Mission_Outcome,Flights,GridFins,Reused,Legs,Customer,Landing_Pad,Block,Reused_Count,Serial,Longitude,Latitude
4,6,2010-06-04 18:45:00+00:00,2010-06-04,18:45:00,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,SpaceX,,1.0,0,B0003,-80.577366,28.561857
5,8,2012-05-22 07:44:00+00:00,2012-05-22,07:44:00,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,NASA(COTS),,1.0,0,B0005,-80.577366,28.561857
6,10,2013-03-01 19:10:00+00:00,2013-03-01,19:10:00,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,NASA (CRS),,1.0,0,B0007,-80.577366,28.561857
7,11,2013-09-29 16:00:00+00:00,2013-09-29,16:00:00,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,MDA,,1.0,0,B1003,-120.610829,34.632093
8,12,2013-12-03 22:41:00+00:00,2013-12-03,22:41:00,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,SES,,1.0,0,B1004,-80.577366,28.561857


Now that some values have been removed, the `FlightNumber` column will be reset.



In [21]:
data_falcon9.loc[:,'Flight_Number'] = list(range(1, data_falcon9.shape[0]+1))
data_falcon9

Unnamed: 0,Flight_Number,date_utc,Date,Time,Booster_Version,Payload_Mass,Orbit,Launch_Site,Mission_Outcome,Flights,GridFins,Reused,Legs,Customer,Landing_Pad,Block,Reused_Count,Serial,Longitude,Latitude
4,1,2010-06-04 18:45:00+00:00,2010-06-04,18:45:00,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,SpaceX,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22 07:44:00+00:00,2012-05-22,07:44:00,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,NASA(COTS),,1.0,0,B0005,-80.577366,28.561857
6,3,2013-03-01 19:10:00+00:00,2013-03-01,19:10:00,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,NASA (CRS),,1.0,0,B0007,-80.577366,28.561857
7,4,2013-09-29 16:00:00+00:00,2013-09-29,16:00:00,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,MDA,,1.0,0,B1003,-120.610829,34.632093
8,5,2013-12-03 22:41:00+00:00,2013-12-03,22:41:00,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,SES,,1.0,0,B1004,-80.577366,28.561857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89,86,2020-09-03 12:46:00+00:00,2020-09-03,12:46:00,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,2,True,True,True,SpaceX,5e9e3032383ecb6bb234e7ca,5.0,12,B1060,-80.603956,28.608058
90,87,2020-10-06 11:29:00+00:00,2020-10-06,11:29:00,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,3,True,True,True,SpaceX,5e9e3032383ecb6bb234e7ca,5.0,13,B1058,-80.603956,28.608058
91,88,2020-10-18 12:25:00+00:00,2020-10-18,12:25:00,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,6,True,True,True,SpaceX,5e9e3032383ecb6bb234e7ca,5.0,12,B1051,-80.603956,28.608058
92,89,2020-10-24 15:31:00+00:00,2020-10-24,15:31:00,Falcon 9,15600.0,VLEO,CCSFS SLC 40,True ASDS,3,True,True,True,SpaceX,5e9e3033383ecbb9e534e7cc,5.0,12,B1060,-80.577366,28.561857


## Data Wrangling


It can be observed below that some rows in the dataset contain missing values.



In [22]:
data_falcon9.isnull().sum()

Flight_Number       0
date_utc            0
Date                0
Time                0
Booster_Version     0
Payload_Mass        5
Orbit               0
Launch_Site         0
Mission_Outcome     0
Flights             0
GridFins            0
Reused              0
Legs                0
Customer            0
Landing_Pad        26
Block               0
Reused_Count        0
Serial              0
Longitude           0
Latitude            0
dtype: int64

Before proceeding, it is necessary to address these missing values. The `LandingPad` column will retain `None` values to indicate instances where landing pads were not used.  



### Task 3: Dealing with Missing Values


The mean of `PayloadMass` will be calculated using the `.mean()` function. This mean value will then be used with the `.replace()` function to replace any `np.nan` values in the dataset.  


In [23]:
# Calculate the mean value of PayloadMass column
Payload_Mass_mean = data_falcon9['Payload_Mass'].mean()
# Replace the np.nan values with its mean value
data_falcon9 = data_falcon9.replace({'Payload_Mass' : np.nan}, Payload_Mass_mean)

The number of missing values in `PayloadMass` should now be reduced to zero.

At this stage, no missing values should remain in the dataset, except for those in `LandingPad`.

The dataset can now be exported as both **CSV** and **pickle** files for the next section.



In [24]:
csv_file = os.path.join(INTERIM_DATA_DIR,'dataset_part_1.csv')
pickle_file = os.path.join(INTERIM_DATA_DIR, 'dataset_part_1.pkl')

# Save as CSV
data_falcon9.to_csv(csv_file, index=False)
# Save as Pickle
data_falcon9.to_pickle(pickle_file)

Copyright © 2021 IBM Corporation. All rights reserved.
