<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo">
    </a>
</p>


# <center>**SpaceX Falcon 9 First Stage Landing Prediction**<center>


# <center> Practical laboratory 1:<center>

## ``Data Collection`` from the SpaceX API


In this capstone project, we will predict whether the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website at a cost of $62 million; other providers cost over $165 million each. Much of the savings is because SpaceX can reuse the first stage. Therefore, if we can determine whether the first stage will land successfully, we can determine the cost of the launch. This information can be used if another company wants to compete with SpaceX for a rocket launch. In this lab, we will collect data from an API and ensure it is in the correct format. The following is an example of a successful launch.

![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/lab_v2/images/landing_1.gif)


Below are several examples of failed landings:

![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/lab_v2/images/crash.gif)


Most failed landings are planned. SpaceX makes a controlled landing in the oceans.


## Objectives


In this lab, we'll make a GET request to the SpaceX API. We'll also perform some basic data manipulation and formatting.

- SpaceX API Request
- Cleanup of Requested Data

----


## Import libraries and define helper functions


We will import the following libraries into the lab:


In [19]:
# Requests allows us to make HTTP requests that we will use to obtain data from an API.
import requests
# Pandas is a software library written for the Python programming language for data manipulation and analysis.
import pandas as pd
# NumPy is a library for the Python programming language, which adds support for large multidimensional arrays and matrices, along with a large collection of high-level mathematical functions for operating on these arrays.
import numpy as np
# Datetime is a library that allows us to represent dates.
import datetime

# This library is used to suppress warnings.
import warnings
warnings.filterwarnings('ignore')

# Setting this option will print all columns in a data frame.
pd.set_option('display.max_columns', None)
# Setting this option will print all data from a function.
pd.set_option('display.max_colwidth', None)

Next, we'll define a series of helper functions that will allow us to use the API to extract information using ID numbers in the launch data.

From the <code>rocket</code> column, we want to get the name of the booster.


In [20]:
# Take the dataset and use the rocket column to call the API and add the data to the list
def getBoosterVersion(data):
    for x in data['rocket']:
       if x:
        response = requests.get("https://api.spacexdata.com/v4/rockets/"+str(x)).json()
        BoosterVersion.append(response['name'])

From the <code>launchpad</code> we would like to know the name of the launch site being used, the location and the latitude.


In [21]:
# Take the dataset and use the launchpad column to call the API and add the data to the list
def getLaunchSite(data):
    for x in data['launchpad']:
       if x:
         response = requests.get("https://api.spacexdata.com/v4/launchpads/"+str(x)).json()
         Longitude.append(response['longitude'])
         Latitude.append(response['latitude'])
         LaunchSite.append(response['name'])

From the <code>payload</code> we would like to know the mass of the payload and the orbit it is heading to.

In [22]:
# Take the dataset and use the payloads column to call the API and add the data to the lists
def getPayloadData(data):
    for load in data['payloads']:
       if load:
        response = requests.get("https://api.spacexdata.com/v4/payloads/"+load).json()
        PayloadMass.append(response['mass_kg'])
        Orbit.append(response['orbit'])

From <code>cores</code> we would like to know the landing result, the landing type, the number of flights with that core, if grid fins were used, if the core was reused, if legs were used, the landing platform used, the core block which is a number used to separate the version of cores, the number of times this specific core was reused and the core serial number.


In [None]:
# Take the dataset and use the cores column to call the API and add the data to the lists
def getCoreData(data):
    for core in data['cores']:
        if core['core'] != None:
            response = requests.get("https://api.spacexdata.com/v4/cores/"+core['core']).json()
            Block.append(response['block'])
            ReusedCount.append(response['reuse_count'])
            Serial.append(response['serial'])
        else:
            Block.append(None)
            ReusedCount.append(None)
            Serial.append(None)
            Outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
            Flights.append(core['flight'])
            GridFins.append(core['gridfins'])
            Reused.append(core['reused'])
            Legs.append(core['legs'])
            LandingPad.append(core['landpad'])

Now let's start requesting rocket launch data from the SpaceX API with the following URL:


In [24]:
spacex_url="https://api.spacexdata.com/v4/launches/past"

In [25]:
response = requests.get(spacex_url)

We check the content of the response

In [26]:
print(f"Status code: {response.status_code}") # 200 means the request was successful

# We verify that it is the content we are looking for
print(f"The first 100 characters of the response content{response.content[0:100]}") # The first 100 characters of the response content

Status code: 200
The first 100 characters of the response contentb'[{"fairings":{"reused":false,"recovery_attempt":false,"recovered":false,"ships":[]},"links":{"patch"'


We see that the answer contains a lot of information about SpaceX launches. Next, let's try to uncover more information relevant to this project.


### Task 1: We will request and analyze SpaceX launch data using the GET request


To make the requested JSON results more consistent, we will use the following static response object for this project:


In [27]:
static_json_url='https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/API_call_spacex_api.json'

We should see that the request was successful with status response code 200.


In [28]:
response.status_code

200

Now we decode the response content as JSON using <code>.json()</code> and convert it to a Pandas DataFrame using <code>.json_normalize()</code>


In [29]:
# Use the json_normalize method to convert the json result into a data frame
spacex_df = pd.json_normalize(response.json())

Using the data frame <code>data</code> we print the first 5 rows


In [30]:
# Get the head of the dataframe

spacex_df.head(5)

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,net,window,rocket,success,failures,details,crew,ships,capsules,payloads,launchpad,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,auto_update,tbd,launch_library_id,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}]",Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/94/f2/NN6Ph45r_o.png,https://images2.imgbox.com/5b/02/QcxHUb5V_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html,https://en.wikipedia.org/wiki/DemoSat,
1,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 301, 'altitude': 289, 'reason': 'harmonic oscillation leading to premature engine shutdown'}]","Successful first stage burn and transition to second stage, maximum altitude 289 km, Premature engine shutdown at T+7 min 30 s, Failed to reach orbit, Failed to recover first stage",[],[],[],[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86,2,DemoSat,2007-03-21T01:10:00.000Z,1174439400,2007-03-21T13:10:00+12:00,hour,False,"[{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdaffd86e000604b32b,False,False,False,[],https://images2.imgbox.com/f9/4a/ZboXReNb_o.png,https://images2.imgbox.com/80/a2/bkWotCIS_o.png,,,,,[],[],,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,https://www.space.com/3590-spacex-falcon-1-rocket-fails-reach-orbit.html,https://en.wikipedia.org/wiki/DemoSat,
2,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 140, 'altitude': 35, 'reason': 'residual stage-1 thrust led to collision between stage 1 and stage 2'}]",Residual stage 1 thrust led to collision between stage 1 and stage 2,[],[],[],"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006eeb1e4]",5e9e4502f5090995de566f86,3,Trailblazer,2008-08-03T03:34:00.000Z,1217734440,2008-08-03T15:34:00+12:00,hour,False,"[{'core': '5e9e289ef3591814873b2625', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdbffd86e000604b32c,False,False,False,[],https://images2.imgbox.com/6c/cb/na1tzhHs_o.png,https://images2.imgbox.com/4a/80/k1oAkY0k_o.png,,,,,[],[],,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,http://www.spacex.com/news/2013/02/11/falcon-1-flight-3-mission-summary,https://en.wikipedia.org/wiki/Trailblazer_(satellite),
3,2008-09-20T00:00:00.000Z,1221869000.0,False,0.0,5e9d0d95eda69955f709d1eb,True,[],"Ratsat was carried to orbit on the first successful orbital launch of any privately funded and developed, liquid-propelled carrier rocket, the SpaceX Falcon 1",[],[],[],[5eb0e4b7b6c3bb0006eeb1e5],5e9e4502f5090995de566f86,4,RatSat,2008-09-28T23:15:00.000Z,1222643700,2008-09-28T11:15:00+12:00,hour,False,"[{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdbffd86e000604b32d,False,False,False,[],https://images2.imgbox.com/95/39/sRqN7rsv_o.png,https://images2.imgbox.com/a3/99/qswRYzE8_o.png,,,,,[],[],,https://www.youtube.com/watch?v=dLQ2tZEH6G0,dLQ2tZEH6G0,https://en.wikipedia.org/wiki/Ratsat,https://en.wikipedia.org/wiki/Ratsat,
4,,,False,0.0,5e9d0d95eda69955f709d1eb,True,[],,[],[],[],[5eb0e4b7b6c3bb0006eeb1e6],5e9e4502f5090995de566f86,5,RazakSat,2009-07-13T03:35:00.000Z,1247456100,2009-07-13T15:35:00+12:00,hour,False,"[{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdcffd86e000604b32e,False,False,False,[],https://images2.imgbox.com/ab/5a/Pequxd5d_o.png,https://images2.imgbox.com/92/e4/7Cf6MLY0_o.png,,,,,[],[],http://www.spacex.com/press/2012/12/19/spacexs-falcon-1-successfully-delivers-razaksat-satellite-orbit,https://www.youtube.com/watch?v=yTaIDooc8Og,yTaIDooc8Og,http://www.spacex.com/news/2013/02/12/falcon-1-flight-5,https://en.wikipedia.org/wiki/RazakSAT,


You'll notice that much of the data is identifiers. For example, the "rocket" column doesn't contain information about the rocket, just an ID number.

Now we'll use the API again to get information about the launches using the identifiers provided for each one. Specifically, we'll use the <code>rocket</code>, <code>payloads</code>, <code>launchpad</code>, and <code>cores</code> columns.


In [31]:
# Let's take a subset of our data frame, keeping only the features we want, including the flight number and date_utc.
data = spacex_df[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

# We'll remove rows with multiple cores because they are Falcon rockets with two additional rocket boosters, and rows that have multiple payloads on a single rocket.
data = data[data['cores'].map(len)==1]
data = data[data['payloads'].map(len)==1]

# Since payloads and cores are lists of size 1, we'll also extract the unique value in the list and replace the feature.
data['cores'] = data['cores'].map(lambda x : x[0])
data['payloads'] = data['payloads'].map(lambda x : x[0])

# We also want to convert the date_utc to a datetime data type and then extract the date, leaving the time.
data['date'] = pd.to_datetime(data['date_utc']).dt.date

# Using the date, we'll restrict the release dates.
data = data[data['date'] <= datetime.date(2020, 11, 13)]

* For the rocket, we want to know the name of the booster.

* For the payload, we want to know its mass and the orbit it is targeting.

* For the launch pad, we want to know the launch site name, longitude, and latitude.

* **For the cores, we want to know the landing result, the type of landing, the number of flights with that core, whether grid fins were used, whether the core is reused, whether legs were used, the landing pad used, the core block (a number that separates core versions), the number of times this specific core has been reused, and its serial number.**

The data from these requests will be stored in lists and used to create a new data frame.


In [32]:
#Global variables 
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

These functions will apply the outputs globally to the above variables. Let's analyze the <code>BoosterVersion</code> variable. Before applying <code>getBoosterVersion</code>, the list is empty:


In [33]:
BoosterVersion

[]

Now, let's apply the <code>getBoosterVersion</code> function method to get the booster version.


In [34]:
# Call getBoosterVersion
getBoosterVersion(data)

The list has already been updated


In [35]:
BoosterVersion[0:5]

['Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 9']

We can apply the rest of the functions here:


In [36]:
# Call getLaunchSite
getLaunchSite(data)

In [37]:
# Call getPayloadData
getPayloadData(data)

In [38]:
# Call getCoreData
getCoreData(data)

Finally, let's build our dataset with the data we obtained. We combine the columns into a dictionary.


In [39]:
launch_dict = {
    'FlightNumber': list(data['flight_number']),
    'Date': list(data['date']),
    'BoosterVersion':BoosterVersion,
    'PayloadMass':PayloadMass,
    'Orbit':Orbit,
    'LaunchSite':LaunchSite,
    'Outcome':Outcome,
    'Flights':Flights,
    'GridFins':GridFins,
    'Reused':Reused,
    'Legs':Legs,
    'LandingPad':LandingPad,
    'Block':Block,
    'ReusedCount':ReusedCount,
    'Serial':Serial,
    'Longitude': Longitude,
    'Latitude': Latitude
}

Next we need to create a Pandas dataframe from the launch dictionary.


In [40]:
launch_df = pd.DataFrame(launch_dict, columns = ['FlightNumber', 'Date', 'BoosterVersion', 'PayloadMass', 'Orbit', 'LaunchSite', 'Outcome', 'Flights', 'GridFins', 'Reused', 'Legs', 'LandingPad', 'Block', 'ReusedCount', 'Serial', 'Longitude', 'Latitude'])

We check the summary of the data frame


In [41]:
launch_df.describe()

Unnamed: 0,FlightNumber,PayloadMass,Flights,Block,ReusedCount,Longitude,Latitude
count,94.0,88.0,94.0,90.0,94.0,94.0,94.0
mean,54.202128,5919.165341,1.755319,3.5,3.053191,-75.553302,28.581782
std,30.589048,4909.689575,1.197544,1.595288,4.153938,53.39188,4.639981
min,1.0,20.0,1.0,1.0,0.0,-120.610829,9.047721
25%,28.25,2406.25,1.0,2.0,0.0,-80.603956,28.561857
50%,52.5,4414.0,1.0,4.0,1.0,-80.577366,28.561857
75%,81.5,9543.75,2.0,5.0,4.0,-80.577366,28.608058
max,106.0,15600.0,6.0,5.0,13.0,167.743129,34.632093


### Task 2: We filter the data frame to include ONLY the "Falcon 9" launches


Finally, we'll remove the Falcon 1 launches and keep only the Falcon 9 launches. We'll filter the data frame with the <code>BoosterVersion</code> column to keep only the Falcon 9 launches. We'll save the filtered data to a new data frame called <code>data_falcon9</code>.


In [42]:
data_falcon9= launch_df[launch_df['BoosterVersion']!='Falcon 1']
data_falcon9.describe()

Unnamed: 0,FlightNumber,PayloadMass,Flights,Block,ReusedCount,Longitude,Latitude
count,90.0,85.0,90.0,90.0,90.0,90.0,90.0
mean,56.477778,6123.547647,1.788889,3.5,3.188889,-86.366477,29.449963
std,29.232977,4870.916417,1.213172,1.595288,4.194417,14.149518,2.141306
min,6.0,350.0,1.0,1.0,0.0,-120.610829,28.561857
25%,32.25,2482.0,1.0,2.0,0.0,-80.603956,28.561857
50%,55.5,4535.0,1.0,4.0,1.0,-80.577366,28.561857
75%,82.75,9600.0,2.0,5.0,4.0,-80.577366,28.608058
max,106.0,15600.0,6.0,5.0,13.0,-80.577366,34.632093


Now that we have deleted some values, we need to reset the FlagNumber column.


In [43]:
data_falcon9.loc[:,'FlightNumber'] = list(range(1, data_falcon9.shape[0]+1))
data_falcon9

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89,86,2020-09-03,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,2,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,12,B1060,-80.603956,28.608058
90,87,2020-10-06,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,3,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,13,B1058,-80.603956,28.608058
91,88,2020-10-18,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,6,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,12,B1051,-80.603956,28.608058
92,89,2020-10-24,Falcon 9,15600.0,VLEO,CCSFS SLC 40,True ASDS,3,True,True,True,5e9e3033383ecbb9e534e7cc,5.0,12,B1060,-80.577366,28.561857


## ``Data manipulation``


We can see below that some rows are missing values ​​in our dataset.


In [44]:
data_falcon9.isnull().sum()

FlightNumber       0
Date               0
BoosterVersion     0
PayloadMass        5
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

Before continuing, we need to correct these missing values. The <code>LandingPad</code> column will retain the value "None" to indicate when landing pads were not used.


### Task 3: How do we deal with missing values?


Calculamos a continuación ``la media de PayloadMass`` mediante <code>.mean()</code>. Luego, utilizaremos la media y la función <code>.replace()</code> para reemplazar los valores de `np.nan` en los datos con la media calculada.

In [46]:
# Calculate the mean value of PayloadMass column
payloadmass_mean = data_falcon9['PayloadMass'].mean()

# Replace the np.nan values with its mean value
data_falcon9['PayloadMass'].replace(np.nan, payloadmass_mean, inplace=True)
data_falcon9.isnull().sum()

FlightNumber       0
Date               0
BoosterVersion     0
PayloadMass        0
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

We should see that the number of missing values ​​for <code>PayLoadMass</code> has changed to zero.


We should now have no missing values ​​in our dataset except for <code>LandingPad</code>.


Now we can export it to a <b>CSV</b> for the next section, but so the answers are consistent.


In [28]:
data_falcon9.to_csv('dataset_part_1.csv', index=False)

<br>

-----------------------------

## Author


<a href="https://www.linkedin.com/in/flavio-aguirre-12784a252/">Flavio Aguirre</a><br>
Data Scientist


<!--

|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2020-09-20|1.1|Joseph|get result each time you run|
|2020-09-20|1.1|Azim |Created Part 1 Lab using SpaceX API|
|2020-09-20|1.0|Joseph |Modified Multiple Areas|
-->


Copyright © 2021 IBM Corporation. All rights reserved.
