# **SpaceX  Falcon 9 first stage Landing Prediction**


## Data Collection API


**Context:** 
In this capstone, it will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars, while other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore if we can determine if the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch. 

**Objects:** 
- Get Request to the SpaceX API to collect df
- Clean the requested df


### Import Libraries, Global Variables, Define Auxiliary Functions

In [1]:
import pandas as pd
import numpy as np
import requests
import datetime

# Setting this option will print all collumns of a dfframe
pd.set_option('display.max_columns', None)
# Setting this option will print all of the df in a feature
pd.set_option('display.max_colwidth', None)

In [2]:
#Global variables
# These functions will apply the outputs globally to the above variables.

BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

In [8]:
# AUXILIARY FUNCTIONS
# Takes the dfset and uses the column to call the API and append the df to the list

def getBoosterVersion(data):
    """
     From the rocket column we would like to learn the booster name.
       """
    
    for x in data['rocket']:
       if x:
        response = requests.get("https://api.spacexdata.com/v4/rockets/"+str(x)).json()
        BoosterVersion.append(response['name'])


def getLaunchSite(data):
    """
        From the launchpad we would like to know the name of the launch site being used, the logitude, and the latitude.
        """
    for x in data['launchpad']:
       if x:
         response = requests.get("https://api.spacexdata.com/v4/launchpads/"+str(x)).json()
         Longitude.append(response['longitude'])
         Latitude.append(response['latitude'])
         LaunchSite.append(response['name'])


def getPayloadData(data):
    """
    From the payload we would like to learn the mass of the payload and the orbit that it is going to.
    """
    for load in data['payloads']:
       if load:
        response = requests.get("https://api.spacexdata.com/v4/payloads/"+load).json()
        PayloadMass.append(response['mass_kg'])
        Orbit.append(response['orbit'])


def getCoreData(data):
    """
    From the cores we would like to learn the outcome of the landing, the type of the landing, 
    number of flights with that core, whether gridfins were used, wheter the core is reused, wheter legs were used, 
    the landing pad used, the block of the core which is a number used to seperate version of cores, 
    the number of times this specific core has been reused, and the serial of the core.
    """
    for core in data['cores']:
            if core['core'] != None:
                response = requests.get("https://api.spacexdata.com/v4/cores/"+core['core']).json()
                Block.append(response['block'])
                ReusedCount.append(response['reuse_count'])
                Serial.append(response['serial'])
            else:
                Block.append(None)
                ReusedCount.append(None)
                Serial.append(None)
                
            Outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
            Flights.append(core['flight'])
            GridFins.append(core['gridfins'])
            Reused.append(core['reused'])
            Legs.append(core['legs'])
            LandingPad.append(core['landpad'])


###  1. Request and parse the SpaceX launch data using the GET request

In [9]:
# SPACE X API
url="https://api.spacexdata.com/v4/launches/past"

# Get Request to the SpaceX API to collect data
try:
    response = requests.get(url)
    print("Successfully retrieved data from SpaceX API. Status Code: ", response.status_code)
except Exception as e:
    print("Error:", response.status_code, e)

Successfully retrieved data from SpaceX API. Status Code:  200


In [10]:
# Use json_normalize meethod to convert the json result into a dataframe
df = pd.json_normalize(response.json())

# Print the total number of rows and columns
print(f"Total Rows: {df.shape[0]}  |    Total Columns: {df.shape[1]}")

# Print the first 5 rows of the dataframe
#df.head()

Total Rows: 187  |    Total Columns: 43


#### 2. Get information about `rocket`, `payloads`, `launchpad`, `cores`, `flight_number`, `data_utc` by ID's 
- We will now use the API again to get information about the launches using the IDs given for each launch.

In [11]:
# Subset the columns
df = df[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

# We will remove rows with multiple cores because those are falcon rockets with 2 extra rocket boosters and rows that have multiple payloads in a single rocket.
df = df[df['cores'].map(len)==1]
df = df[df['payloads'].map(len)==1]

# Since payloads and cores are lists of size 1 we will also extract the single value in the list and replace the feature.
df['cores'] = df['cores'].map(lambda x : x[0])
df['payloads'] = df['payloads'].map(lambda x : x[0])

# We also want to convert the date_utc to a datetime datatype and then extracting the date leaving the time
df['date'] = pd.to_datetime(df['date_utc']).dt.date

# Using the date we will restrict the dates of the launches
df = df[df['date'] <= datetime.date(2020, 11, 13)]

print(f"Total Rows: {df.shape[0]}  |    Total Columns: {df.shape[1]}")
df.head()

Total Rows: 94  |    Total Columns: 7


Unnamed: 0,rocket,payloads,launchpad,cores,flight_number,date_utc,date
0,5e9d0d95eda69955f709d1eb,5eb0e4b5b6c3bb0006eeb1e1,5e9e4502f5090995de566f86,"{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",1,2006-03-24T22:30:00.000Z,2006-03-24
1,5e9d0d95eda69955f709d1eb,5eb0e4b6b6c3bb0006eeb1e2,5e9e4502f5090995de566f86,"{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",2,2007-03-21T01:10:00.000Z,2007-03-21
3,5e9d0d95eda69955f709d1eb,5eb0e4b7b6c3bb0006eeb1e5,5e9e4502f5090995de566f86,"{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",4,2008-09-28T23:15:00.000Z,2008-09-28
4,5e9d0d95eda69955f709d1eb,5eb0e4b7b6c3bb0006eeb1e6,5e9e4502f5090995de566f86,"{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",5,2009-07-13T03:35:00.000Z,2009-07-13
5,5e9d0d95eda69973a809d1ec,5eb0e4b7b6c3bb0006eeb1e7,5e9e4501f509094ba4566f84,"{'core': '5e9e289ef359185f2b3b2628', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}",6,2010-06-04T18:45:00.000Z,2010-06-04


**Note:**
* From the <code>rocket</code> we would like to learn the booster name

* From the <code>payload</code> we would like to learn the mass of the payload and the orbit that it is going to

* From the <code>launchpad</code> we would like to know the name of the launch site being used, the longitude, and the latitude.

* **From <code>cores</code> we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.**

The data from these requests will be stored in lists and will be used to create a new dataframe.

In [12]:
# Calling the Functions to extract data from API

# To get the booster name
getBoosterVersion(df)
BoosterVersion[0:5]

['Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 9']

In [13]:
# To get the launch site name, longitude, and latitude
getLaunchSite(df)

In [14]:
# To get the payload mass and orbit
getPayloadData(df)

In [15]:
# To get the outcome of the landing, the type of the landing, number of flights with that core, 
# whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, 
# the block of the core which is a number used to seperate version of cores, the number of times 
# this specific core has been reused, and the serial of the core.

getCoreData(df)

In [16]:
# Construct the dataset using the data we have obtained. Combine the columns into a dictionary.
launch_dict = {'FlightNumber': list(df['flight_number']),
'Date': list(df['date']),
'BoosterVersion':BoosterVersion,
'PayloadMass':PayloadMass,
'Orbit':Orbit,
'LaunchSite':LaunchSite,
'Outcome':Outcome,
'Flights':Flights,
'GridFins':GridFins,
'Reused':Reused,
'Legs':Legs,
'LandingPad':LandingPad,
'Block':Block,
'ReusedCount':ReusedCount,
'Serial':Serial,
'Longitude': Longitude,
'Latitude': Latitude}

# Create a data from launch_dict
df_launch = pd.DataFrame(launch_dict)

# Shape of the dataframe
print(f"Total Rows: {df_launch.shape[0]}  |    Total Columns: {df_launch.shape[1]}")

# Show the head of the dataframe
df_launch.head()

### 2. Subset the dataframe to only include `Falcon 9` launches
Keeping only the Falcon 9 launches. Filter the data dataframe using the <code>BoosterVersion</code> column to only keep the Falcon 9 launches and save the filtered data to a new dataframe called <code>df_falcon9</code>.


In [18]:
### 2. Subset of `Falcon 9` launches
df_falcon9 = df_launch[df_launch['BoosterVersion']!='Falcon 9']

# Reset the FlightNumber column
df_falcon9.loc[:,'FlightNumber'] = list(range(1, df_falcon9.shape[0]+1))
df_falcon9

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1A,167.743129,9.047721
1,2,2007-03-21,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2A,167.743129,9.047721
2,3,2008-09-28,Falcon 1,165.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2C,167.743129,9.047721
3,4,2009-07-13,Falcon 1,200.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin3C,167.743129,9.047721


### 3. Data Wrangling


#### Missing Values

In [19]:
# Check the missing values
df_falcon9.isnull().sum()

FlightNumber      0
Date              0
BoosterVersion    0
PayloadMass       1
Orbit             0
LaunchSite        0
Outcome           0
Flights           0
GridFins          0
Reused            0
Legs              0
LandingPad        4
Block             4
ReusedCount       0
Serial            0
Longitude         0
Latitude          0
dtype: int64

Before we can continue we must deal with these missing values. The <code>LandingPad</code> column will retain None values to represent when landing pads were not used.


In [20]:
# Replace the missing values
df_falcon9['PayloadMass'].replace(np.nan, df_falcon9['PayloadMass'].mean(), inplace=True)

df_falcon9.isnull().sum()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_falcon9['PayloadMass'].replace(np.nan, df_falcon9['PayloadMass'].mean(), inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_falcon9['PayloadMass'].replace(np.nan, df_falcon9['PayloadMass'].mean(), inplace=True)


FlightNumber      0
Date              0
BoosterVersion    0
PayloadMass       0
Orbit             0
LaunchSite        0
Outcome           0
Flights           0
GridFins          0
Reused            0
Legs              0
LandingPad        4
Block             4
ReusedCount       0
Serial            0
Longitude         0
Latitude          0
dtype: int64

Now we should have no missing values in our dataset except for in <code>LandingPad</code>.


In [21]:
### 4. Save the dataset

df_falcon9.to_csv('../data/processed/df_part_1.csv', index=False)