<a href="https://colab.research.google.com/github/shuv50/IBM_Data_Science/blob/main/SpaceX_API_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

In this capstone, we will predict if the Falcon 9 first stage will land successfully.

SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage.

Therefore if we can determine if the first stage will land, we can determine the cost of a launch.
<br>This information can be used if an alternate company wants to bid against SpaceX for a rocket launch.

Here, we will collect and make sure the data is in the correct format from an API.

In [None]:
from IPython.display import display, HTML

# GIF URLs
gif_url_1 = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/lab_v2/images/landing_1.gif"
gif_url_2 = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/lab_v2/images/crash.gif"

# HTML to display GIFs side by side
html_code = f"""
<div style="display: flex; justify-content: center; gap: 20px; margin-top: 20px;">
    <img src="{gif_url_1}" alt="GIF 1" style="max-width: 100%; height: auto; border-radius: 10px;">
    <img src="{gif_url_2}" alt="GIF 2" style="max-width: 100%; height: auto; border-radius: 10px;">
</div>
"""

# Display the HTML
display(HTML(html_code))


# Objectives

*   Request to the SpaceX API
*   Clean the requested data

# Import Libraries

In [None]:
# Import libraries
import requests
import pandas as pd
import numpy as np
import datetime

# Dataframe settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

#  Define Auxiliary Functions
Helper functions that will help us use the API to extract information using identification numbers in the launch data.

## Rocket data

In [None]:
# Use rocket column from dataset to call the API and append the data to the list
def getBoosterVersion(data):
    for x in data['rocket']:
       if x:
        response = requests.get("https://api.spacexdata.com/v4/rockets/"+str(x)).json()
        BoosterVersion.append(response['name'])

## Launchpad Data

In [None]:
# Use launchpad column from dataset to call the API and append the data to the list
def getLaunchSite(data):
    for x in data['launchpad']:
       if x:
         response = requests.get("https://api.spacexdata.com/v4/launchpads/"+str(x)).json()
         Longitude.append(response['longitude'])
         Latitude.append(response['latitude'])
         LaunchSite.append(response['name'])

## Payload Data

In [None]:
# Use payloads column from dataset to call the API and append the data to the list
def getPayloadData(data):
    for load in data['payloads']:
       if load:
        response = requests.get("https://api.spacexdata.com/v4/payloads/"+load).json()
        PayloadMass.append(response['mass_kg'])
        Orbit.append(response['orbit'])

## Core Data

From cores we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, wheter the core is reused, wheter legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.

In [None]:
# Use core column from dataset to call the API and append the data to the list
def getCoreData(data):
    for core in data['cores']:
            if core['core'] != None:
                response = requests.get("https://api.spacexdata.com/v4/cores/"+core['core']).json()
                Block.append(response['block'])
                ReusedCount.append(response['reuse_count'])
                Serial.append(response['serial'])
            else:
                Block.append(None)
                ReusedCount.append(None)
                Serial.append(None)
            Outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
            Flights.append(core['flight'])
            GridFins.append(core['gridfins'])
            Reused.append(core['reused'])
            Legs.append(core['legs'])
            LandingPad.append(core['landpad'])

# Collect Data

Now let's request rocket launch data from SpaceX API with the following URL

The response contains massive information about SpaceX launches. <br>Next, let's try to discover some more relevant information for this project.

In [None]:
# Full data
spacex_url = "https://api.spacexdata.com/v4/launches/past"
response = requests.get(spacex_url)

# Check the content of the response
# print(response.content)

## Relevant SpaceX launch data

To make the requested JSON results more consistent, we will use the following static response object for this project:
<br>We should see that the request was successfull with the 200 status response code

In [None]:
# Relevant data
static_json_url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/API_call_spacex_api.json'
response = requests.get(static_json_url)
response.status_code

200

## Decode/Normalize response
Now we decode the response content as a Json using .json() and turn it into a Pandas dataframe using .json_normalize()

In [None]:
# Save JSON data to dataframe
data = pd.json_normalize(response.json())
data.head(1)

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,tbd,net,window,rocket,success,details,crew,ships,capsules,payloads,launchpad,auto_update,failures,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,False,0.0,5e9d0d95eda69955f709d1eb,False,Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,True,"[{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}]",1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/3c/0e/T8iJcSN3_o.png,https://images2.imgbox.com/40/e3/GypSkayF_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html,https://en.wikipedia.org/wiki/DemoSat,


# Feature Selection

You will notice that a lot of the data are IDs.
<br>For example the rocket column has no information about the rocket just an identification number.

We will now use the API again to get information about the launches using the IDs given for each launch.
<br>Specifically we will be using columns rocket, payloads, launchpad, and cores

In [None]:
# Select required columns
data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]
data.head(1)

Unnamed: 0,rocket,payloads,launchpad,cores,flight_number,date_utc
0,5e9d0d95eda69955f709d1eb,[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",1,2006-03-24T22:30:00.000Z


# Data Pre-processing

## Filter data

We will remove rows with multiple cores because those are falcon rockets with 2 extra rocket boosters and rows that have multiple payloads in a single rocket.

In [None]:
# Filter cores and payloads
data = data[data['cores'].map(len)==1]
data = data[data['payloads'].map(len)==1]

## Data mapping

Since payloads and cores are lists of size 1 we will also extract the single value in the list and replace the feature.

In [None]:
# Extract and replace single list value
data['cores'] = data['cores'].map(lambda x : x[0])
data['payloads'] = data['payloads'].map(lambda x : x[0])

## Data conversion

We also want to convert the date_utc to a datetime datatype and then extracting the date leaving the time.
<br>Using the date we will restrict the dates of the launches

In [None]:
# Convert data to UTC
data['date'] = pd.to_datetime(data['date_utc']).dt.date
data = data[data['date'] <= datetime.date(2020, 11, 13)]

## Segregrate data

From the rocket we would like to learn the booster name

From the payload we would like to learn the mass of the payload and the orbit that it is going to

From the launchpad we would like to know the name of the launch site being used, the longitude, and the latitude.

From cores we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.

### Global variables

The data from these requests will be stored in lists and will be used to create a new dataframe.

In [None]:
#Global variables
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

### Save required data

In [None]:
# Save required data in respective lists
getBoosterVersion(data)
getLaunchSite(data)
getPayloadData(data)
getCoreData(data)

### Reconstruct dataframe

Finally lets construct our dataset using the data we have obtained.
<br>We we combine the columns into a dictionary.


Then, we need to create a Pandas data frame from the dictionary launch_dict.

In [None]:
# Create data structure dictionary
launch_dict = {
    'FlightNumber': list(data['flight_number']),
    'Date': list(data['date']),
    'BoosterVersion':BoosterVersion,
    'PayloadMass':PayloadMass,
    'Orbit':Orbit,
    'LaunchSite':LaunchSite,
    'Outcome':Outcome,
    'Flights':Flights,
    'GridFins':GridFins,
    'Reused':Reused,
    'Legs':Legs,
    'LandingPad':LandingPad,
    'Block':Block,
    'ReusedCount':ReusedCount,
    'Serial':Serial,
    'Longitude': Longitude,
    'Latitude': Latitude
}

# Create final dataframe from dictionary
data = pd.DataFrame(launch_dict)
data.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1A,167.743129,9.047721
1,2,2007-03-21,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2A,167.743129,9.047721
2,4,2008-09-28,Falcon 1,165.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2C,167.743129,9.047721
3,5,2009-07-13,Falcon 1,200.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin3C,167.743129,9.047721
4,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857


## Filter data

Finally we will remove the Falcon 1 launches keeping only the Falcon 9 launches.

<br>Filter the data dataframe using the BoosterVersion column to only keep the Falcon 9 launches.
<br>Save the filtered data to a new dataframe called data_falcon9.

In [None]:
# Get Falcon-9 data only
data_falcon9 = data[data['BoosterVersion'] != 'Falcon 1']
data_falcon9.head(2)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,8,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857


## Reset column

Now that we have removed some values we should reset the FlgihtNumber column

In [None]:
# Reset 'Flight Number'
data_falcon9.loc[:,'FlightNumber'] = list(range(1, data_falcon9.shape[0]+1))
data_falcon9.head(2)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857


## Clean Data

### Missing values

We can see below that some of the rows are missing values in our dataset.

Before we can continue we must deal with these missing values.
<br>The LandingPad column will retain None values to represent when landing pads were not used.

Calculate below the mean for the PayloadMass using the .mean().
<br>Then use the mean and the .replace() function to replace np.nan values in the data with the mean you calculated.

Number of missing values of the PayLoadMass change to zero.
<br>Now we should have no missing values in our dataset except for in LandingPad.

In [None]:
# Check missing values
data_falcon9.isnull().sum()

Unnamed: 0,0
FlightNumber,0
Date,0
BoosterVersion,0
PayloadMass,5
Orbit,0
LaunchSite,0
Outcome,0
Flights,0
GridFins,0
Reused,0


In [None]:
# Calculate the mean PayloadMass
mean_value = data_falcon9['PayloadMass'].mean()
# Replace null values
data_falcon9['PayloadMass'].fillna(mean_value, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data_falcon9['PayloadMass'].fillna(mean_value, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_falcon9['PayloadMass'].fillna(mean_value, inplace=True)


# Export API Data

We can now export transformed and cleaned data to a CSV file

In [None]:
# Falcon9 data
data_falcon9.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857


In [None]:
# Save data as CSV file
data_falcon9.to_csv('data_falcon9.csv', index=False)