# **SpaceX  Falcon 9: Landing Prediction**

# Exploring the data

In [1]:
# import libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
#Load the data
df = pd.read_csv('Launch_Data.csv')
df.head()

Unnamed: 0,FlightNumber,Date,Booster,PayloadMass,Orbit,Site,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1A,167.743129,9.047721
1,2,2007-03-21,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2A,167.743129,9.047721
2,4,2008-09-28,Falcon 1,165.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2C,167.743129,9.047721
3,5,2009-07-13,Falcon 1,200.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin3C,167.743129,9.047721
4,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857


In [3]:
# How many different Boosters are there
df['Booster'].value_counts()

Booster
Falcon 9    168
Falcon 1      4
Name: count, dtype: int64

Lets focus on Falcon 9 boosters since they make up the majority of the data frame. 

In [4]:
df=df[df['Booster']=='Falcon 9']
df['Booster'].value_counts()

Booster
Falcon 9    168
Name: count, dtype: int64

In [5]:
df['FlightNumber'].tail()

167    181
168    182
169    185
170    186
171    187
Name: FlightNumber, dtype: int64

In [6]:
# To maintain the number of flights and the the flight number consistency now that we've removed some flights, lets updae the flight number to match with the number of flights in the dataset
df.loc[:,'FlightNumber'] = list(range(1, df.shape[0]+1))
df['FlightNumber'].tail()

167    164
168    165
169    166
170    167
171    168
Name: FlightNumber, dtype: int64

## Data Wrangling

In [7]:
# Identify any Missing Values
df.isnull().sum()

FlightNumber     0
Date             0
Booster          0
PayloadMass     22
Orbit            1
Site             0
Outcome          0
Flights          0
GridFins         0
Reused           0
Legs             0
LandingPad      26
Block            0
ReusedCount      0
Serial           0
Longitude        0
Latitude         0
dtype: int64

The only null values are in the Orbit column, PayloadMass column and the LandingPad column. The nulls in the Landing Pad column makes sense becuase not all launches need or use landing pads during their landing. But the null Payload mass values will not be helpful for our numeric analysis. I can drop the 22 launches, but that would be a significant loss. I could also change the null Payload Mass values to 0, but that would skew the mass. I'll replace the null values with the mean, to maintain the data distribution. 

In [8]:
payload_mean=df['PayloadMass'].mean()
# Replace the np.nan values with its mean value
df['PayloadMass'] = df['PayloadMass'].replace(np.nan, payload_mean)
df.isnull().sum()

FlightNumber     0
Date             0
Booster          0
PayloadMass      0
Orbit            1
Site             0
Outcome          0
Flights          0
GridFins         0
Reused           0
Legs             0
LandingPad      26
Block            0
ReusedCount      0
Serial           0
Longitude        0
Latitude         0
dtype: int64

In [9]:
df[df['Orbit'].isna()]

Unnamed: 0,FlightNumber,Date,Booster,PayloadMass,Orbit,Site,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
99,96,2020-12-19,Falcon 9,8191.07911,,KSC LC 39A,True RTLS,5,True,True,True,5e9e3032383ecb267a34e7c7,5.0,5,B1059,-80.603956,28.608058


Looks like the null Orbit may have been an oversite as the outcome of the landing was successful. We will keep it in for now but may consider replacing it with an a mode value for the launches of the <code>KSC LC 39A</code> sight