# Space X Falcon 9 First Stage Landing Prediction
### **Data wrangling**
![Falcon 9](../images/falcon9.webp)
### Objective:
* Perform Exploratory Data Analysis (EDA)
* Determine Training Labels.
### Scenario:
* In the data set, there are several different cases where the booster did not land successfully. 
* Sometimes a landing was attempted but failed due to an accident.
* In this lab I will mainly convert those outcomes into Training Labels with **1** means the booster successfully landed **0** means it was unsuccessful.

![Space X](https://camo.githubusercontent.com/a0184ea6ee7174857c755b964345c82ec9556f17e74fd5f6a49b5937711fd60f/68747470733a2f2f63662d636f75727365732d646174612e73332e75732e636c6f75642d6f626a6563742d73746f726167652e617070646f6d61696e2e636c6f75642f49424d446576656c6f706572536b696c6c734e6574776f726b2d445330373031454e2d536b696c6c734e6574776f726b2f6170692f496d616765732f6c616e64696e675f312e676966 "Falcon 9 first stage will land successfully")



# Data Analysis

In [18]:
import pandas as pd
import numpy as np

In [19]:
df = pd.read_csv('../csv/dataset_part_1.csv')
df.head(10)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1.0,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1.0,0.0,0.0,0.0,,1.0,0.0,B0003,-80.577366,28.561857
1,2.0,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1.0,0.0,0.0,0.0,,1.0,0.0,B0005,-80.577366,28.561857
2,3.0,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1.0,0.0,0.0,0.0,,1.0,0.0,B0007,-80.577366,28.561857
3,4.0,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1.0,0.0,0.0,0.0,,1.0,0.0,B1003,-120.610829,34.632093
4,5.0,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1.0,0.0,0.0,0.0,,1.0,0.0,B1004,-80.577366,28.561857
5,6.0,2014-01-06,Falcon 9,3325.0,GTO,CCSFS SLC 40,None None,1.0,0.0,0.0,0.0,,1.0,0.0,B1005,-80.577366,28.561857
6,7.0,2014-04-18,Falcon 9,2296.0,ISS,CCSFS SLC 40,True Ocean,1.0,0.0,0.0,1.0,,1.0,0.0,B1006,-80.577366,28.561857
7,8.0,2014-07-14,Falcon 9,1316.0,LEO,CCSFS SLC 40,True Ocean,1.0,0.0,0.0,1.0,,1.0,0.0,B1007,-80.577366,28.561857
8,9.0,2014-08-05,Falcon 9,4535.0,GTO,CCSFS SLC 40,None None,1.0,0.0,0.0,0.0,,1.0,0.0,B1008,-80.577366,28.561857
9,10.0,2014-09-07,Falcon 9,4428.0,GTO,CCSFS SLC 40,None None,1.0,0.0,0.0,0.0,,1.0,0.0,B1011,-80.577366,28.561857


In [20]:
# Identifying and calculating the percentage of the missing values in each attribute
df.isnull().sum()/df.count()*100

FlightNumber       1.111111
Date               1.111111
BoosterVersion     1.111111
PayloadMass        0.000000
Orbit              1.111111
LaunchSite         1.111111
Outcome            1.111111
Flights            1.111111
GridFins           1.111111
Reused             1.111111
Legs               1.111111
LandingPad        42.187500
Block              1.111111
ReusedCount        1.111111
Serial             1.111111
Longitude          1.111111
Latitude           1.111111
dtype: float64

In [21]:
# Identifying numerical and categorical columns:
df.dtypes

FlightNumber      float64
Date               object
BoosterVersion     object
PayloadMass       float64
Orbit              object
LaunchSite         object
Outcome            object
Flights           float64
GridFins          float64
Reused            float64
Legs              float64
LandingPad         object
Block             float64
ReusedCount       float64
Serial             object
Longitude         float64
Latitude          float64
dtype: object

## 1. Number of launches on each site
#### Space X launch facilities: 
 * [Cape Canaveral Space](https://en.wikipedia.org/wiki/List_of_Cape_Canaveral_and_Merritt_Island_launch_sites?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01) Complex 40 **VAFB SLC 4E**,
 * Vandenberg Air Force Base Space Launch Complex 4E (**SLC-4E**), 
 * Kennedy Space Center Launch Complex 39A **KSC LC 39A**.

In [22]:
# determining the number of launches on each site:
df['LaunchSite'].value_counts()    # The column LaunchSite places the location of each Launch

LaunchSite
CCSFS SLC 40    55
KSC LC 39A      22
VAFB SLC 4E     13
Name: count, dtype: int64

💡 **Note:** Each launch aims to an dedicated orbit, and here are some common orbit types:
* LEO
* VLEO
* GTO
* SSO (or SO)
* ES-L1
* HEO
* ISS
* MEO
* HEO
* GEO
* PO

some are shown in the following plot:

![Orbit](../images/orbit.png)

### 2. Number and occurrence of each orbit

In [23]:
# determining the number and occurrence of each orbit in the column Orbit
df['Orbit'].value_counts()

Orbit
GTO      27
ISS      21
VLEO     14
PO        9
LEO       7
SSO       5
MEO       3
ES-L1     1
HEO       1
SO        1
GEO       1
Name: count, dtype: int64

### 3. Number and occurence of mission outcome per orbit type
* number of landing_outcomes

In [24]:
landing_outcomes = df['Outcome'].value_counts()

df['Outcome'].value_counts()

Outcome
True ASDS      41
None None      19
True RTLS      14
False ASDS      6
True Ocean      5
False Ocean     2
None ASDS       2
False RTLS      1
Name: count, dtype: int64

💡
**True Ocean** means the mission outcome was successfully landed to a specific region of the ocean while **False Ocean** means the mission outcome was unsuccessfully landed to a specific region of the ocean. **True RTLS** means the mission outcome was successfully landed to a ground pad **False RTLS** means the mission outcome was unsuccessfully landed to a ground pad.**True ASDS** means the mission outcome was successfully landed to a drone ship **False ASDS** means the mission outcome was unsuccessfully landed to a drone ship. **None ASDS** and **None None** these represent a failure to land.

In [25]:
for i,outcome in enumerate(landing_outcomes.keys()):
    print(i,outcome)

0 True ASDS
1 None None
2 True RTLS
3 False ASDS
4 True Ocean
5 False Ocean
6 None ASDS
7 False RTLS


creating a set of outcomes where the second stage did not land successfully:

In [26]:
bad_outcomes=set(landing_outcomes.keys()[[1,3,5,6,7]])
bad_outcomes

{'False ASDS', 'False Ocean', 'False RTLS', 'None ASDS', 'None None'}

## 4. Landing outcome label from Outcome column
* landing_class = 0 if bad_outcome
* landing_class = 1 otherwise

In [27]:
landing_class = df['Outcome'].map(lambda x: 0 if x in bad_outcomes else 1)

In [28]:
df['Class']=landing_class
df[['Class']].head(8)

Unnamed: 0,Class
0,0
1,0
2,0
3,0
4,0
5,0
6,1
7,1


In [29]:
df.head(8)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude,Class
0,1.0,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1.0,0.0,0.0,0.0,,1.0,0.0,B0003,-80.577366,28.561857,0
1,2.0,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1.0,0.0,0.0,0.0,,1.0,0.0,B0005,-80.577366,28.561857,0
2,3.0,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1.0,0.0,0.0,0.0,,1.0,0.0,B0007,-80.577366,28.561857,0
3,4.0,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1.0,0.0,0.0,0.0,,1.0,0.0,B1003,-120.610829,34.632093,0
4,5.0,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1.0,0.0,0.0,0.0,,1.0,0.0,B1004,-80.577366,28.561857,0
5,6.0,2014-01-06,Falcon 9,3325.0,GTO,CCSFS SLC 40,None None,1.0,0.0,0.0,0.0,,1.0,0.0,B1005,-80.577366,28.561857,0
6,7.0,2014-04-18,Falcon 9,2296.0,ISS,CCSFS SLC 40,True Ocean,1.0,0.0,0.0,1.0,,1.0,0.0,B1006,-80.577366,28.561857,1
7,8.0,2014-07-14,Falcon 9,1316.0,LEO,CCSFS SLC 40,True Ocean,1.0,0.0,0.0,1.0,,1.0,0.0,B1007,-80.577366,28.561857,1


### 5. Success Rate

In [30]:
df["Class"].mean()

0.6703296703296703

In [31]:
df.to_csv("dataset_part_2.csv", index=False)

# Author 
[Helena Pedro](https://www.linkedin.com/in/helena-mbeua-pedro/) is a Data Scientist at Millennium Atlantic Bank in Angola. She is a Creative big thinker passionated about using data and optimization tools to direct decision making and solve complex and large-scale challenges.
- **Email:** mbeua94@gmail.com


© 2024 