# **Space X  Falcon 9 First Stage Landing Prediction**


## Data wrangling


In this lab, some Exploratory Data Analysis (EDA) will be performed in order to identify patterns within the data and determine the appropriate label for training supervised models.

Within the data set, several instances in which the booster did not land successfully are observed. In some cases, a landing was attempted but ultimately failed due to an accident. For example, the indicator <code>True Ocean</code> signifies that the mission outcome involved a successful landing in a specific region of the ocean, whereas <code>False Ocean</code> signifies an unsuccessful landing in the same region. Similarly, <code>True RTLS</code> indicates that the mission outcome involved a successful landing on a ground pad, while <code>False RTLS</code> indicates an unsuccessful landing on a ground pad. Likewise, <code>True ASDS</code> indicates that the mission outcome involved a successful landing on a drone ship, whereas <code>False ASDS</code> indicates an unsuccessful landing on a drone ship.

In this lab, these outcomes are primarily converted into training labels, where a value of `1` signifies that the booster landed successfully, and a value of `0` signifies that it did not.

The Falcon 9 first stage is expected to land successfully.


![](./images/landing_1.gif)



Several examples of an unsuccessful landing are shown here:


![](./images/crash.gif)

## Objectives

Exploratory data analysis was carried out, and training labels were determined.

* Exploratory data analysis was conducted.
* Training labels were established.



***


In [1]:
import pandas as pd
import numpy as np
import os
import space_x
import pickle

from space_x.config import INTERIM_DATA_DIR

### Data Analysis


Space X dataset, is downloade from previous sections.


In [2]:
pickle_file = os.path.join(INTERIM_DATA_DIR, 'dataset_part_1.pkl')
with open(pickle_file, 'rb') as file:
    df = pickle.load(file)
#print(len(df))
display(df.head())
df.info()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857


<class 'pandas.core.frame.DataFrame'>
Index: 90 entries, 4 to 93
Data columns (total 17 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   FlightNumber    90 non-null     int64  
 1   Date            90 non-null     object 
 2   BoosterVersion  90 non-null     object 
 3   PayloadMass     90 non-null     float64
 4   Orbit           90 non-null     object 
 5   LaunchSite      90 non-null     object 
 6   Outcome         90 non-null     object 
 7   Flights         90 non-null     int64  
 8   GridFins        90 non-null     bool   
 9   Reused          90 non-null     bool   
 10  Legs            90 non-null     bool   
 11  LandingPad      64 non-null     object 
 12  Block           90 non-null     float64
 13  ReusedCount     90 non-null     int64  
 14  Serial          90 non-null     object 
 15  Longitude       90 non-null     float64
 16  Latitude        90 non-null     float64
dtypes: bool(3), float64(4), int64(3), object(7

The percentage of missing values in each attribute was identified and calculated.


In [3]:
df.isnull().sum()/df.count()*100

FlightNumber       0.000
Date               0.000
BoosterVersion     0.000
PayloadMass        0.000
Orbit              0.000
LaunchSite         0.000
Outcome            0.000
Flights            0.000
GridFins           0.000
Reused             0.000
Legs               0.000
LandingPad        40.625
Block              0.000
ReusedCount        0.000
Serial             0.000
Longitude          0.000
Latitude           0.000
dtype: float64

Identify which columns are numerical and categorical:


In [4]:
df.dtypes

FlightNumber        int64
Date               object
BoosterVersion     object
PayloadMass       float64
Orbit              object
LaunchSite         object
Outcome            object
Flights             int64
GridFins             bool
Reused               bool
Legs                 bool
LandingPad         object
Block             float64
ReusedCount         int64
Serial             object
Longitude         float64
Latitude          float64
dtype: object

### TASK 1: Calculation of the number of launches on each site

The dataset comprises several SpaceX launch facilities, including <a href='https://en.wikipedia.org/wiki/List_of_Cape_Canaveral_and_Merritt_Island_launch_sites?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01'>Cape Canaveral Space</a>, Launch Complex 40  <b>VAFB SLC 4E </b> , Vandenberg Air Force Base Space Launch Complex 4E <b>(SLC-4E)</b>, Kennedy Space Center Launch Complex 39A <b>KSC LC 39A </b>. The location of each launch is recorded in the `LaunchSite` column.


Subsequently, the number of launches for each site shall be determined.

The method <code>value_counts()</code> is to be utilised on the <code>LaunchSite</code> column, thereby ascertaining the number of launches per site:



In [5]:
# Apply value_counts() on column LaunchSite
df['LaunchSite'].value_counts()

LaunchSite
CCSFS SLC 40    55
KSC LC 39A      22
VAFB SLC 4E     13
Name: count, dtype: int64

Each launch aims to an dedicated orbit, and here are some common orbit types:


*   <b>LEO</b>: Low Earth orbit (LEO)is an Earth-centred orbit with an altitude of 2,000 km (1,200 mi) or less (approximately one-third of the radius of Earth),\[1] or with at least 11.25 periods per day (an orbital period of 128 minutes or less) and an eccentricity less than 0.25.\[2] Most of the manmade objects in outer space are in LEO <a href='https://en.wikipedia.org/wiki/Low_Earth_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01'>\[1]</a>.

*   <b>VLEO</b>: Very Low Earth Orbits (VLEO) can be defined as the orbits with a mean altitude below 450 km. Operating in these orbits can provide a number of benefits to Earth observation spacecraft as the spacecraft operates closer to the observation<a href='https://www.researchgate.net/publication/271499606_Very_Low_Earth_Orbit_mission_concepts_for_Earth_Observation_Benefits_and_challenges?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01'>\[2]</a>.

*   <b>GTO</b> A geosynchronous orbit is a high Earth orbit that allows satellites to match Earth's rotation. Located at 22,236 miles (35,786 kilometers) above Earth's equator, this position is a valuable spot for monitoring weather, communications and surveillance. Because the satellite orbits at the same speed that the Earth is turning, the satellite seems to stay in place over a single longitude, though it may drift north to south,” NASA wrote on its Earth Observatory website <a  href="https://www.space.com/29222-geosynchronous-orbit.html?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01" >\[3] </a>.

*   <b>SSO (or SO)</b>: It is a Sun-synchronous orbit  also called a heliosynchronous orbit is a nearly polar orbit around a planet, in which the satellite passes over any given point of the planet's surface at the same local mean solar time <a href="https://en.wikipedia.org/wiki/Sun-synchronous_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01">\[4] <a>.

*   <b>ES-L1 </b>:At the Lagrange points the gravitational forces of the two large bodies cancel out in such a way that a small object placed in orbit there is in equilibrium relative to the center of mass of the large bodies. L1 is one such point between the sun and the earth <a href="https://en.wikipedia.org/wiki/Lagrange_point?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01#L1_point">\[5]</a> .

*   <b>HEO</b> A highly elliptical orbit, is an elliptic orbit with high eccentricity, usually referring to one around Earth <a href="https://en.wikipedia.org/wiki/Highly_elliptical_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01">\[6]</a>.

*   <b> ISS </b> A modular space station (habitable artificial satellite) in low Earth orbit. It is a multinational collaborative project between five participating space agencies: NASA (United States), Roscosmos (Russia), JAXA (Japan), ESA (Europe), and CSA (Canada)<a href="https://en.wikipedia.org/wiki/International_Space_Station?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01"> \[7] </a>

*   <b> MEO </b> Geocentric orbits ranging in altitude from 2,000 km (1,200 mi) to just below geosynchronous orbit at 35,786 kilometers (22,236 mi). Also known as an intermediate circular orbit. These are "most commonly at 20,200 kilometers (12,600 mi), or 20,650 kilometers (12,830 mi), with an orbital period of 12 hours <a href="https://en.wikipedia.org/wiki/List_of_orbits?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01"> \[8] </a>

*   <b> HEO </b> Geocentric orbits above the altitude of geosynchronous orbit (35,786 km or 22,236 mi) <a href="https://en.wikipedia.org/wiki/List_of_orbits?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01"> \[9] </a>

*   <b> GEO </b> It is a circular geosynchronous orbit 35,786 kilometres (22,236 miles) above Earth's equator and following the direction of Earth's rotation <a href="https://en.wikipedia.org/wiki/Geostationary_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01"> \[10] </a>

*   <b> PO </b> It is one type of satellites in which a satellite passes above or nearly above both poles of the body being orbited (usually a planet such as the Earth <a href="https://en.wikipedia.org/wiki/Polar_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01"> \[11] </a>

some are shown in the following plot:


![](./images/Orbits.png)


### TASK 2: Calculation of the number and occurrence of each orbit


The method  <code>.value_counts()</code> was used to determine the number and occurrence of each orbit in the  column <code>Orbit</code>.


In [6]:
# Apply value_counts on Orbit column
df['Orbit'].value_counts()

Orbit
GTO      27
ISS      21
VLEO     14
PO        9
LEO       7
SSO       5
MEO       3
HEO       1
ES-L1     1
SO        1
GEO       1
Name: count, dtype: int64

### TASK 3: Calculation of the number and occurence of mission outcome per orbit type


Occurence can be calculated per orbit type in the following:

In [7]:
df.value_counts(['Orbit','Outcome'],sort=False)

Orbit  Outcome    
ES-L1  True Ocean      1
GEO    True ASDS       1
GTO    False ASDS      1
       None ASDS       1
       None None      11
       True ASDS      13
       True Ocean      1
HEO    True ASDS       1
ISS    False ASDS      2
       False Ocean     1
       False RTLS      1
       None ASDS       1
       None None       3
       True ASDS       5
       True Ocean      1
       True RTLS       7
LEO    None None       2
       True Ocean      1
       True RTLS       4
MEO    None None       1
       True ASDS       2
PO     False ASDS      1
       False Ocean     1
       None None       1
       True ASDS       5
       True Ocean      1
SO     None None       1
SSO    True ASDS       2
       True RTLS       3
VLEO   False ASDS      2
       True ASDS      12
Name: count, dtype: int64

The method <code>.value_counts()</code> will be applied to the <code>Outcome</code> column to determine the number of <code>landing_outcomes</code>, and the result will then be assigned to the variable <code>landing_outcomes</code>.

In [8]:
# landing_outcomes = values on Outcome column
landing_outcomes = df['Outcome'].value_counts()
landing_outcomes

Outcome
True ASDS      41
None None      19
True RTLS      14
False ASDS      6
True Ocean      5
False Ocean     2
None ASDS       2
False RTLS      1
Name: count, dtype: int64

The following bullet points have been prepared to clarify the designations utilised in representing the outcomes of mission landings:

- **`True Ocean`** represents a mission outcome in which a successful landing was achieved in a designated region of the ocean.
- **`False Ocean`** represents a mission outcome in which the landing was unsuccessful.
- **`True RTLS`** represents a mission outcome in which a successful landing was accomplished on a ground pad.
- **`False RTLS`** represents a mission outcome in which the landing was unsuccessful.
- **`True ASDS`** represents a mission outcome in which a successful landing was accomplished on a drone ship.
- **`False ASDS`** represents a mission outcome in which the landing was unsuccessful.
- **`None ASDS`** and **`None None`** are utilised to denote a failure to land.



In [9]:
for i,outcome in enumerate(landing_outcomes.keys()):
    print(i,outcome)

0 True ASDS
1 None None
2 True RTLS
3 False ASDS
4 True Ocean
5 False Ocean
6 None ASDS
7 False RTLS


A set of outcomes where the second stage did not land successfully will be created:


In [10]:
bad_outcomes=set(landing_outcomes.keys()[[1,3,5,6,7]])
bad_outcomes

{'False ASDS', 'False Ocean', 'False RTLS', 'None ASDS', 'None None'}

### TASK 4: Creation of a landing outcome label from Outcome column


Using the <code>Outcome</code>,  a list will be created where the element is zero if the corresponding  row  in  <code>Outcome</code> is in the set <code>bad_outcome</code> ; otherwise, would be one. Subsequently, it will be assigned to the variable <code>landing_class</code>:


In [11]:
landing_class=[]
length = len(df['Outcome'])
for i in range(length):
    if df['Outcome'].iloc[i] in bad_outcomes:
        landing_class.append(0)
    else:
        landing_class.append(1)


This variable is utilised as the classification variable representing the outcome of each launch. A value of zero, indicates the  first stage did **not land successfully**; whereas a value of one means  the first stage landed **Successfully**.

In [12]:
df['Class']=landing_class

In [13]:
df.sample(5)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude,Class
13,10,2014-09-07,Falcon 9,4428.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1011,-80.577366,28.561857,0
61,58,2018-11-15,Falcon 9,3000.0,GTO,KSC LC 39A,True ASDS,2,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,2,B1047,-80.603956,28.608058,1
52,49,2018-04-18,Falcon 9,350.0,HEO,CCSFS SLC 40,True ASDS,1,True,False,True,5e9e3032383ecb6bb234e7ca,4.0,1,B1045,-80.577366,28.561857,1
73,70,2019-12-05,Falcon 9,5000.0,ISS,CCSFS SLC 40,True ASDS,1,True,False,True,5e9e3032383ecb6bb234e7ca,5.0,5,B1059,-80.577366,28.561857,1
56,53,2018-07-22,Falcon 9,7076.0,GTO,CCSFS SLC 40,True ASDS,1,True,False,True,5e9e3032383ecb6bb234e7ca,5.0,2,B1047,-80.577366,28.561857,1


The following line of code is to determine  the success rate:


In [14]:
float(df["Class"].mean())

0.6666666666666666

The number and occurence of mission outcome per orbit type can be calculated in the following:

In [15]:
df.value_counts(['Orbit','Class'],sort=False)

Orbit  Class
ES-L1  1         1
GEO    1         1
GTO    0        13
       1        14
HEO    1         1
ISS    0         8
       1        13
LEO    0         2
       1         5
MEO    0         1
       1         2
PO     0         3
       1         6
SO     0         1
SSO    1         5
VLEO   0         2
       1        12
Name: count, dtype: int64

The dataset shall be exported to **CSV** and **Pickle** files for the subsequent sections.



In [16]:
# CSV file import
csv_file = os.path.join(INTERIM_DATA_DIR, "dataset_part_2.csv")
df.to_csv(csv_file, index=False)

# Pickle file import
pickle_file= os.path.join(INTERIM_DATA_DIR, 'dataset_part_2.pkl')
df.to_pickle(pickle_file)

Copyright © 2021 IBM Corporation. All rights reserved.
