In [1]:
import pandas as pd
import numpy as np

In the data set, there are several different cases where the booster did not land successfully. Sometimes a landing was attempted but failed due to an accident; for example, <code>True Ocean</code> means the mission outcome was successfully  landed to a specific region of the ocean while <code>False Ocean</code> means the mission outcome was unsuccessfully landed to a specific region of the ocean. <code>True RTLS</code> means the mission outcome was successfully  landed to a ground pad <code>False RTLS</code> means the mission outcome was unsuccessfully landed to a ground pad.<code>True ASDS</code> means the mission outcome was successfully landed on  a drone ship <code>False ASDS</code> means the mission outcome was unsuccessfully landed on a drone ship.

In [2]:
df= pd.read_csv('falcon9_data.csv')

In [3]:
df.head()

Unnamed: 0,FlightNumber,Date,RocketName,Longitude,Latitude,LaunchSite,PayloadMass,Orbit,Block,SerialNumber,ReusedCount,LandingOutcome,Flights,GridFins,Reused,Legs,LandingPad
0,1,2010-06-04,Falcon 9,-80.577366,28.561857,CCSFS SLC 40,7341.2,LEO,1.0,B0003,0,None None,1,False,False,False,
1,2,2012-05-22,Falcon 9,-80.577366,28.561857,CCSFS SLC 40,525.0,LEO,1.0,B0005,0,None None,1,False,False,False,
2,3,2013-03-01,Falcon 9,-80.577366,28.561857,CCSFS SLC 40,677.0,ISS,1.0,B0007,0,None None,1,False,False,False,
3,4,2013-09-29,Falcon 9,-120.610829,34.632093,VAFB SLC 4E,500.0,PO,1.0,B1003,0,False Ocean,1,False,False,False,
4,5,2013-12-03,Falcon 9,-80.577366,28.561857,CCSFS SLC 40,3170.0,GTO,1.0,B1004,0,None None,1,False,False,False,


### Exploring the data

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 122 entries, 0 to 121
Data columns (total 17 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   FlightNumber    122 non-null    int64  
 1   Date            122 non-null    object 
 2   RocketName      122 non-null    object 
 3   Longitude       122 non-null    float64
 4   Latitude        122 non-null    float64
 5   LaunchSite      122 non-null    object 
 6   PayloadMass     122 non-null    float64
 7   Orbit           121 non-null    object 
 8   Block           122 non-null    float64
 9   SerialNumber    122 non-null    object 
 10  ReusedCount     122 non-null    int64  
 11  LandingOutcome  122 non-null    object 
 12  Flights         122 non-null    int64  
 13  GridFins        122 non-null    bool   
 14  Reused          122 non-null    bool   
 15  Legs            122 non-null    bool   
 16  LandingPad      96 non-null     object 
dtypes: bool(3), float64(4), int64(3), o

In [5]:
df.isna().sum()

FlightNumber       0
Date               0
RocketName         0
Longitude          0
Latitude           0
LaunchSite         0
PayloadMass        0
Orbit              1
Block              0
SerialNumber       0
ReusedCount        0
LandingOutcome     0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
dtype: int64

Since the Orbit column has only one NA value, we can just delete that record.

In [15]:
df = df[df['Orbit'].notna()] 

In [16]:
df.isna().sum()

FlightNumber       0
Date               0
RocketName         0
Longitude          0
Latitude           0
LaunchSite         0
PayloadMass        0
Orbit              0
Block              0
SerialNumber       0
ReusedCount        0
LandingOutcome     0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
dtype: int64

In [17]:
df.dtypes

FlightNumber        int64
Date               object
RocketName         object
Longitude         float64
Latitude          float64
LaunchSite         object
PayloadMass       float64
Orbit              object
Block             float64
SerialNumber       object
ReusedCount         int64
LandingOutcome     object
Flights             int64
GridFins             bool
Reused               bool
Legs                 bool
LandingPad         object
dtype: object

Looking at the number of launches per Launch Site

In [18]:
df['LaunchSite'].value_counts()

CCSFS SLC 40    72
KSC LC 39A      33
VAFB SLC 4E     16
Name: LaunchSite, dtype: int64

Each launch is dedicated to an orbit, here are some orbit types:

*   <b>LEO</b>: Low Earth orbit (LEO)is an Earth-centred orbit with an altitude of 2,000 km (1,200 mi) or less (approximately one-third of the radius of Earth),\[1] or with at least 11.25 periods per day (an orbital period of 128 minutes or less) and an eccentricity less than 0.25.\[2] Most of the manmade objects in outer space are in LEO <a href='https://en.wikipedia.org/wiki/Low_Earth_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01'>\[1]</a>.

*   <b>VLEO</b>: Very Low Earth Orbits (VLEO) can be defined as the orbits with a mean altitude below 450 km. Operating in these orbits can provide a number of benefits to Earth observation spacecraft as the spacecraft operates closer to the observation<a href='https://www.researchgate.net/publication/271499606_Very_Low_Earth_Orbit_mission_concepts_for_Earth_Observation_Benefits_and_challenges?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01'>\[2]</a>.

*   <b>GTO</b> A geosynchronous orbit is a high Earth orbit that allows satellites to match Earth's rotation. Located at 22,236 miles (35,786 kilometers) above Earth's equator, this position is a valuable spot for monitoring weather, communications and surveillance. Because the satellite orbits at the same speed that the Earth is turning, the satellite seems to stay in place over a single longitude, though it may drift north to south,” NASA wrote on its Earth Observatory website <a  href="https://www.space.com/29222-geosynchronous-orbit.html?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01" >\[3] </a>.

*   <b>SSO (or SO)</b>: It is a Sun-synchronous orbit  also called a heliosynchronous orbit is a nearly polar orbit around a planet, in which the satellite passes over any given point of the planet's surface at the same local mean solar time <a href="https://en.wikipedia.org/wiki/Sun-synchronous_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01">\[4] <a>.

*   <b>ES-L1 </b>:At the Lagrange points the gravitational forces of the two large bodies cancel out in such a way that a small object placed in orbit there is in equilibrium relative to the center of mass of the large bodies. L1 is one such point between the sun and the earth <a href="https://en.wikipedia.org/wiki/Lagrange_point?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01#L1_point">\[5]</a> .

*   <b>HEO</b> A highly elliptical orbit, is an elliptic orbit with high eccentricity, usually referring to one around Earth <a href="https://en.wikipedia.org/wiki/Highly_elliptical_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01">\[6]</a>.

*   <b> ISS </b> A modular space station (habitable artificial satellite) in low Earth orbit. It is a multinational collaborative project between five participating space agencies: NASA (United States), Roscosmos (Russia), JAXA (Japan), ESA (Europe), and CSA (Canada)<a href="https://en.wikipedia.org/wiki/International_Space_Station?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01"> \[7] </a>

*   <b> MEO </b> Geocentric orbits ranging in altitude from 2,000 km (1,200 mi) to just below geosynchronous orbit at 35,786 kilometers (22,236 mi). Also known as an intermediate circular orbit. These are "most commonly at 20,200 kilometers (12,600 mi), or 20,650 kilometers (12,830 mi), with an orbital period of 12 hours <a href="https://en.wikipedia.org/wiki/List_of_orbits?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01"> \[8] </a>

*   <b> HEO </b> Geocentric orbits above the altitude of geosynchronous orbit (35,786 km or 22,236 mi) <a href="https://en.wikipedia.org/wiki/List_of_orbits?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01"> \[9] </a>

*   <b> GEO </b> It is a circular geosynchronous orbit 35,786 kilometres (22,236 miles) above Earth's equator and following the direction of Earth's rotation <a href="https://en.wikipedia.org/wiki/Geostationary_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01"> \[10] </a>

*   <b> PO </b> It is one type of satellites in which a satellite passes above or nearly above both poles of the body being orbited (usually a planet such as the Earth <a href="https://en.wikipedia.org/wiki/Polar_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2021-01-01"> \[11] </a>


Looking at the number of occurrences in each orbit.

In [19]:
df['Orbit'].value_counts()

GTO      30
ISS      27
VLEO     27
PO       12
LEO       9
SSO       7
MEO       4
ES-L1     1
HEO       1
SO        1
GEO       1
TLI       1
Name: Orbit, dtype: int64

Looking at the number of occurrences based on each Landing Outcome

In [20]:
df['LandingOutcome'].value_counts()

True ASDS      69
None None      19
True RTLS      16
False ASDS      7
True Ocean      5
False Ocean     2
None ASDS       2
False RTLS      1
Name: LandingOutcome, dtype: int64

In the data set, there are several different cases where the booster did not land successfully. Sometimes a landing was attempted but failed due to an accident; for example, <code>True Ocean</code> means the mission outcome was successfully  landed to a specific region of the ocean while <code>False Ocean</code> means the mission outcome was unsuccessfully landed to a specific region of the ocean. <code>True RTLS</code> means the mission outcome was successfully  landed to a ground pad <code>False RTLS</code> means the mission outcome was unsuccessfully landed to a ground pad.<code>True ASDS</code> means the mission outcome was successfully landed on  a drone ship <code>False ASDS</code> means the mission outcome was unsuccessfully landed on a drone ship.

In [21]:

landing_outcomes = df['LandingOutcome'].value_counts()

In [22]:
for i,outcome in enumerate(landing_outcomes.keys()):
    print(i,outcome)

0 True ASDS
1 None None
2 True RTLS
3 False ASDS
4 True Ocean
5 False Ocean
6 None ASDS
7 False RTLS


Create a set of bad_outcomes where it was an unsuccessful landing

In [23]:
bad_outcomes=set(landing_outcomes.keys()[[1,3,5,6,7]])
bad_outcomes

{'False ASDS', 'False Ocean', 'False RTLS', 'None ASDS', 'None None'}

Create a binary label for good and bad landing outcomes

In [24]:
# If the landing outcome is in the bad_outcomes list then set the class to 0, otherwise 1
landing_class = []

for x in df['LandingOutcome']:
    if x in bad_outcomes:
        landing_class.append(0)
    else:
        landing_class.append(1)

In [25]:
len(landing_class)

121

Add the Class feature to the dataset with the landing_class list

In [26]:
df['Class'] = landing_class

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Class'] = landing_class


In [27]:
df['Class'].value_counts()

1    90
0    31
Name: Class, dtype: int64

In [28]:
df['Class'].mean()

0.743801652892562

In [29]:
df.to_csv('falcon9_data_class.csv', index=False)