# SpaceX Data Wrangling 


# Description

In thi notebook, we will perform some Exploratory Data Analysis (EDA) to find some patterns in the data and determine what would be the label for training supervised models and we try to combine two dataset that we collected from scraping the Wikipedia web page and SpaceX API.

In the data set, there are several different cases where the booster did not land successfully. Sometimes a landing was attempted but failed due to an accident.

In this lab we will mainly convert those outcomes into Training Labels with `1` means the booster successfully landed `0` means it was unsuccessful.


Falcon 9 first stage will land successfully  
  
![](../../Files/landing_1.gif)



Several examples of an unsuccessful landing are shown here  
  
![](../../Files/crash.gif)



# Objectives

Perform exploratory  Data Analysis and determine Training Labels

*   Exploratory Data Analysis
*   Determine Training Labels


# Setup

We will import the following libraries.


In [9]:
import pandas as pd
import numpy as np

In [2]:
# first setup datasets that we collected
path_1 = "../CollectData_Using_API/dataset_part_1.csv"
path_2 = "../webscraping/spacex_web_scraped.csv"

Load Space X dataset, from last section.


In [13]:
df = pd.read_csv(path_1)
df_web = pd.read_csv(path_2)

In [4]:
df.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
1,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
2,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
3,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
4,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857


In [14]:
df_web.head()

Unnamed: 0,Flight No.,Launch site,Payload,Payload mass,Orbit,Customer,Launch outcome,Version Booster,Booster landing,Date,Time
0,1,CCAFS,Dragon Spacecraft Qualification Unit,0,LEO,SpaceX,Success\n,F9 v1.0B0003.1,Failure,4 June 2010,18:45
1,2,CCAFS,Dragon,0,LEO,NASA,Success,F9 v1.0B0004.1,Failure,8 December 2010,15:43
2,3,CCAFS,Dragon,525 kg,LEO,NASA,Success,F9 v1.0B0005.1,Not attempted\n,22 May 2012,07:44
3,4,CCAFS,SpaceX CRS-1,"4,700 kg",LEO,NASA,Success\n,F9 v1.0B0006.1,No attempt,8 October 2012,00:35
4,5,CCAFS,SpaceX CRS-2,"4,877 kg",LEO,NASA,Success\n,F9 v1.0B0007.1,Not attempted\n,1 March 2013,15:10


In [15]:
df.shape, df_web.shape

((90, 17), (121, 11))

# Exploratory Data Analysis

First we need to Combine Two Datasets, here is our approche for this work:  
1. we find a `unique` column that is `common` between two datasets
2. we get the `difference between these two dataset` by that unique column
3. we add `records` that we got from previous step to other dataset
4. we `rearrange` Flight number column  

First we double check our data that had to be collected for Falcon 9

In [7]:
df.BoosterVersion.unique()

array(['Falcon 9'], dtype=object)

Based on the data that published from spaceX first two characters of booster version show us type of vehicle that carry cargo and crew into Earth orbit,

In [16]:
booster_v = list(map(lambda x: x[0:2], df_web['Version Booster'].unique()))
np.unique(booster_v)

array(['F9'], dtype='<U2')

We can see here we have all the booster that are for Falcon 9 

Next we try to find a unique identifier column/s that is/are common between two datasets

`Date` is a good common identifier, first we need to make format of `Date` columns comparable 

In [17]:
df_web['Date'] = pd.to_datetime(df_web['Date'], format='%d %B %Y').dt.strftime('%Y-%m-%d')

Lets see the Diffrences between these two sets

In [18]:
print(len(set(df_web['Date']) - set(df['Date'])))
print(len(set(df['Date']) - set(df_web['Date'])))

32
1


we have 32 records that are in `Scrapped data` and not in `API collected data`.  
we have 1 record that is in `API collected data` and not in `Scrapped data`.  

> **Note**: our dataset are small and if we add data to our dataset it will be good for our predictive analysis but there is a problem we scrapped data from **web** and also from **spaceX API** therefore our data are **slightly different in features** and also we have some wrong data probably in **Wikipedia** Data.  
  
one approche is that we assume that we did not have more than one launch in one exate date so if we verify our records by Date column we can augmented data together but we will face with some problem like missmatching between some records between scrapped data and API date.

---

we add some features from scrapped data to API dataset for same records because if we want to add new records to API dataset we will face with lots of `Nan` values

In [19]:
df_web[df_web.Date == '2013-03-01']

Unnamed: 0,Flight No.,Launch site,Payload,Payload mass,Orbit,Customer,Launch outcome,Version Booster,Booster landing,Date,Time
4,5,CCAFS,SpaceX CRS-2,"4,877 kg",LEO,NASA,Success\n,F9 v1.0B0007.1,Not attempted\n,2013-03-01,15:10


In [20]:
df[df.Date == '2013-03-01']

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
2,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857


We can see two same date record have different attributes.

we create an ground dataframe to combine all data there 

In [28]:
df.Outcome.unique()

array(['None None', 'False Ocean', 'True Ocean', 'False ASDS',
       'None ASDS', 'True RTLS', 'True ASDS', 'False RTLS'], dtype=object)

In [29]:
df_web['Booster landing'].unique()

array(['Failure', 'Not attempted\n', 'No attempt', 'Uncontrolled',
       'Not attempted', 'Controlled', 'Failure ', 'Precluded', 'Success'],
      dtype=object)

In [30]:
df_web['Launch outcome'].unique() 

array(['Success\n', 'Success', 'Failure'], dtype=object)

In [42]:
df[df.Outcome=='None ASDS'].Date

15    2015-06-28
24    2016-09-01
Name: Date, dtype: object

In [51]:
df_web[df_web.Date.isin(['2015-06-28', '2016-09-01'])]

Unnamed: 0,Flight No.,Launch site,Payload,Payload mass,Orbit,Customer,Launch outcome,Version Booster,Booster landing,Date,Time
18,19,Cape Canaveral,SpaceX CRS-7,"1,952 kg",LEO,NASA,Failure,F9 v1.1,Precluded,2015-06-28,14:21


In [62]:
# df[df.Date == '2015-06-28'].Outcome = 'False ASDS'
# df.loc[18, 'Outcome'] 

'False ASDS'

In [54]:
df[df.Outcome=='None None'].Date

0     2010-06-04
1     2012-05-22
2     2013-03-01
4     2013-12-03
5     2014-01-06
8     2014-08-05
9     2014-09-07
14    2015-04-27
27    2017-03-16
30    2017-05-15
34    2017-07-05
45    2018-03-06
46    2018-03-30
47    2018-04-02
50    2018-06-04
51    2018-06-29
60    2018-12-23
67    2019-08-06
72    2020-01-19
Name: Date, dtype: object

In [58]:
df_web[df_web.Date.isin(df[df.Outcome=='None None'].Date)][['Booster landing', 'Launch outcome']]

Unnamed: 0,Booster landing,Launch outcome
0,Failure,Success\n
2,Not attempted\n,Success
4,Not attempted\n,Success\n
6,Not attempted,Success
7,Not attempted,Success
10,Not attempted,Success
11,Not attempted\n,Success
17,Not attempted,Success\n
30,Not attempted,Success\n
33,Not attempted,Success\n


All the outcome were `True` but `booster landing` 'not attempted' we change the data from `None None` to `True None`

In [65]:
for i in df[df.Outcome=='None None'].index:
    df.loc[i, 'Outcome'] = 'True None'

In [66]:
df.Outcome.unique()

array(['True None', 'False Ocean', 'True Ocean', 'False ASDS',
       'None ASDS', 'True RTLS', 'True ASDS', 'False RTLS'], dtype=object)

In [80]:
customers = []
payload = []
indx = []
for date, i in zip(df['Date'], df['Date'].index):
    # print(date, i, sep='\n')
    # break
    if df_web['Date'].isin([date]).sum() == 1:
        # print(date, i, sep='\n')
        # break
        # df.loc[i, 'Customer'] = df_web[df_web['Date']==date].Customer
        customers.append(df_web[df_web['Date']==date].Customer)
        # df.loc[i, 'Payload'] = df_web[df_web['Date']==date].Payload
        payload.append(df_web[df_web['Date']==date].Payload)
        indx.append(i)
        
    

In [114]:
df['Payload'] = np.zeros(df.shape[0])
df['Payload'] = df['Payload'].astype(dtype=str)

df['Customer'] = np.zeros(df.shape[0])
df['Customer'] = df['Customer'].astype(dtype=str)


In [115]:
for i, p, c in zip(indx, payload, customers):
    df.loc[i, 'Customer'] = list(c)[0]
    df.loc[i, 'Payload'] = list(p)[0]

In [121]:
df.Payload[df.Payload == '0.0']

24    0.0
Name: Payload, dtype: object

In [128]:
df[df.isna().Customer]

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude,Customer,Payload
14,15,2015-04-27,Falcon 9,4707.0,GTO,CCSFS SLC 40,True None,1,False,False,False,,1.0,0,B1016,-80.577366,28.561857,,TürkmenÄlem 52°E / MonacoSAT


In [129]:
# df.drop(axis=0, index=24, inplace=True)
df.drop(axis=0, index=14, inplace=True)

In [130]:
df.isna().sum()

FlightNumber       0
Date               0
BoosterVersion     0
PayloadMass        0
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        25
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
Customer           0
Payload            0
dtype: int64

In [132]:
df.LandingPad.unique()

array([nan, '5e9e3032383ecb761634e7cb', '5e9e3032383ecb6bb234e7ca',
       '5e9e3032383ecb267a34e7c7', '5e9e3033383ecbb9e534e7cc',
       '5e9e3032383ecb554034e7c9'], dtype=object)

In [133]:
df.drop(axis=1, columns=['LandingPad'], inplace=True)

In [134]:
df.isna().sum()

FlightNumber      0
Date              0
BoosterVersion    0
PayloadMass       0
Orbit             0
LaunchSite        0
Outcome           0
Flights           0
GridFins          0
Reused            0
Legs              0
Block             0
ReusedCount       0
Serial            0
Longitude         0
Latitude          0
Customer          0
Payload           0
dtype: int64

Identify and calculate the percentage of the missing values in each attribute


In [135]:
(df.isnull().sum()/df.shape[0])*100

FlightNumber      0.0
Date              0.0
BoosterVersion    0.0
PayloadMass       0.0
Orbit             0.0
LaunchSite        0.0
Outcome           0.0
Flights           0.0
GridFins          0.0
Reused            0.0
Legs              0.0
Block             0.0
ReusedCount       0.0
Serial            0.0
Longitude         0.0
Latitude          0.0
Customer          0.0
Payload           0.0
dtype: float64

In [136]:
df.dtypes

FlightNumber        int64
Date               object
BoosterVersion     object
PayloadMass       float64
Orbit              object
LaunchSite         object
Outcome            object
Flights             int64
GridFins             bool
Reused               bool
Legs                 bool
Block             float64
ReusedCount         int64
Serial             object
Longitude         float64
Latitude          float64
Customer           object
Payload            object
dtype: object

### TASK 1: Calculate the number of launches on each site

The data contains several Space X  launch facilities: <a href='https://en.wikipedia.org/wiki/List_of_Cape_Canaveral_and_Merritt_Island_launch_sites?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555'>Cape Canaveral Space</a> Launch Complex 40  <b>VAFB SLC 4E </b> , Vandenberg Air Force Base Space Launch Complex 4E <b>(SLC-4E)</b>, Kennedy Space Center Launch Complex 39A <b>KSC LC 39A </b>.The location of each Launch Is placed in the column <code>LaunchSite</code>


Next, let's see the number of launches for each site.

Use the method  <code>value_counts()</code> on the column <code>LaunchSite</code> to determine the number of launches  on each site:


In [137]:
df['LaunchSite'].value_counts()

LaunchSite
CCSFS SLC 40    53
KSC LC 39A      22
VAFB SLC 4E     13
Name: count, dtype: int64

Each launch aims to an dedicated orbit, and here are some common orbit types:


*   <b>LEO</b>: Low Earth orbit (LEO)is an Earth-centred orbit with an altitude of 2,000 km (1,200 mi) or less (approximately one-third of the radius of Earth),\[1] or with at least 11.25 periods per day (an orbital period of 128 minutes or less) and an eccentricity less than 0.25.\[2] Most of the manmade objects in outer space are in LEO <a href='https://en.wikipedia.org/wiki/Low_Earth_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555'>\[1]</a>.

*   <b>VLEO</b>: Very Low Earth Orbits (VLEO) can be defined as the orbits with a mean altitude below 450 km. Operating in these orbits can provide a number of benefits to Earth observation spacecraft as the spacecraft operates closer to the observation<a href='https://www.researchgate.net/publication/271499606_Very_Low_Earth_Orbit_mission_concepts_for_Earth_Observation_Benefits_and_challenges?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555'>\[2]</a>.

*   <b>GTO</b> A geosynchronous orbit is a high Earth orbit that allows satellites to match Earth's rotation. Located at 22,236 miles (35,786 kilometers) above Earth's equator, this position is a valuable spot for monitoring weather, communications and surveillance. Because the satellite orbits at the same speed that the Earth is turning, the satellite seems to stay in place over a single longitude, though it may drift north to south,” NASA wrote on its Earth Observatory website <a  href="https://www.space.com/29222-geosynchronous-orbit.html?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555" >\[3] </a>.

*   <b>SSO (or SO)</b>: It is a Sun-synchronous orbit  also called a heliosynchronous orbit is a nearly polar orbit around a planet, in which the satellite passes over any given point of the planet's surface at the same local mean solar time <a href="https://en.wikipedia.org/wiki/Sun-synchronous_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555">\[4] </a>.

*   <b>ES-L1 </b>:At the Lagrange points the gravitational forces of the two large bodies cancel out in such a way that a small object placed in orbit there is in equilibrium relative to the center of mass of the large bodies. L1 is one such point between the sun and the earth<a href="https://en.wikipedia.org/wiki/Lagrange_point?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555">\[5]</a>.

*   <b>HEO</b> A highly elliptical orbit, is an elliptic orbit with high eccentricity, usually referring to one around Earth<a href="https://en.wikipedia.org/wiki/Highly_elliptical_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555">\[6]</a>.

*   <b> ISS </b> A modular space station (habitable artificial satellite) in low Earth orbit. It is a multinational collaborative project between five participating space agencies: NASA (United States), Roscosmos (Russia), JAXA (Japan), ESA (Europe), and CSA (Canada)<a href="https://en.wikipedia.org/wiki/International_Space_Station?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555"> \[7] </a>

*   <b> MEO </b> Geocentric orbits ranging in altitude from 2,000 km (1,200 mi) to just below geosynchronous orbit at 35,786 kilometers (22,236 mi). Also known as an intermediate circular orbit. These are "most commonly at 20,200 kilometers (12,600 mi), or 20,650 kilometers (12,830 mi), with an orbital period of 12 hours <a href="https://en.wikipedia.org/wiki/List_of_orbits?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555"> \[8] </a>

*   <b> HEO </b> Geocentric orbits above the altitude of geosynchronous orbit (35,786 km or 22,236 mi) <a href="https://en.wikipedia.org/wiki/List_of_orbits?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555"> \[9] </a>

*   <b> GEO </b> It is a circular geosynchronous orbit 35,786 kilometres (22,236 miles) above Earth's equator and following the direction of Earth's rotation <a href="https://en.wikipedia.org/wiki/Geostationary_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555"> \[10] </a>

*   <b> PO </b> It is one type of satellites in which a satellite passes above or nearly above both poles of the body being orbited (usually a planet such as the Earth <a href="https://en.wikipedia.org/wiki/Polar_orbit?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555"> \[11] </a>

some are shown in the following plot:


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/Images/Orbits.png)


### TASK 2: Calculate the number and occurrence of each orbit


Use the method  <code>.value_counts()</code> to determine the number and occurrence of each orbit in the  column <code>Orbit</code>


In [138]:
df['Orbit'].value_counts()

Orbit
GTO      25
ISS      21
VLEO     14
PO        9
LEO       7
SSO       5
MEO       3
ES-L1     1
HEO       1
SO        1
GEO       1
Name: count, dtype: int64

### TASK 3: Calculate the number and occurence of mission outcome per orbit type


Use the method <code>.value_counts()</code> on the column <code>Outcome</code> to determine the number of <code>landing_outcomes</code>.Then assign it to a variable landing_outcomes.


In [141]:
# first we create picot table for orbit and outcome to determine the number of landing_outcomes in each orbit
gp = df[['Orbit', 'Outcome']].groupby('Orbit', as_index=False).value_counts()
# gp
pivot = gp.pivot(index='Orbit', columns='Outcome')
pivot

Unnamed: 0_level_0,count,count,count,count,count,count,count,count
Outcome,False ASDS,False Ocean,False RTLS,None ASDS,True ASDS,True None,True Ocean,True RTLS
Orbit,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
ES-L1,,,,,,,1.0,
GEO,,,,,1.0,,,
GTO,1.0,,,,13.0,10.0,1.0,
HEO,,,,,1.0,,,
ISS,2.0,1.0,1.0,1.0,5.0,3.0,1.0,7.0
LEO,,,,,,2.0,1.0,4.0
MEO,,,,,2.0,1.0,,
PO,1.0,1.0,,,5.0,1.0,1.0,
SO,,,,,,1.0,,
SSO,,,,,2.0,,,3.0


In [142]:
landing_outcomes = df['Outcome'].value_counts()
landing_outcomes

Outcome
True ASDS      41
True None      18
True RTLS      14
False ASDS      6
True Ocean      5
False Ocean     2
None ASDS       1
False RTLS      1
Name: count, dtype: int64

<code>True Ocean</code> means the mission outcome was successfully  landed to a specific region of the ocean while <code>False Ocean</code> means the mission outcome was unsuccessfully landed to a specific region of the ocean. <code>True RTLS</code> means the mission outcome was successfully  landed to a ground pad <code>False RTLS</code> means the mission outcome was unsuccessfully landed to a ground pad.<code>True ASDS</code> means the mission outcome was successfully  landed to a drone ship <code>False ASDS</code> means the mission outcome was unsuccessfully landed to a drone ship, <code>True None</code> means the mission outcome was successfully landed to a somewhere we do not have info in tihs dataset. <code>None ASDS</code> represents a uknown outcome.


We have 1 `None ASDS` so we can remove that record

In [144]:
df[df['Outcome'] == 'None ASDS'].index

Index([15], dtype='int64')

In [145]:
df.drop(index=15, axis=0, inplace=True)

We create a set of outcomes where the second stage did not land successfully:


In [146]:
landing_outcomes = df['Outcome'].value_counts()
landing_outcomes

Outcome
True ASDS      41
True None      18
True RTLS      14
False ASDS      6
True Ocean      5
False Ocean     2
False RTLS      1
Name: count, dtype: int64

In [147]:
# bad landing outcome
# landing_outcomes.keys()
bad_landing_outcome = landing_outcomes.keys()[[3,5,6]]
print(bad_landing_outcome)

Index(['False ASDS', 'False Ocean', 'False RTLS'], dtype='object', name='Outcome')


In [148]:
bad_outcomes=set(bad_landing_outcome)
bad_outcomes

{'False ASDS', 'False Ocean', 'False RTLS'}

### TASK 4: Create a landing outcome label from Outcome column


Using the <code>Outcome</code>,  create a list where the element is zero if the corresponding  row  in  <code>Outcome</code> is in the set <code>bad_outcome</code>; otherwise, it's one. Then assign it to the variable <code>landing_class</code>:


In [149]:
# landing_class = 0 if bad_outcome
# landing_class = 1 otherwise

landing_class = []
for i in df['Outcome']:
    if i in bad_outcomes:
        landing_class.append(0)
    else:
        landing_class.append(1)
# landing_class

In [150]:
df['Outcome'].count()

87

In [151]:
len(landing_class)

87

both `landing_class` and `df['Outcome']` have equal rows

This variable will represent the classification variable that represents the outcome of each launch. If the value is zero, the  first stage did not land successfully; one means  the first stage landed Successfully


In [152]:
df['Class']=landing_class
df[['Class']].head(10)

Unnamed: 0,Class
0,1
1,1
2,1
3,0
4,1
5,1
6,1
7,1
8,1
9,1


In [153]:
df.head(10)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,Block,ReusedCount,Serial,Longitude,Latitude,Customer,Payload,Class
0,1,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,True None,1,False,False,False,1.0,0,B0003,-80.577366,28.561857,SpaceX,Dragon Spacecraft Qualification Unit,1
1,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,True None,1,False,False,False,1.0,0,B0005,-80.577366,28.561857,NASA,Dragon,1
2,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,True None,1,False,False,False,1.0,0,B0007,-80.577366,28.561857,NASA,SpaceX CRS-2,1
3,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,1.0,0,B1003,-120.610829,34.632093,MDA,CASSIOPE,0
4,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,True None,1,False,False,False,1.0,0,B1004,-80.577366,28.561857,SES,SES-8,1
5,6,2014-01-06,Falcon 9,3325.0,GTO,CCSFS SLC 40,True None,1,False,False,False,1.0,0,B1005,-80.577366,28.561857,Thaicom,Thaicom 6,1
6,7,2014-04-18,Falcon 9,2296.0,ISS,CCSFS SLC 40,True Ocean,1,False,False,True,1.0,0,B1006,-80.577366,28.561857,NASA,SpaceX CRS-3,1
7,8,2014-07-14,Falcon 9,1316.0,LEO,CCSFS SLC 40,True Ocean,1,False,False,True,1.0,0,B1007,-80.577366,28.561857,Orbcomm,Orbcomm-OG2,1
8,9,2014-08-05,Falcon 9,4535.0,GTO,CCSFS SLC 40,True None,1,False,False,False,1.0,0,B1008,-80.577366,28.561857,AsiaSat,AsiaSat 8,1
9,10,2014-09-07,Falcon 9,4428.0,GTO,CCSFS SLC 40,True None,1,False,False,False,1.0,0,B1011,-80.577366,28.561857,AsiaSat,AsiaSat 6,1


We can use the following line of code to determine  the success rate:


In [156]:
df.FlightNumber.unique()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 17, 18, 19,
       20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
       38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
       55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,
       72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88,
       89, 90])

In [154]:
df["Class"].mean()

0.896551724137931

We can now export it to a CSV for the next section


In [155]:
df.to_csv('dataset_part_2.csv', index=False)

## Author


<center>Moein (mrpintime)</center>
