# Data Wrangling

In [1]:
import pandas as pd
import numpy as np
print("Imported Libraries")

Imported Libraries


## Read Data

In [2]:
df=pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/dataset_part_1.csv")
print(f"Read Data ({df.shape})")

Read Data ((90, 17))


In [3]:
for column in df.columns:
    nullPercent = np.round(df[column].isnull().sum()/len(df),3)
    if nullPercent > 0:
        print(f'{column} is {nullPercent*100}% null.')

LandingPad is 28.9% null.


In [4]:
df.drop('LandingPad', axis=1, inplace=True)

## TASK 1: Calculate the number of launches on each site

In [5]:
# Apply value_counts() on column LaunchSite
df[['LaunchSite']].groupby(['LaunchSite']).value_counts()

LaunchSite
CCAFS SLC 40    55
KSC LC 39A      22
VAFB SLC 4E     13
Name: count, dtype: int64

## TASK 2: Calculate the number and occurrence of each orbit

 Use the method  <code>.value_counts()</code> to determine the number and occurrence of each orbit in the  column <code>Orbit</code>

In [6]:
df[['Orbit']].groupby(['Orbit']).value_counts()

Orbit
ES-L1     1
GEO       1
GTO      27
HEO       1
ISS      21
LEO       7
MEO       3
PO        9
SO        1
SSO       5
VLEO     14
Name: count, dtype: int64

In [7]:
# Outcome Totals for Each Orbit
print(pd.DataFrame(df[['Orbit','Outcome']].groupby(['Orbit']).value_counts()))

                   count
Orbit Outcome           
ES-L1 True Ocean       1
GEO   True ASDS        1
GTO   True ASDS       13
      None None       11
      False ASDS       1
      None ASDS        1
      True Ocean       1
HEO   True ASDS        1
ISS   True RTLS        7
      True ASDS        5
      None None        3
      False ASDS       2
      False Ocean      1
      False RTLS       1
      None ASDS        1
      True Ocean       1
LEO   True RTLS        4
      None None        2
      True Ocean       1
MEO   True ASDS        2
      None None        1
PO    True ASDS        5
      False ASDS       1
      False Ocean      1
      None None        1
      True Ocean       1
SO    None None        1
SSO   True RTLS        3
      True ASDS        2
VLEO  True ASDS       12
      False ASDS       2


## TASK 3: Calculate the number and occurence of mission outcome of the orbits

Use the method <code>.value_counts()</code> on the column <code>Outcome</code> to determine the number of <code>landing_outcomes</code>.Then assign it to a variable landing_outcomes.

In [18]:
df_orbit_outcomes = pd.DataFrame(df[['Orbit','Outcome']].groupby(['Orbit']).value_counts())
df_orbit_outcomes

Unnamed: 0_level_0,Unnamed: 1_level_0,count
Orbit,Outcome,Unnamed: 2_level_1
ES-L1,True Ocean,1
GEO,True ASDS,1
GTO,True ASDS,13
GTO,None None,11
GTO,False ASDS,1
GTO,None ASDS,1
GTO,True Ocean,1
HEO,True ASDS,1
ISS,True RTLS,7
ISS,True ASDS,5


In [8]:
landing_outcomes = df[['Outcome']].groupby(['Outcome']).value_counts(); landing_outcomes

Outcome
False ASDS      6
False Ocean     2
False RTLS      1
None ASDS       2
None None      19
True ASDS      41
True Ocean      5
True RTLS      14
Name: count, dtype: int64

<code>True Ocean</code> means the mission outcome was successfully  landed to a specific region of the ocean while <code>False Ocean</code> means the mission outcome was unsuccessfully landed to a specific region of the ocean. <code>True RTLS</code> means the mission outcome was successfully  landed to a ground pad <code>False RTLS</code> means the mission outcome was unsuccessfully landed to a ground pad.<code>True ASDS</code> means the mission outcome was successfully  landed to a drone ship <code>False ASDS</code> means the mission outcome was unsuccessfully landed to a drone ship. <code>None ASDS</code> and <code>None None</code> these represent a failure to land.


In [9]:
for i,outcome in enumerate(landing_outcomes.keys()):
    print(i,outcome)

0 False ASDS
1 False Ocean
2 False RTLS
3 None ASDS
4 None None
5 True ASDS
6 True Ocean
7 True RTLS


We create a set of outcomes where the second stage did not land successfully:


In [10]:
bad_outcomes = set(landing_outcomes.keys()[[1,3,5,6,7]]);   bad_outcomes

{'False Ocean', 'None ASDS', 'True ASDS', 'True Ocean', 'True RTLS'}

In [11]:
bad_outcomes = set(landing_outcomes.keys()[[0,1,2,3,4]]);   bad_outcomes

{'False ASDS', 'False Ocean', 'False RTLS', 'None ASDS', 'None None'}

In [12]:
bad_outcomes = set(landing_outcomes[landing_outcomes.keys().str.contains("False") | landing_outcomes.keys().str.contains("None")].keys());   bad_outcomes

{'False ASDS', 'False Ocean', 'False RTLS', 'None ASDS', 'None None'}

## TASK 4: Create a landing outcome label from Outcome column

This code is better than the lab's convoluted approach:\
``df = df.assign(Class = df['Outcome'].str.contains("True").map({True: 1, False: 0}))``

Using the <code>Outcome</code>,  create a list where the element is zero if the corresponding  row  in  <code>Outcome</code> is in the set <code>bad_outcome</code>; otherwise, it's one. Then assign it to the variable <code>landing_class</code>:


In [13]:
landing_class = [int(not x) for x in df['Outcome'].isin(bad_outcomes)]

This variable will represent the classification variable that represents the outcome of each launch. If the value is zero, the  first stage did not land successfully; one means  the first stage landed Successfully 


In [14]:
df['Class']=landing_class

We can use the following line of code to determine  the success rate:

In [15]:
print(f'Successful {df["Class"].mean()*100:.1f}%')

Successful 66.7%


In [21]:
df_orbit_outcomes = pd.DataFrame(df[['Orbit','Class']].groupby(['Orbit','Class']).value_counts())
df_orbit_outcomes

Unnamed: 0_level_0,Unnamed: 1_level_0,count
Orbit,Class,Unnamed: 2_level_1
ES-L1,1,1
GEO,1,1
GTO,0,13
GTO,1,14
HEO,1,1
ISS,0,8
ISS,1,13
LEO,0,2
LEO,1,5
MEO,0,1


This code is better than the lab's convoluted approach:\
``df = df.assign(Class = df['Outcome'].str.contains("True").map({True: 1, False: 0}))``

## TASK 4: Create a landing outcome label from Outcome column

## Save Data to CSV 

In [16]:
df.to_csv("dataset_part_2.csv", index=False)