# SpaceX Falcon 9 First Stage Landing Prediction: Data Wrangling

This notebook performs data wrangling and exploratory data analysis on the SpaceX Falcon 9 launch data.

Key components:

* Loading and initial exploration of the dataset
* Analysis of missing values
* Examination of launch sites and orbits
* Creation of a landing outcome label
* Calculation of mission success rate
* Data export for further analysis

## 1. Setup

In [9]:
import pandas as pd
import numpy as np
from IPython.display import display

# Constants
DATA_URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/dataset_part_1.csv"

## 2. Data Loading

In [10]:
# Load dataset
df = pd.read_csv(DATA_URL)

# Display first few rows of raw data
display(df.head())

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2010-06-04,Falcon 9,6104.959412,LEO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
1,2,2012-05-22,Falcon 9,525.0,LEO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
2,3,2013-03-01,Falcon 9,677.0,ISS,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
3,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
4,5,2013-12-03,Falcon 9,3170.0,GTO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857


## 3. Data Exploration

In [11]:
# Calculate percentage of missing values
print("\nPercentage of missing values in each column:")
display((df.isnull().sum() / len(df) * 100).round(2))

# Identify numerical and categorical columns
print("\nData types of each column:")
display(df.dtypes)


Percentage of missing values in each column:


Unnamed: 0,0
FlightNumber,0.0
Date,0.0
BoosterVersion,0.0
PayloadMass,0.0
Orbit,0.0
LaunchSite,0.0
Outcome,0.0
Flights,0.0
GridFins,0.0
Reused,0.0



Data types of each column:


Unnamed: 0,0
FlightNumber,int64
Date,object
BoosterVersion,object
PayloadMass,float64
Orbit,object
LaunchSite,object
Outcome,object
Flights,int64
GridFins,bool
Reused,bool


### Analysis of Launch Sites and Orbits

In [12]:
# TASK 1: Calculate the number of launches on each site
print("\nNumber of launches on each site:")
display(df['LaunchSite'].value_counts())

# TASK 2: Calculate the number and occurrence of each orbit
print("\nNumber and occurrence of each orbit:")
display(df['Orbit'].value_counts())


Number of launches on each site:


Unnamed: 0_level_0,count
LaunchSite,Unnamed: 1_level_1
CCAFS SLC 40,55
KSC LC 39A,22
VAFB SLC 4E,13



Number and occurrence of each orbit:


Unnamed: 0_level_0,count
Orbit,Unnamed: 1_level_1
GTO,27
ISS,21
VLEO,14
PO,9
LEO,7
SSO,5
MEO,3
ES-L1,1
HEO,1
SO,1


### Mission Outcome Analysis

In [13]:
# TASK 3: Calculate the number and occurrence of mission outcome
landing_outcomes = df['Outcome'].value_counts()
print("\nNumber and occurrence of mission outcomes:")
display(landing_outcomes)

# Print each outcome
print("\nOutcomes:")
for i, outcome in enumerate(landing_outcomes.keys()):
    print(i, outcome)

# Define bad outcomes
bad_outcomes = set(landing_outcomes.keys()[[1,3,5,6,7]])
print("\nBad outcomes:")
print(bad_outcomes)


Number and occurrence of mission outcomes:


Unnamed: 0_level_0,count
Outcome,Unnamed: 1_level_1
True ASDS,41
None None,19
True RTLS,14
False ASDS,6
True Ocean,5
False Ocean,2
None ASDS,2
False RTLS,1



Outcomes:
0 True ASDS
1 None None
2 True RTLS
3 False ASDS
4 True Ocean
5 False Ocean
6 None ASDS
7 False RTLS

Bad outcomes:
{'False RTLS', 'False ASDS', 'None ASDS', 'False Ocean', 'None None'}


### Creation of Landing Outcome Label

In [14]:
# TASK 4: Create a landing outcome label from Outcome column
landing_class = [0 if outcome in bad_outcomes else 1 for outcome in df['Outcome']]
df['Class'] = landing_class

print("\nFirst 8 rows of the new Class column:")
display(df[['Class']].head(8))

print("\nFirst 5 rows of the updated dataset:")
display(df.head(5))


First 8 rows of the new Class column:


Unnamed: 0,Class
0,0
1,0
2,0
3,0
4,0
5,0
6,1
7,1



First 5 rows of the updated dataset:


Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude,Class
0,1,2010-06-04,Falcon 9,6104.959412,LEO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857,0
1,2,2012-05-22,Falcon 9,525.0,LEO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857,0
2,3,2013-03-01,Falcon 9,677.0,ISS,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857,0
3,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093,0
4,5,2013-12-03,Falcon 9,3170.0,GTO,CCAFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857,0


### Success Rate Calculation

In [15]:
# Calculate success rate
success_rate = df["Class"].mean()
print(f"\nSuccess rate: {success_rate:.2%}")


Success rate: 66.67%


## Data Export

In [16]:
# Export to CSV
df.to_csv("dataset_part_2.csv", index=False)
print("\nData exported to 'dataset_part_2.csv'")


Data exported to 'dataset_part_2.csv'
