# **SpaceX Falcon9 launch prediction project**
### **<span style="color:#ff9933">Part 1: Data collection & cleaning</span>** 
**Mason Phung**   
Last edited: December 2024

*Space Exploration Technologies Corp. or SpaceX is an American spacecraft manufacturer, popular for their successful mission in sending a spacecraft and astronauts to the International Space Station. They are also well-known for their [VTVL](https://en.wikipedia.org/wiki/VTVL) rocket launches , in which rockets can land and be resued, thus save a huge amount of launching cost for the company.*

*One of SpaceX's most popular rocket - the Falcon 9, have landed and reflown [more than 200 times](https://en.wikipedia.org/wiki/List_of_Falcon_9_and_Falcon_Heavy_launches) . The rocket was advertised on its website with a launch cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore if we can determine if the first stage will land, we can determine the cost of a launch.*

*We are working at a business in the aerospace industry who are developing a space rocket and are researching different rocket technologies and their competitiors.*

*In this project, we will collect & analyze past launches data of the Falcon 9 rockets then try to predict the outcome of future launches using different Machine Learning models. The main purpose is to find which factor contributes to the success of each flight and to build a Machine Learning model that can predict the outcome of a rocket launch.*

**Questions**
1. What are the important features that contributes to the success of each launch?
2. Does geography play an essential role in the success of each launch?
3. What can be suggested to improve to increase the successful rate?
4. What can be suggested to improve the reliability of the predicting model?

# **<span style="color:#ff9933">Process description</span>**

<span style="color:#ff9933">

**Part 1: Data collection by requesting API and web scrapping**
- Request data from SpaceX API using `requests` and webscrape from wikipedia using `BeautifulSoup`
- Clean & format data after collecting

</span>

**Part 2: Descriptive analysis using:**
- Make Python & SQL queries to explore the datasets
- Setting up a local SQL database(server)

**Part 3: Visualization:**
- Plotting with `matplotlib` and `seaborn`   
- Geographical visualization with `folium`   
- Build an Interactive dashboard with `dash` and `plotly.express` (in a separate dash app)

**Part 4: Machine Learning with `sklearn` (Classification):**
- Apply different techniques to enhance models' accuracy & correctness including:
    - Select features based on correlation strength + multicollinearity
    - Features engineering: convert categorical non-numerical data into numerical format
    - Train/test split with stratification to ensure data balance
    - Normalize data to ensure the variables have a standard scale
- Models: Logistic Regression, Support Vector Machine, Decision Trees, K nearest neighbors, XGBoost, Neural Networks
- After applying default models, conduct hyperparamter tuning with `GridSearchCV` to improve the models' performances

**Part 5: Discussion**
- Notable observations gained when anaylyze data
- The performance of the Machine Learning (ML) models
- The cons of the project and the dataset
- Improvements & suggestion

----


# **<span style="color:#ff9933">Libraries</span>**


In [None]:
# Basics & cores for our work + data manipulation
import pandas as pd
import numpy as np
import datetime

# Data collection
import requests
from bs4 import BeautifulSoup
import re
import unicodedata

# Suppress warnings
import warnings
warnings.filterwarnings("ignore")

# Print all collumns of a dataframe
pd.set_option('display.max_columns', None)
# Print all of the data in a feature
pd.set_option('display.max_colwidth', None)


In [None]:
# utils
import sys
from pathlib import Path

# Add the src directory to the system path
src_path = Path("../src").resolve()
if str(src_path) not in sys.path:
    sys.path.append(str(src_path))
    
from src import mysql_init, utils
import src.utils


# **<span style="color:#ff9933">Part 1: Data collection</span>**


In this project, we will collect 02 datasets using two different methods: Request API using `requests` and Webscraping with `BeautifulSoup()`.
- Dataset 1: `falcon9_technical`- Technical data of Falcon 9 launches
    - Data source: *api.spacexdata.com*
    - The data focus on technical aspects of each launch.
    - There is a focus on the core of the rocket and the landing pads, which may contribute to the outcome of each launch.

- Dataset 2: `falcon9_general` - General data of Falcon 9 launches
    - Data source: *wikipedia.com*
    - The data provides general information about Falcon 9 rocket launches, including time, booster version, launch site, payload, target orbit, customer and outcome. 
    - Flight data documents can be found at https://docs.spacexdata.com. In this project, we use V4 API.

## **A. Technical launch data collection by requesting API**


**Summary**
- We will use `requests` to request data from SpaceX API.
- However, due to many variables in requested data are presented as `id`. We will have to access their exact location in the SpaceX API to get the data with these `ids`.
- Convert the data into DataFrame and clean the data.
- Export the data in `.csv`.

### <span style="color:#ff9933">I. Request rocket launch data from SpaceX API</span>


#### a. Request rocket launch data from SpaceX API

- Directly connect to SpaceX API to make request.
- Use the alternative static response object to make the requested JSON results more consistent (at the time of this project).
- The alternative static response object was provided by the IBM course.


In [2]:
# SpaceX API
spacex_url="https://api.spacexdata.com/v4/launches/past"

# Alternative 1: Use this as alternative for a more consistent response
static_json_url='https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/API_call_spacex_api.json'

# Alternative 2: Load data from local directory

# Request for data with static url
response = requests.get(static_json_url)

# Check response status
response.status_code

200

*Status response code 200 = successfully connected*

#### b. Convert requested data to pandas dataframe

Decode the response content as a Json using `.json()` and turn it into a Pandas dataframe using `.json_normalize()`


In [3]:
# Use json_normalize meethod to convert the json result into a dataframe
data = pd.json_normalize(response.json())

# Get the head of the dataframe
data.head()


Unnamed: 0,static_fire_date_utc,static_fire_date_unix,tbd,net,window,rocket,success,details,crew,ships,capsules,payloads,launchpad,auto_update,failures,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,False,0.0,5e9d0d95eda69955f709d1eb,False,Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,True,"[{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}]",1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/3c/0e/T8iJcSN3_o.png,https://images2.imgbox.com/40/e3/GypSkayF_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html,https://en.wikipedia.org/wiki/DemoSat,
1,,,False,False,0.0,5e9d0d95eda69955f709d1eb,False,"Successful first stage burn and transition to second stage, maximum altitude 289 km, Premature engine shutdown at T+7 min 30 s, Failed to reach orbit, Failed to recover first stage",[],[],[],[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86,True,"[{'time': 301, 'altitude': 289, 'reason': 'harmonic oscillation leading to premature engine shutdown'}]",2,DemoSat,2007-03-21T01:10:00.000Z,1174439400,2007-03-21T13:10:00+12:00,hour,False,"[{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdaffd86e000604b32b,False,False,False,[],https://images2.imgbox.com/4f/e3/I0lkuJ2e_o.png,https://images2.imgbox.com/be/e7/iNqsqVYM_o.png,,,,,[],[],,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,https://www.space.com/3590-spacex-falcon-1-rocket-fails-reach-orbit.html,https://en.wikipedia.org/wiki/DemoSat,
2,,,False,False,0.0,5e9d0d95eda69955f709d1eb,False,Residual stage 1 thrust led to collision between stage 1 and stage 2,[],[],[],"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006eeb1e4]",5e9e4502f5090995de566f86,True,"[{'time': 140, 'altitude': 35, 'reason': 'residual stage-1 thrust led to collision between stage 1 and stage 2'}]",3,Trailblazer,2008-08-03T03:34:00.000Z,1217734440,2008-08-03T15:34:00+12:00,hour,False,"[{'core': '5e9e289ef3591814873b2625', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdbffd86e000604b32c,False,False,False,[],https://images2.imgbox.com/3d/86/cnu0pan8_o.png,https://images2.imgbox.com/4b/bd/d8UxLh4q_o.png,,,,,[],[],,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,http://www.spacex.com/news/2013/02/11/falcon-1-flight-3-mission-summary,https://en.wikipedia.org/wiki/Trailblazer_(satellite),
3,2008-09-20T00:00:00.000Z,1221869000.0,False,False,0.0,5e9d0d95eda69955f709d1eb,True,"Ratsat was carried to orbit on the first successful orbital launch of any privately funded and developed, liquid-propelled carrier rocket, the SpaceX Falcon 1",[],[],[],[5eb0e4b7b6c3bb0006eeb1e5],5e9e4502f5090995de566f86,True,[],4,RatSat,2008-09-28T23:15:00.000Z,1222643700,2008-09-28T11:15:00+12:00,hour,False,"[{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdbffd86e000604b32d,False,False,False,[],https://images2.imgbox.com/e9/c9/T8CfiSYb_o.png,https://images2.imgbox.com/e0/a7/FNjvKlXW_o.png,,,,,[],[],,https://www.youtube.com/watch?v=dLQ2tZEH6G0,dLQ2tZEH6G0,https://en.wikipedia.org/wiki/Ratsat,https://en.wikipedia.org/wiki/Ratsat,
4,,,False,False,0.0,5e9d0d95eda69955f709d1eb,True,,[],[],[],[5eb0e4b7b6c3bb0006eeb1e6],5e9e4502f5090995de566f86,True,[],5,RazakSat,2009-07-13T03:35:00.000Z,1247456100,2009-07-13T15:35:00+12:00,hour,False,"[{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",5eb87cdcffd86e000604b32e,False,False,False,[],https://images2.imgbox.com/a7/ba/NBZSw3Ho_o.png,https://images2.imgbox.com/8d/fc/0qdZMWWx_o.png,,,,,[],[],http://www.spacex.com/press/2012/12/19/spacexs-falcon-1-successfully-delivers-razaksat-satellite-orbit,https://www.youtube.com/watch?v=yTaIDooc8Og,yTaIDooc8Og,http://www.spacex.com/news/2013/02/12/falcon-1-flight-5,https://en.wikipedia.org/wiki/RazakSAT,


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107 entries, 0 to 106
Data columns (total 42 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   static_fire_date_utc       100 non-null    object 
 1   static_fire_date_unix      100 non-null    float64
 2   tbd                        107 non-null    bool   
 3   net                        107 non-null    bool   
 4   window                     100 non-null    float64
 5   rocket                     107 non-null    object 
 6   success                    107 non-null    bool   
 7   details                    100 non-null    object 
 8   crew                       107 non-null    object 
 9   ships                      107 non-null    object 
 10  capsules                   107 non-null    object 
 11  payloads                   107 non-null    object 
 12  launchpad                  107 non-null    object 
 13  auto_update                107 non-null    bool   

*Successfully collected the launch data. However, it can be noticed that some of the variables contains ID strings as their values, not the exact values that we expected*

#### c. Problem: Requested data were ID numbers, not exact values

- To fix this problem, we were suggested to make separate requests to each categories to get the data for each variables.
- Each observation will be identified with their current values (ID number)

- Reuse the API to get information using the IDs given for each launcn for these variables:
    - `rocket`
    - `payloads`
    - `launchpad`
    - `cores`


**Define a series of helper functions that will help us use the API to extract information using identification numbers in the launch data.**

In [None]:
def getBoosterVersion(data):
    """
    Takes the dataset and uses the `rocket` column (contains rocket IDs) to call the API and append the data to the list
    
    Parameters:
    data (DataFrame): The dataset to be used to receive data

    Returns:
    None
        Updates the `BoosterVersion` list by appending rocket names to it
    """
    for x in data['rocket']:
       if x:
        response = requests.get("https://api.spacexdata.com/v4/rockets/"+str(x)).json()
        BoosterVersion.append(response['name'])


def getLaunchSite(data):
    """
    Takes the dataset and uses the `launchpad` (contains launchpad IDs) column to call the API and append the data to the list.
    For each ID, longitude, latitude and launch site data will be requested and then appended to the list.
    
    Parameters:
    data (DataFrame): The dataset to be used to receive data

    Returns:
    None
        Updates the `Longitude`, `Latitude`, and `LaunchSite` lists with the respective launchpad details.
    """
    for x in data['launchpad']:
       if x:
         response = requests.get("https://api.spacexdata.com/v4/launchpads/"+str(x)).json()
         Longitude.append(response['longitude'])
         Latitude.append(response['latitude'])
         LaunchSite.append(response['name'])
  
       
def getPayloadData(data):
    """
    Takes the dataset and uses the `payloads` (contains payload IDs) column to call the API and append the data to the list.
    For each ID, `Payload Mass` and `Orbit` data will be requested and then appended to the list.
    
    Parameters:
    data (DataFrame): The dataset to be used to receive data

    Returns:
    None
        Updates the `PayloadMass` and `Orbit` lists with the respective payload details.
    """
    for load in data['payloads']:
       if load:
        response = requests.get("https://api.spacexdata.com/v4/payloads/"+load).json()
        PayloadMass.append(response['mass_kg'])
        Orbit.append(response['orbit'])

        
def getCoreData(data):
    """
    Takes the dataset with a 'cores' column containing dictionaries of core details. 
    - Each dictionary must include a 'core' key for the core ID.
    The function calls the SpaceX API for each core ID to fetch the core's data, 
    appending specific attributes (block number, reuse count, and serial number)  to the respective global lists. 
    - For cores without an ID, it appends `None` values. 
    Then, it appends landing success, landing type, flight number, grid fins presence, reuse status,  legs presence, and landing pad 
    to their respective global lists.
    
    Parameters:
    data (DataFrame): The dataset to be used to receive data

    Returns:
    None
        Updates `Block`, `ReusedCount`, `Serial`, `Outcome`, `Flights`, `GridFins`, `Reused`, `Legs`, `LandingPad` lists 
        with the respective core details.
    """
    for core in data['cores']:
            if core['core'] != None:
                response = requests.get("https://api.spacexdata.com/v4/cores/"+core['core']).json()
                Block.append(response['block'])
                ReusedCount.append(response['reuse_count'])
                Serial.append(response['serial'])
            else:
                Block.append(None)
                ReusedCount.append(None)
                Serial.append(None)
            # Append additional core landing and flight details
            Outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
            Flights.append(core['flight'])
            GridFins.append(core['gridfins'])
            CoreReused.append(core['reused'])
            Legs.append(core['legs'])
            LandingPad.append(core['landpad'])

**Prepare data dataframe to request data**

In [6]:
# Take a subset of the dataframe keeping only the features we want and the flight number, and date_utc.
data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

# Remove rows with multiple cores because those are falcon rockets with 2 extra rocket boosters 
# and rows that have multiple payloads in a single rocket.
data = data[data['cores'].map(len)==1]
data = data[data['payloads'].map(len)==1]

# Since payloads and cores are lists of size 1 we will also extract the single value in the list and replace the feature.
data['cores'] = data['cores'].map(lambda x : x[0])
data['payloads'] = data['payloads'].map(lambda x : x[0])

# We also want to convert the date_utc to a datetime datatype and then extracting the date leaving the time
data['date'] = pd.to_datetime(data['date_utc']).dt.date

# Using the date we will restrict the dates of the launches
data = data[data['date'] <= datetime.date(2020, 11, 13)]


* From the `rocket` we would like to learn the booster name

* From the `payload` we would like to learn the mass of the payload and the orbit that it is going to

* From the `launchpad` we would like to know the name of the launch site being used, the longitude, and the latitude.

* From `cores` we would like to learn 
    - The outcome of the landing
    - The type of the landing
    - Number of flights with that core
    - Whether gridfins were used
    - Whether the core is reused
    - Whether legs were used
    - The landing pad used
    - The block of the core which is a number used to seperate version of cores
    - The number of times this specific core has been reused, and the serial of the core.

The data from these requests will be stored in lists and will be used to create a new dataframe.


In [7]:
# Global variables 
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
CoreReused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

**Use defined auxiliary functions to get data**

In [None]:
# Call auxiliary functions to get data
getBoosterVersion(data)
getLaunchSite(data)
getPayloadData(data)
getCoreData(data)

Recheck if the data was successfully updated

- *If the data has been successfully updated, the name of the Booster version can be seen, such as 'Falcon 1', 'Falcon 9',...*

- *If the data hasn't been updated, the list will stay empty*

In [10]:
# Recheck if the data was updated
BoosterVersion[0:5]

['Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 9']

**Construct the final dataset using the data we have obtained by combining the columns into a dictionary.**


In [11]:
launch_dict = {
    'FlightNumber': list(data['flight_number']),
    'Date': list(data['date']),
    'BoosterVersion':BoosterVersion,
    'PayloadMass':PayloadMass,
    'Orbit':Orbit,
    'LaunchSite':LaunchSite,
    'Outcome':Outcome,
    'Flights':Flights,
    'GridFins':GridFins,
    'CoreReused':CoreReused,
    'Legs':Legs,
    'LandingPad':LandingPad,
    'Block':Block,
    'ReusedCount':ReusedCount,
    'Serial':Serial,
    'Longitude': Longitude,
    'Latitude': Latitude
}

Create a data frame from the dictionary launch_dict.


In [12]:
# Create a data frame from launch_dict
df = pd.DataFrame(launch_dict)

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 94 entries, 0 to 93
Data columns (total 17 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   FlightNumber    94 non-null     int64  
 1   Date            94 non-null     object 
 2   BoosterVersion  94 non-null     object 
 3   PayloadMass     88 non-null     float64
 4   Orbit           94 non-null     object 
 5   LaunchSite      94 non-null     object 
 6   Outcome         94 non-null     object 
 7   Flights         94 non-null     int64  
 8   GridFins        94 non-null     bool   
 9   CoreReused      94 non-null     bool   
 10  Legs            94 non-null     bool   
 11  LandingPad      64 non-null     object 
 12  Block           90 non-null     float64
 13  ReusedCount     94 non-null     int64  
 14  Serial          94 non-null     object 
 15  Longitude       94 non-null     float64
 16  Latitude        94 non-null     float64
dtypes: bool(3), float64(4), int64(3), obj

Show the summary of the dataframe


In [14]:
# Show the head of the dataframe
df.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,CoreReused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1A,167.743129,9.047721
1,2,2007-03-21,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2A,167.743129,9.047721
2,4,2008-09-28,Falcon 1,165.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2C,167.743129,9.047721
3,5,2009-07-13,Falcon 1,200.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin3C,167.743129,9.047721
4,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857


### <span style="color:#ff9933">II. Data Wrangling</span>


#### a. Filter the dataframe to only include `Falcon 9` launches


By observing the data, it is noticable that the data includes both Falcon 1 and Falcon 9 launches. Filter the data dataframe using the <code>BoosterVersion</code> column to only keep the Falcon 9 launches. Save the filtered data to a new dataframe called <code>data_falcon9</code>.


In [15]:
# Filter: remove Falcon 1 launches to only keep the Falcon 9
falcon9_technical = df[df['BoosterVersion'] != 'Falcon 1']

Now that we have removed some values we should reset the FlgihtNumber column


In [16]:
falcon9_technical.loc[:,'FlightNumber'] = list(range(1, falcon9_technical.shape[0]+1))
falcon9_technical

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,CoreReused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89,86,2020-09-03,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,2,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,12,B1060,-80.603956,28.608058
90,87,2020-10-06,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,3,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,13,B1058,-80.603956,28.608058
91,88,2020-10-18,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,6,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,12,B1051,-80.603956,28.608058
92,89,2020-10-24,Falcon 9,15600.0,VLEO,CCSFS SLC 40,True ASDS,3,True,True,True,5e9e3033383ecbb9e534e7cc,5.0,12,B1060,-80.577366,28.561857


In [17]:
falcon9_technical.shape[0]

90

#### b. Deal with missing values

##### **Check for nulls & missing values**

Call a function to report missing values

In [None]:
utils.report_missing(falcon9_technical)

Unnamed: 0,null_count,blank_count,total_missing,null_percent,blank_percent,total_missing_percent
LandingPad,26,0,26,28.89,0.0,28.89
PayloadMass,5,0,5,5.56,0.0,5.56
FlightNumber,0,0,0,0.0,0.0,0.0
CoreReused,0,0,0,0.0,0.0,0.0
Longitude,0,0,0,0.0,0.0,0.0
Serial,0,0,0,0.0,0.0,0.0
ReusedCount,0,0,0,0.0,0.0,0.0
Block,0,0,0,0.0,0.0,0.0
Legs,0,0,0,0.0,0.0,0.0
GridFins,0,0,0,0.0,0.0,0.0


*Observing the results, it is visible that there are 2 variables with missing values: `PayloadMass` and `LandingPad`*
- The `LandingPad` column will retain None values to represent when landing pads were not used.
- Because missing values don't take a very large proportion,`PayloadMass` null values will be replaced by the mean of `PayloadMass`


##### **Replace null values of `PayloadMass`**


Calculate below the mean for the <code>PayloadMass</code> using the <code>.mean()</code>. Then use the mean and the <code>.replace()</code> function to replace `np.nan` values in the data with the mean calculated.


In [None]:
# Calculate the mean value of PayloadMass column
PayloadMass_mean = falcon9_technical['PayloadMass'].mean()

PayloadMass_mean

# Replace the np.nan values with its mean value
falcon9_technical['PayloadMass'] = falcon9_technical['PayloadMass'].replace(
    np.nan, PayloadMass_mean
)

# Recheck
utils.report_missing(falcon9_technical)

Unnamed: 0,null_count,blank_count,total_missing,null_percent,blank_percent,total_missing_percent
LandingPad,26,0,26,28.89,0.0,28.89
FlightNumber,0,0,0,0.0,0.0,0.0
CoreReused,0,0,0,0.0,0.0,0.0
Longitude,0,0,0,0.0,0.0,0.0
Serial,0,0,0,0.0,0.0,0.0
ReusedCount,0,0,0,0.0,0.0,0.0
Block,0,0,0,0.0,0.0,0.0
Legs,0,0,0,0.0,0.0,0.0
GridFins,0,0,0,0.0,0.0,0.0
Date,0,0,0,0.0,0.0,0.0


*Missing values of `PayLoadMass` change to zero.*


#### c. Create a landing outcome label from `Outcome` column

**Create a mission outcomes standing list ordered by the number of mission**


In [21]:
landing_outcomes = falcon9_technical['Outcome'].unique()
landing_outcomes

array(['None None', 'False Ocean', 'True Ocean', 'False ASDS',
       'None ASDS', 'True RTLS', 'True ASDS', 'False RTLS'], dtype=object)

In [22]:
for i,outcome in enumerate(landing_outcomes):
    print(i,outcome)

0 None None
1 False Ocean
2 True Ocean
3 False ASDS
4 None ASDS
5 True RTLS
6 True ASDS
7 False RTLS


- <code>True Ocean</code>: the mission outcome was successfully  landed to a specific region of the ocean.
- <code>False Ocean</code>: the mission outcome was unsuccessfully landed to a specific region of the ocean. 
- <code>True RTLS</code>: the mission outcome was successfully landed to a ground pad.
- <code>False RTLS</code>: the mission outcome was unsuccessfully landed to a ground pad.
- <code>True ASDS</code>: the mission outcome was successfully  landed to a drone ship.
- <code>False ASDS</code>: the mission outcome was unsuccessfully landed to a drone ship. 
- <code>None ASDS</code> and <code>None None</code>: a failure to land.


**Create a set of outcomes where the second stage did not land successfully (Bad outcomes)**


In [23]:
bad_outcomes = set(landing_outcomes[[0,1,3,4,7]])
bad_outcomes

{'False ASDS', 'False Ocean', 'False RTLS', 'None ASDS', 'None None'}

Using the <code>Outcome</code>,  create a list where the element is zero if the corresponding  row  in  <code>Outcome</code> is in the set <code>bad_outcome</code>; otherwise, it's one. Then assign it to the variable <code>landing_class</code>:


In [24]:
# landing_class = 0 if bad_outcome
# landing_class = 1 otherwise

landing_class = []
for badoutcome in falcon9_technical['Outcome']:
    if badoutcome in bad_outcomes:
        landing_class.append(0)
    else:
        landing_class.append(1)
        
falcon9_technical['Class'] = landing_class
falcon9_technical[['Class']].head(8)

Unnamed: 0,Class
4,0
5,0
6,0
7,0
8,0
9,0
10,1
11,1


This variable will represent the classification variable that represents the outcome of each launch. If the value is zero, the  first stage did not land successfully; one means  the first stage landed Successfully 


#### d. Export data

We can now export it to a <b>CSV</b> for the next section,but to make the answers consistent, in the next lab we will provide data in a pre-selected date range. 


In [25]:
falcon9_technical.to_csv(
    'dataset/Falcon9_technical.csv', 
    index = False
)

### <span style="color:#ff9933">III. Data Description</span>


| Variable          | Description                                   |
|-------------------|-----------------------------------------------|
| FlightNumber      | Number of the launch in order of date         |
| Date              | Date of the launch                            |
| BoosterVersion    | Version of the booster used in the rocket     |
| PayloadMass       | Total mass of the payload carried             |
| Orbit             | Type of payload orbit to be launched into     |
| LaunchSite        | Launching site used                           |
| Outcome           | Outcome and launch mission                    |
| Flights           | Number of previous core flights               |
| GridFins          | If grid fins were used                        |
| CoreReused        | If the core was reused                        |
| Legs              | If legs were used in landing                  |
| LandingPad        | If landing pad was used                       |
| Block             | Core block number                             |
| ReusedCount       | Number of time the core was reused            |
| Serial            | Core serial number                            |
| Longitude         | Longitude of the launch                       |
| Latitude          | Latitude of the launch                        |


## **B. Falcon 9 general launch data collection by HTML webscraping**


**Summary**
- We will use `requests` to request data from Wikipedia.
- Create a BeautifulSoup object from the response, then extract data from the object into dictionaries
- Then convert the dictionaries into a data frame, export in `.csv`

### <span style="color:#ff9933">I. Request the Falcon9 Launch Wiki page from its URL</span>


To retain consistency, it's required to scrape the data from a snapshot of the  `List of Falcon 9 and Falcon Heavy launches` Wikipage updated on
`9th June 2021`


**Conduct HTTP GET method to request the Falcon9 Launch HTML page, as an HTTP response.**


In [None]:
static_url = "https://en.wikipedia.org/w/index.php?title=List_of_Falcon_9_and_Falcon_Heavy_launches&oldid=1027686922"

# Use requests.get() method with the provided static_url
# Assign the response to a object
response  = requests.get(static_url).text

**Create a `BeautifulSoup` object from the HTML `response`**


In [27]:
# Use BeautifulSoup() to create a BeautifulSoup object from a response text content
soup = BeautifulSoup(response, 'html5lib')

# Print the page title to verify if the `BeautifulSoup` object was created properly 
soup.title

<title>List of Falcon 9 and Falcon Heavy launches - Wikipedia</title>

### <span style="color:#ff9933">II. Extract all column/variable names from the HTML table header</span>


Collect all relevant column names from the HTML table header


**Find all tables on the wiki page using `BeautifulSoup`. Starting from the third table is our target table contains the actual launch records.**


In [None]:
# Use the find_all function in the BeautifulSoup object, with element type `table`
# Assign the result to a list called `html_tables`
html_tables = soup.find_all('table')
first_launch_table = html_tables[2]

**Iterate through the `<th>` elements and apply the provided `extract_column_from_header()` to extract column name one by one**


In [None]:
column_names = []

# Apply find_all() function with `th` element on first_launch_table
# Iterate each th element and apply the provided extract_column_from_header() to get a column name
# Append the Non-empty column name (`if name is not None and len(name) > 0`) into a list called column_names

for tableHeader in first_launch_table.find_all('th'):
    column = utils.extract_column_from_header(tableHeader)
    if (column != None and len(column) > 0):
        column_names.append(column)
        
# Check the extracted column names
print(column_names)


['Flight No.', 'Date and time ( )', 'Launch site', 'Payload', 'Payload mass', 'Orbit', 'Customer', 'Launch outcome']


### <span style="color:#ff9933">III. Create a data frame by parsing the launch HTML tables</span>


**Create an empty dictionary with keys from the extracted column names. Later, this dictionary will be converted into a Pandas dataframe**


In [30]:
launch_dict = dict.fromkeys(column_names)

# Remove an irrelvant column
del launch_dict['Date and time ( )']

# Initial the launch_dict with each value to be an empty list
launch_dict['Flight No.'] = []
launch_dict['Launch site'] = []
launch_dict['Payload'] = []
launch_dict['Payload mass'] = []
launch_dict['Orbit'] = []
launch_dict['Customer'] = []
launch_dict['Launch outcome'] = []
# Add some new required columns
launch_dict['Version Booster']=[]
launch_dict['Booster landing']=[]
launch_dict['Date']=[]
launch_dict['Time']=[]

*We now have an empty `launch_dict` dictionary with required keys. Fill up the `launch_dict` with launch records extracted from table rows.*


**Define some functions to extract string data from a requested HTML table cell**

In [31]:
def date_time(table_cells):
    """
    This function returns the data and time from the HTML table cell
    
    Parameter: 
    table_cells: The element of a table data cell extracts extra row
    
    Return:
    list of str: A list contains 2 strings, first str is data, second string is time
    """
    return [data_time.strip() for data_time in list(table_cells.strings)][0:2]

def booster_version(table_cells):
    """
    This function returns the booster version from the HTML  table cell 
    
    Parameter: 
    table_cells: The element of a table data cell extracts extra row
    
    Return:
    out: A string formed by concatenating alternate strings found in the table cell, excluding the last.
    """
    out=''.join([booster_version for i,booster_version in enumerate( table_cells.strings) if i%2==0][0:-1])
    return out

def landing_status(table_cells):
    """
    This function returns the landing status from the HTML table cell 
    
    Parameter: 
    table_cells: The element of a table data cell extracts extra row
    
    Return:
    out: The first string found in the table cell, assumed to be the landing status of the booster.
    """
    out=[i for i in table_cells.strings][0]
    return out


def get_mass(table_cells):
    """
    This function returns the mass (with kg) from the HTML table cell 
    
    Parameter: 
    table_cells: The element of a table data cell extracts extra row
    
    Return:
    new_mass: Mass as string (including the "kg" suffix), or '0' if no mass is found.
    """
    mass=unicodedata.normalize("NFKD", table_cells.text).strip()
    if mass:
        mass.find("kg")
        new_mass = mass[0:mass.find("kg")+2]
    else:
        new_mass = 0
    return new_mass


Usually, HTML tables in Wiki pages are likely to contain unexpected annotations and other types of noises, such as reference links `B0004.1[8]`, missing values `N/A [e]`, inconsistent formatting, etc.


**Parse the data from wikipedia to fill up `launch_dict`**

In [32]:
extracted_row = 0
# Extract each table 
for table_number,table in enumerate(soup.find_all('table',"wikitable plainrowheaders collapsible")):
   # Get table row 
    for rows in table.find_all("tr"):
        # Check to see if first table heading is as number corresponding to launch a number 
        if rows.th:
            if rows.th.string:
                flight_number = rows.th.string.strip()
                flag = flight_number.isdigit()
        else:
            flag = False
        # Get table element 
        row = rows.find_all('td')
        # If it is number save cells in a dictonary 
        if flag:
            extracted_row += 1
            # Flight Number value
            # Append the flight_number into launch_dict with key `Flight No.`
            launch_dict['Flight No.'].append(flight_number)
            #print(flight_number)
            datatimelist=date_time(row[0])
            
            # Date value
            # Append the date into launch_dict with key `Date`
            date = datatimelist[0].strip(',')
            launch_dict['Date'].append(date)
            #print(date)
            
            # Time value
            # Append the time into launch_dict with key `Time`
            time = datatimelist[1]
            launch_dict['Time'].append(time)
            #print(time)
              
            # Booster version
            # Append the bv into launch_dict with key `Version Booster`
            bv = booster_version(row[1])
            if not(bv):
                bv=row[1].a.string
            launch_dict['Version Booster'].append(bv)
            #print(bv)
            
            # Launch Site
            # Append the launch_site into launch_dict with key `Launch site`
            launch_site = row[2].a.string
            launch_dict['Launch site'].append(launch_site)
            #print(launch_site)
            
            # Payload
            # Append the payload into launch_dict with key `Payload`
            payload = row[3].a.string
            launch_dict['Payload'].append(payload)
            #print(payload)
            
            # Payload Mass
            # Append the payload_mass into launch_dict with key `Payload mass`
            payload_mass = get_mass(row[4])
            launch_dict['Payload mass'].append(payload_mass)
            #print(payload)
            
            # Orbit
            # Append the orbit into launch_dict with key `Orbit`
            orbit = row[5].a.string
            launch_dict['Orbit'].append(orbit)
            #print(orbit)
            
            # Customer
            # Append the customer into launch_dict with key `Customer`
            customer = row[6].text.strip()
            launch_dict['Customer'].append(customer)
            #print(customer)
            
            # Launch outcome
            # Append the launch_outcome into launch_dict with key `Launch outcome`
            launch_outcome = list(row[7].strings)[0]
            launch_outcome.replace('\n', ' ').strip()  # Replace newlines with space and strip any leading/trailing whitespace
            launch_dict['Launch outcome'].append(launch_outcome)

            #print(launch_outcome)
            
            # Booster landing
            # Append the booster_landing into launch_dict with key `Booster landing`
            booster_landing = landing_status(row[8])
            launch_dict['Booster landing'].append(booster_landing)
            #print(booster_landing)
            

**Create a pandas dataframe from the parsed launch record dictionary**

In [33]:
falcon9_general = pd.DataFrame({ key:pd.Series(value) for key, value in launch_dict.items() })
falcon9_general

Unnamed: 0,Flight No.,Launch site,Payload,Payload mass,Orbit,Customer,Launch outcome,Version Booster,Booster landing,Date,Time
0,1,CCAFS,Dragon Spacecraft Qualification Unit,0,LEO,SpaceX,Success\n,F9 v1.0B0003.1,Failure,4 June 2010,18:45
1,2,CCAFS,Dragon,0,LEO,".mw-parser-output .plainlist ol,.mw-parser-output .plainlist ul{line-height:inherit;list-style:none;margin:0;padding:0}.mw-parser-output .plainlist ol li,.mw-parser-output .plainlist ul li{margin-bottom:0}\nNASA (COTS)\nNRO",Success,F9 v1.0B0004.1,Failure,8 December 2010,15:43
2,3,CCAFS,Dragon,525 kg,LEO,NASA (COTS),Success,F9 v1.0B0005.1,No attempt\n,22 May 2012,07:44
3,4,CCAFS,SpaceX CRS-1,"4,700 kg",LEO,NASA (CRS),Success\n,F9 v1.0B0006.1,No attempt,8 October 2012,00:35
4,5,CCAFS,SpaceX CRS-2,"4,877 kg",LEO,NASA (CRS),Success\n,F9 v1.0B0007.1,No attempt\n,1 March 2013,15:10
...,...,...,...,...,...,...,...,...,...,...,...
116,117,CCSFS,Starlink,"15,600 kg",LEO,SpaceX,Success\n,F9 B5B1051.10,Success,9 May 2021,06:42
117,118,KSC,Starlink,"~14,000 kg",LEO,SpaceX Capella Space and Tyvak,Success\n,F9 B5B1058.8,Success,15 May 2021,22:56
118,119,CCSFS,Starlink,"15,600 kg",LEO,SpaceX,Success\n,F9 B5B1063.2,Success,26 May 2021,18:59
119,120,KSC,SpaceX CRS-22,"3,328 kg",LEO,NASA (CRS),Success\n,F9 B5B1067.1,Success,3 June 2021,17:29


*Observing the data, it's clear that it needs to be cleaned*.   
We can see that:
- Customer has an observation which still include CSS style tags.
- Many observation in Launch outcome and Booster landing has `\n` in its value.
- In `Launch site` variable:
    - All launches in Cape Canaveral Air Force Station should be noted as 'CCAFS' not in both 'Cape Canaveral' and 'CCAFS' at a same time
    - From 2020, Cape Canaveral Air Force Station changed its name into Cape Canaveral Space Force Station
    - Fix this problem by convert all observation with 'Cape Canaveral' and 'CCAFS' into 'CCSFS'

**Start cleaning the data frame**

### <span style="color:#ff9933">IV. Data cleaning</span>


**Look for missing values in the general data with the pre-defined function**

In [None]:
utils.report_missing(falcon9_general)

Unnamed: 0,null_count,blank_count,total_missing,null_percent,blank_percent,total_missing_percent
Flight No.,0,0,0,0.0,0.0,0.0
Launch site,0,0,0,0.0,0.0,0.0
Payload,0,0,0,0.0,0.0,0.0
Payload mass,0,0,0,0.0,0.0,0.0
Orbit,0,0,0,0.0,0.0,0.0
Customer,0,0,0,0.0,0.0,0.0
Launch outcome,0,0,0,0.0,0.0,0.0
Version Booster,0,0,0,0.0,0.0,0.0
Booster landing,0,0,0,0.0,0.0,0.0
Date,0,0,0,0.0,0.0,0.0


*There is no missing data found*

**Start cleaning the dataset based on what we observed**

In [35]:
# Change the observation that still has CSS style tag in its value
falcon9_general.iloc[1,5] = 'NASA (COTS) NRO'

# Get rid of the '\n' in observations
cleaning_cols = ['Launch outcome', 'Booster landing']
for column in falcon9_general[cleaning_cols]:
    falcon9_general[column] = falcon9_general[column].str.replace('\n', ' ')

# Convert 'Date' variable from strings to date
falcon9_general['Date'] = pd.to_datetime(falcon9_general['Date'])

# Convert 'Cape Canaveral', 'CCAFS' launch site name into code 'CCSFS'
falcon9_general['Launch site'] = falcon9_general['Launch site'].replace(['CCAFS', 'Cape Canaveral'], 'CCSFS')
falcon9_general

Unnamed: 0,Flight No.,Launch site,Payload,Payload mass,Orbit,Customer,Launch outcome,Version Booster,Booster landing,Date,Time
0,1,CCSFS,Dragon Spacecraft Qualification Unit,0,LEO,SpaceX,Success,F9 v1.0B0003.1,Failure,2010-06-04,18:45
1,2,CCSFS,Dragon,0,LEO,NASA (COTS) NRO,Success,F9 v1.0B0004.1,Failure,2010-12-08,15:43
2,3,CCSFS,Dragon,525 kg,LEO,NASA (COTS),Success,F9 v1.0B0005.1,No attempt,2012-05-22,07:44
3,4,CCSFS,SpaceX CRS-1,"4,700 kg",LEO,NASA (CRS),Success,F9 v1.0B0006.1,No attempt,2012-10-08,00:35
4,5,CCSFS,SpaceX CRS-2,"4,877 kg",LEO,NASA (CRS),Success,F9 v1.0B0007.1,No attempt,2013-03-01,15:10
...,...,...,...,...,...,...,...,...,...,...,...
116,117,CCSFS,Starlink,"15,600 kg",LEO,SpaceX,Success,F9 B5B1051.10,Success,2021-05-09,06:42
117,118,KSC,Starlink,"~14,000 kg",LEO,SpaceX Capella Space and Tyvak,Success,F9 B5B1058.8,Success,2021-05-15,22:56
118,119,CCSFS,Starlink,"15,600 kg",LEO,SpaceX,Success,F9 B5B1063.2,Success,2021-05-26,18:59
119,120,KSC,SpaceX CRS-22,"3,328 kg",LEO,NASA (CRS),Success,F9 B5B1067.1,Success,2021-06-03,17:29


In `Payload mass` variable:
- All value have a suffix 'kg'
- Many observations have estimated data, for example: '5,000-6,000 kg', '~12,500 kg'
- Unknown value, for example: 'C'

**We will apply these fixes:**
- Remove observations that have 'C' as their values.
- Convert the values presented in range `A-B kg` into one value of the mean of A and B
- Remove ',' and suffix 'kg' in each observation


In [36]:
# Function to calculate the mean of two comma-separated numbers
def calc_mean(num1, num2):
    """
    Convert 2 input values into integers, remove excess ',' and calculate their mean.
    
    Parameters:
    - num1 (string): The first input number
    - num2 (string): The second input number
    
    Returns:
        Mean of num1 and num2
    """
    # Convert numbers with commas into integers
    num1 = int(num1.replace(',', ''))
    num2 = int(num2.replace(',', ''))
    # Calculate the mean
    return (num1 + num2) // 2

# Function to replace ranges with their mean in a text
def replace_range_with_mean(var):
    """
    Replace numeric ranges in a string with their mean. The function looks for patterns of two
    consecutive four-digit numbers and replaces them with their mean.

    Parameters:
    var (str): The string containing numeric ranges.

    Returns:
    str: A string where numeric ranges have been replaced with their mean.
    
    Note:
    This function assumes that the input string contains exactly two consecutive four-digit numbers.
    """
    var = str(var)
    # Regex pattern to find numeric ranges
    pattern = r'(\d{4})(\d{4})'
    matches = re.findall(pattern, var)
    for match in matches:
        mean = calc_mean(match[0], match[1])  # Calculate mean
        range_string = f"{match[0]}{match[1]}"  # Form the original matched string
        var = re.sub(range_string, str(mean), var)  # Replace the original string with mean
        var = int(var)
    return var

def payload_mass_clean(df, col):
    """
    Clean and transform the 'Payload mass' column. 
    Involves removing invalid rows and converting string ranges to numeric means.
    
    Parameters:
    - df (DataFrame): The DataFrame containing the 'Payload mass' column.
    - col (str): The name of the column to clean ('Payload mass').

    Returns:
    DataFrame: The modified DataFrame with the 'Payload mass' column cleaned.
    
    Details:
    - Rows where 'Payload mass' equals 'C' are removed.
    - Non-digit characters, including commas and 'kg', are stripped from the 'Payload mass' values.
    - Numeric ranges are converted to their mean values.
    """
    # Remove rows that have Payload mass = 'C'
    df = df[df[col] != 'C']
    # Convert Payload mass into numerical value, remove ',' and 'kg'
    df[col] = df[col].replace(
        r'[^\d]',
        '',
        regex = True
    )
    # Replace a value in range into the range's mean
    df[col] = df[col].apply(replace_range_with_mean)
    # Final data
    return df

falcon9_general = payload_mass_clean(falcon9_general, 'Payload mass')
falcon9_general

Unnamed: 0,Flight No.,Launch site,Payload,Payload mass,Orbit,Customer,Launch outcome,Version Booster,Booster landing,Date,Time
0,1,CCSFS,Dragon Spacecraft Qualification Unit,0,LEO,SpaceX,Success,F9 v1.0B0003.1,Failure,2010-06-04,18:45
1,2,CCSFS,Dragon,0,LEO,NASA (COTS) NRO,Success,F9 v1.0B0004.1,Failure,2010-12-08,15:43
2,3,CCSFS,Dragon,525,LEO,NASA (COTS),Success,F9 v1.0B0005.1,No attempt,2012-05-22,07:44
3,4,CCSFS,SpaceX CRS-1,4700,LEO,NASA (CRS),Success,F9 v1.0B0006.1,No attempt,2012-10-08,00:35
4,5,CCSFS,SpaceX CRS-2,4877,LEO,NASA (CRS),Success,F9 v1.0B0007.1,No attempt,2013-03-01,15:10
...,...,...,...,...,...,...,...,...,...,...,...
116,117,CCSFS,Starlink,15600,LEO,SpaceX,Success,F9 B5B1051.10,Success,2021-05-09,06:42
117,118,KSC,Starlink,14000,LEO,SpaceX Capella Space and Tyvak,Success,F9 B5B1058.8,Success,2021-05-15,22:56
118,119,CCSFS,Starlink,15600,LEO,SpaceX,Success,F9 B5B1063.2,Success,2021-05-26,18:59
119,120,KSC,SpaceX CRS-22,3328,LEO,NASA (CRS),Success,F9 B5B1067.1,Success,2021-06-03,17:29


**As we will use the data for analysis, it's better to clean the variable names, make them easier to read and for queries, especially in SQL**

In [37]:
falcon9_general.rename(
    columns = {
        'Flight No.': 'flight_no',
        'Launch site': 'launch_site',
        'Payload': 'payload',
        'Payload mass': 'payload_mass',
        'Orbit': 'orbit',
        'Customer': 'customer',
        'Launch outcome': 'mission_outcome',
        'Version Booster': 'booster_version',
        'Booster landing': 'landing_outcome',
        'Date': 'date',
        'Time': 'time'
    },
    inplace = True
)
falcon9_general.columns

Index(['flight_no', 'launch_site', 'payload', 'payload_mass', 'orbit',
       'customer', 'mission_outcome', 'booster_version', 'landing_outcome',
       'date', 'time'],
      dtype='object')

**Export the dataframe**

In [38]:
falcon9_general.to_csv('dataset/Falcon9_general.csv', index = False)

### <span style="color:#ff9933">V. Data Description</span>



| Variable          | Description                                                               |
|-------------------|---------------------------------------------------------------------------|
| flight_no         | Number of the flight                                                      |
| launch_site       | The site we the rocket was launched                                       |
| payload           | Objects carried by the launched                                           |
| payload_mass      | Total weight of of the carried objects                                    |
| orbit             | Payload target orbit                                                      |
| customer          | Payload customer                                                          |
| mission_outcome   | The result of the launch                                                  |
| booster_version   | Version of the Falcon 9 booster used                                      |
| landing_outcome   | Whether if the booster landing attempted or not, and outcomes if attemped |
| date              | Date of the launch                                                        |
| time              | Time of the launch                                                        |