This dataset was scraped from [nextspaceflight.com](https://nextspaceflight.com/launches/past/?page=1) and includes all the space missions since the beginning of Space Race between the USA and the Soviet Union in 1957 to August 7th 2020.

In [1]:
%pip install iso3166

Note: you may need to restart the kernel to use updated packages.


### Import Statements

In [2]:
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns

from iso3166 import countries
from datetime import datetime, timedelta

### Notebook Presentation

In [3]:
pd.options.display.float_format = '{:,.2f}'.format

### Load the Data

In [4]:
df_data = pd.read_csv('mission_launches.csv')

## Preliminary Data Exploration

Shape of the dataframe:

In [12]:
shape = df_data.shape
print(f"rows: {shape[0]}")
print(f"columns: {shape[1]}")

rows: 4324
columns: 9


Columns of the dataframe:

In [13]:
df_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4324 entries, 0 to 4323
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Unnamed: 0.1    4324 non-null   int64 
 1   Unnamed: 0      4324 non-null   int64 
 2   Organisation    4324 non-null   object
 3   Location        4324 non-null   object
 4   Date            4324 non-null   object
 5   Detail          4324 non-null   object
 6   Rocket_Status   4324 non-null   object
 7   Price           964 non-null    object
 8   Mission_Status  4324 non-null   object
dtypes: int64(2), object(7)
memory usage: 304.2+ KB


## Data Cleaning - Check for Missing Values and Duplicates


In [23]:
print(df_data.head())
print(df_data.tail())

   Unnamed: 0.1  Unnamed: 0 Organisation  \
0             0           0       SpaceX   
1             1           1         CASC   
2             2           2       SpaceX   
3             3           3    Roscosmos   
4             4           4          ULA   

                                            Location  \
0         LC-39A, Kennedy Space Center, Florida, USA   
1  Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...   
2                      Pad A, Boca Chica, Texas, USA   
3       Site 200/39, Baikonur Cosmodrome, Kazakhstan   
4           SLC-41, Cape Canaveral AFS, Florida, USA   

                         Date                                        Detail  \
0  Fri Aug 07, 2020 05:12 UTC  Falcon 9 Block 5 | Starlink V1 L9 & BlackSky   
1  Thu Aug 06, 2020 04:01 UTC           Long March 2D | Gaofen-9 04 & Q-SAT   
2  Tue Aug 04, 2020 23:57 UTC            Starship Prototype | 150 Meter Hop   
3  Thu Jul 30, 2020 21:25 UTC  Proton-M/Briz-M | Ekspress-80 & Ekspress-103   
4  

In [16]:
print(f"mssing values: {df_data.isna().values.any()}")
print(f"duplicates: {df_data.duplicated().values.any()}")

missing values: True
duplicates: False


In [22]:
print(f"number of missing values: {df_data.isna().value_counts()}")

number of missing values: Unnamed: 0.1  Unnamed: 0  Organisation  Location  Date   Detail  Rocket_Status  Price  Mission_Status
False         False       False         False     False  False   False          True   False             3360
                                                                                False  False              964
dtype: int64


Cleaned dataframe:

In [29]:
cleaned_df = df_data.dropna()
print(cleaned_df.head())

   Unnamed: 0.1  Unnamed: 0 Organisation  \
0             0           0       SpaceX   
1             1           1         CASC   
3             3           3    Roscosmos   
4             4           4          ULA   
5             5           5         CASC   

                                            Location  \
0         LC-39A, Kennedy Space Center, Florida, USA   
1  Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...   
3       Site 200/39, Baikonur Cosmodrome, Kazakhstan   
4           SLC-41, Cape Canaveral AFS, Florida, USA   
5       LC-9, Taiyuan Satellite Launch Center, China   

                         Date  \
0  Fri Aug 07, 2020 05:12 UTC   
1  Thu Aug 06, 2020 04:01 UTC   
3  Thu Jul 30, 2020 21:25 UTC   
4  Thu Jul 30, 2020 11:50 UTC   
5  Sat Jul 25, 2020 03:13 UTC   

                                              Detail Rocket_Status  Price  \
0       Falcon 9 Block 5 | Starlink V1 L9 & BlackSky  StatusActive   50.0   
1                Long March 2D | Gaofen-9 04 

*All the rows with NaN-values have been dropped. There were only NaN values in the price column.*

## Descriptive Statistics

# Number of Launches per Company

Chart that shows the number of space mission launches by organisation.

# Number of Active versus Retired Rockets

How many rockets are active compared to those that are decomissioned? 

# Distribution of Mission Status

How many missions were successful?
How many missions failed?

# How Expensive are the Launches? 

Histogram and visualise the distribution (price column in USD millions).

# Choropleth Map to Show the Number of Launches by Country

# Choropleth Map to Show the Number of Failures by Country


# Plotly Sunburst Chart of the countries, organisations, and mission status.

# Total Amount of Money Spent by Organisation on Space Missions

# Amount of Money Spent by Organisation per Launch

# Number of Launches per Year

Number of Launches Month-on-Month until the Present

Which month has seen the highest number of launches in all time?

# Launches per Month: Which months are most popular and least popular for launches?

Some months have better weather than others. Which time of year seems to be best for space missions?

# How has the Launch Price varied Over Time? 

Line chart that shows the average price of rocket launches over time.

# Number of Launches over Time by the Top 10 Organisations.

How has the dominance of launches changed over time between the different players? 

# Cold War Space Race: USA vs USSR

The cold war lasted from the start of the dataset up until 1991. 

## Plotly Pie Chart comparing the total number of launches of the USSR and the USA

## Chart that Shows the Total Number of Launches Year-On-Year by the Two Superpowers

## Total Number of Mission Failures Year on Year.

## Percentage of Failures over Time

Did failures go up or down over time? Did the countries get better at minimising risk and improving their chances of success over time? 

# Which Country was in the Lead in terms of Total Number of Launches up to and including 2020?

Do the results change if we only look at the number of successful launches? 

# Year-on-Year Chart Showing the Organisation Doing the Most Number of Launches

Which organisation was dominant in the 1970s and 1980s? Which organisation was dominant in 2018, 2019 and 2020? 