<a href="https://colab.research.google.com/github/indigorose/Analysing_the_Space_Race/blob/main/Space_Missions_Analysis_(start).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

<center><img src="https://i.imgur.com/9hLRsjZ.jpg" height=400></center>

This dataset was scraped from [nextspaceflight.com](https://nextspaceflight.com/launches/past/?page=1) and includes all the space missions since the beginning of Space Race between the USA and the Soviet Union in 1957!

### Install Package with Country Codes

In [2]:
%pip install iso3166

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting iso3166
  Downloading iso3166-2.1.1-py3-none-any.whl (9.8 kB)
Installing collected packages: iso3166
Successfully installed iso3166-2.1.1


### Upgrade Plotly

Run the cell below if you are working with Google Colab.

In [3]:
%pip install --upgrade plotly

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting plotly
  Downloading plotly-5.11.0-py2.py3-none-any.whl (15.3 MB)
[K     |████████████████████████████████| 15.3 MB 14.7 MB/s 
Installing collected packages: plotly
  Attempting uninstall: plotly
    Found existing installation: plotly 5.5.0
    Uninstalling plotly-5.5.0:
      Successfully uninstalled plotly-5.5.0
Successfully installed plotly-5.11.0


### Import Statements

In [4]:
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns

# These might be helpful:
from iso3166 import countries
from datetime import datetime, timedelta

### Notebook Presentation

In [5]:
pd.options.display.float_format = '{:,.2f}'.format

### Load the Data

In [7]:
df_data = pd.read_csv('mission_launches.csv')

# Preliminary Data Exploration

* What is the shape of `df_data`? 
* How many rows and columns does it have?
* What are the column names?
* Are there any NaN values or duplicates?

In [8]:
# Use .shape() to show the rows and columns of the data file
print(f'The shape of the data is {df_data.shape}')

The shape of the data is (4324, 9)


In [9]:
# Following on the the shape call above, we can see that there are 4324 rows and 9 columns. 
# Let's get details of those columns.
print(f'The columns are as follows: {df_data.columns}')

The columns are as follows: Index(['Unnamed: 0', 'Unnamed: 0.1', 'Organisation', 'Location', 'Date',
       'Detail', 'Rocket_Status', 'Price', 'Mission_Status'],
      dtype='object')


In [10]:
# To see if there are any duplicates or NaN values, we run the following checks. 
print(f'Are there any duplicates? - {df_data.duplicated().values.any()}')


Are there any duplicates? - False


## Data Cleaning - Check for Missing Values and Duplicates

Consider removing columns containing junk data. 

In [13]:
df_data_clean = df_data.drop(columns=['Unnamed: 0', 'Unnamed: 0.1'])


In [14]:
df_data_clean.isna()

Unnamed: 0,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status
0,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False
2,False,False,False,False,False,True,False
3,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...
4319,False,False,False,False,False,True,False
4320,False,False,False,False,False,True,False
4321,False,False,False,False,False,True,False
4322,False,False,False,False,False,True,False


In [16]:
# dropna() removes the duplicate values.
# This is checked with a shape call which updates us with 964 rows and 9 columns.
# We take this one step further removing two unnecessary columns.
df_data_clean = df_data_clean.dropna()
df_data_clean.shape




(964, 7)

## Descriptive Statistics

In [17]:
df_data_clean.describe()
# This will give us some details about the clean data

Unnamed: 0,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status
count,964,964,964,964,964,964.0,964
unique,25,56,963,962,2,56.0,4
top,CASC,"LC-39A, Kennedy Space Center, Florida, USA","Wed Nov 05, 2008 00:15 UTC",Long March 2D | Shiyan-3 & Chuangxin-1(02),StatusActive,450.0,Success
freq,158,120,2,2,586,136.0,910


In [18]:
df_data_clean.groupby('Mission_Status').count()
# This shows us that across the dataset, there were over 900 successful missions.

Unnamed: 0_level_0,Organisation,Location,Date,Detail,Rocket_Status,Price
Mission_Status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Failure,36,36,36,36,36,36
Partial Failure,17,17,17,17,17,17
Prelaunch Failure,1,1,1,1,1,1
Success,910,910,910,910,910,910


# Number of Launches per Company

Create a chart that shows the number of space mission launches by organisation.

In [43]:
"""The above shows us the most launches for a company. In this dataset, CASC
has the most launches at 158. We should see this reflected in the bar chart."""
launches = df_data_clean.groupby('Organisation').count()
launches.sort_values('Mission_Status', ascending=False).head()


Unnamed: 0_level_0,Location,Date,Detail,Rocket_Status,Price,Mission_Status
Organisation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CASC,158,158,158,158,158,158
NASA,149,149,149,149,149,149
SpaceX,99,99,99,99,99,99
ULA,98,98,98,98,98,98
Arianespace,96,96,96,96,96,96


In [61]:
"""Whilst the above code gives a an overall, what we really need is below.
This is a count of the values within the dataset """
companies_count = df_data_clean['Organisation'].value_counts()
companies_count[:5]

CASC           158
NASA           149
SpaceX          99
ULA             98
Arianespace     96
Name: Organisation, dtype: int64

In [62]:
c_count = pd.DataFrame({'id':companies_count.index, 'values':companies_count.values})
c_count.head()

Unnamed: 0,id,values
0,CASC,158
1,NASA,149
2,SpaceX,99
3,ULA,98
4,Arianespace,96


In [76]:
"""Even though we converted the data into a dataframe, it was one step too far.
Go back and use the original data and then convert to a bar graph."""
# TODO - add colours to the bar chart.
c_bar = px.bar(x = companies_count.index, y = companies_count.values, 
             title="Launches per Company")
c_bar.update_layout(xaxis_title="Companies", yaxis_title="No. of Launches")
c_bar.show()

# Number of Active versus Retired Rockets

How many rockets are active compared to those that are decomissioned? 

In [79]:
rocket_count = df_data_clean['Rocket_Status'].value_counts()
rocket_count

StatusActive     586
StatusRetired    378
Name: Rocket_Status, dtype: int64

In [80]:
r_bar = px.bar(x = rocket_count.index, y = rocket_count.values, 
             title="Number of Active Vs Retired Rockets")
r_bar.update_layout(xaxis_title="Status", yaxis_title="No. of Rockets")
r_bar.show()

# Distribution of Mission Status

How many missions were successful?
How many missions failed?

In [81]:
mission_count = df_data_clean['Mission_Status'].value_counts()
mission_count

Success              910
Failure               36
Partial Failure       17
Prelaunch Failure      1
Name: Mission_Status, dtype: int64

In [82]:
m_bar = px.bar(x = mission_count.index, y = mission_count.values, 
             title="Mission Status")
m_bar.update_layout(xaxis_title="Status", yaxis_title="No. of Missions")
m_bar.show()

# How Expensive are the Launches? 

Create a histogram and visualise the distribution. The price column is given in USD millions (careful of missing values). 

In [84]:
cost_count = df_data_clean['Price'].value_counts()
cost_count

450.0      136
200.0       75
40.0        55
62.0        41
30.8        38
109.0       37
50.0        34
64.68       34
29.75       33
90.0        32
41.8        31
48.5        26
29.15       25
31.0        22
29.0        22
59.0        22
69.7        17
21.0        16
65.0        16
35.0        16
56.5        15
37.0        15
164.0       15
7.5         14
1,160.0     13
47.0        13
25.0        12
350.0       11
153.0       11
45.0        10
112.5        9
5.3          9
123.0        8
145.0        7
85.0         7
120.0        7
80.0         7
115.0        6
59.5         5
7.0          5
46.0         5
136.6        4
63.23        4
140.0        3
133.0        3
190.0        3
130.0        3
135.0        2
5,000.0      2
39.0         2
55.0         1
15.0         1
20.14        1
20.0         1
12.0         1
28.3         1
Name: Price, dtype: int64

# Use a Choropleth Map to Show the Number of Launches by Country

* Create a choropleth map using [the plotly documentation](https://plotly.com/python/choropleth-maps/)
* Experiment with [plotly's available colours](https://plotly.com/python/builtin-colorscales/). I quite like the sequential colour `matter` on this map. 
* You'll need to extract a `country` feature as well as change the country names that no longer exist.

Wrangle the Country Names

You'll need to use a 3 letter country code for each country. You might have to change some country names.

* Russia is the Russian Federation
* New Mexico should be USA
* Yellow Sea refers to China
* Shahrud Missile Test Site should be Iran
* Pacific Missile Range Facility should be USA
* Barents Sea should be Russian Federation
* Gran Canaria should be USA


You can use the iso3166 package to convert the country names to Alpha3 format.

# Use a Choropleth Map to Show the Number of Failures by Country


# Create a Plotly Sunburst Chart of the countries, organisations, and mission status. 

# Analyse the Total Amount of Money Spent by Organisation on Space Missions

# Analyse the Amount of Money Spent by Organisation per Launch

# Chart the Number of Launches per Year

# Chart the Number of Launches Month-on-Month until the Present

Which month has seen the highest number of launches in all time? Superimpose a rolling average on the month on month time series chart. 

# Launches per Month: Which months are most popular and least popular for launches?

Some months have better weather than others. Which time of year seems to be best for space missions?

# How has the Launch Price varied Over Time? 

Create a line chart that shows the average price of rocket launches over time. 

# Chart the Number of Launches over Time by the Top 10 Organisations. 

How has the dominance of launches changed over time between the different players? 

# Cold War Space Race: USA vs USSR

The cold war lasted from the start of the dataset up until 1991. 

## Create a Plotly Pie Chart comparing the total number of launches of the USSR and the USA

Hint: Remember to include former Soviet Republics like Kazakhstan when analysing the total number of launches. 

## Create a Chart that Shows the Total Number of Launches Year-On-Year by the Two Superpowers

## Chart the Total Number of Mission Failures Year on Year.

## Chart the Percentage of Failures over Time

Did failures go up or down over time? Did the countries get better at minimising risk and improving their chances of success over time? 

# For Every Year Show which Country was in the Lead in terms of Total Number of Launches up to and including including 2020)

Do the results change if we only look at the number of successful launches? 

# Create a Year-on-Year Chart Showing the Organisation Doing the Most Number of Launches

Which organisation was dominant in the 1970s and 1980s? Which organisation was dominant in 2018, 2019 and 2020? 