# Introduction

<center><img src="https://i.imgur.com/9hLRsjZ.jpg" height=400></center>

This dataset was scraped from [nextspaceflight.com](https://nextspaceflight.com/launches/past/?page=1) and includes all the space missions since the beginning of Space Race between the USA and the Soviet Union in 1957!

### Install Package with Country Codes

In [544]:
# %pip install iso3166

### Upgrade Plotly

Run the cell below if you are working with Google Colab.

In [545]:
# %pip install --upgrade plotly
# %pip install seaborn


### Import Statements

In [546]:
import numpy 
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt

# These might be helpful:
from iso3166 import countries
from datetime import datetime, timedelta

### Notebook Presentation

In [547]:
pd.options.display.float_format = '{:,.2f}'.format

### Load the Data

In [548]:
df_data = pd.read_csv('mission_launches.csv')

# Preliminary Data Exploration

* What is the shape of `df_data`? 
* How many rows and columns does it have?
* What are the column names?
* Are there any NaN values or duplicates?

In [549]:
print("data shape:{}".format(df_data.shape))
print('number of rows:{}'.format(df_data.shape[0]))
print('number of columns:{}'.format(df_data.shape[1]))
if df_data.isna().values.any():
    print('number of "NAN" values:{}'.format(df_data.isna().values.sum()))

data shape:(4324, 9)
number of rows:4324
number of columns:9
number of "NAN" values:3360


## Checkng for Missing Values and Duplicates

Consider removing columns containing junk data. 

In [550]:
# eliminating rows with missing values or NAN alues
df_data=df_data.dropna().drop_duplicates()

## Descriptive Statistics of the data 

In [551]:
df_data.describe()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0
count,964.0,964.0
mean,858.49,858.49
std,784.21,784.21
min,0.0,0.0
25%,324.75,324.75
50%,660.5,660.5
75%,1112.0,1112.0
max,4020.0,4020.0


In [552]:
# droping useles colunms
df_data=df_data.drop(columns=['Unnamed: 0.1', 'Unnamed: 0'],axis=1)
df_data.sort_values(by='Price')

Unnamed: 0,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status
3683,NASA,"LC-39A, Kennedy Space Center, Florida, USA","Thu Apr 04, 1968 12:00 UTC",Saturn V | Apollo 6,StatusRetired,1160.0,Partial Failure
3149,NASA,"LC-39A, Kennedy Space Center, Florida, USA","Mon May 14, 1973 17:30 UTC",Saturn V | Skylab 1,StatusRetired,1160.0,Success
3180,NASA,"LC-39A, Kennedy Space Center, Florida, USA","Tue Dec 19, 1972 19:24 UTC",Saturn V | Apollo 17,StatusRetired,1160.0,Success
3243,NASA,"LC-39A, Kennedy Space Center, Florida, USA","Sun Apr 16, 1972 17:54 UTC",Saturn V | Apollo 16,StatusRetired,1160.0,Success
3384,NASA,"LC-39A, Kennedy Space Center, Florida, USA","Sun Jan 31, 1971 21:03 UTC",Saturn V | Apollo 14,StatusRetired,1160.0,Success
...,...,...,...,...,...,...,...
510,MHI,"LA-Y1, Tanegashima Space Center, Japan","Thu Mar 26, 2015 01:21 UTC",H-IIA 202 | IGS-Optical 5,StatusActive,90.0,Success
365,MHI,"LA-Y1, Tanegashima Space Center, Japan","Fri Mar 17, 2017 01:20 UTC",H-IIA 202 | IGS-Radar 5,StatusActive,90.0,Success
146,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA","Thu Apr 11, 2019 22:35 UTC",Falcon Heavy | ArabSat 6A,StatusActive,90.0,Success
236,MHI,"LA-Y1, Tanegashima Space Center, Japan","Tue Jun 12, 2018 04:20 UTC",H-IIA 202 | IGS Radar-6,StatusActive,90.0,Success


# Number of Launches per Company

Create a chart that shows the number of space mission launches by organisation.

In [553]:
df_lunch=df_data.groupby(by=df_data.Organisation,as_index=True).agg({'Detail':pd.Series.count})
df_lunch.head()

Unnamed: 0_level_0,Detail
Organisation,Unnamed: 1_level_1
Arianespace,96
Boeing,7
CASC,158
EER,1
ESA,1


In [554]:
# df_lunch_success=df_data.groupby(['Organisation','Mission_Status']).count()
# df_lunch_success

In [555]:

fig=px.bar(df_lunch,color=df_lunch.Detail.values, title='Number OF SPACE Lunch Attampt By Space Companies:',height=500, width=800, template='plotly_dark')
fig.update_layout(xaxis_title='space companies',
                yaxis_title='Number of launch attemps',
                xaxis={'categoryorder':'total descending'},
                )
fig.show()

# Number of Active versus Retired Rockets

How many rockets are active compared to those that are decomissioned? 

In [556]:
df_status=df_data.groupby(['Organisation','Rocket_Status'], as_index=False).agg({'Detail':pd.Series.count})
df_status
fig=px.bar(df_status, x='Organisation', y='Detail', color='Rocket_Status',barmode='group', title='Number of active vs inactive rockets per company', template='plotly_dark')
fig.update_layout(xaxis_title='Space company',
                    yaxis_title='Rocket Status',
                    
                    xaxis={'categoryorder':'total descending'},
                    yaxis=dict(type='log'),
                    )
fig.show()

# Distribution of Mission Status

How many missions were successful?
How many missions failed?

In [557]:
mission_stats=df_data.groupby(by=['Organisation','Mission_Status'],as_index=False).agg({'Detail': pd.Series.count})
mission_stats.head()

Unnamed: 0,Organisation,Mission_Status,Detail
0,Arianespace,Failure,2
1,Arianespace,Partial Failure,1
2,Arianespace,Success,93
3,Boeing,Partial Failure,1
4,Boeing,Success,6


In [558]:
f=px.bar(mission_stats,x='Organisation', y='Detail', barmode='group', color='Mission_Status', title='Missions successful Vs unsuccessful Vs partailly succesfull', template='plotly_dark')
f.update_layout(xaxis_title='company',
                    yaxis_title='number of launches',
                    
                    xaxis={'categoryorder':'total descending'},
                    yaxis=dict(type='log'),
                    )
f.show()


# How Expensive are the Launches? 

Create a histogram and visualise the distribution. The price column is given in USD millions (careful of missing values). 

In [559]:
#converting price column to interger 

df_data.Price=df_data.Price.str.replace(',','')
df_data.Price=df_data.Price.str.replace('.','')

df_data.Price=df_data.Price.astype(int)

df_data.Date=pd.to_datetime(df_data.Date,utc=True, ).dt.tz_localize(None)
df_data.head()


The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.



Unnamed: 0,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status
0,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA",2020-08-07 05:12:00,Falcon 9 Block 5 | Starlink V1 L9 & BlackSky,StatusActive,500,Success
1,CASC,"Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...",2020-08-06 04:01:00,Long March 2D | Gaofen-9 04 & Q-SAT,StatusActive,2975,Success
3,Roscosmos,"Site 200/39, Baikonur Cosmodrome, Kazakhstan",2020-07-30 21:25:00,Proton-M/Briz-M | Ekspress-80 & Ekspress-103,StatusActive,650,Success
4,ULA,"SLC-41, Cape Canaveral AFS, Florida, USA",2020-07-30 11:50:00,Atlas V 541 | Perseverance,StatusActive,1450,Success
5,CASC,"LC-9, Taiyuan Satellite Launch Center, China",2020-07-25 03:13:00,"Long March 4B | Ziyuan-3 03, Apocalypse-10 & N...",StatusActive,6468,Success


In [560]:
f=px.histogram(df_data,x=df_data.Organisation, y=df_data.Price,title='price distribution of the companies', template='plotly_dark')
f.update_layout(xaxis_title='company',
                    yaxis_title='price $',
                    
                    xaxis={'categoryorder':'total descending'},
                    yaxis=dict(type='log'),
                    )
f.show()

# A Choropleth Map to Show the Number of Launches by Country

<!-- * Create a choropleth map using [the plotly documentation](https://plotly.com/python/choropleth-maps/)
* Experiment with [plotly's available colours](https://plotly.com/python/builtin-colorscales/). I quite like the sequential colour `matter` on this map. 
* You'll need to extract a `country` feature as well as change the country names that no longer exist.

Wrangle the Country Names

You'll need to use a 3 letter country code for each country. You might have to change some country names.

* Russia is the Russian Federation
* New Mexico should be USA
* Yellow Sea refers to China
* Shahrud Missile Test Site should be Iran
* Pacific Missile Range Facility should be USA
* Barents Sea should be Russian Federation
* Gran Canaria should be USA -->


You can use the iso3166 package to convert the country names to Alpha3 format.

In [561]:

def get_country_code(location):
    country_name = location.split(",")[-1].strip()
    return country_name

df_data['Location'] = df_data['Location'].apply(get_country_code).replace(to_replace=['India','France','Kazakhstan','Iran', 'New Zealand','Russia','Japan','China','Gran Canaria','Pacific Missile Range Facility','Yellow Sea'],value=['IND','FRA','KAZ','IRN','NZL','RUS','JPN','CHN','ESP','ESP','CHN'])
df_location=df_data.groupby(['Location'],as_index=False).agg({'Detail':pd.Series.count})

In [562]:
px.choropleth(df_location, locations='Location' , color='Location', title='number of space  launches per counntry',hover_name='Detail')

# A Choropleth Map to Show the Number of Failures by Country


In [563]:
df_failure=df_data.copy()
df_failure=df_failure[df_failure['Mission_Status'] !='Success']
df_failure.groupby('Location',as_index=False).agg({'Mission_Status': pd.Series.count,
                                                   })
px.choropleth(df_failure, locations='Location',locationmode='ISO-3', color='Mission_Status',title='Number of launch failures per country')

# A Plotly Sunburst Chart of the countries, organisations, and mission status. 

In [564]:
fig=px.sunburst(df_data, path=['Location','Organisation', 'Mission_Status'], values='Price',title='Organisations and their success/ failure rates per country')
fig.show()

# Analyses of the Total Amount of Money Spent by Organisation on Space Missions

In [565]:

df_tamount=df_data.groupby('Organisation',as_index=False).agg({'Price':pd.Series.sum})
px.sunburst(df_tamount, path=['Organisation', 'Price'], values='Price',)

# Analyses of  the Amount of Money Spent by Organisation per Launch

In [566]:
df_data.groupby(['Organisation','Detail'],as_index=False)['Price'].sum().head()

Unnamed: 0,Organisation,Detail,Price
0,Arianespace,"Ariane 5 ECA | ABS-2, Athena-Fidus",2000
1,Arianespace,"Ariane 5 ECA | Alphasat I-XL, INSAT-3D",2000
2,Arianespace,Ariane 5 ECA | Amazonas 2 & COMSATBw-1,2000
3,Arianespace,"Ariane 5 ECA | Amazonas-3, Azerspace-1 (Africa...",2000
4,Arianespace,"Ariane 5 ECA | Arabsat 6B, GSAT-15",2000


# Chart of the Number of Launches per Year

In [567]:
# df_yeardata=
df_data['Year']=df_data.Date.dt.year
df_yeardata=df_data.groupby('Year', as_index=False)['Detail'].count()
fig=px.line(df_yeardata,x='Year',y='Detail', title='Number Of Space Launches Per Year',)
fig.update_layout(xaxis_title='year',
                    yaxis_title='Number of space Luanches',
                    yaxis=dict(type='linear'),
                    )
fig.show()

In [568]:
# df_yeardata

# Chart of  the Number of Launches Month-on-Month until the Present

figuring out Which month has seen the highest number of launches in all time? Superimpose a rolling average on the month on month time series chart. 

In [569]:
# df_data.Date=pd.to_datetime('Date')
df_data.Year=df_data.Date.apply(lambda x: x.to_period('M'))
df_monthdata=df_data.groupby('Year',as_index=False).agg({'Detail':pd.Series.count})
df_monthdata['Year']=df_monthdata['Year'].astype(str)
fig=px.line(df_monthdata,x='Year',y='Detail', title='Number Of Space Launches Per month in each year ',)
fig.update_layout(xaxis_title='month/year',
                    yaxis_title='Number of space Luanches',
                    yaxis=dict(type='linear'),
                    )
fig.show()

# Launches per Month: Which months are most popular and least popular for launches?

Some months have better weather than others. Which time of year seems to be best for space missions?

In [570]:
df_datamonth=df_data.copy()
df_datamonth['month']=df_datamonth.Year.dt.month
df_datamonth.Year=df_data.Date.apply(lambda x: x.to_period('M'))
df_datamonth=df_datamonth.groupby(['month'],as_index=False)['Detail'].count()
# df_datamonth.Year=df_datamonth.Year.astype(str)
fig=px.bar(df_datamonth,x='month',y='Detail', color='month', title='Rate of space Launches per Month')
fig.update_layout(xaxis_title='months',
                    yaxis_title='Number of space Luanches',
                    yaxis=dict(type='log'),
                    )
fig.show()

# How has the Launch Price varied Over Time? 

Create a line chart that shows the average price of rocket launches over time. 

In [571]:
price_time=df_data.groupby('Year', as_index=False).agg({'Price':pd.Series.sum})
price_time.Year=price_time.Year.astype(str)
fig=px.line(price_time,x='Year', y='Price',)
fig.update_layout(xaxis_title='time',
                    yaxis_title='Cost of space Luanches',
                    yaxis=dict(type='log'),
                    xaxis=dict(type='date'),
                    )

# Chart of the Number of Launches over Time by the Top 10 Organisations. 

How has the dominance of launches changed over time between the different players? 

In [572]:
# Group the data by organization and year
grouped = df_data.groupby(['Organisation', df_data['Date'].dt.year])['Detail'].count().reset_index(name='Launches')

# Find the top 10 organizations by total number of launches
top_10 = grouped.groupby('Organisation')['Launches'].sum().nlargest(10).index

# Filter the grouped data to only include the top 10 organizations
grouped = grouped[grouped['Organisation'].isin(top_10)]


fig = px.line(grouped, x='Date', y='Launches', color='Organisation', title='Number of Launches over Time by Top 10 Organisations')
fig.show()