# THE SPACE RACE with Plotly Express & Seaborn

![](https://cdn.techexplorist.com/wp-content/uploads/2019/07/chandrayan2-696x593.jpg)
[Img source](https://www.techexplorist.com/chandrayan-2-mission-put-hold-citing-technical-issues/24835/)

Since the success of Sputnik-1 in 1957, we have come a long way with manned missions to moon and unmanned missions which have gone beyond the limits of our solar system.

We have an interesting dataset which gives an insight in to all the missions by various space agencies/companies around the world.

In this particular notebook I have attempted to understand the missions by countries, their success rates, topographical or weather impacts if any, missions by companies, how number of missions have moved over time for various countries/companies.

Please do upvote if you learn something out of this notebook !

In [None]:
!pip install chart_studio
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import chart_studio.plotly as py
import seaborn as sns
import plotly.express as px
import warnings
warnings.filterwarnings("ignore")

Importing our data file into a Pandas Dataframe and cleaning the columns.
Post that we have a look at the data.

In [None]:
dataset = pd.read_csv('../input/all-space-missions-from-1957/Space_Corrected.csv')
dataset = dataset.drop(['Unnamed: 0','Unnamed: 0.1'],axis=1).reset_index(drop=True)
dataset.head()

We perform three operations below:-
1. We extract the country of launch from the Location columns of the data
2. Next we transform the launch date to the pandas date format
3. Lastly we extract the year of launch from the date which would form a part our analysis

In [None]:
dataset['Country'] = [str(val).split(',')[-1].lstrip() for val in dataset.loc[:]['Location']]
dataset['Launch_Date'] =  pd.to_datetime(dataset['Datum'],utc=True)
dataset['Launch_Year'] = pd.DatetimeIndex(dataset['Launch_Date']).year

Next I did some google work to get the countries out of which these companies are based. We would do our country launch analysis based on this.
First I create a dictionary based on my googling and then map it.

In [None]:
dict_comp = {'SpaceX':'USA','CASC':'China', 'Roscosmos':'Russia', 'ULA':'USA', 'JAXA':'Japan', 'Northrop':'USA', 'ExPace':'China',
'IAI':'Israel', 'Rocket Lab':'NZ', 'Virgin Orbit':'USA', 'VKS RF':'Russia', 'MHI':'Japan', 'IRGC':'Iran',
'Arianespace':'Europe', 'ISA':'Iran', 'Blue Origin':'USA', 'ISRO':'India', 'Exos':'USA', 'ILS':'USA',
'i-Space':'China', 'OneSpace':'China', 'Landspace':'China' ,'Eurockot':'Russia', 'Land Launch':'Russia',
'CASIC':'China', 'KCST':'DPRK', 'Sandia':'USA', 'Kosmotras':'Russia', 'Khrunichev':'Russia', 'Sea Launch':'Russia',
'KARI':'ROK', 'ESA':'Europe', 'NASA':'USA', 'Boeing':'USA', 'ISAS':'Japan', 'SRC':'Russia', 'MITT':'Russia',
 'Lockheed':'USA','AEB':'Brazil', 'Starsem':'Europe', 'RVSN USSR':'Russia', 'EER':'USA', 'General Dynamics':'USA',
 'Martin Marietta':'USA', 'Yuzhmash':'Russia', 'Douglas':'USA', 'ASI':'Europe', 'US Air Force':'USA','CNES':'Europe',
  'CECLES':'Europe', 'RAE':'Europe', 'UT':'Japan', 'OKB-586':'Russia', 'AMBA':'USA',"Arm??e de l'Air":'Europe', 'US Navy':'USA'}

dataset['Sat_country'] = dataset['Company Name'].map(dict_comp)

In [None]:
dataset["StatusMission"] = np.where(dataset["Status Mission"]=="Success","Success","Failure")

Let us check the total number of launches carried out by various countries(including private companies). We have grouped all European countries as most of their launches are carried out by the European Space Agency which is a multi-governmental agency.

Russia and USA are far ahead other countries. There are only 6 coutries who had more than 50 launches/attempts.

### We have used animation to see how the number of launches progressed over time. Click the play button to view the animation.

In [None]:
data_gp = dataset.groupby(["Sat_country","Launch_Year","StatusMission"])["Country"].count().reset_index()
data_gp = pd.pivot_table(data_gp,index=["Sat_country","Launch_Year"],columns=["StatusMission"],values="Country",aggfunc=np.sum).reset_index()
data_gp.columns.name=""
data_gp.fillna(0,inplace=True)
data_gp["Total_Launch"] = data_gp["Failure"]+data_gp["Success"]


dat=(data_gp.groupby(["Sat_country","Launch_Year"])["Total_Launch"].sum()).reset_index()
idx = pd.MultiIndex.from_product([dat.Launch_Year.unique(), 
                                  dat.Sat_country.unique()], names=['Launch_Year', 'Sat_country'])                                  
dat2 = dat.set_index(['Launch_Year', 'Sat_country']).reindex(idx).fillna(0).sort_values(ascending=True,by=["Sat_country","Launch_Year"])
dat2 = pd.concat([dat2, dat2.groupby(level=1).cumsum().add_prefix('Cum_')], 1).sort_index(level=1).reset_index()
dat2.sort_values(ascending=[True,False],by=["Launch_Year","Sat_country"],inplace=True)
dat2.rename(columns={"Launch_Year":"Year","Cum_Total_Launch":"Cummulative_Launches","Sat_country":"Country"},inplace=True)
px.bar(dat2,x="Country",y="Cummulative_Launches",color="Country",animation_group="Country",animation_frame="Year",range_y=[0,2200])

Let's get a geographical sense of the countries who have launched spacecrafts. Since Europe was not available in the list of countries for the plotting I have plotted it under France, since Ariane and ESA are headquartered in Paris.

Plotly express lets us plot this only in one simple line of code. Hover for the total launches.

In [None]:
px.choropleth(dat2[dat2["Year"]==2020].replace({'Country': {"NZ": "New Zealand", "ROK": "South Korea","Europe":"France"}}),locations="Country",color="Cummulative_Launches",locationmode="country names",hover_name="Cummulative_Launches")

Let us focus on the top 6 countries in terms of missions.

We start off by checking the success % of the missions. Europe has the highest success % and India has the lowest when it comes to the top 6 countries.

In [None]:
data_gp_new = data_gp[data_gp["Sat_country"].isin(["Russia","USA","Europe","China","Japan","India"])]
plot_data = (data_gp_new.groupby("Sat_country")["Success"].sum()/  data_gp_new.groupby("Sat_country")["Total_Launch"].sum()).sort_values(ascending=False)
plot_data=plot_data.reset_index().rename(columns={0:"Success %"})
plot_data["Success %"]*=100
px.bar(plot_data,x="Sat_country",y="Success %",color="Sat_country")

Now let us check the number of launches by these countries and their successes over time.

***The blue line indicates total no of missions and the red line indicates the successful launches.***

****USA****- We can see USA had hiccups intitally in the late 50s when their program had started up. There was a single year in the 1960s where only 50 missions out of 60 succeeded. Things started relatively stabilizing post that.

****China**** - China started their missions in 1970 and have had an excellent success %, the number of launches are grown exponentially since ~2014.

****Russia**** - USSR had a tremendous space program with it almost having 100 launches in a year in 1970s.They had some issues in the later half of the 1960s. The number of launches dropped significantly in the 1980s and fell off the cliff in the 1990s post the soviet split. Number of launches has recovered slightly in the second half of the current decade.

****Japan**** - Japan started it's missions in the late 1960s with no success for a few years. 1979 and 2000 were wipeouts for Japan.They have also started increasing their launches in the current decade with a good success rate.

****Europe**** - Apart from 2 failed missions in 1970, Europe has a successful record.

****India**** - India had a patchy 3 decades from 1979 to around 2010. Post that both the number of missions and success rate has grown.


In [None]:
dataset_new = dataset[dataset["Sat_country"].isin(["Russia","USA","Europe","China","Japan","India"])]

f,ax=plt.subplots(2,3,figsize=(12,15))
uniq_country = dataset_new["Sat_country"].unique()
ind=0
for i in range(2):
  for j in range(3):
    y=data_gp_new.loc[data_gp_new["Sat_country"]==uniq_country[ind]]["Total_Launch"]
    y_success=data_gp_new.loc[data_gp_new["Sat_country"]==uniq_country[ind]]["Success"]
    ax[i][j].plot(data_gp_new.loc[data_gp_new["Sat_country"]==uniq_country[ind]]["Launch_Year"],
    y,color="b")
    ax[i][j].plot(data_gp_new.loc[data_gp_new["Sat_country"]==uniq_country[ind]]["Launch_Year"],
    y_success,color="y")
    yint = []
    #Uncomment below lines to get inly integer values for y-axis. Have commented it to reduce runtime.
    #locs=ax[i][j].get_yticks()
    #for each in locs:
    #  yint.append(int(each))
    #ax[i][j].set_yticks(yint)
    ax[i][j].set_title(uniq_country[ind])
    ind+=1

Now let us compare the journey for USA and USSR/Russia since they were locked in combat to be the pioneers of space during the cold war.

First let us compare the number of missions across the decades since 1950.

From 1950-60, USA lead in terms of no of missions albeit most of them were failure. Then from 1960 to 1990 USSR/Russia was way ahead the USA in terms of missions. Post the fall of USSR, USA has been dominating the number of missions.

In [None]:
dataset_new["Launch_Year_Bin"] = pd.cut(dataset_new["Launch_Year"],[1950,1960,1970,1980,1990,2000,2010,2020])
data_gp_new["Launch_Year_Bin"] = pd.cut(data_gp_new["Launch_Year"],[1950,1960,1970,1980,1990,2000,2010,2020])
data_gp_new["Launch_Year_Bin"]=data_gp_new["Launch_Year_Bin"].astype("str")


df_total_plot = pd.DataFrame(data_gp_new[data_gp_new["Sat_country"].isin(["USA","Russia"])].
groupby(["Sat_country","Launch_Year_Bin"])["Total_Launch"].sum()).reset_index(level=["Launch_Year_Bin","Sat_country"])

df_total_plot
df_total_plot["Launch_Year_Bin"] = df_total_plot["Launch_Year_Bin"].replace(["(1950, 1960]","(1960, 1970]",
                                                                             "(1970, 1980]","(1980, 1990]",
                                                                             "(1990, 2000]","(2000, 2010]","(2010, 2020]"]
                                                                            ,["1950s-1960s","1960s-1970s","1970s-1980s",
                                                                              "1980s-1990s","1990s-2000s","2000s-2010s","2010s-2020s"])

px.bar(df_total_plot,y="Launch_Year_Bin",x="Total_Launch",color="Sat_country",orientation="h")

Since the USSR/Russia mission success percentage is comparable to that of the USA, the number of successful missions also show a similar pattern as that of the total missions.

In [None]:
df_successn_plot = pd.DataFrame(data_gp_new[data_gp_new["Sat_country"].isin(["USA","Russia"])].groupby(["Sat_country","Launch_Year_Bin"])["Success"].sum()).reset_index(level=["Launch_Year_Bin","Sat_country"])

df_successn_plot["Launch_Year_Bin"] = df_successn_plot["Launch_Year_Bin"].replace(["(1950, 1960]","(1960, 1970]",
                                                                             "(1970, 1980]","(1980, 1990]",
                                                                             "(1990, 2000]","(2000, 2010]","(2010, 2020]"]
                                                                            ,["1950s-1960s","1960s-1970s","1970s-1980s",
                                                                              "1980s-1990s","1990s-2000s","2000s-2010s","2010s-2020s"])

px.bar(df_successn_plot,y="Launch_Year_Bin",x="Success",color="Sat_country",orientation="h")

As far as the success rate across decades is concerned, we can see that USSR/Russia had always been slightly ahead the USA till 1990. Post that USA has a higher rate for the last two decades at around 95% compared to Russia with 92-93%. 

In [None]:
f,ax = plt.subplots(figsize=(12,7))

df_success_plot = pd.DataFrame(data_gp_new[data_gp_new["Sat_country"].isin(["USA","Russia"])].groupby(["Sat_country","Launch_Year_Bin"])["Success"].sum()/  data_gp_new[data_gp_new["Sat_country"].isin(["USA","Russia"])].groupby(["Sat_country","Launch_Year_Bin"])["Total_Launch"].sum()).reset_index(level=["Launch_Year_Bin","Sat_country"])

df_success_plot.rename(columns={0:"S_Perc"},inplace=True)

sns.barplot(x="Launch_Year_Bin",y="S_Perc",data=df_success_plot,hue="Sat_country",ax=ax)

ax.set_title("% of Successful Missions")

for patch in ax.patches:
  height = patch.get_height()
  width = patch.get_x()+patch.get_width()
  ax.text(width-.35,height+.01,"{:.1%}".format(height))

# Relation between the month,country of launch and the success rate

This is one area I found to be particularly interesting. Due to various geographical factors like weather or some other topographical features of the Area of launch, success rate might vary.

We have plotted the total no of launches and success rates across months of a year for various launch countries.
Months are numbers from 1- 12 ( Jan to Dec)

Please note here the country is the country from which the launch was done/planned. The country we had used earlier was the country out of which the company operated.

We have plotted even the total number of launches as a low number of launches should not impact the pattern we are trying to find.

1. China - Significantly low success rate in July.
2. France - Low succcess rate in May
3. India - Low success in August but the number of missions is also too low to identify a pattern
4. Japan - Low success in April but sample size is too small in April
5. Kazakastan - Low success in February and April
5. Russia - Low success in June and December
6. USA - Low success in August

If I were a part of any of the space programs I would avoid a combination of these months and areas. :)

### Also in the below chart we can see how to chart with secondary axis

In [None]:
dataset['Launch_Month'] = pd.DatetimeIndex(dataset['Launch_Date']).month
data_gp_month= pd.pivot_table(dataset,index=["Country","Launch_Month"],columns="StatusMission",values="Company Name",aggfunc="count").reset_index().fillna(0)
data_gp_month["SuccessRate"] = round(data_gp_month["Success"]/(data_gp_month["Success"]+data_gp_month["Failure"]),2) 
data_gp_month["Total_Launch"] = (data_gp_month["Success"]+data_gp_month["Failure"]) 
data_gp_month_sr= pd.pivot_table(data_gp_month,index=["Country"],columns=["Launch_Month"],values="SuccessRate",aggfunc=np.sum).fillna(0).reset_index()
data_gp_month_t =  pd.pivot_table(data_gp_month,index=["Country"],columns=["Launch_Month"],values="Total_Launch",aggfunc=np.sum).fillna(0).reset_index()
data_gp_month_sr.columns.name = ""
data_gp_month_sr=data_gp_month_sr[data_gp_month_sr["Country"].isin(["China","France","India","Japan","Kazakhstan","Russia","USA"])].reset_index(drop=True)
data_gp_month_t=data_gp_month_t[data_gp_month_t["Country"].isin(["China","France","India","Japan","Kazakhstan","Russia","USA"])].reset_index(drop=True)

f,ax = plt.subplots(7,1,figsize=(15,30))
row=0
for i in range(7):
    ax[i].plot(data_gp_month_t.columns.values[1:], data_gp_month_t.iloc[i][1:])
    ax[i].set_title(data_gp_month_t.loc[row]["Country"])
    if i ==6:
        ax[i].set_xlabel('Months (Jan-Dec)')
    ax[i].set_ylabel('No of Launches')
    ax2 = ax[i].twinx()
    ax2.plot(data_gp_month_sr.columns.values[1:], data_gp_month_sr.iloc[i][1:],color="g")

    ax2.set_ylabel('Succes %s')
    row+=1

Let us check the top 10 companies in terms of number of missions.

RVSN USSR is way ahead all the others, as we had seen USSR/Russia had expontential growth in the number of launches in the 1960s and 1970s.

In [None]:
data_gp_c = dataset.groupby(["Company Name","Launch_Year","StatusMission"])["Country"].count().reset_index()
data_gp_c = pd.pivot_table(data_gp_c,index=["Company Name","Launch_Year"],columns=["StatusMission"],values="Country",aggfunc=np.sum).reset_index()
data_gp_c.columns.name=""
data_gp_c.fillna(0,inplace=True)
data_gp_c["Total_Launch"] = data_gp_c["Failure"]+data_gp_c["Success"]

#Count of Launches
data_gp_ct = data_gp_c.groupby("Company Name")["Total_Launch"].sum().sort_values(ascending=False)[:10]
data_gp_ct = data_gp_ct.reset_index()
px.bar(data_gp_ct,x="Company Name",y="Total_Launch",color="Company Name")

Let us check the % success for these companies. ULA has been the most successful while US Air Force has the lowest success rate in this lot.

In [None]:
data_gp_cn = data_gp_c[data_gp_c["Company Name"].isin(data_gp_ct["Company Name"])]

#f,ax = plt.subplots(figsize=(8,7))
data_g = (data_gp_cn.groupby("Company Name")["Success"].sum()/  data_gp_cn.groupby("Company Name")["Total_Launch"].sum()).sort_values(ascending=False)
data_g= data_g.reset_index().rename(columns={0:"Success %"})
data_g["Success %"]*=100
px.bar(data_g,x="Company Name",y="Success %",color="Company Name")
#ax.set_title("% of Successful Missions")
#for patch in ax.patches:
#  height = patch.get_height()
#  width = patch.get_x()+patch.get_width()
#  ax.text(width-.5,height+.01,"{:.1%}".format(height))

We have plotted the total number of missions and number of successful missions againt time for the top 6 companies.

NASA and General Dynamics had some inital issues post which they stabilized.

RVSN USSR had some issues in the second half if the 1960s when they were ramping up their number of missions.

In [None]:
f,ax=plt.subplots(2,3,figsize=(12,15))
uniq_comp = ["Arianespace","CASC","General Dynamics","NASA","RVSN USSR","VKS RF"]
ind=0
for i in range(2):
  for j in range(3):
    y=data_gp_c.loc[data_gp_c["Company Name"]==uniq_comp[ind]]["Total_Launch"]
    y_success=data_gp_c.loc[data_gp_c["Company Name"]==uniq_comp[ind]]["Success"]
    ax[i][j].plot(data_gp_c.loc[data_gp_c["Company Name"]==uniq_comp[ind]]["Launch_Year"],
    y,color="b")
    ax[i][j].plot(data_gp_c.loc[data_gp_c["Company Name"]==uniq_comp[ind]]["Launch_Year"],
    y_success,color="r")
    yint = []
    #Uncomment below lines to get inly integer values for y-axis. Have commented it to reduce runtime.
    #locs=ax[i][j].get_yticks()
    #for each in locs:
    #  yint.append(int(each))
    #ax[i][j].set_yticks(yint)
    ax[i][j].set_title(uniq_comp[ind])
    ind+=1