<h1>Space Mission Analysis from 1957<h1>
<center><img src="https://i.imgur.com/9hLRsjZ.jpg" height=400></center>

<h2>Import statements<h2>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from iso3166 import countries
from datetime import datetime, timedelta

In [None]:
path = "../input/all-space-missions-from-1957/Space_Corrected.csv"

df_data = pd.read_csv(path)

<h2>Data Exploration<h2>

In [None]:
df_data.head()

In [None]:
df_data.shape

In [None]:
df_data.describe()

In [None]:
df_data.dtypes

<h2>Data Cleaning - Check for any NULL values or duclicates<h2>

In [None]:
df_data.isna().any()

In [None]:
df_data.duplicated().any()

<h2>Number of Launches per Company<h2>
   

In [None]:
organisation = df_data["Company Name"].value_counts()[:10]
organisation.head()

In [None]:
fig = px.bar(df_data, x=organisation.index, y=organisation.values, color=organisation.values, color_continuous_scale='turbo', title="Number of launches of top 10 organisations")
fig.update_layout(xaxis_title="Name of the organisation", yaxis_title="Space Mission Launches")
fig.show()

<h2>Number of Active versus Retired Rockets<h2>

In [None]:
racket = df_data["Status Rocket"].value_counts()
racket

In [None]:
rack = px.pie(df_data, values=racket.values, names=["Decomissioned", "Active"], title="Status of rackets", color_discrete_sequence=["darkblue", "yellow"])
rack.update_traces(textfont_size=20, hoverinfo='label+percent')
rack.show()

<h2>Distribution of Mission Status<h2>

In [None]:
mission_status = df_data["Status Mission"].value_counts()
mission_status

In [None]:
mis = px.pie(df_data, names=mission_status.index, values=mission_status.values, hole=0.5, title="Mission status of rackets")
mis.update_traces(textfont_size=15, textposition="outside")
mis.show()

<h2>Expences on Launches<h2>

In [None]:
df_data = df_data.rename(columns={" Rocket": "Price"})
df_data.columns

In [None]:
df_data = df_data.dropna()


In [None]:
price = df_data[df_data.Price != "nan"]
price.shape

In [None]:
price.Price = price.Price.astype(str).str.replace(',', '')
price.Price = pd.to_numeric(price.Price)

In [None]:
price.dtypes

In [None]:
fig = sns.displot(price, x=price.Price, bins=100, aspect=2)
fig.set(xlim=(0,5000))
plt.show()

In [None]:
fig = sns.displot(price, x=price.Price, bins=250, aspect=2, color="red")
plt.title("Prices up to 500$ for launch")
fig.set(xlim=(0,500))
plt.show()

From the graph could be seen that the cost of launches mostlyis up to 500. Of course, there is an exception of more expensive launches (about 5000), but it is rare. The highest number of launches cost approximately 50.

<h2>Choropleth Map Shows the Number of Launches by Country<h2>

In [None]:
df_data["Country"] = df_data.Location.str.split(",").str[-1].str.strip()

In [None]:
list_countries = {'Gran Canaria': 'USA', 
                'Barents Sea': 'Russian Federation',
                'Russia': 'Russian Federation',
                'Pacific Missile Range Facility': 'USA', 
                'Shahrud Missile Test Site': 'Iran, Islamic Republic of', 
                'Yellow Sea': 'China', 
                'New Mexico': 'USA',
                'Iran': 'Iran, Islamic Republic of',
                'North Korea': 'Korea, Democratic People\'s Republic of',
                'Pacific Ocean': 'United States Minor Outlying Islands',
                 'South Korea': 'Korea, Republic of'}
for country in list_countries:
    df_data.Country = df_data.Country.replace(country, list_countries[country])

In [None]:
def convert_iso(country):
    return countries.get(country).alpha3
df_data['ISO'] = df_data.Country.apply(lambda country: convert_iso(country))

In [None]:
iso = df_data.ISO.value_counts()

In [None]:
maps = px.choropleth(df_data, locations=iso.index, color=iso.values, hover_name=iso.index, title='Number of Lauches', color_continuous_scale="Viridis")
maps.show()

<h2>Sunburst Chart of the countries, organisations, and mission status<h2>

In [None]:
df_sunburst = df_data.groupby(
    ['Country','Company Name', 'Status Mission'], 
    as_index=False).agg(
    {'Status Rocket': pd.Series.count})


In [None]:
sunburst = px.sunburst(df_sunburst, path=['Country','Company Name', 'Status Mission'], values="Status Rocket", title='Global Mission Status')
sunburst.show()

<h2>Analysis of the Total Amount of Money Spent by Organisation on Space Missions<h2>

In [None]:
df_money = price.groupby("Company Name", as_index=False).agg({"Price": pd.Series.sum})
df_money = df_money.sort_values("Price", ascending=False)


In [None]:
fig = px.bar(df_money, x="Company Name", y="Price", title="Total Amount of Money Spent by Organisation on Space Missions")
fig.show()

The USA Company NASA spend the most on their lauches.

In [None]:
fig = px.bar(x=df_money["Company Name"][1:11], y=df_money["Price"][1:11])
fig.update_layout(xaxis_title="Name of Organisation", yaxis_title="Price", title="Total Amount of Money Spent by Top 10 Organisation<br>(after NASA) Missions")
fig.show()

<h2>Analysis of the Amount of Money Spent by Organisation per Launch<h2>

In [None]:
df_moneyavr = price.groupby("Company Name", as_index=False).agg({"Price": pd.Series.mean})
df_moneyavr = df_moneyavr.sort_values("Price", ascending=False)


In [None]:
fig = px.bar(df_moneyavr, x="Company Name", y="Price", title="Total Amount of Money Spent by Organisation per Launch")
fig.show()

<h2>Number of Launches per Year<h2>

In [None]:
df_data["Date"] = pd.to_datetime(df_data["Datum"])

In [None]:
df_data['Year'] = df_data['Date'].apply(lambda datetime: datetime.year)

In [None]:
sns.histplot(df_data, x="Year", kde=True, bins=30)
plt.show()

This distribution shows that the number of launches increases with the time. 

In [None]:
lanches_year = df_data.groupby(["Company Name"])["Year"].nunique().reset_index()
lanches_year.columns = ["Company Name", "Count"]
lanches_year = lanches_year.sort_values("Count", ascending=False)


In [None]:
fig = px.bar(lanches_year, x="Company Name", y="Count", title="Numbers of Lanches per year by each company", color="Count", color_continuous_scale=px.colors.sequential.Sunsetdark)
fig.show()

In [None]:
lanches_con = df_data.groupby(["Country"])["Year"].nunique().reset_index()
lanches_con.columns = ["Country", "Count"]
lanches_con = lanches_con.sort_values("Count", ascending=False)

In [None]:
fig = px.bar(lanches_con, x="Country", y="Count", title="Numbers of Lanches per year by country", color="Count", color_continuous_scale=px.colors.sequential.Sunset)
fig.show()

<h2>Year-on-Year Chart Showing the which Country Doing the Most Number of Launches<h2>

In [None]:
year_board = df_data.groupby(["Year", "Country"], as_index=False).agg({"Status Mission": pd.Series.count})


In [None]:
fig = px.line(x=year_board["Year"], y=year_board["Status Mission"], color=year_board["Country"])
fig.update_layout(xaxis_title="Year", yaxis_title="Numbers of Lanches", yaxis_range=(0,30))
fig.show()

Could be seen that the leaders of our time became US and China at numbers of space missions.

<h2>Number of Launches Month-on-Month<h2>

In [None]:
df_data['Month'] = df_data['Date'].apply(lambda datetime: datetime.month)

In [None]:
month = df_data.groupby(["Month"])["Company Name"].nunique().reset_index()
month.columns = ["Month", "Count"]
month = month.sort_values("Month", ascending=True)


In [None]:
fig = px.bar(month, x="Month", y="Count", title="Number of Launches Month-on-Month")
fig.update_xaxes(tickmode="linear")
fig.show()

There is no visible correlation between months and launches. The missions more or less evenly distributed over all year. However, the highest number of launches appear in May and June.

<h2>Total Number of Mission Failures<h2>

In [None]:
fail_df = df_data[df_data['Status Mission'] == 'Failure']
fail_df = fail_df.groupby("Country", as_index=False).agg({"Status Mission": pd.Series.count})
fail_df = fail_df.sort_values("Status Mission", ascending=False)


In [None]:
fail_df["Country"] = fail_df.Country.str.split(",").str[0].str.strip()

In [None]:
fig = px.bar(fail_df, x="Country", y="Status Mission", title="Total Number of Mission Failures by Countries", color="Status Mission",color_continuous_scale=px.colors.sequential.Inferno_r)
fig.show()

In [None]:
fail_df_com = df_data[df_data['Status Mission'] == 'Failure']
fail_df_com = fail_df_com.groupby("Company Name", as_index=False).agg({"Status Mission": pd.Series.count})
fail_df_com = fail_df_com.sort_values("Status Mission", ascending=False)

In [None]:
fig = px.bar(fail_df_com, x="Company Name", y="Status Mission", title="Total Number of Mission Failures by Organisation", color="Status Mission",color_continuous_scale=px.colors.sequential.Inferno_r)
fig.show()

<h2>Total Number of Mission Succeeded<h2>

In [None]:
suc_df = df_data[df_data['Status Mission'] == 'Success']
suc_df = suc_df.groupby("Country", as_index=False).agg({"Status Mission": pd.Series.count})
suc_df = suc_df.sort_values("Status Mission", ascending=False)


In [None]:
fig = px.bar(suc_df, x="Country", y="Status Mission", title="Total Number of Mission Successfull by Countries", color="Status Mission",color_continuous_scale=px.colors.sequential.Inferno_r)
fig.show()

Comparing the mission failure and mission succeeded in all over the world, we could say that the successful missions exaggerate the failure at many times.

<h2>Origin of the companies<h2>

In [None]:
company_origin = df_data.groupby(["Country"])["Company Name"].nunique().reset_index()
company_origin.columns = ["Country", "Numbers of companies"]
company_origin = company_origin.sort_values("Numbers of companies", ascending=False)

In [None]:
company_origin["Country"] = company_origin.Country.str.split(",").str[0].str.strip()

In [None]:
org = px.bar(company_origin, x="Country", y="Numbers of companies", color="Numbers of companies", color_continuous_scale=px.colors.sequential.Magma, title="Number of companies owned by the country")
org.update_traces(marker_coloraxis=None)
org.show()

The USA owns the most numbers of od companies. That explains why they have the highest number of launches.  

*Thank you for checking my notebook. Please leave some comments and feedback*