# Data Explore on Space-mission-from-1957 data file

By **Jared Chung**

________________________________________________________________________________________________________________________________________________________

## Introduction

Data exploration attempt on Space-Mission-From-1957 data file. This notebook will walk through data cleaning, manipulation and visulisation of several analysis.


### Acknowledgements

All data collection (from Kaggle) credit goes to the original Author

### Table of content
- [Data mining](#s2)
- [Data modification](#s3)
- [Data Analysis](#s4)
- [Conclusions](#s5)

<a id="s2"></a>
### Data mining
________________________________________________________________________________________________________________________________________________________________________
Importing necessaries libraries 

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# data graphing
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import datetime 

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


In [None]:
# Reading in data file
ds = pd.read_csv("../input/all-space-missions-from-1957/Space_Corrected.csv")

Checking if insertion was completed without error

In [None]:
ds.head()

In [None]:
# Checking number of rows and cols in the data file
ds.shape

There seems to be 4324 rows of records and 9 header columns.

Look at what the columns mean

In [None]:
# Label header of column
ds.columns

Both 'Unnamed: 0' and 'Unnamed: 0.1' serves the same purpose on identifying different space missions.  Thus, one of them should be deleted.

Furthermore, it represents ID so modification on the header is needed.

In [None]:
# Drop one of the columns 
ds = ds.drop(columns = ["Unnamed: 0"])

# Giving better column name
ds = ds.rename(columns={"Unnamed: 0.1" : "ID"})

In [None]:
# Checking Null values
ds.isna().sum()

Only 'Rocket' has Null value, It can be due to is a test missions that the rocket has no name. Does not matter to our analyses.

<a id="s3"></a>
### Data Modification
________________________________________________________________________________________________________________________________________________________________________

As the **Datum** column consist of date and time in a single column, seprating the data is required for further analysis.
This can be achieve by spliting the column into two new columns namely; **Month** and **Year**

Furthermore, the data in **Location** gives full address which is not very useful. However, the Country it presented can be extracted for further analysis. 

In [None]:
# Changing data type to datetime
ds.Datum = ds.Datum.apply(pd.to_datetime)

In [None]:
# Create a new Column Date to seprate date and time
ds['Date'] = [d.date() for d in ds['Datum']]

In [None]:
# Extracting year and month from 'Date' to new columns
ds['Year'] = pd.DatetimeIndex(ds['Date']).year
ds['Month'] = pd.DatetimeIndex(ds['Date']).month

In [None]:
# Checking data consisting new columns
ds.head()

In [None]:
# Getting the Country
df = ds["Location"].apply(lambda x: x.split(","))
ds["country"] = df.apply(lambda x: x[-1].split()[0])

In [None]:
ds.head()

All the required column fields of data is now generated

<a id="s4"></a>
## Data analysis
________________________________________________________________________________________________________________________________________________________________________

- Number of space missions by Coutry from the year 1957
- Number of space missions by Company from the year 1957
- Number of space missions from the last decade 2010-2020 
- Percentage of success missions
- Percentage of Missions still active

In [None]:
# Gathering the data by Country
Country_list = ds.groupby("country")["ID"].count().sort_values()

In [None]:
Country_list.plot.barh()

In [None]:
# Getting the top 2
ds.groupby("country")["ID"].count().nlargest(2)

Russia and USA are the top 2 country to have space missions since 1957 with Russia leading of 1395 cases and USA following with 1344 missions. 



In [None]:
# Top-10 Companies with space missions from 1957
ds.groupby("Company Name")["ID"].count().nlargest(10).sort_values().plot.barh()

In [None]:
Name = ds.groupby("Company Name")["ID"].count().idxmax()
value = ds.groupby("Company Name")["ID"].count().max()
print(Name, 'has the highest space missions of', value, 'times')

RSVN is owned by Russia government thus, our data is accurate with Russia the country and RVSN company having the most number of space-mission from 1957 to present 2020.

In [None]:
# Numbers of space missions by year
ds.groupby("Year")["ID"].count().plot()

From the graph above, the data presented shows that space missions became popular during the 1960s - 1970s but see a downfall from 1980s onwards till the year 2010. There is also a spike of space missions numbers from 2015 but cases started to fall again at 2020. This is due to the year 2020 is still ongoing, there can be many more space missions to be added into the dataset at the end of the year. 

In [None]:
ds_filtered = ds.query('Year > 2010')
ds_filtered.groupby("Company Name")["ID"].count().nlargest(10).sort_values().plot.barh()

In [None]:
ds_filtered.groupby("country")["ID"].count().nlargest(10).sort_values().plot.barh()

In [None]:
ds_filtered.groupby("country")["ID"].count().nlargest(10)

During the last decade, USA and China proves to be the powerhouse of space missions having a total of 371 missions combined. USA contributed 229 missions towards that total number. 

In [None]:
ds.groupby("Status Mission")["ID"].count().plot.pie(autopct='%1.2f%%')

In [None]:
ds_filtered.groupby("Status Mission")["ID"].count().plot.pie(autopct='%1.2f%%')

Success rate of all time from 1957 was 89.71%. 

Success rate for the last decade was 93.455. 

An increase rate can be due to projects taking more concern in safety and having better advance technology and resources in recent years.

In [None]:
ds.groupby("Status Rocket")["ID"].count().plot.pie(autopct='%1.2f%%')

Only 18.27 of launched space missions remains active to this day. 

<a id="s5"></a>
## Conclusion
________________________________________________________________________________________________________________________________________________________________________

This Notebook has walked through data mining, cleaning, manipulating data and finally analysis datas presented on space-missions-from-1957
It is found that:
- Russia has the highest space missions from the year 1957 with 1395 missions and 'RVSN USSR'launched 1777 missions.
- During the last decade USA leads the space mission with 229 but CASC company owns by the Chinese has the highest space missions launched.
- Space missions was popular during the 1960s and only became popular again at 2010s.
- The all time success rate of space missions is 89.71% with last decade success rate of 93.45%
- Of all missions launched, only 18.27% of the missions still remains active