In [None]:
import numpy as np # linear algebra
import datetime
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from collections import Counter

Here we are given a dataset of all the Rocket launches that have happened in the history of mankind. IT would be very nice to see which countries are ahead in the race, in what year did we launch the most rockets etc. 

Also in this notebook I will try to encorporate data insights in a single plot for more variables.

Lets begin by reading the data.

In [None]:
file = pd.read_csv("../input/all-space-missions-from-1957/Space_Corrected.csv")
file.head()

Any given point we see a Date in the dataset, we should extract as much information from it so that it can give us great insights of the events happeneing according to the date.

Processing the Datum column

In [None]:
def trim(s):
    s = s.replace(",", "")
    return s[ 3: -3] 
file["Datum"] = file["Datum"].apply(lambda x : trim(x))

In [None]:
#helper funciton to add certain new variables from our date column

def add_datepart(df, field_name, prefix=None, drop=True, time=False):
    "Helper function that adds columns relevant to a date in the column `field_name` of `df`."
    attr = ['year', 'month', 'week', 'day', 'hour', 'minute', 'second', 'is_month_start']
    attr_deprecated = ['Week']
  
    for n in attr:
        if n not in attr_deprecated: df[prefix + n] = getattr(pd.DatetimeIndex(df[field_name]), n)
    if drop: df.drop(field_name, axis=1, inplace=True)
    return df

In [None]:
file = add_datepart(file, "Datum",prefix = "date_", time = True)

## Analyzing the Data with the **Date** variable

In [None]:
plt.rcParams["figure.figsize"] = 16 , 12
df = file.groupby(["date_year", "Status Mission"]).agg({"Status Mission": "count"}).unstack()
df.plot(kind = "bar", stacked= True)

plt.show()

The above plot shares some interesting insights.

- The highes number of missions were launched in the year 2001 which also has the highes rate of failure
- There are only 3 years when the missions have had "Prelaunch Failures" - 1966, 2002, 2016
- The period 1965 - 1978 was really a great period where we had many launches.
- The period 2006 - 2015 saw a great dip in the launches, but did pick afterwards
- Looking upon carefully at the very begininng wehn the failure rate was very high, year 1962 was really great where the launches increased but the failure rate went down really hard. So 1962 was a great year in the domain to.
- It is also clearly seen that the "Failure" rate has substantially reduced as we progressed. Exception being 2001 and 2002 though the number of launches were also high.
- 2020 on the other hand has seen a lot in terms of financial crisis, covid etc but still there have been a substantial number of rocket launches in this year. Also the failure rate is comparitively high.

In [None]:
plt.rcParams["figure.figsize"] = 12 , 8
df2 = file.groupby(["date_month", "Status Mission"]).agg({"date_month": "count"}).unstack()
df2.plot(kind = "bar", stacked = True)
plt.show()

- Of the total Launches since 1957, December seems to be a month which the companies prefer the most with more than 400 launches in that month. And as expected after a month of soo many launches we expect it to be the lowest in January. All other months seem to have a fair distribution.
- Failures are fairly distributed across all months.

In [None]:
plt.rcParams["figure.figsize"] = 12 , 8
df5 = file.groupby(["date_day", "Status Mission"]).agg({"date_day": "count"}).unstack()
df5.plot(kind = "bar", stacked = True)
plt.show()

- There is no pattern as such for any specific day on the month that the companies preferred to lauch their rockets.

In [None]:
plt.rcParams["figure.figsize"] = 16 , 12
df3 = file.groupby(["date_year", "Status Rocket"]).agg({"Status Rocket": "count"}).unstack()
df3.plot(kind = "bar", stacked= True)
plt.title(" Rocket status according to the Year")
plt.show()

- As expected most of the rockets from the early period are in the Retired Status, but there are some Rockets which are active from the year 1982, which is really nice feat to achieve.
- Also though 2001 had the most number of launches in a year, its active Rocket percentage is fairly less. Comparing the same with 2002 it has a good active Rocket status.

## Analyzing the Companies and its Rocket launches

In [None]:
df4 = file.groupby(["Company Name", "Status Rocket"]).agg({"Status Rocket": "count"}).unstack()
df4.plot(kind = "bar", stacked = True)
plt.show()

- Wow, looking at the above plot we can clearly see the winner is RVSN - USSR in terms of lauches which belongs to Russia. Though it has had such a high rocket launch number all the rockets have retired for this company.
- If we exclude this particular class, we can see that Ariancespace (French company) which is a private firm and it a a company which offers commercial launches also has good number of lauches but also decent number of rockets which are Active
- We also have certain Government backed Space organizations like Nasa, CASC, ISRO and US Air Forcein in the launch race.
- Taking out the one outlier, General Dynamics and  US Air Force which have had substantial amount of launches also have all their rockets Retired
- Looking at the plot we can say the company which has the highest active rockets is CASC, Arianespace, ULA, ISRO, Northrop.
- There are companies which standout like Sea Launch, Land Launch, ExPace, Exos which have all their Rockets in active status. Maybe they are the new players in the market and have started very recently.

## Analyzing the Countries involved in the launches

In [None]:
file['Location_Country'] = file['Location'].apply(lambda x: x.split(',')[-1])

In [None]:
df5 = file.groupby(["Location_Country", "Status Mission"]).agg({"Status Mission": "count"}).unstack()
df5.plot(kind = "bar", stacked = True)
plt.show()

Looking at the above plot we can say that most of the Launches of these Rockets have taken from some particular countries like, USA , Russia, China, France, Kazakhistan, India,Japan. Rest all places have nearly negligible launches.

This is a good starter for EDA combining many variables together to get a better sense of Data all at once. I would also try to use the Detail column in future to know more about the specific Rockets which these companies prefer to use.

Any feedback and Suggestions would be highly appreciated.