# **Exploratory Data Analysis of all the space missions conducted from 1957 to present by all entities(government and commercial)**

**This notebook contains exploratory data analysis based on the DataSet that was scraped from https://nextspaceflight.com/launches/past/?page=1 and includes all the space missions since the beginning of Space Race (1957). Mankind has seen tremendous development in the space department since its beginning in the 50s. Space missions in the early days were usually fuelled by the human curiosity and cold war but after the fall of USSR the main focus of Space missions has shifted to what it originally should have been: Search for a new home and betterment of the people by understanding more about how our universe works. So lets dive right in!**

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import wordcloud as wc
import matplotlib.pyplot as plt
%matplotlib inline
sns.set(style='dark')
import plotly.express as px

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


**Loading the data from CSV**

In [None]:
df = pd.read_csv('../input/all-space-missions-from-1957/Space_Corrected.csv')
df.head(10)

**Visually checking approximately how many null values we might have in our dataset**

In [None]:
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='coolwarm')

**Data Pre-processing and Cleaning**

In [None]:
df.shape

Our dataset contains 4324 rows and 9 columns

**Now lets try to find out how many null values does the Rocket column have**

In [None]:
df.isnull().sum()

**Dropping the column Rocket as almost 75% of its values were null. Dropping the first 2 columns too as we already have index**

In [None]:
columns = [' Rocket','Unnamed: 0','Unnamed: 0.1']
df.drop(columns, inplace=True, axis=1)

In [None]:
df.shape

**The dataframe now has 6 columns. Let us try to create a few more columns by extracting Year from Datum and Country from Location**

In [None]:
#Converting Datum to DateTime datatype
df['DateTime'] = pd.to_datetime(df['Datum'])
#Extracting Year from Datum
df['Year']=df['DateTime'].apply(lambda x: x.year)
#Extracting Country from Location and trimming space before teh country name
df['Country']=df['Location'].apply(lambda x:x.split(',')[-1])
df['Country']=df['Country'].str.strip()
#Dropping the Datum column as it is redundant
df = df.drop('Datum',1)

In [None]:
df.head(10)

**Now lets see the percentage distribution of the top 10 countries involved in all the space missions**

In [None]:
df['Country'].value_counts().head(10).plot.pie(autopct='%1.0f%%', pctdistance=1.1, labeldistance=1.2,figsize=(15,15),title='% Distribution Country-wise')

In [None]:
df.head()

**Lets have a look at the rocket statuses of the companies**

In [None]:
df.groupby(['Company Name','Status Rocket'])['Status Rocket'].count().unstack().plot(kind='bar', stacked=True, figsize=(16,16),title='Rocket Status Company-wise')

**Lets have a look at the mission statuses of the top countries**

In [None]:
grouped = df.groupby(['Country','Status Mission'])['Country'].count().unstack()
grouped.sort_index(ascending=False)
grouped.plot(kind='bar', stacked=True, figsize=(16,16),title='Mission Status Country-wise')

**Most Active companies in the last decade using Wordcloud**

In [None]:
from wordcloud import WordCloud, STOPWORDS

df2 = df.query('Year > 2010')
company = " ".join(df2['Company Name'])
country = " ".join(df2['Country'])
#Defining a function to plot wordclouds
def plot_cloud(wordcloud):
    # Set figure size
    plt.figure(figsize=(30, 20))
    # Display image
    plt.imshow(wordcloud) 
    # No axis details
    plt.axis("off");
    
wordcloud = WordCloud(width = 1600, height = 1200, random_state=1, background_color='black', colormap='Pastel1', collocations=False,
                      stopwords = STOPWORDS).generate(company)
    
    
plot_cloud(wordcloud)

**As we can infer from the wordcloud, commercial organizations were more active than the government this last decade. Now lets see which countries were most active with their space missions throughtout the last decade. **

In [None]:
wordcloud = WordCloud(width = 1600, height = 1200, random_state=1, background_color='black', colormap='Pastel1', collocations=False,
                      stopwords = STOPWORDS).generate(country)
    
    
plot_cloud(wordcloud)

**Please upvote the notebook if you like it and do give suggestions/feedback in the comments! Have a nice day!**