# Explore the Space

![maxresdefault.jpg](attachment:maxresdefault.jpg)

pic credit - google images

# What we will cover in this DATASET

* Basic Information of dataset

* Data pre-processing

* In Data Visualisation
  1. Maximum numbers of Rocket Mission year by year
  2. Top 15 Coutries Mission Status
  3. Top 15 Space Organisations Mission Status
  4. Total Budget year by year
  5. Total number of Launches in every years
  6. Which Space Organisation have highest number of Rocket Missions
  7. Countries and their Rocket Launch Pad
  8. Countries and their Space Organisations
  9. USA VS Russia 
  10. Plotting Countries location on Global Map

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import datetime as dt
import re
from iso3166 import countries

import warnings
warnings.filterwarnings('ignore')

In [2]:
data = pd.read_csv("../input/all-space-missions-from-1957/Space_Corrected.csv")
data.drop(['Unnamed: 0','Unnamed: 0.1'],1,inplace = True)
data.shape

(4324, 7)

# Basic Information

In [3]:
data.head(10)

Unnamed: 0,Company Name,Location,Datum,Detail,Status Rocket,Rocket,Status Mission
0,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA","Fri Aug 07, 2020 05:12 UTC",Falcon 9 Block 5 | Starlink V1 L9 & BlackSky,StatusActive,50.0,Success
1,CASC,"Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...","Thu Aug 06, 2020 04:01 UTC",Long March 2D | Gaofen-9 04 & Q-SAT,StatusActive,29.75,Success
2,SpaceX,"Pad A, Boca Chica, Texas, USA","Tue Aug 04, 2020 23:57 UTC",Starship Prototype | 150 Meter Hop,StatusActive,,Success
3,Roscosmos,"Site 200/39, Baikonur Cosmodrome, Kazakhstan","Thu Jul 30, 2020 21:25 UTC",Proton-M/Briz-M | Ekspress-80 & Ekspress-103,StatusActive,65.0,Success
4,ULA,"SLC-41, Cape Canaveral AFS, Florida, USA","Thu Jul 30, 2020 11:50 UTC",Atlas V 541 | Perseverance,StatusActive,145.0,Success
5,CASC,"LC-9, Taiyuan Satellite Launch Center, China","Sat Jul 25, 2020 03:13 UTC","Long March 4B | Ziyuan-3 03, Apocalypse-10 & N...",StatusActive,64.68,Success
6,Roscosmos,"Site 31/6, Baikonur Cosmodrome, Kazakhstan","Thu Jul 23, 2020 14:26 UTC",Soyuz 2.1a | Progress MS-15,StatusActive,48.5,Success
7,CASC,"LC-101, Wenchang Satellite Launch Center, China","Thu Jul 23, 2020 04:41 UTC",Long March 5 | Tianwen-1,StatusActive,,Success
8,SpaceX,"SLC-40, Cape Canaveral AFS, Florida, USA","Mon Jul 20, 2020 21:30 UTC",Falcon 9 Block 5 | ANASIS-II,StatusActive,50.0,Success
9,JAXA,"LA-Y1, Tanegashima Space Center, Japan","Sun Jul 19, 2020 21:58 UTC",H-IIA 202 | Hope Mars Mission,StatusActive,90.0,Success


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4324 entries, 0 to 4323
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Company Name    4324 non-null   object
 1   Location        4324 non-null   object
 2   Datum           4324 non-null   object
 3   Detail          4324 non-null   object
 4   Status Rocket   4324 non-null   object
 5    Rocket         964 non-null    object
 6   Status Mission  4324 non-null   object
dtypes: object(7)
memory usage: 236.6+ KB


In [5]:
data.describe().transpose()

Unnamed: 0,count,unique,top,freq
Company Name,4324,56,RVSN USSR,1777
Location,4324,137,"Site 31/6, Baikonur Cosmodrome, Kazakhstan",235
Datum,4324,4319,"Wed Nov 05, 2008 00:15 UTC",2
Detail,4324,4278,Cosmos-3MRB (65MRB) | BOR-5 Shuttle,6
Status Rocket,4324,2,StatusRetired,3534
Rocket,964,56,450.0,136
Status Mission,4324,4,Success,3879


# Data pre-processing

> Rocket Feature is Cost of Rocket in $Million

> so we will convert Object to Float Data type and also rename as Rocket

In [6]:
data.rename(columns={' Rocket':'Rocket','Datum':'Date'},inplace=True)

data['Rocket'] = data['Rocket'].apply(lambda x: x if(type(x)==float or (type(x)==int)) else float(x.replace(',','')) if(',' in x) else float(x) )
print('NaN Value in Rocket Column = {0:.2f}'.format(data['Rocket'].isna().sum()/len(data)*100))

NaN Value in Rocket Column = 77.71


> so we hava more than 50% values are Nan in Rocket Column 

> There are multiple Imputing method to fill NaN value like

> Mean , Median , Mode , KNN Imputer , Predictive Imputer and etc

> But we have only ~22% of data in Column 

> So Imputing value will be not a good idea

**Converting Date Column data type from Object to pandas Datetime**

**ans also making new column/seperate columns for Time, Days and Date**

In [7]:
data['Date'] = pd.to_datetime(data['Date'],utc = True)
data['Time'] = data['Date'].dt.time
data['Days'] = data['Date'].dt.day_name()
data['Date'] = data['Date'].dt.date

**Extracting Country in Seprate Columns for better Understanding**

In [8]:
data['Region'] = data['Location'].apply(lambda x:','.join(x.split(',')[-2:]).strip())
data['Country'] = data['Location'].apply(lambda x:''.join(x.split(',')[-1]).strip())
data['States'] = data['Location'].apply(lambda x:''.join(x.split(',')[-2]).strip())

# Data Visualization

# Maximum numbers of Rocket Mission year by year

**Top ten companies/organisation which have maxium numbers of rocket mission**

In [9]:
data['Company Name'].value_counts()

fig = px.pie(data,values = data['Company Name'].value_counts().values[:10],
             names = data['Company Name'].value_counts().index[:10]
             ,hole=.4,height = 700
            )

fig.update_traces(textinfo='label+percent',textposition='inside')

fig.update_layout(title='Top 10 Companies',
                 annotations=[dict(x=0.5,y=0.5,text='Companies',showarrow=False,font_size=20)])

fig.show()

**Top 10 Countries with Maximum Mission**

In [10]:
data['Country'].value_counts().index[:10]

fig = px.pie(data,values=data['Country'].value_counts().values[:10],
                 names = data['Country'].value_counts().index[:10],
                 title='Top 10 Countries with Maximum Mission',
                 hole=.4,height = 700
                )

fig.update_traces(textposition='inside',textinfo='label+percent')

fig.update_layout(annotations=[dict(x=.5,y=.5,text='Countries',showarrow=False,font_size=20)])

fig.show()

In [11]:
def findTopTen(col1,col2,val):
    
    final = {'Success':0,'Failure':0, 'Partial Failure':0, 'Prelaunch Failure':0,col1:''}
    finalDF = pd.DataFrame(final,index=[0])
    n = val[col1].value_counts().index[:15].values
    
    for i in n:
        m = val[val[col1]==i][col2]
        unique = m.value_counts()
        for key,values in zip(unique.index.values,unique.values):
            final[key] = values
        final[col1] = i
        finalDF = finalDF.append(final,ignore_index=True)
        
    return finalDF

# Top 15 Coutries Mission Status

In [12]:

topcountry = findTopTen('Country','Status Mission',data)
fig = px.bar(topcountry,x='Country',y=['Success','Failure','Partial Failure','Prelaunch Failure'],
             title='Top 15 Countries Mission Status',height = 700)

fig.show()

# Top 15 Space Organisations Mission Status

In [13]:
topCompany = findTopTen('Company Name','Status Mission',data)

fig = px.bar(topCompany,x='Company Name',y=['Success','Failure','Partial Failure','Prelaunch Failure'],
            title = 'Top 15 Space Organisations Mission Status',height = 700)

fig.show()

# Total Budget year by year

In [14]:
data['Rocket']  = data['Rocket'].values*1000000

budget = data[['Date','Rocket']]
budget['Date'] = pd.to_datetime(budget['Date'])
budget.set_index('Date',inplace = True)

Yearlybudget = budget.resample('Y').sum().ffill()
monthlyBudget = budget.resample('M').sum().ffill()

In [15]:
fig  = px.line(Yearlybudget,x=Yearlybudget.index,y='Rocket',title='Total Budget year by year',height = 600)

fig.update_traces(hoverinfo='text+name', mode='lines+markers')

fig.show()

# Total number of Launches in every years

In [16]:
launch = data['Date'].value_counts()
launch = pd.DataFrame(launch.values,columns=['Count'],index=launch.index)
launch.index = pd.to_datetime(launch.index)
yearLaunch = launch.resample('Y').sum()


#launch
fig = px.bar(yearLaunch , x=yearLaunch.index,y='Count',title='Total Number of Rocket Launch year by year',height = 600,
            color='Count')
fig.show()

# Which Space Organisation have highest number of Rocket Missions

In [17]:
org = data['Company Name'].value_counts()
org = pd.DataFrame(org.values,columns = ['Count'],index = org.index)

fig = px.bar(org,x = org.index, y = 'Count',title = 'Organisation with Highest Rocket Missions over Years',color='Count',
            height = 650)
fig.show()

# Contries and their Rocket Launch Pad

In [18]:
fig = px.sunburst(data,path = ['Country','States'],height = 700,title = 'Contries and their Rocket Launch Pad')

fig.show()

# Country and their Space Organisations

In [19]:
fig = px.sunburst(data, path = ['Country','Company Name'], title = 'Country and their Space Organisations',height = 700)

fig.show()

# Total Mission and Status of Russia and USA

In [20]:
newDf = data[data['Country'].isin(['USA','Russian Federation'])]

fig = px.bar(newDf , x='Country', facet_col = "Status Mission",color = 'Country'
             ,color_continuous_scale=px.colors.sequential.Cividis_r, title = 'USA and Russia and their Mission Status'
            )
fig.show()


# Total Money Spend by USA and Russia year by year

In [21]:
fig = px.histogram(newDf , x = 'Date',y='Rocket',color = 'Country',title = 'Total Money Spend by USA and Russia year by year')

fig.show()

# Mission Cost and Status of USA and Russia year by year

In [22]:
fig = px.histogram(newDf , x = 'Date',y='Rocket',color = 'Status Mission'
                   ,title = 'Mission Cost and Status of USA and Russia year by year',
                  height = 650)

fig.show()

# Plotting Countries location on Global Map

In [23]:
data.loc[data['Country'] == 'Russia', 'Country'] = 'Russian Federation'
data.loc[data['Country'] == 'New Mexico', 'Country'] = 'USA'
data.loc[data['Country'] == "Yellow Sea", 'Country'] = "China"
data.loc[data['Country'] == "Shahrud Missile Test Site", 'Country'] = "Iran"
data.loc[data['Country'] == "Pacific Missile Range Facility", 'Country'] = "USA"
data.loc[data['Country'] == "Barents Sea", 'Country'] = 'Russian Federation'
data.loc[data['Country'] == "Gran Canaria", 'Country'] = 'USA'

allCountries = {}

for c in countries:
    allCountries[c.name] = c.alpha3

allCountries

data['AlphaNames'] = data['Country']

data = data.replace({'AlphaNames':allCountries})

data.loc[data['Country'] == "North Korea", 'AlphaNames'] = "PRK"
data.loc[data['Country'] == "South Korea", 'AlphaNames'] = "KOR"

# Countries on Global Map

In [24]:
mapdf = data.groupby(['Country', 'AlphaNames'])['Status Mission'].count().reset_index()
fig = px.choropleth(mapdf,locations = 'AlphaNames',color='Country',projection="equirectangular",height = 600,width = 1200)
fig.show()

# Year by year Mission took place and Mission Status

In [25]:
data['Date'] = pd.to_datetime(data['Date'], errors='coerce')
data.sort_values('Date',inplace = True)
data['Year'] = data['Date'].dt.year

fig = px.choropleth(data,locations = 'AlphaNames',animation_frame = 'Year',color = 'Status Mission',title = 'Countries Rocket Mission Status year by year',
                   height = 650)
fig.show()

* If you guys find this usefull then please upvote
* If you guys have any suggestion or find any mistake then please comment down, I will try to improvise.

**Thank You**