# Python Data Analysis of Major Spaceflight Milestones
## A Python Project by Ian Takaoka

## Introduction

*Since 1961, **600 humans** have achieved spaceflight from **37 countries**, and this number increases yearly.*
*This project aims to use Python with Wikipedia integration to provide information and useful statistics on some of these astronauts.*

## Data Import
Credits: Mariya Stavnichuk and Tatsuya Corlett (original database), Georgios Karamanis via Kaggle (for csv)

From the author: 
>*"This database contains publically available information about all astronauts who participated in space missions before 15 January 2020 collected from NASA, Roscosmos, and fun-made websites. The provided information includes full astronaut name, sex, date of birth, nationality, military status, a title and year of a selection program, and information about each mission completed by a particular astronaut such as a year, ascend and descend shuttle names, mission and extravehicular activity (EVAs) durations."*

Limitations (from me): 
This does not include spaceflight statistics after January 2020. As the Soviet Union did not distinguish between nationalities within its borders, the USSR/Russia field may include cosmonauts from other Soviet Republics and not specifically from the Russian SFSR. Additionally, no commercial astronauts are listed, as SpaceX's Crew Dragon Demo-2, the first manned commercial spaceflight, did not launch until May 2020. 


In [None]:
import pandas as pd
import numpy as np

##Will return info from Wikipedia about specific individuals. Make sure to pip install wikipedia before proceeding with this program
import wikipedia as wiki
##Runs the Cyrtranslit module as data cleaning for Cyrillic names. Make sure to pip install cyrtranslit before proceeding
##import cyrtranslit as cyr

from matplotlib import pyplot as plt
##Under the hood to allow multiple Wikipedia summaries per cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

pd.set_option('chained_assignment',None)


astronaut_raw = pd.read_csv('astronauts.csv')
astronaut_raw.drop_duplicates(subset=['name' , 'original_name'], keep= 'first')

## Astronaut Nationality

The Soviet Union was the first nation to achieve manned spaceflight in 1961. Since then, 37 countries have sent humans to space. 

In [None]:
##This dataframe needs to remove duplicates, as many astronauts went to space more than once. 
df1 = astronaut_raw[['name', 'nationality', 'year_of_mission', 'mission_title']]
nations=df1.drop_duplicates(subset ='name', keep = 'first') 
nations_counts=nations['nationality'].value_counts()
print(nations_counts);
nations_counts.plot(kind = 'pie');

In [None]:
usrussia = nations[nations['nationality'].str.contains("U.S.")==True]
usrussia_counts = usrussia['nationality'].value_counts()
print(usrussia_counts);
usrussia_counts.plot(kind = 'pie');

## Astronaut Age
During the Space Race, the first astronauts and cosmonauts were primarily drawn from the military and government sectors. Specifically, due to the highly experimental nature of early manned spaceflight, the first astronauts were almost exclusively test pilots from the armed forces, and as such were well into their military careers. Civilians were eventually drawn into the space program, typically highly degreed scientists as mission specialists. Currently, NASA maintains that Astronaut Candidates (ASCANs) must have at least a Master's degree in a Science, Technology, Engineering, and Mathematics (STEM) field and extensive professional experience, or 1000 hours of pilot command time in jet aircraft. This tends to skew average astronaut ages at selection to early 30s, with crew selections set years in advance of mission launch to allow for adequate training and mission control experience. Thus, many astronauts are in their early 40s at mission launch. 

The youngest astronauts at time of selection were the Soviet cosmonauts **Pyotr Klimuk** , **Gennady Sarafanov** and **Vyacheslav Zudov,** who were each 23 years old. The youngest humans to achieve spaceflight were Soviet cosmonauts **Gherman Titov** and **Valentina Tereshkova**, each at 26 years old. 32 year old **Sally Ride** is the youngest American to achieve spaceflight.

Private citizen **Dennis Tito** at 60 years old currently holds the record for oldest astronaut at selection, while American career astronaut **John Glenn** is the oldest astronaut at time of launch at 77 years old*.

<sub><sup>As of 2022, 90-year old actor William Shatner surpassed John Glenn as the oldest human in space with Blue Origin. As mentioned previously, this is beyond the scope of the dataset</sup></sub>

### Age At Selection

In [None]:
df2 = astronaut_raw[['name', 'year_of_birth', 'year_of_selection', 'nationality', 'selection']]

df2['age_at_selection'] = df2['year_of_selection'] - df2['year_of_birth']

df2.drop_duplicates(subset=['name'], keep= 'first')

In [None]:
selection_year = df2[['year_of_selection']].to_numpy()
selection_age = df2[['age_at_selection']].to_numpy()
plt.figure(figsize = (12, 6))
plt.scatter(selection_year, selection_age, marker= '+', color = 'red')
plt.xlabel('Selection Year')
plt.ylabel('Age at First Selection')
plt.title('Distribution of Astronaut Candidate Age')
plt.show();

In [None]:
##Mean Age of Astronaut Candidates
df2_mean = (df2.age_at_selection.mean())
print('The average age of all astronaut candidates prior to 2020 is ' + str(round(df2_mean)) + ' ' + 'years old.' )

In [None]:
##Youngest Astronauts at Selection
df2_youngest = (df2[df2.age_at_selection == df2.age_at_selection.min()])
df2_youngest.drop_duplicates(subset=['name'], keep= 'first')

In [None]:
##Oldest Astronaut at Selection 
df2_oldest = (df2[df2.age_at_selection == df2.age_at_selection.max()])
df2_oldest.drop_duplicates(subset=['name'], keep= 'first')

In [None]:
wiki.summary(df2_oldest['name'])

### Age at Spaceflight

In [None]:
df3 = astronaut_raw[['name', 'year_of_birth', 'year_of_mission', 'nationality', 'mission_title']]

df3['age_at_mission'] = df3['year_of_mission'] - df3['year_of_birth']

df3

In [None]:
mission_year = df3[['year_of_mission']].to_numpy()
mission_age = df3[['age_at_mission']].to_numpy()
plt.figure(figsize = (12, 6))
plt.scatter(mission_year, mission_age, marker= '+', color = 'red')
plt.xlabel('Mission Year')
plt.ylabel('Age at Mission Launch')
plt.title('Distribution of Astronaut Age at Time of Spaceflight')
plt.show();

In [None]:
##Mean Age of Astronauts at Mission Launch
df3_mean = (df3.age_at_mission.mean())
print('The average age of all astronauts at mission launch prior to 2020 is ' + str(round(df3_mean)) + ' ' + 'years old.' )

In [None]:
##Youngest Astronauts at Mission Launch
df3_youngest = (df3[df3.age_at_mission == df3.age_at_mission.min()])
df2_youngest.drop_duplicates(subset=['name'], keep= 'first')

In [None]:
wiki.summary(df3_youngest['name'], sentences=3)

In [None]:
#Oldest Astronauts at Mission Launch
df3_oldest = (df3[df3.age_at_mission == df3.age_at_mission.max()])
df3_oldest.drop_duplicates(subset=['name'], keep= 'first')

In [None]:
wiki.summary(df3_oldest['name'], sentences=4)