# <div align = "center"> Data Analysis and Visualization of Astronauts from 1959 to 2013 <div>

In this notebook, we will be analyzing a dataset from NASA that contains information of astronauts from 1959 to 2013. What we're trying to figure out is if there are any commonalities between astronauts in NASA and to visualize these commonalities or trends, along with answering a couple questions:

### Is there an “ideal” astronaut? 
- In this case, "ideal" means the best candidate that can be selected as an astronaut

### Can the “best path” to being an astronaut be mapped out?
- "Best path" is defined as the best actions to take or activities to participate in (majors, military service, etc.) prior to being selected as an astronaut. 

## **Environment Set-Up**
Setting up pandas/numpy/plotly.express and the dataset we will be using

In [21]:
import pandas as pd
import numpy as np
import plotly.express as px

In [22]:
#Astronaut Dataset
astro = pd.read_csv("https://raw.githubusercontent.com/ishaandey/node/master/archives/data-practice/nasa_astronauts.csv")

Take a quick peek at the data... It's always good practice to get a quick feel at what your data looks like
- Useful Functions: `head()`, `info()`

In [23]:
astro.head()

Unnamed: 0,Name,Year,Group,Status,Birth Date,Birth Place,Gender,Alma Mater,Undergraduate Major,Graduate Major,Military Branch,Space Flight (hr),Space Walks (hr),Missions,Death Date,Death Mission
0,Joseph M. Acaba,2004.0,19.0,Active,5/17/1967,"Inglewood, CA",Male,University of California-Santa Barbara; Univer...,Geology,Geology,,3307,13,"STS-119 (Discovery), ISS-31/32 (Soyuz)",,
1,James C. Adamson,1984.0,10.0,Retired,3/3/1946,"Warsaw, NY",Male,US Military Academy; Princeton University,Engineering,Aerospace Engineering,US Army,334,0,"STS-28 (Columbia), STS-43 (Atlantis)",,
2,Thomas D. Akers,1987.0,12.0,Retired,5/20/1951,"St. Louis, MO",Male,University of Missouri-Rolla,Applied Mathematics,Applied Mathematics,US Air Force,814,29,"STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...",,
3,Buzz Aldrin,1963.0,3.0,Retired,1/20/1930,"Montclair, NJ",Male,US Military Academy; MIT,Mechanical Engineering,Astronautics,US Air Force,289,8,"Gemini 12, Apollo 11",,
4,Andrew M. Allen,1987.0,12.0,Retired,8/4/1955,"Philadelphia, PA",Male,Villanova University; University of Florida,Mechanical Engineering,Business Administration,US Marine Corps,906,0,"STS-46 (Atlantis), STS-62 (Columbia), STS-75 (...",,


In [24]:
astro.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 329 entries, 0 to 328
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Name                 329 non-null    object 
 1   Year                 329 non-null    float64
 2   Group                329 non-null    float64
 3   Status               329 non-null    object 
 4   Birth Date           329 non-null    object 
 5   Birth Place          329 non-null    object 
 6   Gender               329 non-null    object 
 7   Alma Mater           329 non-null    object 
 8   Undergraduate Major  309 non-null    object 
 9   Graduate Major       275 non-null    object 
 10  Military Branch      206 non-null    object 
 11  Space Flight (hr)    329 non-null    int64  
 12  Space Walks (hr)     329 non-null    int64  
 13  Missions             306 non-null    object 
 14  Death Date           48 non-null     object 
 15  Death Mission        14 non-null     obj

## **Cleaning Up the Data**

### Death Date
It looks like there are lots of NaNs in the Death Date column (not everyone in this dataset is deceased). Let's fill the NaN's with some arbitrary date (01/01/2262).

- Useful Functions: `pd.Timestamp()`, `.fillna()`

In [25]:
# Change the NaNs in the Death Date column to an arbitrary date (01/01/2622)
arb_date = pd.Timestamp(year=2262, month=1, day=1) #Setting up a variable for the arbitrary date

astro['Death Date'] = astro['Death Date'].fillna(arb_date) #Sets all NaN values to the date 01/01/2262
astro.head()


Unnamed: 0,Name,Year,Group,Status,Birth Date,Birth Place,Gender,Alma Mater,Undergraduate Major,Graduate Major,Military Branch,Space Flight (hr),Space Walks (hr),Missions,Death Date,Death Mission
0,Joseph M. Acaba,2004.0,19.0,Active,5/17/1967,"Inglewood, CA",Male,University of California-Santa Barbara; Univer...,Geology,Geology,,3307,13,"STS-119 (Discovery), ISS-31/32 (Soyuz)",2262-01-01 00:00:00,
1,James C. Adamson,1984.0,10.0,Retired,3/3/1946,"Warsaw, NY",Male,US Military Academy; Princeton University,Engineering,Aerospace Engineering,US Army,334,0,"STS-28 (Columbia), STS-43 (Atlantis)",2262-01-01 00:00:00,
2,Thomas D. Akers,1987.0,12.0,Retired,5/20/1951,"St. Louis, MO",Male,University of Missouri-Rolla,Applied Mathematics,Applied Mathematics,US Air Force,814,29,"STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...",2262-01-01 00:00:00,
3,Buzz Aldrin,1963.0,3.0,Retired,1/20/1930,"Montclair, NJ",Male,US Military Academy; MIT,Mechanical Engineering,Astronautics,US Air Force,289,8,"Gemini 12, Apollo 11",2262-01-01 00:00:00,
4,Andrew M. Allen,1987.0,12.0,Retired,8/4/1955,"Philadelphia, PA",Male,Villanova University; University of Florida,Mechanical Engineering,Business Administration,US Marine Corps,906,0,"STS-46 (Atlantis), STS-62 (Columbia), STS-75 (...",2262-01-01 00:00:00,


Now, let's change the column to a datetime object to work with it more later on
- Useful Functions: `pd.to_datetime()`

In [26]:
# Change Death Date to a datetime object
astro['Death Date'] = pd.to_datetime(astro['Death Date']) #Changes the dtype from object to datetime
astro.head()

Unnamed: 0,Name,Year,Group,Status,Birth Date,Birth Place,Gender,Alma Mater,Undergraduate Major,Graduate Major,Military Branch,Space Flight (hr),Space Walks (hr),Missions,Death Date,Death Mission
0,Joseph M. Acaba,2004.0,19.0,Active,5/17/1967,"Inglewood, CA",Male,University of California-Santa Barbara; Univer...,Geology,Geology,,3307,13,"STS-119 (Discovery), ISS-31/32 (Soyuz)",2262-01-01,
1,James C. Adamson,1984.0,10.0,Retired,3/3/1946,"Warsaw, NY",Male,US Military Academy; Princeton University,Engineering,Aerospace Engineering,US Army,334,0,"STS-28 (Columbia), STS-43 (Atlantis)",2262-01-01,
2,Thomas D. Akers,1987.0,12.0,Retired,5/20/1951,"St. Louis, MO",Male,University of Missouri-Rolla,Applied Mathematics,Applied Mathematics,US Air Force,814,29,"STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...",2262-01-01,
3,Buzz Aldrin,1963.0,3.0,Retired,1/20/1930,"Montclair, NJ",Male,US Military Academy; MIT,Mechanical Engineering,Astronautics,US Air Force,289,8,"Gemini 12, Apollo 11",2262-01-01,
4,Andrew M. Allen,1987.0,12.0,Retired,8/4/1955,"Philadelphia, PA",Male,Villanova University; University of Florida,Mechanical Engineering,Business Administration,US Marine Corps,906,0,"STS-46 (Atlantis), STS-62 (Columbia), STS-75 (...",2262-01-01,


### Death Mission
Similar to replacing the NaNs in Death Date, let's fill the NaNs in this column with a placeholder.

There seem to be two main "categories" of NaNs...
1. Astronauts that do not have a death date -> Let's fill these with "Alive"
2. Astronauts that passed away but not directly from a space mission -> Let's fill these with "Unrelated Death"
- Useful Functions:  `.loc[]`, masking and subsetting!

In [27]:
# Replacing the NaNs of astronauts that have an unrelated death
astro.loc[(astro['Death Date'] != arb_date) & (astro['Death Mission'].isna()), 'Death Mission'] = 'Unrelated Death'
# In the code above, the arb_date was set earlier to be 01/01/2262

In [28]:
# Replacing the NaNs of astronauts that are alive
astro.loc[astro['Death Mission'].isna(), 'Death Mission'] = "Alive"

### Missions Column
The NaNs in this column seem to stem from astronauts that haven't had any missions yet. Let's fill these with "None"

In [29]:
# Replace NaN in Missions column with "None"
astro['Missions'] = astro['Missions'].fillna('None')

### Military Branch

Again, lots of NaNs in this column because not every astronaut served in the military. Let's go ahead and replace these with "Civilian"

In [32]:
# Replace NaN in Military Branch with "Civilian"
astro['Military Branch'] = astro['Military Branch'].fillna('Civilian')

### Majors

Some astronauts didn't go to graduate or even undergraduate school! Let's go ahead and replace the NaNs with "No Degree" (you're probably already a pro at this point :D ).

In [35]:
# Replace Undergraduate Major NaNs with "No Degree"
astro.loc[astro['Undergraduate Major'].isna(), 'Undergraduate Major'] = 'No Degree'

In [39]:
# Replace Graduate Major NaNs with "No Degree"
astro.loc[astro['Graduate Major'].isna(), 'Graduate Major'] = 'No Degree'

That's it for cleaning! No more NaN values, which makes the dataset much easier to work with :)

## **The Power of DateTime!**

Let's look at a cool function we can do with datetimes! First, let's set the Birth Date column as a `datetime` object, similar to how we did with Death Date.

In [45]:
# Set Birth Date to datetime
astro['Birth Date'] = pd.to_datetime(astro['Birth Date'])

Now, let's see the age at which each astronaut was selected to be an astronaut. Could you figure out how to create a new column with the age at which they were selected? (Name the new column as "Age Selected")
- Hint: the Year column in the dataset shows the Year at which they were selected to be an astronaut. Think of a way to use that with their birth year

In [49]:
# Finding the age that each astronaut was selected
astro['Age Selected'] = astro['Year'] - astro['Birth Date'].dt.year

## **Data Viz!**
Let's check out all the interested trends within this dataset to answer our ultimate question: "Is there an ideal path to an astronaut?"

### **What's the Age Distribution for Astronauts?**
Plot the age distrubution as a *histogram* using plotly.express (px). At what age is the majority of the astronauts selected?

In [54]:
# Plotting Age Selected as a Histogram

# This is the most basic form to answer this question
# Are there ways to make the viz better?
px.histogram(astro['Age Selected'])

### **How many astronauts are men? How many are women?**

Let's look at the distribution of men vs. women in the astronaut dataset. A simple pie chart could work here (look at documentation!), OR you could even incorporate the gender within your histogram you made earlier...

In [60]:
# Again, the most basic form to answer this question with a pie chart
px.pie(astro, astro['Gender'])

# Incorporating the histogram...
px.histogram(astro['Age Selected'], color=astro['Gender'])

### **Most common undergraduate/graduate degree?**

Let's look at the degrees that these astronauts got! Bar charts are good here (you may need two charts, one for undergrad, one for grad). Try to sort the values so the most common is on one side of the chart(s)
- Useful tip: use the `orientation` parameter in `px.bar` to make the bar chart horizontal
    - If you're unsure what this means, try Googling it!

In [77]:
# Visualizing the most common undergraduate degrees
top10_udeg = astro['Undergraduate Major'].value_counts().head(10) #This sorts and takes the top 10 majors
px.bar(top10_udeg, orientation='h')

In [78]:
# Visualizing the most common graduate degrees
top10_gdeg = astro['Graduate Major'].value_counts().head(10) #This sorts and takes the top 10 majors
px.bar(top10_gdeg, orientation='h')

### **Military vs. Civilian**

Let's see if having any military experience increases your odds of becoming an astronaut...

Plot the Military Branches/Civilian status using whichever chart(s) you like

In [86]:
# I used a bar chart here, but you could also use a pie chart
px.bar(astro['Military Branch'].value_counts())
# It looks like all the military branches combined outweight the civilians, with Air Force on top

## **Answering the Question...**

Now, after your research, what can you say the "best path" to an astronaut is based on Age? Higher Education? Military Service?

- Just a heads up: The trends here doesn't necessarily show the "best path" to being an astronaut. Every year the requirements/needs of NASA changes, and the application process is quite extensive. This is just a fun activity to look at trends within astronauts already selected :)

Feel free to edit this cell for your response!

