# <div align = "center"> Data Analysis and Visualization of Astronauts from 1959 to 2013 <div>

In this notebook, we will be analyzing a dataset from NASA that contains information of astronauts from 1959 to 2013. What we're trying to figure out is if there are any commonalities between astronauts in NASA and to visualize these commonalities or trends, along with answering a couple questions:

### Is there an “ideal” astronaut? 
- In this case, "ideal" means the best candidate that can be selected as an astronaut

### Can the “best path” to being an astronaut be mapped out?
- "Best path" is defined as the best actions to take or activities to participate in (majors, military service, etc.) prior to being selected as an astronaut. 

## **Environment Set-Up**
Setting up pandas/numpy/plotly.express and the dataset we will be using

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px

In [None]:
#Astronaut Dataset
astro = pd.read_csv("https://raw.githubusercontent.com/ishaandey/node/master/week-5/practice/nasa_astronauts.csv")

Take a quick peek at the data... It's always good practice to get a quick feel at what your data looks like
- Useful Functions: `head()`, `info()`

## **Cleaning Up the Data**

### Death Date
It looks like there are lots of NaNs in the Death Date column (not everyone in this dataset is deceased). Let's fill the NaN's with some arbitrary date (01/01/2262).

- Useful Functions: `pd.Timestamp()`, `.fillna()`

In [None]:
# Change the NaNs in the Death Date column to an arbitrary date (01/01/2622)


Now, let's change the column to a datetime object to work with it more later on
- Useful Functions: `pd.to_datetime()`

In [None]:
# Change Death Date to a datetime object


### Death Mission
Similar to replacing the NaNs in Death Date, let's fill the NaNs in this column with a placeholder.

There seem to be two main "categories" of NaNs...
1. Astronauts that do not have a death date -> Let's fill these with "Alive"
2. Astronauts that passed away but not directly from a space mission -> Let's fill these with "Unrelated Death"
- Useful Functions:  `.loc[]`, masking and subsetting!

In [None]:
# Replacing the NaNs of astronauts that have an unrelated death


In [None]:
# Replacing the NaNs of astronauts that are alive


### Missions Column
The NaNs in this column seem to stem from astronauts that haven't had any missions yet. Let's fill these with "None"

In [None]:
# Replace NaN in Missions column with "None"


### Military Branch

Again, lots of NaNs in this column because not every astronaut served in the military. Let's go ahead and replace these with "Civilian"

In [None]:
# Replace NaN in Military Branch with "Civilian"


### Majors

Some astronauts didn't go to graduate or even undergraduate school! Let's go ahead and replace the NaNs with "No Degree" (you're probably already a pro at this point :D ).

In [None]:
# Replace Undergraduate Major NaNs with "No Degree"


In [None]:
# Replace Graduate Major NaNs with "No Degree"


That's it for cleaning! No more NaN values, which makes the dataset much easier to work with :)

## **The Power of DateTime!**

Let's look at a cool function we can do with datetimes! First, let's set the Birth Date column as a `datetime` object, similar to how we did with Death Date.

In [None]:
# Set Birth Date to datetime


Now, let's see the age at which each astronaut was selected to be an astronaut. Could you figure out how to create a new column with the age at which they were selected? (Name the new column as "Age Selected")
- Hint: the Year column in the dataset shows the Year at which they were selected to be an astronaut. Think of a way to use that with their birth year

In [None]:
# Finding the age that each astronaut was selected


## **Data Viz!**
Let's check out all the interested trends within this dataset to answer our ultimate question: "Is there an ideal path to an astronaut?"

### **What's the Age Distribution for Astronauts?**
Plot the age distrubution as a *histogram* using plotly.express (px). At what age is the majority of the astronauts selected?

In [None]:
# Plotting Age Selected as a Histogram


### **How many astronauts are men? How many are women?**

Let's look at the distribution of men vs. women in the astronaut dataset. A simple pie chart could work here (look at documentation!), OR you could even incorporate the gender within your histogram you made earlier...

In [None]:
# Plotting gender distributions within NASA Astronauts

### **Most common undergraduate/graduate degree?**

Let's look at the degrees that these astronauts got! Bar charts are good here (you may need two charts, one for undergrad, one for grad). Try to sort the values so the most common is on one side of the chart(s)
- Useful tip: use the `orientation` parameter in `px.bar` to make the bar chart horizontal
    - If you're unsure what this means, try Googling it!

In [None]:
# Visualizing the most common undergraduate degrees


In [None]:
# Visualizing the most common graduate degrees


### **Military vs. Civilian**

Let's see if having any military experience increases your odds of becoming an astronaut...

Plot the Military Branches/Civilian status using whichever chart(s) you like

In [None]:
# Visualize military/civilian distributions


## **Answering the Question...**

Now, after your research, what can you say the "best path" to an astronaut is based on Age? Higher Education? Military Service?

- Just a heads up: The trends here doesn't necessarily show the "best path" to being an astronaut. Every year the requirements/needs of NASA changes, and the application process is quite extensive. This is just a fun activity to look at trends within astronauts already selected :)

Feel free to edit this cell for your response!

