# Is there life after graduate school?

The goal of this blog post is to explain the process of creating a dashboard visalization using *streamlit*. This dashboard visualizes data about doctorate recipients including: demographic information, field of study, and postgraduation plans. Data was collected from the National Center for Science and Engineering Statistics (NCSES) and datasets can be found here: https://ncses.nsf.gov/pubs/nsf19301/data. Analysis were performed using the pandas library and visualizations were created using the matplotlib and plotly libraries.

**View dashboard visualization: https://share.streamlit.io/saahithirao/bios-823-blog/hw4.py** 

**View code: https://github.com/saahithirao/bios-823-blog/blob/master/hw4.py** This code can be downloaded to a personal machine and run using >>streamlit run hw4.py

**Doctorate recipients by gender & race from 2008-2016**  
This visualization displays an interactive data table of doctorate recipents by gender and race from 2008 to 2016. The user can click on a year on the sidebar to display data for that year and can select to view data by gender or all genders at the same time. Since there was no dataset that contained all of this information, I opted to combine two different datasets: one that contained data on females and on that contained data on males. I extracted the necessary information to create the visualization, as shown below, and merged the two dataframes. Then I created a widget (sidebar) that allows the user to select a specific year that they want to see data for and/or filter the data by gender. Code for this is shown below and linked above.

In [2]:
import pandas as pd
df = pd.read_excel("https://ncses.nsf.gov/pubs/nsf19301/assets/data/tables/sed17-sr-tab021.xlsx", header=3)
df = df.rename(columns={'Ethnicity, race, and citizenship status':'Race'})
df_female = (
        df.
        drop(df[df['Race'].str.contains('citizen')].index.tolist()).
        drop(df[df['Race'].str.contains('visa')].index.tolist()).
        drop(df[df['Race'].str.contains('Hispanic')].index.tolist()).
        drop(df[df['Race'].str.contains('Ethnicity')].index.tolist()).
        reset_index().
        drop(columns = ['index'])
    )
df_female["Gender"] = "Female"
df_female

Unnamed: 0,Race,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,Gender
0,All doctorate recipients,22494,23187,22488.0,22700,23527,24366,24816,25354,25256,25495,Female
1,American Indian or Alaska Native,70,78,65.0,79,65,64,55,77,73,62,Female
2,Asian,4795,4741,4662.0,4904,5117,5392,5465,5598,5521,5605,Female
3,Black or African American,1396,1552,1403.0,1396,1481,1554,1503,1660,1662,1756,Female
4,White,12704,13058,12525.0,12572,12941,13264,13324,13523,13623,13413,Female
5,More than one race,298,370,418.0,399,484,497,488,535,605,594,Female
6,Other race or race not reported,222,238,205.0,191,202,204,162,167,193,321,Female


In [None]:
# Creating widget for selecting data by year and gender (separately)
st.sidebar.header("User Input")
selected_year = st.sidebar.selectbox('Year', list(reversed(range(2008,2017))))
phds = load_data(selected_year)
unique_gender = phds.Gender.unique()
select_gender = st.sidebar.multiselect('Gender', unique_gender, unique_gender)
df_selected = phds[(phds.Gender.isin(select_gender))]

**Number of doctorate recipients by gender over time**  

This visualization displays an interactive plot illustrating number of doctorate recipients over time and separated by gender. The user can hover over points to view data for that year. 

In [None]:
import plotly.express as px
df_select = df_female[df_female["Race"] == "All doctorate recipients"]
df2 = (df_select.drop(['Gender'], axis=1))
df_long = pd.melt(df2,id_vars=['Race'],var_name='Year', value_name='phds')
df_long['Gender'] = ['Female']*10

df_select_male = df_male[df_male["Race"] == "All doctorate recipients"]
df3 = (df_select_male.drop(['Gender'], axis=1))
df_long2 = pd.melt(df3,id_vars=['Race'],var_name='Year', value_name='phds')
df_long2['Gender'] = ['Male']*10

df_comb = pd.concat([df_long, df_long2], ignore_index=True)

fig = px.line(df_combine_plot, 
                x='Year', y='phds', color='Gender', 
                labels = {
                    "phds": "Number of PhDs"
                })