# Is There Life After Graduate School?
By Jasmine Young

### The Task

In this blog post we will be using data from [Science and Engineering PhDs awarded in the US](https://ncses.nsf.gov/pubs/nsf19301/data). We will perform some analysis in `pandas` and make a dashboard visualization of a few interesting aspects of the data.

### Plotly & Dash
I chose to use plotly for my visualizations and Dash to power my dashboard because it was easily integrated into Jupyter notebook. Plotly allows you to easily create interactive visualizations using python. I found Dash to be straightforward and easily customizable when working with Plotly visualizations.

### Data Analysis In Pandas
The data transformations and analysis necessary to create the dashboard can be viewed below.

In [182]:
#collapse
table2 = pd.read_excel('tab001.xlsx', index_col=0, header=3)
table2 = table2.replace('-', 0)
table2.reset_index(level=0, inplace=True)

In [None]:
#collapse
table4 = pd.read_excel('tab006.xlsx', index_col=0, header=[3,4])

In [75]:
#collapse
us_state_to_abbrev = {
    "Alabama": "AL",
    "Alaska": "AK",
    "Arizona": "AZ",
    "Arkansas": "AR",
    "California": "CA",
    "Colorado": "CO",
    "Connecticut": "CT",
    "Delaware": "DE",
    "Florida": "FL",
    "Georgia": "GA",
    "Hawaii": "HI",
    "Idaho": "ID",
    "Illinois": "IL",
    "Indiana": "IN",
    "Iowa": "IA",
    "Kansas": "KS",
    "Kentucky": "KY",
    "Louisiana": "LA",
    "Maine": "ME",
    "Maryland": "MD",
    "Massachusetts": "MA",
    "Michigan": "MI",
    "Minnesota": "MN",
    "Mississippi": "MS",
    "Missouri": "MO",
    "Montana": "MT",
    "Nebraska": "NE",
    "Nevada": "NV",
    "New Hampshire": "NH",
    "New Jersey": "NJ",
    "New Mexico": "NM",
    "New York": "NY",
    "North Carolina": "NC",
    "North Dakota": "ND",
    "Ohio": "OH",
    "Oklahoma": "OK",
    "Oregon": "OR",
    "Pennsylvania": "PA",
    "Rhode Island": "RI",
    "South Carolina": "SC",
    "South Dakota": "SD",
    "Tennessee": "TN",
    "Texas": "TX",
    "Utah": "UT",
    "Vermont": "VT",
    "Virginia": "VA",
    "Washington": "WA",
    "West Virginia": "WV",
    "Wisconsin": "WI",
    "Wyoming": "WY",
    "District of Columbia": "DC",
    "American Samoa": "AS",
    "Guam": "GU",
    "Northern Mariana Islands": "MP",
    "Puerto Rico": "PR",
    "United States Minor Outlying Islands": "UM",
    "U.S. Virgin Islands": "VI",
}

In [183]:
#collapse
t401 = table4.groupby(level=0, axis=1).sum()
t401 = t401.replace('DD',0)
t401['abbrev'] = t401.index.map(us_state_to_abbrev)
t401 = t401.drop(['United Statesd'])
for col in t401.columns:
    t401[col] = t401[col].astype(str)
t401['Text'] = ' Education: ' + t401['Education'] + ' Engineering: ' + t401['Engineering'] + '<br>' + \
    ' Humanities & Arts: ' + t401['Humanities and arts'] + ' Life Sciences: ' + \
    t401['Life sciencesb'] + '<br>' + ' Math & Computer Sciences: ' + \
    t401['Mathematics and computer sciences'] + ' Other: ' + t401['Otherc'] + \
    '<br>' + ' Physical & Earth Sciences: ' + t401['Physical sciences and earth sciences '] + \
    ' Psychology & Social Sciences: ' + t401['Psychology and social sciences']

In [None]:
#collapse
t402 = table4.groupby(level=0, axis=1).sum()
t402 = t402.iloc[: , :-1]
t402 = pd.DataFrame(t402.loc['United Statesd']).reset_index()

In [None]:
#collapse
table5 = pd.read_excel('tab015.xlsx', index_col=0, header=3)
table5 = table5.iloc[: , :-1]

### Dashboard

The collapsed code below will show how I created the Dashboard. This dashboard highlights a few key points about the doctoral data. 
- We show the increase in doctoral recipients over times, and how that increase has slowed in recent decades.
- We show a comparison of the number of doctorate recipients in each subject category. Life Sciences still produces the most doctorate recipients.
- We show how the number of male and female doctorate recipients has changed over time.
- We show how the number of doctoral graduates varies by State (specifically in the Year 2017).

In [179]:
#collapse
import dash
from dash.dependencies import Input, Output
from dash import dcc
from dash import html
from jupyter_dash import JupyterDash
import pandas as pd
from datetime import datetime as dt
import plotly.graph_objects as go

app = JupyterDash('Hello World')

def stock_prices(): 
    fig = go.Figure([go.Bar(x = t402['State or location'], y = t402['United Statesd'])
                     ])
    fig.update_layout(title = 'Doctorate Recipients by Subject Category',
                      xaxis_title = 'Subject',
                      yaxis_title = 'Number of Doctorate Recipients'
                      )
    fig.update_layout(xaxis_tickangle=-45)
    return fig 

def other_graph():
    fig = go.Figure([go.Scatter(x = table5.columns, y = table5.loc['Female'],\
                     name = 'Female')
                     ])
    fig.add_trace(go.Scatter(x = table5.columns, y = table5.loc['Male'], name = 'Male'))
    fig.update_layout(title = 'Doctorate Recipients By Sex Over Time',
                      xaxis_title = 'Year',
                      yaxis_title = 'Number of Doctorate Recipients'
                      )
    return fig

def make_map():
    fig = go.Figure(data=go.Choropleth(
    locations=t401['abbrev'], # Spatial coordinates
    z = t401['Totala'].astype(float), # Data to be color-coded
    locationmode = 'USA-states', # set of locations match entries in `locations`
    colorscale = 'Reds',
    text=t401['Text'],
    colorbar_title = "Number of Doctoral Graduates",
    ))
    fig.update_layout(
    title_text = 'Doctoral Graduates By State & Subject Area (2017)',
    geo_scope='usa', # limite map scope to USA
    )
    return fig

app.layout = html.Div([
    html.Div([
    dcc.Dropdown(
        id='my-dropdown',
        options=[
            {'label': 'Doctorate Recipients Over Time', 'value': 'Doctorate recipients'},
            {'label': 'Percent Change in Doctorate Recipients Over Time', 'value': '% change from previous year'}
        ],
        value='Doctorate recipients'
    ),
    dcc.Graph(id='my-graph')
], style={'width': '49%', 'display': 'inline-block'}),
    html.Div([
        dcc.Graph(id = 'line_plot', figure = stock_prices())
        ], style={'width': '49%', 'display': 'inline-block'}),
    html.Div([
    dcc.Graph(id = 'other_line_plot', figure = other_graph())
    ], style={'display': 'block'}),
    html.Div([
    dcc.Graph(id = 'map', figure = make_map())
    ], style={'display': 'block'})
])

@app.callback(Output('my-graph', 'figure'), [Input('my-dropdown', 'value')])
def update_graph(selected_dropdown_value):
    df = table2
    return {
        'data': [{
            'x': df.Year,
            'y': df[str(selected_dropdown_value)]
        }],
        'layout': {'margin': {'l': 40, 'r': 0, 't': 20, 'b': 30}}
    }


In [180]:
app.run_server(mode='inline')