# UMD Salary Guide Data

## Contents
1. [Introduction](#intro)
2. [About the Data](#about)
3. [Scraping](#scrape)
4. [Tidying](#tidy)
5. [Exploratory Data Analysis and Data Visualization](#eda)
6. [Hypothesis Testing and Analysis](#hypothesis)
6. [Conclusion](#conclusion)
7. [Future Work](#future)
8. [References](#references)

## Required Tools

<a id='intro'></a>

## 1. Introduction
For the University of Maryland community, the Diamondback’s yearly salary guides are an often browsed dataset. (If you are not from UMD, we still welcome you to take a look!) After these salary guides are released, many basic observations can be made about the highest paying faculty, your favorite professors’ salaries, etc. However, any more complicated observations would require the power and knowledge of computational analysis and data science techniques. More generally, given a dataset, it’s not always clear what next steps to take when jumping from initial observations to more complex conclusions and data visualizations. Therefore, we aim to leave the audience with a better understanding of data science protocols through analyzing trends in UMD faculty and staff salaries. 

<a id='about'></a>

## 2. About the Data
The first step to data analysis is to choose and obtain a dataset. In our case, this is __[UMD Salary Data from the Diamondback](http://salaryguide.diamondbacklab.com/)__. They state on their website, "Each year, the university provides this public data to The Diamondback in a basic Excel spreadsheet." However, we received no reponse to our multiple requests for these spreadsheets, which was even more surprising as this data is public information. Thus, we had no choice but to scrape the salary data from their website. If you are interested in how we scraped this data, we will explain this process in the next section. On the other hand, if you are not too interested in the scraping process, that is okay too since we've scraped the yearly data into csv files for you! You can access the scraped data from __[our repository](https://github.com/rosegaray/UMDSalaryGuideData)__ and skip the next section.

Note: Another dataset we could have used comes from the __[Baltimore Sun](http://www.baltimoresun.com/news/data/bal-public-salaries-archive-20150415-htmlstory.html)__, who publish salary data from the entire state of Maryland. However, we decided to use the data from the Diamondback because we want to focus on UMD and the information in their salary guides are a bit more specific, as it includes both the school and department that a UMD employee belongs to.

<a id='scrape'></a>

## 3. Scraping

<a id='another_cell'></a>

<a id='tidy'></a>

## 4. Tidying

In [721]:
# !pip install plotly
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly 
plotly.tools.set_credentials_file(username='achang96', api_key='MuchOgtjBpKostjwu4RF')

import plotly.plotly as py
import plotly.graph_objs as go
import seaborn as sns

In [722]:
pd.set_option('max_rows', 50)
# Grabbing each year's data and storing it in a pandas frame
df_17 = pd.read_csv("2017_data.csv")
# Creating a Year column to each dataset
df_17["Year"] = "2017"
# Dropping all nans
df_17.dropna(axis=0, how='any', inplace=True)

# Dropped a non academic dept - necessary in order to split by '-'
df_17 = df_17[df_17.Department != "Office of Inst Research, Planning & Assessment"]

#Split Department column into 2 columns - School and Department 
df_17 = df_17.assign(School= df_17.Department.apply(lambda x: x.split("-",1)[0]))
df_17 = df_17.assign(Dept= df_17.Department.apply(lambda x: x.split("-",1)[1]))

#Dropping department column
df_17.drop('Department',axis=1,inplace=True)

#Converting salary to float
df_17 = df_17.assign(Salary = df_17.Salary.apply(lambda x: float(x[1:].replace(',',''))))
df_17.reset_index(drop=True, inplace=True)

# View all unique departments
all_departments = df_17.School.unique()
all_depts = np.sort(all_departments)
all_depts


array(['AGNR', 'ARCH', 'ARHU', 'BMGT', 'BSOS', 'CMNS', 'DIT', 'EDUC',
       'ENGR', 'EXST', 'GRAD', 'INFO', 'JOUR', 'LIBR', 'PLCY', 'PRES',
       'SPHL', 'SVPAAP', 'UGST', 'USG', 'VPAF', 'VPR', 'VPSA', 'VPUR'], dtype=object)

In [723]:
#Keep academic departments only 
#Drop DIT, EXST (extended studies), GRAD, LIBR, PRES, SVPAAP, UGST, USG, VPAF, VPR, VPSA, VPUR 
to_drop = ["DIT", "EXST" ,"GRAD" ,"LIBR" ,"PRES","SVPAAP" ,"UGST" ,"USG" ,"VPAF" , "VPR" ,"VPSA", "VPUR"]
df_17 = df_17[~df_17.School.isin(to_drop)]
depts = np.sort((df_17.School).unique())
depts

array(['AGNR', 'ARCH', 'ARHU', 'BMGT', 'BSOS', 'CMNS', 'EDUC', 'ENGR',
       'INFO', 'JOUR', 'PLCY', 'SPHL'], dtype=object)

In [724]:
#Get rid of duplicates
df_17 = df_17.groupby(['Name']).agg({'Title':'first',"Salary":np.sum,\
                                     "Year":'first',"School":'first',"Dept":'first'}).reset_index()

In [725]:
#sort dataframe by school
df_17.sort_values('School', inplace=True)
df_17.reset_index(drop=True, inplace=True)
df_17

Unnamed: 0,Name,Title,Salary,Year,School,Dept
0,"Perise, Michael Frederick",Agric Tech Lead,36178.00,2017,AGNR,AES-Agriculture Experiment Station
1,"Calder, Vivian A.",Agric Tech,45151.76,2017,AGNR,AES-Agriculture Experiment Station
2,"Caldwell, Crystal",Coordinator,50307.57,2017,AGNR,Animal & Avian Sciences
3,"Justice, David M.",Manager,60358.49,2017,AGNR,AES-Agriculture Experiment Station
4,"Callahan, Mary Theresa Louise",Fac Asst,43865.00,2017,AGNR,Plant Science & Landscape Architecture
5,"Sherrard, Ann Carroll",Principal Agent,93827.28,2017,AGNR,UME-West Region
6,"Shi, Meiqing",Asst Prof,85497.54,2017,AGNR,Veterinary Medicine Program
7,"Funk, David H.",Manager,62659.81,2017,AGNR,AES-Agriculture Experiment Station
8,"Korir, Robert C.",Lab Res Tech,41701.60,2017,AGNR,Plant Science & Landscape Architecture
9,"Ma, Li",Asst Prof,132027.42,2017,AGNR,Animal & Avian Sciences


In [726]:
#Viewing all titles
depts = np.sort((df_17.Title).unique())
depts

array(['Acad Adv', 'Acad Prog Spec', 'Accompanist', 'Account Clerk I',
       'Account Clerk II', 'Account Clerk III', 'Accountant',
       'Accountant I', 'Accounting Assoc', 'Adjunct Assoc Prof',
       'Adjunct Asst Prof', 'Adjunct Prof', 'Admin Asst I',
       'Admin Asst II', 'Administrator', 'Adv Consul', 'Advisor', 'Agent',
       'Agent Assoc', 'Agric Tech', 'Agric Tech Lead', 'Agric Tech Supv',
       'Agric Worker I', 'Agric Worker II', 'Analyst', 'Arch Tech I',
       'Asoc Prof &Assoc Dir', 'Assoc Clin Prof', 'Assoc Dean',
       'Assoc Dir', 'Assoc Prof', 'Assoc Prof & Assoc Dean',
       'Assoc Prof & Dir', 'Assoc Prof &Chair', 'Assoc Prof Assoc Chair',
       'Assoc Prof Emeritus', 'Assoc Res Eng', 'Assoc Res Prof',
       'Assoc Res Scholar', 'Assoc Res Sci', 'Asst', 'Asst Art-In-Res',
       'Asst Clin Prof', 'Asst Dean', 'Asst Dean & Dir', 'Asst Dir',
       'Asst Editor', 'Asst Inst', 'Asst Mgr', 'Asst Prof',
       'Asst Prog Dir', 'Asst Res Eng', 'Asst Res Schl', '

In [727]:
# Keep only teaching faculty 
to_keep = ['Adjunct Assoc Prof','Adjunct Asst Prof','Adjunct Prof',"Asoc Prof &Assoc Dir",
"Assoc Clin Prof","Assoc Dean","Assoc Dean & Assoc Director""Assoc Dir","Assoc Prof","Assoc Prof & Assoc Dean","Assoc Prof & Dir","Assoc Prof &Chair",
"Assoc Prof Assoc Chair","Assoc Prof Emeritus","Assoc Res Eng","Assoc Res Prof","Assoc Res Scholar","Assoc Res Sci","Asst Rsch Prof",
"Clin Prof","College Park Professor","Dist Univ Prof","Dist Univ Prof & Dir","Dist Univ Prof Chair","Dist Univ Prof Emerita","Dist Unv Prof, Rgnts Prof, Dir","Jr Lecturer",
"Lect & Dir","Lecturer","Post-Doc Assoc","Prin Lecturer","Prof","Prof & Act Assoc Dean","Prof & Act Chair","Prof & Act Dir","Prof & Area Chair","Prof & Assoc Chair",
"Prof & Assoc Dean","Prof & Assoc Dir","Prof & Chair","Prof & Dir","Prof And Dean","Prof Emerita","Prof Emeritus","Prof Of Practice",
"Res Prof","Res Prof & Dir","Res Prof Emeritus","Senior Lecturer","Visit Assoc Prof","Visit Asst Prof","Visit Lecturer","Visit Prof",
"Visit Res Prof","Visiting Assoc Res Prof","Visiting Asst Rsch Prof", "Asst Prof"]
# Drop row if the title is not in to keep
df_17 = df_17[df_17.Title.isin(to_keep)]
df_17 = df_17.reset_index(drop=True)
# Show depts kept
depts = np.sort((df_17.Title).unique())
depts

array(['Adjunct Assoc Prof', 'Adjunct Asst Prof', 'Adjunct Prof',
       'Asoc Prof &Assoc Dir', 'Assoc Clin Prof', 'Assoc Dean',
       'Assoc Prof', 'Assoc Prof & Assoc Dean', 'Assoc Prof & Dir',
       'Assoc Prof &Chair', 'Assoc Prof Assoc Chair',
       'Assoc Prof Emeritus', 'Assoc Res Eng', 'Assoc Res Prof',
       'Assoc Res Scholar', 'Assoc Res Sci', 'Asst Prof', 'Asst Rsch Prof',
       'Clin Prof', 'College Park Professor', 'Dist Univ Prof',
       'Dist Univ Prof & Dir', 'Dist Univ Prof Chair',
       'Dist Univ Prof Emerita', 'Dist Unv Prof, Rgnts Prof, Dir',
       'Jr Lecturer', 'Lect & Dir', 'Lecturer', 'Post-Doc Assoc',
       'Prin Lecturer', 'Prof', 'Prof & Act Assoc Dean',
       'Prof & Act Chair', 'Prof & Act Dir', 'Prof & Area Chair',
       'Prof & Assoc Chair', 'Prof & Assoc Dean', 'Prof & Assoc Dir',
       'Prof & Chair', 'Prof & Dir', 'Prof And Dean', 'Prof Emerita',
       'Prof Emeritus', 'Prof Of Practice', 'Res Prof', 'Res Prof & Dir',
       'Res Prof E

In [728]:
# Sort by Schol, then Salary
df_17.sort_values(['School','Salary'], ascending=[True,False],inplace=True)
df_17.reset_index(drop=True, inplace=True)
df_17

Unnamed: 0,Name,Title,Salary,Year,School,Dept
0,"Beyrouty, Craig",Prof And Dean,343374.97,2017,AGNR,College of Agriculture & Natural Resources
1,"Wei, Cheng-I",Prof & Dir,282197.66,2017,AGNR,College of Agriculture & Natural Resources
2,"Chambers, Robert G.",Prof,245190.00,2017,AGNR,Agricultural & Resource Economics
3,"Samal, Siba K.",Prof & Chair,230668.13,2017,AGNR,Veterinary Medicine Program
4,"Hanson, James C.",Prof & Chair,218461.98,2017,AGNR,Agricultural & Resource Economics
5,"Meng, Jianghong",Prof & Dir,212255.84,2017,AGNR,Nutrition and Food Science
6,"Bowerman, William W",Prof & Chair,210197.57,2017,AGNR,Environmental Science & Technology
7,"Williams, Roberton C III",Prof,197733.22,2017,AGNR,Agricultural & Resource Economics
8,"Murphy, Angus",Prof & Chair,197681.46,2017,AGNR,Plant Science & Landscape Architecture
9,"Shirmohammadi, Adel",Prof & Assoc Dean,196289.96,2017,AGNR,AES-Agriculture Experiment Station


In [729]:
#Highest paid people overall
df_17.sort_values(['Salary'], ascending=[False ],inplace=True)
df_17.reset_index(drop=True, inplace=True)
df_17.head(20)

Unnamed: 0,Name,Title,Salary,Year,School,Dept
0,"Das Sarma, Sankar",Dist Univ Prof & Dir,411978.92,2017,CMNS,Physics
1,"Maksimovic, Vojislav",Prof & Area Chair,408038.69,2017,BMGT,Finance
2,"Wedel, Michel",Dist Univ Prof,398688.68,2017,BMGT,Marketing
3,"Triantis, Alexander J.",Prof And Dean,394017.29,2017,BMGT,Robert H. Smith School of Business
4,"Fox, Nathan A.",Dist Univ Prof,393091.0,2017,EDUC,Human Development and Quantitative Methodology
5,"Rust, Roland T.",Dist Univ Prof,391907.75,2017,BMGT,Marketing
6,"Tronetti, Rajshree Agarwal",Prof,387850.0,2017,BMGT,Management & Organization
7,"Monroe, Christopher",Dist Univ Prof,387730.33,2017,CMNS,Physics
8,"Banavar, Jayanth R.",Prof And Dean,378750.0,2017,CMNS,"College of Computer, Math & Natural Sciences"
9,"Tadmor, Eitan",Dist Univ Prof,362900.95,2017,CMNS,Ctr for Scientific Computation and Math Modeling


In [755]:
# Function to similarly wrangle the data for years 2013-2016. 
# Could have just used this function for 2017 as well, but wanted to walk you through the steps above :)
def wrangle(year): 
    csv = str(year) + "_data.csv"
    df = pd.read_csv(csv)
    

    df["Year"] = str(year)
    df.dropna(axis=0, how='any', inplace=True)
    df = df[df.Department != "Office of Inst Research, Planning & Assessment"]
    
    df = df.assign(School= df.Department.apply(lambda x: x.split("-",1)[0]))
    df = df.assign(Dept= df.Department.apply(lambda x: x.split("-",1)[1]))

    df.drop('Department',axis=1,inplace=True)
    df = df.assign(Salary = df.Salary.apply(lambda x: float(x[1:].replace(',',''))))
    df.reset_index(drop=True, inplace=True)

    all_departments = df.School.unique()
    all_depts = np.sort(all_departments)
    
    to_drop = ["DIT", "EXST" ,"GRAD" ,"LIBR" ,"PRES","SVPAAP" ,"UGST" ,"USG" ,"VPAF" , "VPR" ,"VPSA", "VPUR"]
    df = df[~df.School.isin(to_drop)]
    
    df = df.groupby(['Name']).agg({'Title':'first',"Salary":np.sum,\
                                     "Year":'first',"School":'first',"Dept":'first'}).reset_index()
    
    df.sort_values('School', inplace=True)
    df.reset_index(drop=True, inplace=True)
    
    to_keep = ['Adjunct Assoc Prof','Adjunct Asst Prof','Adjunct Prof',"Asoc Prof &Assoc Dir",
        "Assoc Clin Prof","Assoc Dean","Assoc Dean & Assoc Director""Assoc Dir","Assoc Prof","Assoc Prof & Assoc Dean","Assoc Prof & Dir","Assoc Prof &Chair",
        "Assoc Prof Assoc Chair","Assoc Prof Emeritus","Assoc Res Eng","Assoc Res Prof","Assoc Res Scholar","Assoc Res Sci","Asst Rsch Prof",
        "Clin Prof","College Park Professor","Dist Univ Prof","Dist Univ Prof & Dir","Dist Univ Prof Chair","Dist Univ Prof Emerita","Dist Unv Prof, Rgnts Prof, Dir","Jr Lecturer",
        "Lect & Dir","Lecturer","Post-Doc Assoc","Prin Lecturer","Prof","Prof & Act Assoc Dean","Prof & Act Chair","Prof & Act Dir","Prof & Area Chair","Prof & Assoc Chair",
        "Prof & Assoc Dean","Prof & Assoc Dir","Prof & Chair","Prof & Dir","Prof And Dean","Prof Emerita","Prof Emeritus","Prof Of Practice",
        "Res Prof","Res Prof & Dir","Res Prof Emeritus","Senior Lecturer","Visit Assoc Prof","Visit Asst Prof","Visit Lecturer","Visit Prof",
        "Visit Res Prof","Visiting Assoc Res Prof","Visiting Asst Rsch Prof", "Asst Prof"]
    df = df[df.Title.isin(to_keep)]
    df = df.reset_index(drop=True)
    
    df.sort_values(['School','Salary'], ascending=[True,False ],inplace=True)
    df.reset_index(drop=True, inplace=True)
    
    return df

In [731]:
df_16 = wrangle(2016)
df_15 = wrangle(2015)

<a id='eda'></a>

## 5. Exploratory Data Analysis and Data Visualization

### 5.1 Grouping by School

In [732]:
means = df_17[['Salary', 'School']].groupby('School').mean()
means = means.rename(columns = {'Salary' : 'mean_salary'}).reset_index()
means['faculty_count'] = 0

for idx,row in means.iterrows():
    means.set_value(idx, 'faculty_count', df_17['School'].value_counts()[row['School']])

means 

Unnamed: 0,School,mean_salary,faculty_count
0,AGNR,91099.678649,185
1,ARCH,67424.647414,58
2,ARHU,67736.839321,604
3,BMGT,145143.28843,223
4,BSOS,101356.38274,365
5,CMNS,102176.440052,774
6,EDUC,82323.662422,161
7,ENGR,104636.849607,433
8,INFO,82015.205682,44
9,JOUR,55382.590429,70


In [733]:
# Sorting by highest mean salary

means1 = means.copy()
means1.sort_values(['mean_salary'], ascending=[False],inplace=True)
means1 = means1.reset_index(drop=True)
means1

Unnamed: 0,School,mean_salary,faculty_count
0,BMGT,145143.28843,223
1,ENGR,104636.849607,433
2,CMNS,102176.440052,774
3,PLCY,101421.090625,48
4,BSOS,101356.38274,365
5,AGNR,91099.678649,185
6,EDUC,82323.662422,161
7,INFO,82015.205682,44
8,SPHL,77301.812761,163
9,ARHU,67736.839321,604


In [734]:
# Sorting by highest faculty count

means2 = means.copy()
means2.sort_values(['faculty_count'], ascending=[False],inplace=True)
means2 = means2.reset_index(drop=True)
means2

Unnamed: 0,School,mean_salary,faculty_count
0,CMNS,102176.440052,774
1,ARHU,67736.839321,604
2,ENGR,104636.849607,433
3,BSOS,101356.38274,365
4,BMGT,145143.28843,223
5,AGNR,91099.678649,185
6,SPHL,77301.812761,163
7,EDUC,82323.662422,161
8,JOUR,55382.590429,70
9,ARCH,67424.647414,58


In [735]:
# function below courtesy of stackoverflow user @ntg
# from IPython.display import display_html
# def display_side_by_side(*args):
#     html_str=''
#     for df in args:
#         html_str+=df.to_html()
#     display_html(html_str.replace('table','table style="display:inline"'),raw=True)
    
# display_side_by_side(means,means1,means2)

In [736]:
# print(means['faculty_count'].sum())

In [737]:
#explain departments by getting how big they are lawl

In [738]:
# total = len(df_17.index)
# schools = means2['School'].values
# counts = means2['faculty_count'].values
# counts = counts/total
# print(schools)
# print(counts)

In [739]:
# schools

In [740]:
labels = means2['School'].values
values = means2['faculty_count'].values

trace = go.Pie(labels=labels, values=values,
               hoverinfo='label+percent', textinfo='value', 
               textfont=dict(size=20),
               marker=dict(line=dict(color='#000000', width=2)))
fig = {'data' : [trace],
       'layout' : {'title': 'Number of Faculty by School'}
      }
py.iplot(fig, filename='styled_pie_chart')

In [741]:
len(df_17.School.unique())

12

In [742]:
df_17.sort_values(['Salary'], ascending=[True],inplace=True)
df_17[df_17.School == 'INFO']

Unnamed: 0,Name,Title,Salary,Year,School,Dept
2987,"Rhoads, Vera T",Lecturer,8701.48,2017,INFO,College of Information Studies
2989,"Kolowitz, Brian J",Lecturer,8701.48,2017,INFO,College of Information Studies
2988,"Seyed, A Patrice",Lecturer,8701.48,2017,INFO,College of Information Studies
2986,"Sahasrabudhe, Vikas M",Lecturer,8701.5,2017,INFO,College of Information Studies
2985,"Killam, Howard William JR",Lecturer,8701.5,2017,INFO,College of Information Studies
2984,"Butler, Michelle Markey",Lecturer,8701.5,2017,INFO,College of Information Studies
2983,"Brewer, Laurence Neil",Lecturer,8701.5,2017,INFO,College of Information Studies
2982,"Dearstyne, Bruce W.",Lecturer,8701.5,2017,INFO,College of Information Studies
2981,"Taylor, Deborah D.",Lecturer,8701.5,2017,INFO,College of Information Studies
2789,"Haspo, Beatriz",Lecturer,17403.0,2017,INFO,College of Information Studies


In [743]:
for school in sorted(df_17.School.unique()):
    globals()["trace" + school] = go.Box(
        x=(df_17[df_17.School == school])['Salary'].tolist(),
        name = str(school))
    
data=[traceSPHL, tracePLCY, traceJOUR, traceINFO, traceENGR, traceEDUC,
      traceCMNS, traceBSOS, traceBMGT, traceARHU, traceARCH, traceAGNR]

layout=go.Layout(
    title="Salaries by School",
    xaxis=dict(title='Salary'),
    yaxis=dict(title='School')
)

fig=go.Figure(data=data, layout=layout)

py.iplot(fig)

In [744]:
# len(df_17.Dept.unique())

In [745]:
# len(df_17.loc[df_17['School'] == 'ENGR'].Dept.unique())
# len(df_17.loc[df_17['School'] == 'BMGT'].Dept.unique())
# len(df_17.loc[df_17['School'] == 'CMNS'].Dept.unique())
# df_17.loc[df_17['School'] == 'BMGT']
# df_17.loc[df_17['School'] == 'CMNS']

# df_17.groupby('Dept').Dept.count()
# df_17.groupby('Dept').Dept.count().get('Art')

In [746]:
# counts = df_17['School'].value_counts()
# c2017 = counts.to_frame('count').reset_index()
# c2017.sort_values(['index'], ascending=[True],inplace=True)
# c2017 = c2017.reset_index(drop=True)
# c2017['year'] = 2017
# c2017


In [747]:
c2017 = df_17[['School', 'Year']]
c2017 = pd.DataFrame(data={'Count': c2017.groupby(['School', 'Year']).size()}).reset_index()
c2016 = df_16[['School', 'Year']]
c2016 = pd.DataFrame(data={'Count': c2016.groupby(['School', 'Year']).size()}).reset_index()
c2015 = df_15[['School', 'Year']]
c2015 = pd.DataFrame(data={'Count': c2015.groupby(['School', 'Year']).size()}).reset_index()
result = pd.concat([c2017,c2016,c2015])
result.reset_index(drop=True)

Unnamed: 0,School,Year,Count
0,AGNR,2017,185
1,ARCH,2017,58
2,ARHU,2017,604
3,BMGT,2017,223
4,BSOS,2017,365
5,CMNS,2017,774
6,EDUC,2017,161
7,ENGR,2017,433
8,INFO,2017,44
9,JOUR,2017,70


In [748]:
#correct?
to_drop = ['Astronomy', 'Atmospheric & Oceanic Science', \
           'Biology','Cell Biology & Molecular Genetics', 'Chemistry & Biochemistry', \
           'Earth System Science Interdisciplinary Center', 'Entomology','Geology','Mathematics', 'Physics']

cs_df = cs_df[~cs_df.Dept.isin(to_drop)]
depts = np.sort((cs_df.Dept).unique())
depts


array(['College of Computer, Math & Natural Sciences', 'Computer Science',
       'Ctr for Scientific Computation and Math Modeling',
       'Inst for Physical Science & Technology',
       'Inst for Research in Electronics & Applied Physics',
       'Institute for Advanced Computer Studies'], dtype=object)

In [749]:
cs_df = df_17[df_17.School == "CMNS"]
depts = np.sort((cs_df.Dept).unique())
depts

array(['Astronomy', 'Atmospheric & Oceanic Science', 'Biology',
       'Cell Biology & Molecular Genetics', 'Chemistry & Biochemistry',
       'College of Computer, Math & Natural Sciences', 'Computer Science',
       'Ctr for Scientific Computation and Math Modeling',
       'Earth System Science Interdisciplinary Center', 'Entomology',
       'Geology', 'Inst for Physical Science & Technology',
       'Inst for Research in Electronics & Applied Physics',
       'Institute for Advanced Computer Studies', 'Mathematics', 'Physics'], dtype=object)

In [750]:
cs_df.head(20)

Unnamed: 0,Name,Title,Salary,Year,School,Dept
3127,"Lipsman, Ronald L.",Prof Emeritus,2000.0,2017,CMNS,Mathematics
3012,"Walters, Barbara S.",Lecturer,7868.8,2017,CMNS,Chemistry & Biochemistry
2995,"Fuhrer, Michael Sears",Res Prof,8511.64,2017,CMNS,Inst for Research in Electronics & Applied Phy...
2895,"Collier, Brian P",Post-Doc Assoc,12535.0,2017,CMNS,Mathematics
2892,"Miller, Benjamin Hatch",Lecturer,12726.0,2017,CMNS,Astronomy
2887,"Gebremariam, Hailu Bantu",Lecturer,13395.15,2017,CMNS,Physics
2884,"Macasieb, Melissa",Visit Asst Prof,13592.0,2017,CMNS,Mathematics
2885,"Weber, Franziska",Visit Lecturer,13592.0,2017,CMNS,Mathematics
2861,"Coplan, Michael A.",Res Prof,14664.55,2017,CMNS,Physics
2855,"Xu, Jian Lun",Adjunct Prof,15000.0,2017,CMNS,Mathematics


In [751]:
df_16 = pd.read_csv("2016_data.csv")
df_16["Year"] = "2016"
df_16.dropna(axis=0, how='any',inplace=True)

df_15 = pd.read_csv("2015_data.csv")
df_15["Year"] = "2015"
df_15.dropna(axis=0, how='any',inplace=True)

df_14 = pd.read_csv("2014_data.csv")
df_14["Year"] = "2014"
df_14.dropna(axis=0, how='any',inplace=True)

df_13 = pd.read_csv("2013_data.csv")
df_13["Year"] = "2013"
df_13.dropna(axis=0, how='any',inplace=True)

<a id='hypothesis'></a>

## 6. Hypothesis Testing and Analysis

<a id='conclusion'></a>

## 7. Conclusion

<a id='future'></a>

## 8. Future Work

<a id='references'></a>

## 9. References
#### Setup
-  Github Cheat Sheet (versioning): https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf
-  Anaconda (package management): https://docs.anaconda.com/
-  Docker (containerization): https://docs.docker.com/

#### Scraping
-  Selenium: http://www.seleniumhq.org/docs/
-  Beautiful Soup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
-  PhantomJS: http://phantomjs.org/documentation/

#### Plotting
-  Plotly: https://plot.ly/python/
-  Matplotlib: http://matplotlib.org/contents.html
-  Seaborn: https://seaborn.pydata.org/