## Data Science Degrees

This workbook investigates the pay data for degrees typically specified in Data Science and related positions.

I pulled this blurb from a job posting. 

"Degree must be in Mathematics, Applied Mathematics, Statistics, Applied Statistics, Machine Learning, Data Science, Operations Research, or Computer Science".

This job posting is more specific than most in degree requirements - for example, it doesn't include Engineering, and it specifically mentions Operations Research. I'll stick with this for the analysis here, but there's no clear rule for what to include or exclude when identifying "data science" degrees. Many engineeirng graduates (not included) work as data scientists, and many computer scientists (included) don't - in fact, most graduates of the degree programs listed above don't work as data scientists. 

I am also assuming that the pay distribution for degree recipients in these fields who enter data science is similar enough to other fields (software engineering, statistician, etc) that the median numbers for each overall cohort is a useful metric for graduates who specificaally enter data science as a job category. 

Another factor is that for privacy concerns, data is not provided when the size of the graduating cohort falls below a certain threshold. This has a particularly notable influence on doctoral level programs, which tend to be much smaller than masters of bachelors level programs. It also means that programs with lower enrollment and smaller departments such as operations research may be less likely to report data than larger departments with higher enrollmnt, like computers science. 

With those disclaimers, let's take a look the numbers at the bachelors, masters, and doctoral level. 


In [11]:
import pandas as pd
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
import qgrid

A quick lookup table for credlev to degree level...

In [12]:
df = pd.read_csv('data/Most-Recent-Cohorts-Field-of-Study.csv')

In [6]:
pysqldf("""
SELECT 
    DISTINCT CREDLEV, CREDDESC
FROM
    df
""")

Unnamed: 0,CREDLEV,CREDDESC
0,3,Bachelors Degree
1,5,Master's Degree
2,6,Doctoral Degree
3,8,Graduate/Professional Certificate
4,7,First Professional Degree
5,1,Undergraduate Certificate or Diploma
6,2,Associate's Degree
7,4,Post-baccalaureate Certificate


Bachelors level

In [8]:
qgrid.show_grid(pysqldf("""
SELECT 
    INSTNM, CIPDESC, CREDDESC, EARN_MDN_HI_1YR, EARN_MDN_HI_2YR
FROM 
    df 
WHERE
    (CIPDESC LIKE '%Mathematics%' 
    OR 
    CIPDESC LIKE '%Computer Science%'
    OR
    CIPDESC LIKE '%Statistics%'
    OR
    CIPDESC LIKE '%Operations Research%'
    )
AND 
    -- bachelors degree level
    CREDLEV = 3 
AND
    EARN_MDN_HI_1YR <> 'PrivacySuppressed'
ORDER BY 
    (EARN_MDN_HI_1YR * 1) DESC
"""))

QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…

Masters Level

In [10]:
qgrid.show_grid(pysqldf("""
SELECT 
    INSTNM, CIPDESC, CREDDESC, EARN_MDN_HI_1YR, EARN_MDN_HI_2YR
FROM 
    df 
WHERE
    (CIPDESC LIKE '%Mathematics%' 
    OR 
    CIPDESC LIKE '%Computer Science%'
    OR
    CIPDESC LIKE '%Statistics%'
    OR
    CIPDESC LIKE '%Operations Research%'
    )
AND 
    -- masters degree level
    CREDLEV = 5
AND
    EARN_MDN_HI_1YR <> 'PrivacySuppressed'
ORDER BY 
    (EARN_MDN_HI_1YR * 1) DESC
"""))

QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…

Doctoral Level

In [9]:
qgrid.show_grid(pysqldf("""
SELECT 
    INSTNM, CIPDESC, CREDDESC, EARN_MDN_HI_1YR, EARN_MDN_HI_2YR
FROM 
    df 
WHERE
    (CIPDESC LIKE '%Mathematics%' 
    OR 
    CIPDESC LIKE '%Computer Science%'
    OR
    CIPDESC LIKE '%Statistics%'
    OR
    CIPDESC LIKE '%Operations Research%'
    )
AND 
    -- doctoral degree level
    CREDLEV = 6
AND
    EARN_MDN_HI_1YR <> 'PrivacySuppressed'
ORDER BY 
    (EARN_MDN_HI_1YR * 1) DESC
"""))

QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…