# California's Most Selective Schools #

I want to know some things about the top 10 hardest schools to get accepted into in the state of California. I will use the [college scorecard data from data.gov](https://catalog.data.gov/dataset/college-scorecard).

I am using the 2017-2018 dataset from the CollegeScorecard_Raw_Data.zip.

In [33]:
import pandas as pd

df = pd.read_csv("Data/CollegeScorecard_Raw_Data/MERGED2017_18_PP.csv", low_memory=False)

In [None]:
df_old = pd.read_csv("Data/CollegeScorecard_Raw_Data/MERGED1996_97_PP.csv", low_memory=False)

In [38]:
import plotly.plotly as py
import plotly.figure_factory as ff

ca = df[df['STABBR'] == 'CA']
selective = ca.dropna(subset=['ADM_RATE']).sort_values('ADM_RATE').head(10)

table = ff.create_table(selective[['INSTNM', 'ADM_RATE']])
py.iplot(table, filename='Selective CA Schools')


Consider using IPython.display.IFrame instead



Those are the schools, but I want to see some other details.

In [27]:
import plotly.graph_objs as go

data = [go.Bar(
    x=selective['INSTNM'],
    y=selective['TUITIONFEE_IN']
        )
       ]

py.iplot(data, filename='In-State Tuition of Selective CA Schools')

In [48]:
selective_old = df_old[df_old.INSTNM.isin(selective.INSTNM.tolist())]
print(selective_old[['INSTNM', 'MD_EARN_WNE_P10']])

                                    INSTNM  MD_EARN_WNE_P10
295     California Institute of Technology              NaN
315      University of California-Berkeley              NaN
318   University of California-Los Angeles              NaN
361              Claremont McKenna College              NaN
447                    Harvey Mudd College              NaN
611                         Pitzer College              NaN
614                         Pomona College              NaN
693      University of Southern California              NaN
5381                   Stanford University              NaN


Ran into the issue that the dataset did not include earnings data for these schools (maybe not any school?), so I downloaded another dataset that just has earnings data from [here](https://collegescorecard.ed.gov/data/).

In [49]:
df_earnings = pd.read_csv("Data/earnings.csv", low_memory=False)

In [51]:
selective_earnings = df_earnings[df_earnings.INSTNM.isin(selective.INSTNM.tolist())]
print(selective_earnings[['INSTNM', 'MD_EARN_WNE_P10']])

                                    INSTNM MD_EARN_WNE_P10
211     California Institute of Technology           85900
228      University of California-Berkeley           64700
231   University of California-Los Angeles           60700
267              Claremont McKenna College           72900
324                    Harvey Mudd College           88800
422                         Pitzer College           48700
425                         Pomona College           58100
492      University of Southern California           74000
3953                   Stanford University           94000
6076              Grace Mission University             NaN


Success! I'll plot that with the tuition data.

In [70]:
selective_earnings.sort_values('MD_EARN_WNE_P10', ascending=False, inplace=True)

earn = go.Bar(
    x=selective_earnings['INSTNM'],
    y=selective_earnings['MD_EARN_WNE_P10'],
    name = 'Median Earnings 10 years after entry')

tuit = go.Bar(
    x=selective['INSTNM'],
    y=selective['TUITIONFEE_IN'],
    name = 'In-state Tuition')

data = [earn, tuit]
layout = go.Layout(barmode='group')

fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='SelectiveCATuitionEarnings')


Consider using IPython.display.IFrame instead



None of these factors are great predictors for the quality of the school, the quality of the education you will recieve, the income you will earn, and certainly not the happiness you would experience during or after attending any of these schools. 

In fact, I suspect that the earnings after 10 years has more to do with the ratio of graduates with STEM degrees vs liberal arts degrees than the "quality" of the school (whatever that might mean). That is not to suggest that liberal arts majors live any less fulfilling lives. Of course... liberal arts majors are a lot less likely to analyze datasets using python, jupyter notebooks, pandas, and plotly...