## Bridging the Gap Between Tier-1 and Tier-2 Universities

Variables we have taken:
<br>
1: UNITID: UnitId for an Institution<br>
2: INSTNM: Institution Name<br>
3: MAIN: Flag for the Main Campus ( 0: Branch Campus, 1: Main Campus)
<br>
4: OPEFLAG: Title IV Eligibility Type {
   1: Participates in Title IV federal financial aid programs,
   <br>
      2: Branch campus of a main campus that participates in Title IV,
   <br>
      3: Deferment only - limited participation,
   <br>
      5: Not currently participating in Title IV, has an OPE ID number
   }
<br><br>
5: ICLEVEL: Level of An Insitution(1: 4-yr, 2: 2-yr, 3: Less than 2-yr)
<br>
6: UGDS: Enrollment of all undergraduate students<br>
7: UGDS_WHITE: Fraction of students who are white<br>
8: UGDS_ASIAN: Fraction of students who are asian<br>
9: MD_EARN_WNE_P6: Median earnings of students working and not enrolled 6 years after entry<br>
10: GRAD_DEBT_MDN: The median debt for students who have completed<br>
11: ADM_RATE_ALL: Admission rate for all campuses rolled up to the 6-digit OPEID<br>
12: C100_4: Completion rate for first-time, full-time students at four-year institutions (100% of expected time to completion)<br>
13: SAT_AVG_ALL: Average SAT equivalent score of students admitted for all campuses rolled up to the 6-digit OPEID<br>
14: NPT4_PRIV: Average net price(Cost-Aid) for Title IV institutions (private for-profit and nonprofit institutions)<br>
15: NPT4_PUB: Average net price(Cost-Aid) for Title IV institutions (public institutions)<br>
16: NPT41_PRIV: Average net price for $0-$30,000 family income (private for-profit and nonprofit institutions)<br>
17: NPT41_PUB: Average net price for $0-$30,000 family income (public institutions)<br>
18: NPT43_PRIV: Average net price for $48,001-$75,000 family income (public institutions)<br>
19: NPT43_PUB: Average net price for $48,001-$75,000 family income (private for-profit and nonprofit institutions)<br>
20: PCIP11: Percentage of degrees awarded in Computer And Information Sciences And Support Services.<br>
21: PCIP26: Percentage of degrees awarded in Biological And Biomedical Sciences.<br>
22: PCIP27: Percentage of degrees awarded in Mathematics And Statistics.<br>
23: PCIP52: Percentage of degrees awarded in Business, Management, Marketing, And Related Support Services.<br>


In [1]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 20)

## Loading the Data ##

In [2]:
str1 = 'MERGED20' #fileName: MERGED2004_05_PP
str2 = '_PP.csv'
reqCols = ['UNITID', 'OPEID', 'OPEID6', 'INSTNM', 'STABBR', 'CITY', 'ZIP', 'ACCREDAGENCY', 'CURROPER', 'MAIN', 'NUMBRANCH',
           'CONTROL', 'CIPCODE1', 'CIPCODE2', 'CIPCODE3', 'OPEFLAG', 'ICLEVEL', 'HIGHDEG', 'PREDDEG', 'DISTANCEONLY', 'ADM_RATE',
           'ADM_RATE_ALL', 'C100_4_POOLED_SUPP', 'UGDS', 'UGDS_WHITE', 'UGDS_BLACK', 'UGDS_ASIAN', 'UGDS_HISP']
availCourseCode = [1,3,5,9,10,11,12,13,14,15,16,19,22,23,24,25,26,27,29,30,31,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,54]

for num in availCourseCode:
    reqCols.append('PCIP' + str(f"{num:02d}"))

arr1 = ['C100_4','NPT4_PUB', 'NPT4_PRIV', 'NPT41_PUB', 'NPT42_PUB', 'NPT43_PUB',
        'NPT41_PRIV', 'NPT42_PRIV', 'NPT43_PRIV', 'FTFTPCTFLOAN', 'FTFTPCTPELL', 'DEBT_MDN', 'GRAD_DEBT_MDN', 'PELL_DEBT_MDN', 'NOPELL_DEBT_MDN',
        'TUITFTE', 'HCM2', 'MD_FAMINC', 'MD_EARN_WNE_P8', 'MD_EARN_WNE_P6', 'MD_EARN_WNE_INC1_P6', 'MD_EARN_WNE_INC1_P6', 'MD_EARN_WNE_INC2_P6','MD_EARN_WNE_INC3_P6',
        'MD_EARN_WNE_1YR', 'MD_EARN_WNE_4YR', 'BBRR1_FED_UG_MAKEPROG', 'BBRR1_FED_UG_PAIDINFULL', 'CDR2', 'CDR3', 'PCTFLOAN', 'SAT_AVG_ALL',
        'COSTT4_A', 'STUFACR', 'PLUS_DEBT_INST_MD', 'NUM41_PUB', 'NUM42_PUB', 'NUM43_PUB', 'NUM44_PUB', 'NUM45_PUB',  'NUM41_PRIV', 'NUM42_PRIV',
        'NUM43_PRIV', 'NUM44_PRIV', 'NUM45_PRIV', 'DEATH_YR2_RT', 'LO_INC_DEATH_YR2_RT', 'MD_INC_DEATH_YR2_RT', 'HI_INC_DEATH_YR2_RT']
reqCols.extend(arr1)

df1 = pd.read_csv('../Project1_Data/data/MERGED2000_01_PP.csv', usecols=reqCols)
df2 = pd.read_csv('../Project1_Data/data/MERGED2001_02_PP.csv', usecols=reqCols)
df3 = pd.read_csv('../Project1_Data/data/MERGED2002_03_PP.csv', usecols=reqCols)
df4 = pd.read_csv('../Project1_Data/data/MERGED2003_04_PP.csv', usecols=reqCols)
df5 = pd.read_csv('../Project1_Data/data/MERGED2004_05_PP.csv', usecols=reqCols)
df6 = pd.read_csv('../Project1_Data/data/MERGED2005_06_PP.csv', usecols=reqCols)
df7 = pd.read_csv('../Project1_Data/data/MERGED2006_07_PP.csv', usecols=reqCols)
df8 = pd.read_csv('../Project1_Data/data/MERGED2007_08_PP.csv', usecols=reqCols)
df9 = pd.read_csv('../Project1_Data/data/MERGED2008_09_PP.csv', usecols=reqCols)
df10 = pd.read_csv('../Project1_Data/data/MERGED2009_10_PP.csv',usecols=reqCols)
df11 = pd.read_csv('../Project1_Data/data/MERGED2010_11_PP.csv',usecols=reqCols)
df12 = pd.read_csv('../Project1_Data/data/MERGED2011_12_PP.csv',usecols=reqCols)
df13 = pd.read_csv('../Project1_Data/data/MERGED2012_13_PP.csv',usecols=reqCols)
df14 = pd.read_csv('../Project1_Data/data/MERGED2013_14_PP.csv',usecols=reqCols)
df15 = pd.read_csv('../Project1_Data/data/MERGED2014_15_PP.csv',usecols=reqCols)
df16 = pd.read_csv('../Project1_Data/data/MERGED2015_16_PP.csv',usecols=reqCols)
df17 = pd.read_csv('../Project1_Data/data/MERGED2016_17_PP.csv',usecols=reqCols)
df18 = pd.read_csv('../Project1_Data/data/MERGED2017_18_PP.csv',usecols=reqCols)
df19 = pd.read_csv('../Project1_Data/data/MERGED2018_19_PP.csv',usecols=reqCols)
df20 = pd.read_csv('../Project1_Data/data/MERGED2019_20_PP.csv',usecols=reqCols)
df21 = pd.read_csv('../Project1_Data/data/MERGED2020_21_PP.csv',usecols=reqCols)
df22 = pd.read_csv('../Project1_Data/data/MERGED2021_22_PP.csv',usecols=reqCols)


  df1 = pd.read_csv('../Project1_Data/data/MERGED2000_01_PP.csv', usecols=reqCols)
  df2 = pd.read_csv('../Project1_Data/data/MERGED2001_02_PP.csv', usecols=reqCols)
  df3 = pd.read_csv('../Project1_Data/data/MERGED2002_03_PP.csv', usecols=reqCols)
  df4 = pd.read_csv('../Project1_Data/data/MERGED2003_04_PP.csv', usecols=reqCols)
  df5 = pd.read_csv('../Project1_Data/data/MERGED2004_05_PP.csv', usecols=reqCols)
  df6 = pd.read_csv('../Project1_Data/data/MERGED2005_06_PP.csv', usecols=reqCols)
  df7 = pd.read_csv('../Project1_Data/data/MERGED2006_07_PP.csv', usecols=reqCols)
  df8 = pd.read_csv('../Project1_Data/data/MERGED2007_08_PP.csv', usecols=reqCols)
  df9 = pd.read_csv('../Project1_Data/data/MERGED2008_09_PP.csv', usecols=reqCols)
  df10 = pd.read_csv('../Project1_Data/data/MERGED2009_10_PP.csv',usecols=reqCols)
  df11 = pd.read_csv('../Project1_Data/data/MERGED2010_11_PP.csv',usecols=reqCols)
  df12 = pd.read_csv('../Project1_Data/data/MERGED2011_12_PP.csv',usecols=reqCols)
  df

In [3]:
years = [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]
df = pd.concat([df1, df2, df3, df4, df5, df6, df7, df8, df9, df10, df10, df11, df12, df13, df14, df15, df16, df17, df18, df19, df20, df21, df22], keys=years, axis=0)
shape = df.shape
nullPercent = df['MAIN'].isna().mean() * 100

print("[+] dfShape: ", shape, " mainNull: ", nullPercent)
df.index.names = ['year', 'rowNum']
df.head()

[+] dfShape:  (156672, 113)  mainNull:  0.0


Unnamed: 0_level_0,Unnamed: 1_level_0,UNITID,OPEID,OPEID6,INSTNM,CITY,STABBR,ZIP,ACCREDAGENCY,HCM2,MAIN,NUMBRANCH,PREDDEG,HIGHDEG,CONTROL,ADM_RATE,ADM_RATE_ALL,SAT_AVG_ALL,PCIP01,PCIP03,PCIP05,PCIP09,PCIP10,PCIP11,PCIP12,PCIP13,PCIP14,PCIP15,PCIP16,PCIP19,PCIP22,PCIP23,PCIP24,PCIP25,PCIP26,PCIP27,PCIP29,PCIP30,PCIP31,PCIP38,PCIP39,PCIP40,PCIP41,PCIP42,PCIP43,PCIP44,PCIP45,PCIP46,PCIP47,PCIP48,PCIP49,PCIP50,PCIP51,PCIP52,PCIP54,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,CURROPER,NPT4_PUB,NPT4_PRIV,NPT41_PUB,NPT42_PUB,NPT43_PUB,NPT41_PRIV,NPT42_PRIV,NPT43_PRIV,NUM41_PUB,NUM42_PUB,NUM43_PUB,NUM44_PUB,NUM45_PUB,NUM41_PRIV,NUM42_PRIV,NUM43_PRIV,NUM44_PRIV,NUM45_PRIV,COSTT4_A,TUITFTE,PCTFLOAN,CDR2,CDR3,DEATH_YR2_RT,LO_INC_DEATH_YR2_RT,MD_INC_DEATH_YR2_RT,HI_INC_DEATH_YR2_RT,DEBT_MDN,GRAD_DEBT_MDN,PELL_DEBT_MDN,NOPELL_DEBT_MDN,MD_FAMINC,MD_EARN_WNE_P6,MD_EARN_WNE_P8,C100_4,ICLEVEL,C100_4_POOLED_SUPP,OPEFLAG,CIPCODE1,CIPCODE2,CIPCODE3,FTFTPCTPELL,FTFTPCTFLOAN,PLUS_DEBT_INST_MD,BBRR1_FED_UG_MAKEPROG,BBRR1_FED_UG_PAIDINFULL,MD_EARN_WNE_INC1_P6,MD_EARN_WNE_INC2_P6,MD_EARN_WNE_INC3_P6,STUFACR,MD_EARN_WNE_1YR,MD_EARN_WNE_4YR
year,rowNum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1
2000,0,100636,1230800,12308.0,Community College of the Air Force,Montgomery,AL,36114-3011,,,1,1,2,2,1.0,,,,0.0,0.0,0.0,0.0024,0.042,0.0115,0.0,0.0888,0.0,0.137,0.0,0.002,0.0065,0.0,0.0,0.0,0.0,0.0,0.0583,0.0,0.0077,0.0,0.0,0.0084,0.0019,0.0,0.0746,0.0032,0.0,0.0,0.1973,0.0,0.0517,0.0064,0.0777,0.2225,0.0003,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,,,,,,,,,,,2.0,,3,,,,,,,,,,,,,,
2000,1,100654,100200,1002.0,Alabama A & M University,Normal,AL,35762,,,1,1,3,4,1.0,,,,0.0624,0.0183,0.0,0.0,0.0239,0.0349,0.0,0.2569,0.0183,0.1083,0.0,0.0294,0.0,0.0092,0.0,0.0,0.0661,0.0202,0.0,0.0,0.0,0.0,0.0,0.0183,0.0,0.0514,0.0,0.044,0.0422,0.0,0.0,0.0,0.0,0.0037,0.0,0.1798,0.0018,,,,,,,,,,,,,,,,,,,,,,,,,,,3486.0,,0.119,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,4625.0,15374.0,4617.5,4625.5,18979.0,,,,1.0,,1,,,,,,,,,,,,,,
2000,2,100663,105200,1052.0,University of Alabama at Birmingham,Birmingham,AL,35294-0110,,,1,1,3,4,1.0,,,,0.0,0.0,0.0,0.0352,0.0,0.0224,0.0,0.1223,0.0525,0.0,0.0051,0.0,0.0,0.0166,0.0,0.0,0.0743,0.0038,0.0,0.0019,0.0,0.0083,0.0,0.0134,0.0,0.0794,0.0467,0.0173,0.0551,0.0,0.0,0.0,0.0,0.0262,0.1895,0.1914,0.0384,,,,,,,,,,,,,,,,,,,,,,,,,,,6348.0,,0.054,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,5000.0,4125.0,4766.5,5250.0,22336.0,,,,1.0,,1,,,,,,,,,,,,,,
2000,3,100690,2503400,25034.0,Amridge University,Montgomery,AL,36117-3553,,,1,1,3,4,2.0,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,5397.0,,0.016,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,5655.5,9900.0,5500.0,8087.0,24892.0,,,,1.0,,1,,,,,,,,,,,,,,
2000,4,100706,105500,1055.0,University of Alabama in Huntsville,Huntsville,AL,35899,,,1,1,3,4,1.0,,,,0.0,0.0,0.0015,0.0,0.0,0.055,0.0,0.0267,0.2719,0.0,0.0208,0.0,0.0,0.0416,0.0,0.0,0.0475,0.0134,0.0,0.0,0.0,0.0045,0.0,0.0208,0.0,0.0208,0.0,0.0,0.0238,0.0,0.0,0.0,0.0,0.0223,0.1694,0.2392,0.0208,,,,,,,,,,,,,,,,,,,,,,,,,,,4200.0,,0.048,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,5500.0,12001.5,5500.0,5250.0,27289.5,,,,1.0,,1,,,,,,,,,,,,,,


## Filtering the Institutes ##
1: First Only select the Main Campus, as many institutes have branch campuses

2: Next Select only the institutes that are eligible for Title IV Aid  by  Federal Government

3: Now filter those institutes that are  4 year institutions

In [4]:
main = df[df['MAIN'] == 1]
shape = main.shape
nullPercent = main['OPEFLAG'].isna().mean() * 100

print("[+] mainShape: ", shape, " titleIVNull: ", nullPercent)
main.head()

[+] mainShape:  (123172, 113)  titleIVNull:  0.0


Unnamed: 0_level_0,Unnamed: 1_level_0,UNITID,OPEID,OPEID6,INSTNM,CITY,STABBR,ZIP,ACCREDAGENCY,HCM2,MAIN,NUMBRANCH,PREDDEG,HIGHDEG,CONTROL,ADM_RATE,ADM_RATE_ALL,SAT_AVG_ALL,PCIP01,PCIP03,PCIP05,PCIP09,PCIP10,PCIP11,PCIP12,PCIP13,PCIP14,PCIP15,PCIP16,PCIP19,PCIP22,PCIP23,PCIP24,PCIP25,PCIP26,PCIP27,PCIP29,PCIP30,PCIP31,PCIP38,PCIP39,PCIP40,PCIP41,PCIP42,PCIP43,PCIP44,PCIP45,PCIP46,PCIP47,PCIP48,PCIP49,PCIP50,PCIP51,PCIP52,PCIP54,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,CURROPER,NPT4_PUB,NPT4_PRIV,NPT41_PUB,NPT42_PUB,NPT43_PUB,NPT41_PRIV,NPT42_PRIV,NPT43_PRIV,NUM41_PUB,NUM42_PUB,NUM43_PUB,NUM44_PUB,NUM45_PUB,NUM41_PRIV,NUM42_PRIV,NUM43_PRIV,NUM44_PRIV,NUM45_PRIV,COSTT4_A,TUITFTE,PCTFLOAN,CDR2,CDR3,DEATH_YR2_RT,LO_INC_DEATH_YR2_RT,MD_INC_DEATH_YR2_RT,HI_INC_DEATH_YR2_RT,DEBT_MDN,GRAD_DEBT_MDN,PELL_DEBT_MDN,NOPELL_DEBT_MDN,MD_FAMINC,MD_EARN_WNE_P6,MD_EARN_WNE_P8,C100_4,ICLEVEL,C100_4_POOLED_SUPP,OPEFLAG,CIPCODE1,CIPCODE2,CIPCODE3,FTFTPCTPELL,FTFTPCTFLOAN,PLUS_DEBT_INST_MD,BBRR1_FED_UG_MAKEPROG,BBRR1_FED_UG_PAIDINFULL,MD_EARN_WNE_INC1_P6,MD_EARN_WNE_INC2_P6,MD_EARN_WNE_INC3_P6,STUFACR,MD_EARN_WNE_1YR,MD_EARN_WNE_4YR
year,rowNum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1
2000,0,100636,1230800,12308.0,Community College of the Air Force,Montgomery,AL,36114-3011,,,1,1,2,2,1.0,,,,0.0,0.0,0.0,0.0024,0.042,0.0115,0.0,0.0888,0.0,0.137,0.0,0.002,0.0065,0.0,0.0,0.0,0.0,0.0,0.0583,0.0,0.0077,0.0,0.0,0.0084,0.0019,0.0,0.0746,0.0032,0.0,0.0,0.1973,0.0,0.0517,0.0064,0.0777,0.2225,0.0003,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,,,,,,,,,,,2.0,,3,,,,,,,,,,,,,,
2000,1,100654,100200,1002.0,Alabama A & M University,Normal,AL,35762,,,1,1,3,4,1.0,,,,0.0624,0.0183,0.0,0.0,0.0239,0.0349,0.0,0.2569,0.0183,0.1083,0.0,0.0294,0.0,0.0092,0.0,0.0,0.0661,0.0202,0.0,0.0,0.0,0.0,0.0,0.0183,0.0,0.0514,0.0,0.044,0.0422,0.0,0.0,0.0,0.0,0.0037,0.0,0.1798,0.0018,,,,,,,,,,,,,,,,,,,,,,,,,,,3486.0,,0.119,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,4625.0,15374.0,4617.5,4625.5,18979.0,,,,1.0,,1,,,,,,,,,,,,,,
2000,2,100663,105200,1052.0,University of Alabama at Birmingham,Birmingham,AL,35294-0110,,,1,1,3,4,1.0,,,,0.0,0.0,0.0,0.0352,0.0,0.0224,0.0,0.1223,0.0525,0.0,0.0051,0.0,0.0,0.0166,0.0,0.0,0.0743,0.0038,0.0,0.0019,0.0,0.0083,0.0,0.0134,0.0,0.0794,0.0467,0.0173,0.0551,0.0,0.0,0.0,0.0,0.0262,0.1895,0.1914,0.0384,,,,,,,,,,,,,,,,,,,,,,,,,,,6348.0,,0.054,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,5000.0,4125.0,4766.5,5250.0,22336.0,,,,1.0,,1,,,,,,,,,,,,,,
2000,3,100690,2503400,25034.0,Amridge University,Montgomery,AL,36117-3553,,,1,1,3,4,2.0,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,5397.0,,0.016,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,5655.5,9900.0,5500.0,8087.0,24892.0,,,,1.0,,1,,,,,,,,,,,,,,
2000,4,100706,105500,1055.0,University of Alabama in Huntsville,Huntsville,AL,35899,,,1,1,3,4,1.0,,,,0.0,0.0,0.0015,0.0,0.0,0.055,0.0,0.0267,0.2719,0.0,0.0208,0.0,0.0,0.0416,0.0,0.0,0.0475,0.0134,0.0,0.0,0.0,0.0045,0.0,0.0208,0.0,0.0208,0.0,0.0,0.0238,0.0,0.0,0.0,0.0,0.0223,0.1694,0.2392,0.0208,,,,,,,,,,,,,,,,,,,,,,,,,,,4200.0,,0.048,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,5500.0,12001.5,5500.0,5250.0,27289.5,,,,1.0,,1,,,,,,,,,,,,,,


In [5]:
titleIV = main[main['OPEFLAG'] == 1]
shape = titleIV.shape
nullPercent = titleIV['ICLEVEL'].isna().mean() * 100

print("[+] titleIVShape: ", shape, " IclevelNull: ", nullPercent)
titleIV.head()

[+] titleIVShape:  (121696, 113)  IclevelNull:  0.0


Unnamed: 0_level_0,Unnamed: 1_level_0,UNITID,OPEID,OPEID6,INSTNM,CITY,STABBR,ZIP,ACCREDAGENCY,HCM2,MAIN,NUMBRANCH,PREDDEG,HIGHDEG,CONTROL,ADM_RATE,ADM_RATE_ALL,SAT_AVG_ALL,PCIP01,PCIP03,PCIP05,PCIP09,PCIP10,PCIP11,PCIP12,PCIP13,PCIP14,PCIP15,PCIP16,PCIP19,PCIP22,PCIP23,PCIP24,PCIP25,PCIP26,PCIP27,PCIP29,PCIP30,PCIP31,PCIP38,PCIP39,PCIP40,PCIP41,PCIP42,PCIP43,PCIP44,PCIP45,PCIP46,PCIP47,PCIP48,PCIP49,PCIP50,PCIP51,PCIP52,PCIP54,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,CURROPER,NPT4_PUB,NPT4_PRIV,NPT41_PUB,NPT42_PUB,NPT43_PUB,NPT41_PRIV,NPT42_PRIV,NPT43_PRIV,NUM41_PUB,NUM42_PUB,NUM43_PUB,NUM44_PUB,NUM45_PUB,NUM41_PRIV,NUM42_PRIV,NUM43_PRIV,NUM44_PRIV,NUM45_PRIV,COSTT4_A,TUITFTE,PCTFLOAN,CDR2,CDR3,DEATH_YR2_RT,LO_INC_DEATH_YR2_RT,MD_INC_DEATH_YR2_RT,HI_INC_DEATH_YR2_RT,DEBT_MDN,GRAD_DEBT_MDN,PELL_DEBT_MDN,NOPELL_DEBT_MDN,MD_FAMINC,MD_EARN_WNE_P6,MD_EARN_WNE_P8,C100_4,ICLEVEL,C100_4_POOLED_SUPP,OPEFLAG,CIPCODE1,CIPCODE2,CIPCODE3,FTFTPCTPELL,FTFTPCTFLOAN,PLUS_DEBT_INST_MD,BBRR1_FED_UG_MAKEPROG,BBRR1_FED_UG_PAIDINFULL,MD_EARN_WNE_INC1_P6,MD_EARN_WNE_INC2_P6,MD_EARN_WNE_INC3_P6,STUFACR,MD_EARN_WNE_1YR,MD_EARN_WNE_4YR
year,rowNum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1
2000,1,100654,100200,1002.0,Alabama A & M University,Normal,AL,35762,,,1,1,3,4,1.0,,,,0.0624,0.0183,0.0,0.0,0.0239,0.0349,0.0,0.2569,0.0183,0.1083,0.0,0.0294,0.0,0.0092,0.0,0.0,0.0661,0.0202,0.0,0.0,0.0,0.0,0.0,0.0183,0.0,0.0514,0.0,0.044,0.0422,0.0,0.0,0.0,0.0,0.0037,0.0,0.1798,0.0018,,,,,,,,,,,,,,,,,,,,,,,,,,,3486.0,,0.119,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,4625.0,15374.0,4617.5,4625.5,18979.0,,,,1.0,,1,,,,,,,,,,,,,,
2000,2,100663,105200,1052.0,University of Alabama at Birmingham,Birmingham,AL,35294-0110,,,1,1,3,4,1.0,,,,0.0,0.0,0.0,0.0352,0.0,0.0224,0.0,0.1223,0.0525,0.0,0.0051,0.0,0.0,0.0166,0.0,0.0,0.0743,0.0038,0.0,0.0019,0.0,0.0083,0.0,0.0134,0.0,0.0794,0.0467,0.0173,0.0551,0.0,0.0,0.0,0.0,0.0262,0.1895,0.1914,0.0384,,,,,,,,,,,,,,,,,,,,,,,,,,,6348.0,,0.054,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,5000.0,4125.0,4766.5,5250.0,22336.0,,,,1.0,,1,,,,,,,,,,,,,,
2000,3,100690,2503400,25034.0,Amridge University,Montgomery,AL,36117-3553,,,1,1,3,4,2.0,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,5397.0,,0.016,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,5655.5,9900.0,5500.0,8087.0,24892.0,,,,1.0,,1,,,,,,,,,,,,,,
2000,4,100706,105500,1055.0,University of Alabama in Huntsville,Huntsville,AL,35899,,,1,1,3,4,1.0,,,,0.0,0.0,0.0015,0.0,0.0,0.055,0.0,0.0267,0.2719,0.0,0.0208,0.0,0.0,0.0416,0.0,0.0,0.0475,0.0134,0.0,0.0,0.0,0.0045,0.0,0.0208,0.0,0.0208,0.0,0.0,0.0238,0.0,0.0,0.0,0.0,0.0223,0.1694,0.2392,0.0208,,,,,,,,,,,,,,,,,,,,,,,,,,,4200.0,,0.048,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,5500.0,12001.5,5500.0,5250.0,27289.5,,,,1.0,,1,,,,,,,,,,,,,,
2000,5,100724,100500,1005.0,Alabama State University,Montgomery,AL,36104-0271,,,1,1,3,4,1.0,,,,0.0,0.0,0.0,0.0509,0.0,0.0959,0.0,0.3738,0.0,0.0,0.0,0.0,0.0,0.0117,0.0,0.0,0.0411,0.0157,0.0,0.0,0.0254,0.0,0.0,0.0059,0.0,0.0411,0.09,0.0528,0.0176,0.0,0.0,0.0,0.0,0.0196,0.002,0.1526,0.0039,,,,,,,,,,,,,,,,,,,,,,,,,,,68868.0,,0.188,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,4000.0,16814.0,3938.0,4768.0,16584.5,,,,1.0,,1,,,,,,,,,,,,,,


In [6]:
fourYear = titleIV[titleIV['ICLEVEL'] == 1]
shape = fourYear.shape

print("[+]  fourYearShape: ", shape)
fourYear.head()

[+]  fourYearShape:  (51404, 113)


Unnamed: 0_level_0,Unnamed: 1_level_0,UNITID,OPEID,OPEID6,INSTNM,CITY,STABBR,ZIP,ACCREDAGENCY,HCM2,MAIN,NUMBRANCH,PREDDEG,HIGHDEG,CONTROL,ADM_RATE,ADM_RATE_ALL,SAT_AVG_ALL,PCIP01,PCIP03,PCIP05,PCIP09,PCIP10,PCIP11,PCIP12,PCIP13,PCIP14,PCIP15,PCIP16,PCIP19,PCIP22,PCIP23,PCIP24,PCIP25,PCIP26,PCIP27,PCIP29,PCIP30,PCIP31,PCIP38,PCIP39,PCIP40,PCIP41,PCIP42,PCIP43,PCIP44,PCIP45,PCIP46,PCIP47,PCIP48,PCIP49,PCIP50,PCIP51,PCIP52,PCIP54,DISTANCEONLY,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,CURROPER,NPT4_PUB,NPT4_PRIV,NPT41_PUB,NPT42_PUB,NPT43_PUB,NPT41_PRIV,NPT42_PRIV,NPT43_PRIV,NUM41_PUB,NUM42_PUB,NUM43_PUB,NUM44_PUB,NUM45_PUB,NUM41_PRIV,NUM42_PRIV,NUM43_PRIV,NUM44_PRIV,NUM45_PRIV,COSTT4_A,TUITFTE,PCTFLOAN,CDR2,CDR3,DEATH_YR2_RT,LO_INC_DEATH_YR2_RT,MD_INC_DEATH_YR2_RT,HI_INC_DEATH_YR2_RT,DEBT_MDN,GRAD_DEBT_MDN,PELL_DEBT_MDN,NOPELL_DEBT_MDN,MD_FAMINC,MD_EARN_WNE_P6,MD_EARN_WNE_P8,C100_4,ICLEVEL,C100_4_POOLED_SUPP,OPEFLAG,CIPCODE1,CIPCODE2,CIPCODE3,FTFTPCTPELL,FTFTPCTFLOAN,PLUS_DEBT_INST_MD,BBRR1_FED_UG_MAKEPROG,BBRR1_FED_UG_PAIDINFULL,MD_EARN_WNE_INC1_P6,MD_EARN_WNE_INC2_P6,MD_EARN_WNE_INC3_P6,STUFACR,MD_EARN_WNE_1YR,MD_EARN_WNE_4YR
year,rowNum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1
2000,1,100654,100200,1002.0,Alabama A & M University,Normal,AL,35762,,,1,1,3,4,1.0,,,,0.0624,0.0183,0.0,0.0,0.0239,0.0349,0.0,0.2569,0.0183,0.1083,0.0,0.0294,0.0,0.0092,0.0,0.0,0.0661,0.0202,0.0,0.0,0.0,0.0,0.0,0.0183,0.0,0.0514,0.0,0.044,0.0422,0.0,0.0,0.0,0.0,0.0037,0.0,0.1798,0.0018,,,,,,,,,,,,,,,,,,,,,,,,,,,3486.0,,0.119,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,4625.0,15374.0,4617.5,4625.5,18979.0,,,,1.0,,1,,,,,,,,,,,,,,
2000,2,100663,105200,1052.0,University of Alabama at Birmingham,Birmingham,AL,35294-0110,,,1,1,3,4,1.0,,,,0.0,0.0,0.0,0.0352,0.0,0.0224,0.0,0.1223,0.0525,0.0,0.0051,0.0,0.0,0.0166,0.0,0.0,0.0743,0.0038,0.0,0.0019,0.0,0.0083,0.0,0.0134,0.0,0.0794,0.0467,0.0173,0.0551,0.0,0.0,0.0,0.0,0.0262,0.1895,0.1914,0.0384,,,,,,,,,,,,,,,,,,,,,,,,,,,6348.0,,0.054,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,5000.0,4125.0,4766.5,5250.0,22336.0,,,,1.0,,1,,,,,,,,,,,,,,
2000,3,100690,2503400,25034.0,Amridge University,Montgomery,AL,36117-3553,,,1,1,3,4,2.0,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,5397.0,,0.016,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,5655.5,9900.0,5500.0,8087.0,24892.0,,,,1.0,,1,,,,,,,,,,,,,,
2000,4,100706,105500,1055.0,University of Alabama in Huntsville,Huntsville,AL,35899,,,1,1,3,4,1.0,,,,0.0,0.0,0.0015,0.0,0.0,0.055,0.0,0.0267,0.2719,0.0,0.0208,0.0,0.0,0.0416,0.0,0.0,0.0475,0.0134,0.0,0.0,0.0,0.0045,0.0,0.0208,0.0,0.0208,0.0,0.0,0.0238,0.0,0.0,0.0,0.0,0.0223,0.1694,0.2392,0.0208,,,,,,,,,,,,,,,,,,,,,,,,,,,4200.0,,0.048,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,5500.0,12001.5,5500.0,5250.0,27289.5,,,,1.0,,1,,,,,,,,,,,,,,
2000,5,100724,100500,1005.0,Alabama State University,Montgomery,AL,36104-0271,,,1,1,3,4,1.0,,,,0.0,0.0,0.0,0.0509,0.0,0.0959,0.0,0.3738,0.0,0.0,0.0,0.0,0.0,0.0117,0.0,0.0,0.0411,0.0157,0.0,0.0,0.0254,0.0,0.0,0.0059,0.0,0.0411,0.09,0.0528,0.0176,0.0,0.0,0.0,0.0,0.0196,0.002,0.1526,0.0039,,,,,,,,,,,,,,,,,,,,,,,,,,,68868.0,,0.188,,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,PrivacySuppressed,4000.0,16814.0,3938.0,4768.0,16584.5,,,,1.0,,1,,,,,,,,,,,,,,


In [7]:
def ffillByYear(df, colName, flag=False, start=2000, end=2021):
    for yr in range(start, end):
        x = df.loc[yr]
        arr1 = x['UNITID']
        idx1 = x.index
        
        for i, id in enumerate(arr1):
            num1 = x[x['UNITID'] == id][colName].iloc[0]
            if (np.isnan(num1) or num1 == 0.0):
                for nextYr in range(yr + 1, 2021):
                    y = df.loc[nextYr]
                    num2 = y[y['UNITID'] == id][colName]

                    if ( (len(num2) > 0) and (not np.isnan(num2.iloc[0])) and (num2.iloc[0] > 0)):
                        if (not flag):
                            df.loc[yr].loc[idx1[i], colName] = num2.iloc[0]
                        else:
                            df.loc[yr].loc[idx1[i], colName] = num2.iloc[0] - 0.05 * num2.iloc[0]
                        break 

In [8]:
pd.set_option('display.max_rows', 10)
filtered_public_institutions = fourYear[(fourYear['UGDS_BLACK'].notna()) & (fourYear['UGDS_BLACK'] != 0.0)]
grpInstitutions = filtered_public_institutions.groupby('UNITID')
stat = grpInstitutions['UGDS_BLACK'].describe()
stat

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
UNITID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
100654,11.0,0.929336,0.023026,0.9022,0.909650,0.92160,0.952650,0.9617
100663,11.0,0.257464,0.009440,0.2401,0.254700,0.25900,0.264000,0.2684
100690,11.0,0.469755,0.206927,0.1778,0.342050,0.41920,0.703200,0.7288
100706,11.0,0.119100,0.021636,0.0879,0.102100,0.12300,0.135900,0.1495
100724,14.0,0.944443,0.020141,0.9208,0.927750,0.93655,0.955200,0.9776
...,...,...,...,...,...,...,...,...
493822,2.0,0.227200,0.001273,0.2263,0.226750,0.22720,0.227650,0.2281
494621,1.0,0.210500,,0.2105,0.210500,0.21050,0.210500,0.2105
494685,2.0,0.051950,0.012657,0.0430,0.047475,0.05195,0.056425,0.0609
495059,1.0,0.083300,,0.0833,0.083300,0.08330,0.083300,0.0833


In [9]:
fourYear[fourYear['UNITID'] == 102322.0]['UGDS_WHITE'].value_counts()
# Some institutes have NaN all across in UGDS_WHITE

Series([], Name: count, dtype: int64)

UGDS_WHITE: Fraction of undergrads that are white
UGDS_BLACK: Fraction of undergrads that are black
UGDS_ASIAN: Fraction of undergrads that are asian

Below Code handles the NaNs in the UGDS_WHITE, UGDS_BLACK and UGDS_ASIAN columns
As we  can see in the above table of stats, there is not much variablity in the UGDS_BLACK column(max and min are very-close and deviation is very
low) for a given institution.

The code replaces the NaNs and 0s with mean values, and if an institution got established only in that year it replaces the values with minimum valid value in that institution over the next years.

In [10]:
filtered_institutions = 0 # This code may take upto 10 sec to run.
stat = 0

def createRaceStat(df, race):
    global filtered_institutions
    filtered_institutions = df[(df[race].notna()) & (df[race] != 0.0)]
    grpInstitutions = filtered_institutions.groupby('UNITID')
    stat = grpInstitutions[race].describe()
    return stat

def updateRace(val, race):
    if (val['UNITID'] not in stat.index):
        return filtered_institutions[race].min()
    elif ( (np.isnan(val[race])) or (val[race] == 0.0) ):
        return stat['mean'].loc[val['UNITID']]
    return val[race]

stat = createRaceStat(fourYear, 'UGDS_WHITE')
fourYear.loc[:, 'UGDS_WHITE'] = fourYear[['UNITID', 'UGDS_WHITE']].apply(updateRace, axis=1, race='UGDS_WHITE')

stat = createRaceStat(fourYear, 'UGDS_BLACK')
fourYear.loc[:, 'UGDS_BLACK'] = fourYear[['UNITID', 'UGDS_BLACK']].apply(updateRace, axis=1, race='UGDS_BLACK')

stat = createRaceStat(fourYear, 'UGDS_ASIAN')
fourYear.loc[:, 'UGDS_ASIAN'] = fourYear[['UNITID', 'UGDS_ASIAN']].apply(updateRace, axis=1, race='UGDS_ASIAN')

UGDS => No. of undergraduates enrolled in a given year.

Below code handles the NaNs in UGDS column, by filling it with the next non-null, non-zero value found in the next year that has a valid value in the UGDS column of the same institution.

Since number of undergrads enrolled won't change signifincatly between a few years this method is mostly correct

In [11]:
ffillByYear(fourYear, 'UGDS') # This code may take upto 40 seconds to run
fourYear['UGDS'].dtype

dtype('float64')

In [12]:
salaryDtype = fourYear['MD_EARN_WNE_P6'].dtype
print("[+] DataType: ", salaryDtype)

[+] DataType:  object


Changing the data type to float and replacing the NaNs and PrivacySuppressed values with 0 in the Median Earnings columns

In [13]:
#Clean the MD_EARN_WNE_P6
def updateEarnings(val):
    if (val == 'PrivacySuppressed' or pd.isna(val)):
        return 0.0
    return float(val)

fourYear['MD_EARN_WNE_P6'] = fourYear['MD_EARN_WNE_P6'].apply(updateEarnings)
fourYear['MD_EARN_WNE_P6'] = fourYear['MD_EARN_WNE_P6'].astype(float)

fourYear['GRAD_DEBT_MDN'] = fourYear['GRAD_DEBT_MDN'].apply(updateEarnings)
fourYear['GRAD_DEBT_MDN'] = fourYear['GRAD_DEBT_MDN'].astype(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fourYear['MD_EARN_WNE_P6'] = fourYear['MD_EARN_WNE_P6'].apply(updateEarnings)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fourYear['MD_EARN_WNE_P6'] = fourYear['MD_EARN_WNE_P6'].astype(float)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fourYear['GRAD_DEBT_MDN'] = fourYear['GRAD_DEBT_MDN'].app

In [14]:
# Generating the required columns
reqCols = ['UNITID', 'INSTNM', 'CONTROL', 'ADM_RATE_ALL', 'C100_4', 'SAT_AVG_ALL', 'UGDS', 'UGDS_WHITE', 'UGDS_BLACK', 'UGDS_ASIAN', 'UGDS_HISP', 'MD_EARN_WNE_P6', 'GRAD_DEBT_MDN', 'NPT4_PRIV', 'NPT4_PUB', 'NPT41_PUB', 'NPT41_PRIV', 'NPT42_PUB', 'NPT42_PRIV', 'NPT43_PUB', 'NPT43_PRIV', 'COSTT4_A', 'STUFACR', 'FTFTPCTFLOAN', 'FTFTPCTPELL', 'PLUS_DEBT_INST_MD',
           'NUM41_PUB', 'NUM42_PUB', 'NUM43_PUB', 'NUM44_PUB', 'NUM45_PUB', 'NUM41_PRIV', 'NUM42_PRIV', 'NUM43_PRIV', 'NUM44_PRIV', 'NUM45_PRIV',
           ]

for num in availCourseCode:
    reqCols.append('PCIP' + str(f"{num:02d}"))

In [15]:
# fourYear[fourYear['INSTNM'].str.contains('University of Texas at Austin')]

## University Selection
<br>
There are two types of institutes that have been selected:<br><br>
1: Type-1(Ivy/Ivy League Plus  referred to as elite-colleges in the story):<br>
'Massachusetts Institute of Technology', 'Stanford University', 'University of Pennsylvania', 'Princeton University', 'Harvard University', <br>
'Northwestern University', 'Johns Hopkins University', 'Brown University', 'Washington University in St Louis', 'Duke University'

2: Type-2(Non-Ivy League Plus Colleges referred to as non-elite colleges in the story):<br>
'University of Nevada-Las Vegas', 'University of North Texas', 'University of Nevada-Reno','University of South Florida', 'University of Arizona',<br>
'Wichita State University', 'Louisiana Tech University', 'Ball State University', 'Cleveland State University', 'University of Arkansas'


In [16]:
pd.set_option('display.max_rows', None)
tier1Institutes = fourYear[fourYear['INSTNM'].isin(['Massachusetts Institute of Technology', 'Stanford University', 'University of Pennsylvania', 'Princeton University', 'Harvard University',
                                                    'Northwestern University', 'Johns Hopkins University', 'Brown University', 'Washington University in St Louis', 'Duke University'])][reqCols]
tier1Institutes = tier1Institutes.loc[2010:2020]
tier1Institutes.loc[[2010,2020]][['INSTNM', 'ADM_RATE_ALL', 'UGDS', 'COSTT4_A', 'STUFACR', 'FTFTPCTFLOAN', 'FTFTPCTPELL', 'NUM41_PRIV', 'NUM42_PRIV', 'NUM43_PRIV', 'NUM44_PRIV', 'NUM45_PRIV']]

Unnamed: 0_level_0,Unnamed: 1_level_0,INSTNM,ADM_RATE_ALL,UGDS,COSTT4_A,STUFACR,FTFTPCTFLOAN,FTFTPCTPELL,NUM41_PRIV,NUM42_PRIV,NUM43_PRIV,NUM44_PRIV,NUM45_PRIV
year,rowNum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2010,1188,Northwestern University,0.262,8905.0,52080.0,7.0,0.3056,0.0664,68.0,73.0,122.0,211.0,328.0
2010,1738,Johns Hopkins University,0.2782,5801.0,51478.0,11.0,0.3402,0.1063,63.0,67.0,80.0,120.0,254.0
2010,1833,Harvard University,0.0719,7181.0,50250.0,7.0,0.027,0.1273,65.0,72.0,51.0,16.0,39.0
2010,1857,Massachusetts Institute of Technology,0.107,4218.0,50100.0,8.0,0.2252,0.1679,66.0,63.0,53.0,68.0,179.0
2010,2297,Washington University in St Louis,0.2219,6436.0,52464.0,7.0,0.2781,0.0621,37.0,52.0,80.0,129.0,227.0
2010,2486,Princeton University,0.1006,5029.0,49830.0,6.0,0.041,0.0981,48.0,47.0,42.0,10.0,27.0
2010,2905,Duke University,0.2238,6416.0,50974.0,8.0,0.249,0.0877,73.0,68.0,78.0,110.0,227.0
2010,3592,University of Pennsylvania,0.1771,10842.0,51299.0,6.0,0.2938,0.1017,87.0,101.0,152.0,185.0,436.0
2010,3678,Brown University,0.1117,6013.0,50560.0,9.0,0.2119,0.1195,71.0,64.0,42.0,84.0,263.0
2010,4548,Stanford University,0.0797,6564.0,51760.0,10.0,0.1122,0.1503,115.0,95.0,55.0,34.0,102.0


In [17]:
# pd.set_option('display.max_rows', None)
# fourYear[fourYear['INSTNM'].str.contains('University of Nevada-Las Vegas')][['INSTNM', 'ADM_RATE_ALL', 'UGDS', 'SAT_AVG_ALL']]
# x = fourYear.loc[2020]
# y = x[(x['UGDS'] > 18000) & (x['UGDS'] < 28000) & (x['ADM_RATE_ALL'] > 0.5)]
# y[['INSTNM', 'ADM_RATE_ALL', 'UGDS']]

In [18]:
pd.set_option('display.max_rows', None)
tier2Institutes = fourYear[fourYear['INSTNM'].isin(['University of Nevada-Las Vegas', 'University of North Texas', 'University of Nevada-Reno','University of South Florida', 'University of Arizona',
                                                    'Wichita State University', 'Louisiana Tech University', 'Ball State University', 'Cleveland State University', 'University of Arkansas'])][reqCols]
tier2Institutes = tier2Institutes.loc[2010:2020]
tier2Institutes.loc[[2010,2020]][['INSTNM', 'ADM_RATE_ALL', 'UGDS', 'COSTT4_A', 'STUFACR', 'FTFTPCTFLOAN', 'FTFTPCTPELL', 'NUM41_PRIV', 'NUM42_PRIV', 'NUM43_PRIV', 'NUM44_PRIV', 'NUM45_PRIV']]

Unnamed: 0_level_0,Unnamed: 1_level_0,INSTNM,ADM_RATE_ALL,UGDS,COSTT4_A,STUFACR,FTFTPCTFLOAN,FTFTPCTPELL,NUM41_PRIV,NUM42_PRIV,NUM43_PRIV,NUM44_PRIV,NUM45_PRIV
year,rowNum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2010,83,University of Arizona,0.78,30036.0,18008.0,20.0,0.2875,0.1948,,,,,
2010,143,University of Arkansas,0.5609,15347.0,18461.0,18.0,0.3948,0.1796,,,,,
2010,888,University of South Florida,0.43,29911.0,17671.0,27.0,0.3466,0.2643,,,,,
2010,1248,Ball State University,0.7368,17403.0,18700.0,18.0,0.5901,0.2111,,,,,
2010,1496,Wichita State University,0.8846,10930.0,14475.0,20.0,0.3947,0.254,,,,,
2010,1624,Louisiana Tech University,0.628,7101.0,15034.0,20.0,0.417,0.2875,,,,,
2010,2375,University of Nevada-Las Vegas,0.7765,21679.0,12733.0,21.0,0.2622,0.2298,,,,,
2010,2376,University of Nevada-Reno,0.8693,12891.0,18119.0,23.0,0.2678,0.1306,,,,,
2010,3086,Cleveland State University,0.6402,9864.0,18555.0,16.0,0.5968,0.4429,,,,,
2010,4012,University of North Texas,0.638,27468.0,16373.0,23.0,0.5672,0.2322,,,,,


In [19]:
year = [2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]

In [20]:
# Handling Admission Rates
ffillByYear(tier2Institutes, 'ADM_RATE_ALL', False, 2010) # 12% null values
admRateMean = tier2Institutes['ADM_RATE_ALL'].mean()
tier2Institutes['ADM_RATE_ALL'].fillna(admRateMean, inplace=True)


In [21]:
def calcApplications(x):
    return x['UGDS'] / x['ADM_RATE_ALL']

# Application Rates and Enrollment:
tier1Institutes['APPLICATIONS'] = tier1Institutes[['ADM_RATE_ALL', 'UGDS']].apply(calcApplications, axis=1)
tier1Institutes['APPLICATIONS'] = tier1Institutes['APPLICATIONS'].astype(int)
tier2Institutes['APPLICATIONS'] = tier2Institutes[['ADM_RATE_ALL', 'UGDS']].apply(calcApplications, axis=1)
tier2Institutes['APPLICATIONS'] = tier2Institutes['APPLICATIONS'].astype(int)

In [22]:
# tier2Institutes[['INSTNM', 'UGDS', 'APPLICATIONS']]

In [23]:
applicationTier1 = tier1Institutes['APPLICATIONS'].groupby(level='year')
applicationTier1 = applicationTier1.quantile(0.5) # value under which 60% of data

In [24]:
applicationTier2 = tier2Institutes['APPLICATIONS'].groupby(level='year')
applicationTier2 = applicationTier2.quantile(0.5) # value under which 60% of data lies

In [25]:
# Normalise the values
def normalise(arr, start):
    for i in range(len(arr)):
        arr.iloc[i] = arr.iloc[i] / start
    return arr

In [26]:
applicationTier1 = normalise(applicationTier1, applicationTier1.iloc[0])
applicationTier2 = normalise(applicationTier2, applicationTier2.iloc[0])

In [27]:
# Plot the application rates
trace1 = go.Scatter(x=year, y=applicationTier1, mode="lines", name="Elite Colleges")
trace2 = go.Scatter(x=year, y=applicationTier2, mode="lines", name="Non-Elite Colleges", line=dict(dash='dash'))

fig = go.Figure(data=[trace1, trace2])
fig.update_layout(
    width= 600,
    xaxis=dict(title="Year"),
    yaxis=dict(title=""),
    title="Trends in application rates over a decade<br>(Applications are normalised, i.e, number of applications<br>in 2010 are represented as 1.0)"
)
fig.show()

In [28]:
# Analyse trends in enrolled students
enrolledTier1 = tier1Institutes['UGDS'].groupby(level='year').quantile(0.5)
enrolledTier2 = tier2Institutes['UGDS'].groupby(level='year').quantile(0.5)

In [29]:
enrolledTier1 = normalise(enrolledTier1, enrolledTier1.iloc[0])
enrolledTier2 = normalise(enrolledTier2, enrolledTier2.iloc[0])

In [30]:
# Plot the application rates
trace1 = go.Scatter(x=year, y=enrolledTier1, mode="lines", name="Elite")
trace2 = go.Scatter(x=year, y=enrolledTier2, mode="lines", name="Non-Elite", line=dict(dash='dash'))

fig = go.Figure(data=[trace1, trace2])
fig.update_layout(
    width=600,
    xaxis=dict(title="Year"),
    yaxis=dict(title=""),
    title="Trends in enrollment rates over a decade<br>(Enrollments are normalised, i.e, number of enrollments in<br>2010 are represented as 1.0)"
)
fig.show()

In [31]:
# Analysing the cost of attending a Elite Vs Non-Elite college
costTier1 = tier1Institutes['COSTT4_A'].groupby(level='year').mean() # very less deviations check using  costTier1.describe()
costTier2 = tier2Institutes['COSTT4_A'].groupby(level='year').mean() # very less deviations check using  costTier2.describe()

In [32]:
# Plot the application rates
trace1 = go.Scatter(x=year, y=costTier1, mode="lines", name="Elite Colleges")
trace2 = go.Scatter(x=year, y=costTier2, mode="lines", name="Non-Elite Colleges", line=dict(dash='dash'))

fig = go.Figure(data=[trace1, trace2])
fig.update_layout(
    width=600,
    xaxis=dict(title="Year"),
    yaxis=dict(title=""),
    title="Trends in tuition fees over a decade"
)
fig.show()

In [33]:
# Prepare MD_EARN_WNE_P6 for Tier1
pd.set_option('display.max_rows', 10)
ffillByYear(tier1Institutes, 'MD_EARN_WNE_P6', True, 2010)
# tier1Institutes[['INSTNM', 'MD_EARN_WNE_P6']]

In [34]:
stat = tier1Institutes.loc[2010:2020]
grpTier1 = stat.groupby(level='year')

medianEarnTier1 = grpTier1['MD_EARN_WNE_P6'].median() # MIT has unsually large median earnings
medianEarnTier1 = medianEarnTier1[medianEarnTier1 > 0]

year = medianEarnTier1.index

medianEarnTier1 = np.array(medianEarnTier1)
len(medianEarnTier1)

11

In [35]:
meanDebtTier1 = grpTier1['GRAD_DEBT_MDN'].mean()
meanDebtTier1 = np.array(meanDebtTier1)
meanDebtTier1

array([10460.15, 11166.4 , 12151.4 , 12827.7 , 13074.8 , 13081.65,
       13211.2 , 13098.45, 14064.1 , 14208.2 , 13654.6 ])

In [36]:
# Prepare MD_EARN_WNE_P6 for Tier2
pd.set_option('display.max_rows', 10)
ffillByYear(tier2Institutes, 'MD_EARN_WNE_P6', True, 2010)
# tier2Institutes[['INSTNM', 'MD_EARN_WNE_P6']]

In [37]:
stat = tier2Institutes.loc[2010:2020]
grpTier2 = stat.groupby(level='year')
medianEarnTier2 = grpTier2['MD_EARN_WNE_P6'].median()
medianEarnTier2 = medianEarnTier2[medianEarnTier2 > 0]

medianEarnTier2 = np.array(medianEarnTier2)
len(medianEarnTier2)

11

In [38]:
meanDebtTier2 = grpTier2['GRAD_DEBT_MDN'].mean()
meanDebtTier2 = np.array(meanDebtTier2)
meanDebtTier2

array([15051.9 , 16299.55, 17507.  , 18622.2 , 19565.1 , 20089.35,
       20201.9 , 20322.4 , 20780.7 , 20739.9 , 20654.8 ])

## Analysing the disparity between Tier-1 and Tier-2 institutes ##

In [39]:
trace1 = go.Scatter(x=year, y=medianEarnTier1, mode="lines", name="Elite")
trace2 = go.Scatter(x=year, y=medianEarnTier2, mode="lines", name="Non-Elite", fill="tonextx",opacity=0)


fig = go.Figure(data=[trace1, trace2])
fig.update_layout(
    width= 600,
    height=400,
    xaxis=dict(title="Year"),
    yaxis=dict(title="Average_Median_Salary"),
    title="These salaries are calculated 6-years after enrollment and thus<br>represent the future prospects that a student has after<br>attending an institute"
)
fig.show()

In [40]:
year=[2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
trace1 = go.Scatter(x=year, y=meanDebtTier1, mode="lines", name="Elite")
trace2 = go.Scatter(x=year, y=meanDebtTier2, mode="lines", name="Non-Elite", fill="tonextx")


fig = go.Figure(data=[trace1, trace2])
fig.update_layout(
    width=600,
    height=400,
    xaxis=dict(title="Year"),
    yaxis=dict(title="Average_Mean_Debt_Grad"),
    title="Debts are calculated when a student graduates"
)
fig.show()

In [41]:
stufacrTier1 = tier1Institutes['STUFACR'].groupby(level='year').median() # 0% null values
stufacrTier2 = tier2Institutes['STUFACR'].groupby(level='year').median() # 0% null values


In [42]:
trace1 = go.Scatter(x=year, y=stufacrTier1, mode="lines", name="Elite")
trace2 = go.Scatter(x=year, y=stufacrTier2, mode="lines", name="Non-Elite", line=dict(dash='dash'))


fig = go.Figure(data=[trace1, trace2])
fig.update_layout(
    width=600,
    height=400,
    xaxis=dict(title="Year"),
    yaxis=dict(title="Student-to-Faculty Ratio"),
)
fig.show()

Inferences....

In [43]:
# Above we analysed the disparity between Tier-1 and Tier-2 institutes
# Disparity between Tier-1 and Tier-2:
# Tier-2 => Low Salary on graduation and burdened with high loans
# If graduates are struggling financially, it can send a negative message to prospective students, discouraging them from applying.
# News of graduates struggling with debt can lead to negative publicity for the institution, further impacting its reputation.

## Exploring the Reasons for the above disparity  between Tier-1 and Tier-2 ##

Analysing the Acceptance-Rate, Sat_Scores and Completion/Graduation-Rates for Tier-1 and Tier-2

In [44]:
# Now we'll analyse the characterisitics of the top-5 universities and compare them
# with other tier-2 colleges

# First lets analyse the acceptance-rate, sat-scores of universities and the completion

In [45]:
pd.set_option('display.max_rows', None)
admittanceTier1 = (tier1Institutes.loc[2010:2020])[['INSTNM', 'ADM_RATE_ALL', 'SAT_AVG_ALL']]
admittanceTier2 = (tier2Institutes.loc[2010:2020])[['INSTNM', 'ADM_RATE_ALL', 'SAT_AVG_ALL']]
year = admittanceTier1.index.get_level_values('year')
admittanceTier1.loc[[2010, 2020]]

Unnamed: 0_level_0,Unnamed: 1_level_0,INSTNM,ADM_RATE_ALL,SAT_AVG_ALL
year,rowNum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010,1188,Northwestern University,0.262,1427.0
2010,1738,Johns Hopkins University,0.2782,1395.0
2010,1833,Harvard University,0.0719,1468.0
2010,1857,Massachusetts Institute of Technology,0.107,1472.0
2010,2297,Washington University in St Louis,0.2219,1462.0
2010,2486,Princeton University,0.1006,1482.0
2010,2905,Duke University,0.2238,1440.0
2010,3592,University of Pennsylvania,0.1771,1436.0
2010,3678,Brown University,0.1117,1420.0
2010,4548,Stanford University,0.0797,1436.0


In [46]:
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=year,
    y=admittanceTier1['ADM_RATE_ALL'],
    marker_color="red",
    mode="markers",
    name="Elite"
))

fig.add_trace(go.Scatter(
    x=year,
    y=admittanceTier2['ADM_RATE_ALL'],
    marker_color="green",
    mode="markers",
    name="Non-Elite"
))
fig.update_layout(
    yaxis=dict(dtick=0.05, title="Admission_Rate (Fraction)"),
    xaxis=dict(title="Year"),
)
fig.layout.update(dict(width=600, height=600))
fig.show()

In [47]:
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=year,
    y=admittanceTier1['SAT_AVG_ALL'],
    marker_color="red",
    mode="markers",
    name="Elite"
))

fig.add_trace(go.Scatter(
    x=year,
    y=admittanceTier2['SAT_AVG_ALL'],
    marker_color="green",
    mode="markers",
    name="Non-Elite"
))
fig.update_layout(
    yaxis=dict(title="SAT_SCORE"),
    xaxis=dict(title="Year"),
)
fig.layout.update(dict(width=600, height=600))
fig.show()

In [48]:
ffillByYear(tier1Institutes, 'C100_4', False, 2010, 2021)
ffillByYear(tier2Institutes, 'C100_4', False, 2010, 2021)

In [49]:
completionTier1 = tier1Institutes.loc[2010:2020]['C100_4']
completionTier2 = tier2Institutes.loc[2010:2020]['C100_4']

In [50]:
# Completion rates
nullPercentTier1 = completionTier1.isna().mean() * 100
nullPercentTier2 = completionTier2.isna().mean() * 100

print("[+] NullCompT1: ", nullPercentTier1, " NullCompT2", nullPercentTier2)

[+] NullCompT1:  0.0  NullCompT2 0.0


In [51]:
year = completionTier1.index.get_level_values('year')

fig = go.Figure()
fig.add_trace(go.Scatter(
    x=year,
    y=completionTier1,
    marker_color="red",
    mode="markers",
    name="Elite"
))

fig.add_trace(go.Scatter(
    x=year,
    y=completionTier2,
    marker_color="green",
    mode="markers",
    name="Non-Elite"
))
fig.update_layout(
    yaxis=dict(dtick=0.05,title="Completion_Rate"),
    xaxis=dict(title="Year"),
    title="This graph reflects the percentage of students who complete<br>their course within the expected duration"
)
fig.layout.update(dict(width=600, height=600))
fig.show()

Inferences....

In [52]:
# From above 3 charts we can see that:
# Since Tier2 have very high acceptance rates as compared to Tier1 and accept students with considerably low SAT scores
# There is no-contention for the seats and thus they are accepting everybody who is good or bad, which is not good.
# Thus they first need to build a strong reputation by actively engaging in academic conferences, research collaborations with Tier1
# institutes.

# Tier2 have very low graduation rates as compared to Tier1, which is kind of unusual since they have very high enrollment
# Such low level of graduation rates suggest that the quality of education in these institutes is not upto mark, that's probably why students are not able to complete
# their course on time.
# Thus these institutes need to enhance their quality of education by reviewing and update curriculum regularly to ensure relevance to industry needs and
# contemporary knowledge, provide faculty development opportunities through workshops

Analysing the differences in the diversity of the students enrolled in Tier-1 and Tier-2 particularly Asians

In [53]:
# Analyse the diversity in these colleges
pd.set_option('display.max_rows', 10)
tier1Institutes.loc[2010:2020][['INSTNM', 'MD_EARN_WNE_P6', 'UGDS_WHITE', 'UGDS_BLACK', 'UGDS_ASIAN']]

Unnamed: 0_level_0,Unnamed: 1_level_0,INSTNM,MD_EARN_WNE_P6,UGDS_WHITE,UGDS_BLACK,UGDS_ASIAN
year,rowNum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010,1188,Northwestern University,52800.0,0.501118,0.057455,0.178409
2010,1738,Johns Hopkins University,61900.0,0.405309,0.062418,0.225018
2010,1833,Harvard University,71500.0,0.047500,0.005200,0.005600
2010,1857,Massachusetts Institute of Technology,81900.0,0.363700,0.080800,0.250600
2010,2297,Washington University in St Louis,53700.0,0.535627,0.070236,0.170036
...,...,...,...,...,...,...
2020,2004,Princeton University,88273.0,0.402400,0.078900,0.223100
2020,2365,Duke University,82232.0,0.412900,0.088300,0.210100
2020,2929,University of Pennsylvania,90173.0,0.399800,0.079900,0.215600
2020,3008,Brown University,69988.0,0.427700,0.067700,0.166800


In [54]:
asianByWhiteTier1 = (tier1Institutes['UGDS_ASIAN'])
asianByWhiteTier2 = (tier2Institutes['UGDS_ASIAN'])
asianByWhiteTier1 = asianByWhiteTier1.loc[2010:2020]
asianByWhiteTier2 = asianByWhiteTier2.loc[2010:2020]

In [55]:
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=asianByWhiteTier1,
    y=tier1Institutes['MD_EARN_WNE_P6'].loc[2010:2020],
    marker_color="red",
    mode="markers",
    name="Elite"
))

fig.add_trace(go.Scatter(
    x=asianByWhiteTier2,
    y=tier2Institutes['MD_EARN_WNE_P6'].loc[2010:2020],
    marker_color="green",
    mode="markers",
    name="Non-Elite"
))
fig.update_layout(
    yaxis=dict(title="Median_Income(2010-2020)"),
    xaxis=dict(title="Fraction of Asian Students"),
)
fig.layout.update(dict(width=600, height=600))
fig.show()


Inferences...

In [56]:
# We can see that as the number of asian-students increase the median-income of institutions have gradually increased
# Tier1 have high percentages of asian-students as compared to Tier2
# Tier2 need to increase the diversity of students in their colleges.
# Tier2 may engage on social media platforms that are popular in different countries, as well as websites, podcasts
# Create scholarship programs specifically targeting students from other countries.

In [57]:
# Analysing the loans-given and the cost of attendance for the students of various income brackets
loansTier1 = tier1Institutes['FTFTPCTFLOAN'].groupby(level='year').mean()
loansTier2 = tier2Institutes['FTFTPCTFLOAN'].groupby(level='year').mean()

In [58]:
year=[2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
trace1 = go.Scatter(x=year, y=loansTier1, mode="lines", name="Elite")
trace2 = go.Scatter(x=year, y=loansTier2, mode="lines", name="Non-Elite", line=dict(dash='dash'))


fig = go.Figure(data=[trace1, trace2])
fig.update_layout(
    width=600,
    height=400,
    xaxis=dict(title="Year"),
    yaxis=dict(title="Fraction"),
    title="Fraction of first-time degree-seeking undergrads who<br>received a federal loan"
)
fig.show()

In [59]:
grantsTier1 = tier1Institutes['FTFTPCTPELL'].groupby(level='year').mean()
grantsTier2 = tier2Institutes['FTFTPCTPELL'].groupby(level='year').mean()

In [60]:
year=[2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
trace1 = go.Scatter(x=year, y=grantsTier1, mode="lines", name="Elite")
trace2 = go.Scatter(x=year, y=grantsTier2, mode="lines", name="Non-Elite", line=dict(dash='dash'))


fig = go.Figure(data=[trace1, trace2])
fig.update_layout(
    width=600,
    height=400,
    xaxis=dict(title="Year"),
    yaxis=dict(title="Fraction"),
    title="Fraction of first-time degree-seeking undergrads who<br>received a federal grant"
)
fig.show()

In [61]:
# Now let's analyse the net average cost of attending these institutes
# This code may take upto 10sec to run
ffillByYear(tier1Institutes, 'NPT4_PRIV', True, 2010)
ffillByYear(tier1Institutes, 'NPT4_PUB', True, 2010)

ffillByYear(tier1Institutes, 'NPT41_PRIV', True, 2010)
ffillByYear(tier1Institutes, 'NPT41_PUB', True, 2010)

ffillByYear(tier1Institutes, 'NPT43_PRIV', True, 2010)
ffillByYear(tier1Institutes, 'NPT43_PUB', True, 2010)

ffillByYear(tier2Institutes, 'NPT4_PRIV', True, 2010)
ffillByYear(tier2Institutes, 'NPT4_PUB', True, 2010)

ffillByYear(tier2Institutes, 'NPT41_PRIV', True, 2010)
ffillByYear(tier2Institutes, 'NPT41_PUB', True, 2010)

ffillByYear(tier2Institutes, 'NPT43_PRIV', True, 2010)
ffillByYear(tier2Institutes, 'NPT43_PUB', True, 2010)

In [62]:
def calcMean(year, ser):
    sum, count = 0, 0
    subSeries = ser.loc[year]

    for val in subSeries:
        if (val > 1000):
            sum += val
            count += 1
    return sum / count

In [63]:
def correctNegatives(series):
    for i in range(0, 11):
        mean = calcMean(i + 2010, series)
        start, end = int(i * len(series) / 11), int(i * len(series) / 11 + len(series) / 11)
        for j in range(start, end):
            if (pd.isna(series.iloc[j]) or series.iloc[j] < 1000):
                series.iloc[j] = mean
    return
                

In [64]:
pd.set_option('display.max_rows', None)
# --------------
# Some Universities like stanford have anomaly in data, i.e, negative net avg-cost or less than $1000 fees
# --------------

avgNetPriceTier1 = tier1Institutes.loc[2010:2020]['NPT4_PRIV']
correctNegatives(avgNetPriceTier1)

avgNetPriceLowIncomeTier1 = tier1Institutes.loc[2010:2020]['NPT41_PRIV']
correctNegatives(avgNetPriceLowIncomeTier1)

avgNetPriceMedIncomeTier1 = tier1Institutes.loc[2010:2020]['NPT43_PRIV']
correctNegatives(avgNetPriceMedIncomeTier1)

# avgNetPriceTier1.describe()

avgNetPriceTier1 = avgNetPriceTier1.groupby(level='year').mean()
avgNetPriceLowIncomeTier1 = avgNetPriceLowIncomeTier1.groupby(level='year').mean()
avgNetPriceMedIncomeTier1 = avgNetPriceMedIncomeTier1.groupby(level='year').mean()




A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [65]:
def extractCost(val, colName1, colName2):
    if (np.isnan(val[colName1]) and np.isnan(val[colName2])):
        return 0.0
    elif (np.isnan(val[colName1])):
        return val[colName2]
    return val[colName1]

avgNetPriceTier2 = tier2Institutes.loc[2010:2020][['NPT4_PRIV', 'NPT4_PUB']].apply(extractCost, axis = 1, colName1='NPT4_PRIV', colName2='NPT4_PUB')
correctNegatives(avgNetPriceTier2)

avgNetPriceLowIncomeTier2 = tier2Institutes.loc[2010:2020][['NPT41_PRIV', 'NPT41_PUB']].apply(extractCost, axis=1, colName1='NPT41_PRIV', colName2='NPT41_PUB')
correctNegatives(avgNetPriceLowIncomeTier2)

avgNetPriceMedIncomeTier2 = tier2Institutes.loc[2010:2020][['NPT43_PRIV', 'NPT43_PUB']].apply(extractCost, axis=1, colName1='NPT43_PRIV', colName2='NPT43_PUB')
correctNegatives(avgNetPriceMedIncomeTier2)

# avgNetPriceTier2.describe()

avgNetPriceTier2 = avgNetPriceTier2.groupby(level='year').mean()
avgNetPriceLowIncomeTier2 = avgNetPriceLowIncomeTier2.groupby(level='year').mean()
avgNetPriceMedIncomeTier2 = avgNetPriceMedIncomeTier2.groupby(level='year').mean()

In [66]:
year = avgNetPriceTier1.loc[2010:2020].index.get_level_values('year')

Analysing the differences in the Net Avg Cost between the students of both types of institutes

In [67]:
pd.set_option('display.max_rows', None)
year = [2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]

In [68]:
def drawBarGraph(year, yT1, yT2, xlabel, ylabel, title):
    tier1Color = "blue"
    tier2Color = "orange"

    fig = go.Figure()
    fig.add_trace(
        go.Bar(
            x=year,
            y=yT1,
            name="Elite",
            marker_color=tier1Color,
        )
    )
    fig.add_trace(
        go.Bar(
            x=year,
            y=yT2,
            name="Non-Elite",
            marker_color=tier2Color,
        )
    )
    fig.update_layout(
        yaxis=dict(title=ylabel),
        xaxis=dict(title=xlabel),
        title=title
    )
    fig.show()

In [69]:
drawBarGraph(year, avgNetPriceLowIncomeTier1, avgNetPriceLowIncomeTier2, 'Year', 'Net-Avg Cost(TotalCost - Aid)', 'Net-Avg Cost for Low Income Families(<=$30K)')

In [70]:
drawBarGraph(year, avgNetPriceMedIncomeTier1, avgNetPriceMedIncomeTier2, 'Year', 'Net-Avg Cost(TotalCost - Aid)', 'Net-Avg Cost for Medium Income Families($48K-$75K)')

In [71]:
drawBarGraph(year, avgNetPriceTier1, avgNetPriceTier2, 'Year', 'Net-Avg Cost(TotalCost - Aid)', 'Net-Avg Cost calculated by considering all income brackets')

Inferences...

In [72]:
# We can see that the net-average cost for attendance of Tier-1 institutes is less than that of Tier-2 institutes for
# low and medium income families.
# This may lead to students being burdened by high debt and financial worries may struggle to focus on their studies,
# leading to lower academic performance, higher dropout rates.

In [73]:
# Now lets analyse the Top-3 subjects in which degrees are awarded in Tier-1 and Tier-2 institutes.

In [74]:
pd.set_option('display.max_rows', 10)
tier1Institutes.head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,UNITID,INSTNM,CONTROL,ADM_RATE_ALL,C100_4,SAT_AVG_ALL,UGDS,UGDS_WHITE,UGDS_BLACK,UGDS_ASIAN,UGDS_HISP,MD_EARN_WNE_P6,GRAD_DEBT_MDN,NPT4_PRIV,NPT4_PUB,NPT41_PUB,NPT41_PRIV,NPT42_PUB,NPT42_PRIV,NPT43_PUB,NPT43_PRIV,COSTT4_A,STUFACR,FTFTPCTFLOAN,FTFTPCTPELL,PLUS_DEBT_INST_MD,NUM41_PUB,NUM42_PUB,NUM43_PUB,NUM44_PUB,NUM45_PUB,NUM41_PRIV,NUM42_PRIV,NUM43_PRIV,NUM44_PRIV,NUM45_PRIV,PCIP01,PCIP03,PCIP05,PCIP09,PCIP10,PCIP11,PCIP12,PCIP13,PCIP14,PCIP15,PCIP16,PCIP19,PCIP22,PCIP23,PCIP24,PCIP25,PCIP26,PCIP27,PCIP29,PCIP30,PCIP31,PCIP38,PCIP39,PCIP40,PCIP41,PCIP42,PCIP43,PCIP44,PCIP45,PCIP46,PCIP47,PCIP48,PCIP49,PCIP50,PCIP51,PCIP52,PCIP54,APPLICATIONS
year,rowNum,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1
2010,1188,147767,Northwestern University,2.0,0.2620,0.8727,1427.0,8905.0,0.501118,0.057455,0.178409,0.0000,52800.0,13000.0,29344.0,,,12783.0,,16065.0,,21062.0,52080.0,7.0,0.3056,0.0664,,,,,,,68.0,73.0,122.0,211.0,328.0,0.0,0.0042,0.0065,0.1616,0.0000,0.0083,0.0,0.0051,0.1510,0.0000,0.0269,0.0,0.0,0.0380,0.0023,0.0,0.0357,0.0167,0.0,0.0236,0.0,0.0139,0.0,0.0167,0.0,0.0885,0.0,0.0310,0.1950,0.0,0.0,0.0,0.0,0.0936,0.0204,0.0273,0.0338,33988
2010,1738,162928,Johns Hopkins University,2.0,0.2782,0.8191,1395.0,5801.0,0.405309,0.062418,0.225018,0.0000,61900.0,14500.0,28857.0,,,13688.0,,14383.0,,16425.0,51478.0,11.0,0.3402,0.1063,,,,,,,63.0,67.0,80.0,120.0,254.0,0.0,0.0000,0.0067,0.0000,0.0094,0.0134,0.0,0.0013,0.1717,0.0000,0.0114,0.0,0.0,0.0362,0.0107,0.0,0.0825,0.0241,0.0,0.0718,0.0,0.0054,0.0,0.0181,0.0,0.0376,0.0,0.0382,0.1482,0.0,0.0,0.0,0.0,0.0577,0.2213,0.0168,0.0174,20851
2010,1833,166027,Harvard University,2.0,0.0719,0.8775,1468.0,7181.0,0.047500,0.005200,0.005600,0.0053,71500.0,6000.0,7785.0,,,2170.0,,1413.0,,4570.0,50250.0,7.0,0.0270,0.1273,,,,,,,65.0,72.0,51.0,16.0,39.0,0.0,0.0111,0.0211,0.0000,0.0000,0.0139,0.0,0.0000,0.0173,0.0000,0.0284,0.0,0.0,0.0512,0.0768,0.0,0.1380,0.0462,0.0,0.0000,0.0,0.0139,0.0,0.0590,0.0,0.0540,0.0,0.0000,0.3406,0.0,0.0,0.0,0.0,0.0323,0.0000,0.0000,0.0963,99874
2010,1857,166683,Massachusetts Institute of Technology,2.0,0.1070,0.8271,1472.0,4218.0,0.363700,0.080800,0.250600,0.1330,81900.0,8853.5,20285.0,,,3400.0,,3834.0,,7874.0,50100.0,8.0,0.2252,0.1679,,,,,,,66.0,63.0,53.0,68.0,179.0,0.0,0.0000,0.0000,0.0035,0.0000,0.1466,0.0,0.0000,0.3997,0.0000,0.0017,0.0,0.0,0.0017,0.0087,0.0,0.0637,0.0785,0.0,0.0585,0.0,0.0026,0.0,0.1047,0.0,0.0000,0.0,0.0000,0.0462,0.0,0.0,0.0,0.0,0.0061,0.0000,0.0593,0.0009,39420
2010,2297,179867,Washington University in St Louis,2.0,0.2219,0.8436,1462.0,6436.0,0.535627,0.070236,0.170036,0.0000,53700.0,14875.0,29202.0,,,18549.0,,12208.0,,18652.0,52464.0,7.0,0.2781,0.0621,,,,,,,37.0,52.0,80.0,129.0,227.0,0.0,0.0163,0.0460,0.0018,0.0000,0.0242,0.0,0.0030,0.1416,0.0000,0.0345,0.0,0.0,0.0351,0.0042,0.0,0.0593,0.0176,0.0,0.0557,0.0,0.0121,0.0,0.0278,0.0,0.0866,0.0,0.0000,0.1701,0.0,0.0,0.0,0.0,0.0841,0.0018,0.1332,0.0236,29004
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2011,2444,186131,Princeton University,2.0,0.0880,0.9009,1485.0,5142.0,0.487900,0.075500,0.168800,0.0774,45885.0,7500.0,9214.0,,,4995.0,,5089.0,,6498.0,51260.0,6.0,0.0510,0.1042,,,,,,,65.0,51.0,40.0,25.0,29.0,0.0,0.0000,0.0194,0.0000,0.0000,0.0000,0.0,0.0000,0.1860,0.0000,0.0614,0.0,0.0,0.0488,0.0000,0.0,0.0783,0.0261,0.0,0.0017,0.0,0.0631,0.0,0.0606,0.0,0.0471,0.0,0.0640,0.2365,0.0,0.0,0.0,0.0,0.0269,0.0000,0.0000,0.0673,58431
2011,2861,198419,Duke University,2.0,0.1886,0.8682,1432.0,6545.0,0.470400,0.089700,0.214100,0.0666,63555.0,7363.0,23085.0,,,7596.0,,5304.0,,13912.0,53035.0,8.0,0.3151,0.1283,,,,,,,106.0,87.0,90.0,150.0,290.0,0.0,0.0111,0.0327,0.0000,0.0000,0.0074,0.0,0.0000,0.1491,0.0000,0.0240,0.0,0.0,0.0450,0.0000,0.0,0.0961,0.0203,0.0,0.0092,0.0,0.0166,0.0,0.0400,0.0,0.0992,0.0,0.0746,0.2526,0.0,0.0,0.0,0.0,0.0253,0.0419,0.0000,0.0548,34703
2011,3541,215062,University of Pennsylvania,2.0,0.1426,0.8860,1435.0,10882.0,0.457900,0.075300,0.176400,0.0652,64030.0,13500.0,24596.0,,,6295.0,,7142.0,,13205.0,53250.0,6.0,0.2579,0.1290,,,,,,,81.0,127.0,154.0,188.0,483.0,0.0,0.0043,0.0072,0.0400,0.0000,0.0086,0.0,0.0004,0.1116,0.0000,0.0274,0.0,0.0,0.0393,0.0058,0.0,0.0965,0.0140,0.0,0.0194,0.0,0.0439,0.0,0.0112,0.0,0.0295,0.0,0.0054,0.1570,0.0,0.0,0.0,0.0,0.0270,0.0810,0.2125,0.0497,76311
2011,3627,217156,Brown University,2.0,0.0934,0.8565,1415.0,6102.0,0.459000,0.059200,0.145700,0.0923,42370.0,12500.0,25281.0,,,5771.0,,4884.0,,12853.0,52030.0,9.0,0.2378,0.1326,,,,,,,61.0,61.0,70.0,72.0,231.0,0.0,0.0121,0.0465,0.0121,0.0000,0.0256,0.0,0.0101,0.0587,0.0000,0.0391,0.0,0.0,0.0553,0.0000,0.0,0.1133,0.0405,0.0,0.0647,0.0,0.0236,0.0,0.0378,0.0,0.0276,0.0,0.0148,0.2569,0.0,0.0,0.0,0.0,0.0472,0.0216,0.0418,0.0506,65331


In [75]:
computerTier1 = tier1Institutes.loc[2010:2020]['PCIP11'] # PCIP11 => Computing Subjects
computerTier2 = tier2Institutes.loc[2010:2020]['PCIP11']

mathsAndStatsTier1 = tier1Institutes.loc[2010:2020]['PCIP27'] # PCIP27 => Mathematics and Statistics Subjects
mathsAndStatsTier2 = tier2Institutes.loc[2010:2020]['PCIP27']

businessTier1 = tier1Institutes.loc[2010:2020]['PCIP52'] # PCIP52 => Business and Marketing Subjects
businessTier2 = tier2Institutes.loc[2010:2020]['PCIP52']

biomedicineTier1 = tier1Institutes.loc[2010:2020]['PCIP26'] # PCIP26 => Biology and Biomedicine Related Subjects
biomedicineTier2 = tier2Institutes.loc[2010:2020]['PCIP26']

computerTier1ByYr = computerTier1.groupby(level='year').mean()
computerTier2ByYr = computerTier2.groupby(level='year').mean()

mathsAndStatsTier1ByYr = mathsAndStatsTier1.groupby(level='year').mean()
mathsAndStatsTier2ByYr = mathsAndStatsTier2.groupby(level='year').mean()

businessTier1ByYr = businessTier1.groupby(level='year').mean()
businessTier2ByYr = businessTier2.groupby(level='year').mean()

biomedicineTier1ByYr = biomedicineTier1.groupby(level='year').mean()
biomedicineTier2ByYr = biomedicineTier2.groupby(level='year').mean()

In [76]:
year=[2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
trace1 = go.Scatter(x=year, y=computerTier1ByYr, mode="lines", name="Elite_Computer_Science")
trace2 = go.Scatter(x=year, y=computerTier2ByYr, mode="lines", name="Non-Elite_Computer_Science", fill="tonextx")


fig = go.Figure(data=[trace1, trace2])
fig.update_layout(
    width=700,
    height=400,
    xaxis=dict(title="Year"),
    yaxis=dict(title="Fraction of degrees awarded"),
    title="Trends in Computer_and_Information_Sciences_and_Support_Services<br>degree over the Years"
)
fig.show()

Analysing which subjects are a part of major foucs in the respective type of colleges

In [77]:
year=[2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
trace1 = go.Scatter(x=year, y=mathsAndStatsTier1ByYr, mode="lines", name="Elite_Maths_and_Stats")
trace2 = go.Scatter(x=year, y=mathsAndStatsTier2ByYr, mode="lines", name="Non-Elite_Maths_and_Stats", fill="tonextx")


fig = go.Figure(data=[trace1, trace2])
fig.update_layout(
    width=700,
    height=400,
    xaxis=dict(title="Year"),
    yaxis=dict(title="Fraction of degrees awarded"),
    title="Trends in Mathematics_and_Statistics degree over the Years"
)
fig.show()

In [78]:
trace1 = go.Scatter(x=year, y=biomedicineTier1ByYr, mode="lines", name="Elite_Biomedicine")
trace2 = go.Scatter(x=year, y=biomedicineTier2ByYr, mode="lines", name="Non-Elite_Biomedicine", fill="tonextx")


fig = go.Figure(data=[trace1, trace2])
fig.update_layout(
    width=700,
    height=400,
    xaxis=dict(title="Year"),
    yaxis=dict(title="Fraction of degrees awarded"),
    title="Trends in Biological_&_Biomedical_Sciences degrees over the Years"
)
fig.show()

In [79]:
trace1 = go.Scatter(x=year, y=businessTier1ByYr, mode="lines", name="Tier 1_Business")
trace2 = go.Scatter(x=year, y=businessTier2ByYr, mode="lines", name="Tier 2_Business", fill="tonextx")


fig = go.Figure(data=[trace1, trace2])
fig.update_layout(
    xaxis=dict(title="Year"),
    yaxis=dict(title="Fraction of degrees awarded"),
    title="Trends in Business, Management, Marketing, and Related Support Services degrees over the Years"
)
fig.show()

Conclusion....

In [80]:
# From above  graphs we can see that Tier1 Institutes have more focus on computer-science and core-maths and statistics degrees while Tier2 Institutes have more focus on
# biomedicines and business.

# Thus Tier-2 institutes should try to have more specialisations in computer-science and core-maths and statistics discplines which would attract a lot of intellectual people.
# Tier-1 should do more work on business related degrees.