# In this step, we'll assign money/award codes based on prior award letter analysis

_Definition for award codes shown in the "Out of Pocket Cost" column: average unmet need for 0 EFC students BEFORE loans_									
+++: Less than \\$5,000  (i.e. no family contribution after Stafford loans)   
++: Between \\$5,000-\\$8,000 (i.e. no more than \\$2,500 in need after Stafford)   
++/-: Most awards ++, but some not as good   
+/--: Most awards are bad, but some are good   	
+/---: Almost all awards are bad, but we had a few surprises   		
--: \\$12,000-\\$15,000   
---: >\\$15,000

_Note that the year references below are from summer 2019; they should be updated each year_
### Here are the fields we're working with. Most were calculated in a separate workbook using student level data
- Noble awards collected FY19 (most recent year)
- Noble mean unmet need FY19 (most recent year)
- Noble min unmet need FY19 (most recent year)
- Noble max unmet need FY19 (most recent year)
- Noble money range FY20 __(we will be calculating this value below)__
- Noble money range FY19 (import from last year--based on FY18 awards)
- Noble money range FY18 (import from prior year--based on FY17 awards)
- Noble money code FY20 (follow rules--see below)
- Noble money code FY19 (import from last year--based on FY18 awards)
- Noble Money FY20 (1 if two plus signs, 0 otherwise based on Money code FY20)
- Current Noble students (pulled from Salesforce: attending plus matriculating to)

### We're also going to grab the NetPrice0-30 number along with Public/Private and State designation from the sfa1617 and  BaseDir; these will help us infer values where we don't have actual awards (net price is given is given in such a way to let us infer public vs. private)


In [5]:
import pandas as pd
import numpy as np
import os

# Edit these to reflect any changes
work_location = 'inputs'
base_dir = 'base_dir.csv'
fin_aid = 'sfa1617.csv'  # INCREMENT
noble_awards = '../../raw_inputs/financial_aid_analysis_output.xlsx'
fin_aid_output = 'award_info_final.csv'
new_range = "Noble money range FY20"  # INCREMENT

In [2]:
os.chdir(work_location)

In [3]:
# First, we'll load the source files and combine them
df = pd.concat([pd.read_csv(base_dir, usecols=["UNITID", "INSTNM", "STABBR"], index_col="UNITID"),
                pd.read_excel(noble_awards, index_col="UNITID"),
               ], axis=1)
# Now, we don't want any net price colleges if they weren't already in base_dir or our award letter info
# so we'll do a left join. Note that either the Public or Private column is populated, never both
a_df = pd.read_csv(fin_aid, index_col="UNITID",
                   usecols=["UNITID", "NPT412", "NPIS412"]).rename(
                        columns={"NPT412":"NetPricePrivate0-30", "NPIS412":"NetPricePublic0-30"})
a_df = a_df[~pd.isnull(a_df["NetPricePublic0-30"]) | ~pd.isnull(a_df["NetPricePrivate0-30"])]
df=pd.merge(df, a_df, how="left", left_index=True, right_index=True)
df.head()

Unnamed: 0_level_0,INSTNM,STABBR,Noble awards collected FY19,Noble mean unmet need FY19,Noble median unmet need FY19,Noble min unmet need FY19,Noble max unmet need FY19,Noble money code FY19,Noble money range FY19,Noble money code FY18,Noble money range FY18,Current Noble Students,Name,NetPricePublic0-30,NetPricePrivate0-30
UNITID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
100654,Alabama A & M University,AL,70.0,17880.0,18487.0,5623.0,27019.0,++/-,"$12k-$15k, some <$6k",++/-,"$12k-$15k, some <$6k",41.0,Alabama A & M University,14719.0,
100663,University of Alabama at Birmingham,AL,1.0,27019.0,27019.0,27019.0,27019.0,,,,,1.0,University of Alabama at Birmingham,15280.0,
100690,Amridge University,AL,,,,,,--,$10k-$12k,++/-,$8k-$10k,,,,9165.0
100706,University of Alabama in Huntsville,AL,,,,,,,,,,,,16047.0,
100724,Alabama State University,AL,,,,,,---,>$15k,++/-,"$10k-$12k, some <$6k",6.0,Alabama State University,12290.0,


### Use the the numbers we have to set the "money range FY20" using the following rules:
- If there are at least 3 award letters, use the mean to set the ranges into these discrete categories:
    - <\$0   
    - <\$5k   
    - $$5k-\$8k   

### Then, repeat those three categories for any Illinois schools with NetPrice0-30 in those ranges or any national private schools with the same ranges. These are all automatic "money" schools, so we don't have to worry about the outliers as much

In [6]:
passed_fields = [
    "Noble awards collected FY19",  # INCREMENT
    "Noble mean unmet need FY19",  # INCREMENT
    "Noble max unmet need FY19",  # INCREMENT
    "Noble min unmet need FY19",  # INCREMENT
    "STABBR",
    "NetPricePublic0-30",
    "NetPricePrivate0-30",
]
def _create_award_ranges(x):
    num_awards, mean_unmet, max_unmet, min_unmet, state, public_np, private_np = x
    if num_awards >= 3:
        return "POSSIBLE"
    else:
        return "IMPOSSIBLE"
df[new_range] = df[passed_fields].apply(_create_award_ranges, axis=1)

In [7]:
# FINISH THE ABOVE
df.to_csv('foo.csv')

---
### After these, we have to get a little trickier. Here are the other codes:

In [4]:
# $8k-$10k
# $8k-$10k, some <$5k
# $10k-$12k
# $10k-$12k, some <$6k
# $10k-$12k, some <$8k
# $12k-$15k
# $12k-$15k, some <$6k
# >$15k
# >$15k, some <$6k
# >$15k, some <$12k