# In this step, we'll assign money/award codes based on prior award letter analysis

_Definition for award codes shown in the "Out of Pocket Cost" column: average unmet need for 0 EFC students BEFORE loans_									
+++: Less than \\$5,000  (i.e. no family contribution after Stafford loans)   
++: Between \\$5,000-\\$8,000 (i.e. no more than \\$2,500 in need after Stafford)   
++/-: Most awards ++, but some not as good   
+/--: Most awards are bad, but some are good   	
+/---: Almost all awards are bad, but we had a few surprises   		
--: \\$12,000-\\$15,000   
---: >\\$15,000

_Note that the year references below are from summer 2019; they should be updated each year_
### Here are the fields we're working with. Most were calculated in a separate workbook using student level data
- Noble awards collected FY19 (most recent year)
- Noble mean unmet need FY19 (most recent year)
- Noble min unmet need FY19 (most recent year)
- Noble max unmet need FY19 (most recent year)
- Noble money range FY20 __(we will be calculating this value below)__
- Noble money range FY19 (import from last year--based on FY18 awards)
- Noble money range FY18 (import from prior year--based on FY17 awards)
- Noble money code FY20 (follow rules--see below)
- Noble money code FY19 (import from last year--based on FY18 awards)
- Noble Money FY20 (1 if two plus signs, 0 otherwise based on Money code FY20)
- Current Noble students (pulled from Salesforce: attending plus matriculating to)

### We're also going to grab the NetPrice0-30 number along with Public/Private and State designation from the sfa1617 and  BaseDir; these will help us infer values where we don't have actual awards (net price is given is given in such a way to let us infer public vs. private)


In [1]:
import pandas as pd
import numpy as np
import os

# Edit these to reflect any changes
work_location = 'inputs'
base_dir = 'base_dir.csv'
fin_aid = 'sfa1617.csv'  # INCREMENT
noble_awards = '../../raw_inputs/financial_aid_analysis_output.xlsx'
fin_aid_output = 'award_info_final.xlsx'
new_range = "Noble money range FY20"  # INCREMENT
new_code = "Noble money code FY20"
old_range = "Noble money range FY19"
old_code = "Noble money code FY19"
new_money = "Noble Money FY20"
old_money = "Noble Money FY19"

In [2]:
os.chdir(work_location)

In [3]:
# First, we'll load the source files and combine them
df = pd.concat([pd.read_csv(base_dir, usecols=["UNITID", "INSTNM", "STABBR"], index_col="UNITID"),
                pd.read_excel(noble_awards, index_col="UNITID"),
               ], axis=1)
# Now, we don't want any net price colleges if they weren't already in base_dir or our award letter info
# so we'll do a left join. Note that either the Public or Private column is populated, never both
a_df = pd.read_csv(fin_aid, index_col="UNITID",
                   usecols=["UNITID", "NPT412", "NPIS412"]).rename(
                        columns={"NPT412":"NetPricePrivate0-30", "NPIS412":"NetPricePublic0-30"})
a_df = a_df[~pd.isnull(a_df["NetPricePublic0-30"]) | ~pd.isnull(a_df["NetPricePrivate0-30"])]
df=pd.merge(df, a_df, how="left", left_index=True, right_index=True)
df.head()

Unnamed: 0_level_0,INSTNM,STABBR,Noble awards collected FY19,Noble mean unmet need FY19,Noble median unmet need FY19,Noble min unmet need FY19,Noble max unmet need FY19,Noble money code FY19,Noble money range FY19,Noble money code FY18,Noble money range FY18,Current Noble Students,Name,NetPricePublic0-30,NetPricePrivate0-30
UNITID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
100654,Alabama A & M University,AL,70.0,17880.0,18487.0,5623.0,27019.0,++/-,"$12k-$15k, some <$6k",++/-,"$12k-$15k, some <$6k",41.0,Alabama A & M University,14719.0,
100663,University of Alabama at Birmingham,AL,1.0,27019.0,27019.0,27019.0,27019.0,,,,,1.0,University of Alabama at Birmingham,15280.0,
100690,Amridge University,AL,,,,,,--,$10k-$12k,++/-,$8k-$10k,,,,9165.0
100706,University of Alabama in Huntsville,AL,,,,,,,,,,,,16047.0,
100724,Alabama State University,AL,,,,,,---,>$15k,++/-,"$10k-$12k, some <$6k",6.0,Alabama State University,12290.0,


### Use the the numbers we have to set the "money range FY20" using the following rules:
- If there are at least 3 award letters, use the mean to set the ranges into these discrete categories:
    - <\\$0   
    - <\\$5k   
    - \\$5k-\\$8k   

### Then, repeat those three categories for any Illinois schools with NetPrice0-30 in those ranges or any national private schools with the same ranges. These are all automatic "money" schools, so we don't have to worry about the outliers as much

In [4]:
# The code below will take a first pass at assigning values
passed_fields = [
    "Noble awards collected FY19",  # INCREMENT
    "Noble mean unmet need FY19",  # INCREMENT
    "Noble max unmet need FY19",  # INCREMENT
    "Noble min unmet need FY19",  # INCREMENT
    "STABBR",
    "NetPricePublic0-30",
    "NetPricePrivate0-30",
]
def range_checker(mean_unmet, min_unmet):
    """returns a number range, min could be None to handle net price data"""
    if mean_unmet < 0:
        return "<$0"
    elif mean_unmet <= 5000:
        return "<$5k"
    elif mean_unmet <= 8000:
        return "$5k-$8k"
    elif mean_unmet <= 10000:
        if min_unmet is not None and min_unmet <= 5000:
            return "$8k-$10k, some <$5k"
        else:
            return "$8k-$10k"
    elif mean_unmet <= 12000:
        if min_unmet is not None and min_unmet <= 6000:
            return "$10k-$12k, some <$6k"
        elif min_unmet is not None and min_unmet <= 8000:
            return "$10k-$12k, some <$8k"
        else:
            return "$10k-$12k"
    elif mean_unmet <= 15000:
        if min_unmet is not None and min_unmet <= 6000:
            return "$12k-$15k, some <$6k"
        else:
            return "$12k-$15k"
    elif mean_unmet > 15000:
        if min_unmet is not None and min_unmet <= 6000:
            return ">$15k, some <$6k"
        elif min_unmet is not None and min_unmet <= 12000:
            return ">$15k, some <$12k"
        else:
            return "$>$15k"
    else:
        return "TBD"

def _create_award_ranges(x):
    num_awards, mean_unmet, max_unmet, min_unmet, state, public_np, private_np = x
    if num_awards >= 3:
        return range_checker(mean_unmet, min_unmet)
    else:
        if (not np.isnan(public_np)) and (state == "IL"):
            return range_checker(public_np, None)
        elif not np.isnan(private_np):
            return range_checker(private_np, None)
        else:
            if num_awards > 0:
                if mean_unmet > 15000:
                    return ">$15k"
                else:
                    return "CHECK BY HAND"
            return "?"
df[new_range] = df[passed_fields].apply(_create_award_ranges, axis=1)

In [5]:
# Now that we have ranges, assign money codes and money yes/no
def money_code(money_range):
    """Returns the money code for a given range"""
    translation = {
        '$10k-$12k': '--',
        '$10k-$12k, some <$6k': '++/-',
        '$10k-$12k, some <$8k': '++/-',
        '$12k-$15k': '--',
        '$12k-$15k, some <$6k': '++/-',
        '$5k-$8k': '++',
        '$8k-$10k': '++/-',
        '$8k-$10k, some <$5k': '++/-',
        '?': '?',
        '<$0': '+++',
        '<$5k': '+++',
        '>$15k': '---',
        '>$15k, some <$12k': '+/---',
        '>$15k, some <$6k': '+/--',
        '>$15k, some <$6k': '++/-',
    }
    if money_range in translation:
        return translation[money_range]
    else:
        return '?'
def is_money(money_code):
    """Yields a zero or one for a binary impression of "Money" """
    if money_code in ['+++', '++', '++/-']:
        return 1
    else:
        return 0

In [6]:
df[new_code] = df[new_range].apply(money_code)
df[new_money] = df[new_code].apply(is_money)
df[old_money] = df[old_code].apply(is_money)

In [7]:
# We'll need to review the results of the above manually, this helps us target rows
def _check_needed(x):
    """Gives a single field flagging a row for review"""
    this_code, last_code, this_money, last_money, this_range = x
    if this_range == "CHECK BY HAND":
        return "CHECK BY HAND (small # of awards colleced)"
    elif this_money > last_money:
        return "CHECK NEW MONEY"
    elif this_money < last_money:
        return "CHECK LOST MONEY"
    elif (this_money == 1) and ("some" in this_range):
        return "CHECK SIZE OF SOME"
    else:
        return "NO CHECK"
df["Check"] = df[[new_code, old_code, new_money, old_money, new_range]].apply(_check_needed, axis=1)

In [8]:
# Add money codes, add money 0/1
df.to_excel('temporary_fin_aid_output.xlsx')

### Now open the temporary Excel file just saved and filter on the "Check" column for each reason:
1. Drop from Money from prior year--check lost money
2. CHECK BY HAND--small # of awards collected
3. "some" entries that are in the money--one or two outliers might be sneaking it into money

Edit the money range of any dubious entries; don't worry about the code or money yes/no--we'll update here

### Once this is done, save the file as "edited_fin_aid_output.xlsx"
---
### We'll then load the file back and save what we need for the final directory

In [9]:
df = pd.read_excel('edited_fin_aid_output.xlsx', index_col="UNITID")

In [10]:
df.head(3)

Unnamed: 0_level_0,INSTNM,STABBR,Noble awards collected FY19,Noble mean unmet need FY19,Noble median unmet need FY19,Noble min unmet need FY19,Noble max unmet need FY19,Noble money code FY19,Noble money range FY19,Noble money code FY18,Noble money range FY18,Current Noble Students,NetPricePublic0-30,NetPricePrivate0-30,Noble money range FY20,Noble money code FY20,Noble Money FY20,Noble Money FY19,Check
UNITID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
100654,Alabama A & M University,AL,70.0,17880.0,18487.0,5623.0,27019.0,++/-,"$12k-$15k, some <$6k",++/-,"$12k-$15k, some <$6k",41.0,14719.0,,">$15k, some <$6k",++/-,1,1,OK
100663,University of Alabama at Birmingham,AL,1.0,27019.0,27019.0,27019.0,27019.0,,,,,1.0,15280.0,,>$15k,---,0,0,OK
100690,Amridge University,AL,,,,,,--,$10k-$12k,++/-,$8k-$10k,,,9165.0,$8k-$10k,++/-,1,0,OK


In [11]:
# Fix the money code after any potential corrections:
df[new_code] = df[new_range].apply(money_code)
df[new_money] = df[new_code].apply(is_money)

In [12]:
df.drop(columns=[
    "STABBR",
    "NetPricePublic0-30",
    "NetPricePrivate0-30",
    "Noble money range FY18",
    "Noble money code FY18",
    "Noble Money FY19",
    "Check",
], inplace=True)
df["Current Noble Students"] = df["Current Noble Students"].apply(lambda x: 0 if pd.isnull(x) else x)

In [13]:
df.head(3)

Unnamed: 0_level_0,INSTNM,Noble awards collected FY19,Noble mean unmet need FY19,Noble median unmet need FY19,Noble min unmet need FY19,Noble max unmet need FY19,Noble money code FY19,Noble money range FY19,Current Noble Students,Noble money range FY20,Noble money code FY20,Noble Money FY20
UNITID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
100654,Alabama A & M University,70.0,17880.0,18487.0,5623.0,27019.0,++/-,"$12k-$15k, some <$6k",41.0,">$15k, some <$6k",++/-,1
100663,University of Alabama at Birmingham,1.0,27019.0,27019.0,27019.0,27019.0,,,1.0,>$15k,---,0
100690,Amridge University,,,,,,--,$10k-$12k,0.0,$8k-$10k,++/-,1


In [14]:
# Now save this off. Eventually, we'll add this to the combine step in #4, but right now, we're adding it to
# the output of step #4 using a vlookup
df.to_excel(fin_aid_output, index_label="UNITID", na_rep="N/A")