## MealsCount Algorithm (v2) Test
  
This notebook details the implementation of an algorithm to groups schools (within a given school district) for maximizing federal funds received through the [**C**ommunity **E**ligiblity **P**rogram](https://www.fns.usda.gov/school-meals/community-eligibility-provision). The groupings generated by the algorithm are near-optimal, optimality being constrained by the need to minimize computational complexity.  
  
### Background  
  
Currently, the Federal government, through the [Food and Nutrition Service](https://www.usda.gov/topics/food-and-nutrition) of the US Dept of Agriculture, offers multiple programs to provide free and/or subsidized meals to school-going children. These programs are targeted at students with low income families. The CEP is one such program.  
  
School districts apply to enrol their schools in CEP once every 4 years (or each year, under certain circumstances). **CEP eligibility criteria** are listed in detail [here](https://www.cde.ca.gov/ls/nu/sn/cepfactsheet.asp). The program allows schools within a school district to enrol individually or in groups (a minimum of 2 schools per group, up to max schools in the district). There is no limit on the number of groups per school district. A school can only be part of one group (?). Further, groups may contain schools of different types (charter, non-charter and so on).  
  
### Problem Statement  
  
While schools enrolled in CEP **must** serve meals to **all** students, the percentage of such meals covered by federal funds is computed based on the Identified Student Percentage (ISP) of each school. Specifically, it is given by the below formula:  
> *__% Meals Covered__* = *__ISP__ X __1.6__*  
  
This implies that in order to be fully (100%) funded the school (or school group) must have an ISP of at least 62.5% (since *62.5 X 1.6 = 100*). For schools (or school groups) with less than 62.5% ISP, the percentage of meals covered by federal funds decreases on a sliding scale until it reaches a minimum of 64% (since a minimum ISP of 40% is required for CEP enrolment, and *40 X 1.6 = 64*). Any meals not funded by CEP will have to be paid for by the student, or by the school itself in case of the student's inability to pay for the same. The latter is more common than not and leaves schools burdened with debt from partially subsidized meals. It is therefore in the school's best interest to meet the 62.5% ISP threshold for full coverage, either by itself, or as part of a school group.  
  
Currently, school groups within a school district are generated manually (through school officials interacting with spreadsheet data). This often results in sub-optimal groupings leading to either many schools not qualifying for CEP entirely, or failing to get adequate funding for meals served.  
  
### The MealsCount Solution    
  
The MealsCount approach to address the sub-optimalities mentioned above is to use an algorithm to generate the school groups. The algorithm is designed with the following optimization criteria:    
  
1. Maximize the percentage of meals funded by CEP, on a per school basis
2. Maximize the number of schools (i.e.: number of students) enrolled in CEP, on a per district basis  
  
In concrete terms, (1) attempts to generate school groups that have an aggregated ISP of 62.5% but not too much lower (or for that matter, too much higher) than that. (2) attemtps to increase the percentage of schools in a CEP eligible group (i.e.: the group's aggregated ISP is 40% or more, ideally no more than 62.5%) such that it is at or near 100% for the school district.

### Algorithm Design   
  
Generating sets of unique groups (i.e.: school groups in our case) from within a large set (i.e.: school district) is, at its core, a combinatrics problem. More specifically, it falls in the realm of [combinatorial optimization](https://en.wikipedia.org/wiki/Combinatorial_optimization). A set with *__n__* elements has *__2<sup>n<sup>__* unique combinations. A typical school district has anywhere from 15-30 schools, resulting in anywhere from 32K to 1B unique groups that would have to be searched for the above optimization criteria. At this size the problem is not trivial but nevertheless manageable. However, it is not uncommon to find school districts with anywhere from a 100 to 1000 schools (e.g.: LA Unified has a 1000+ schools). This leads to an unimaginably large search space rendering a brute-force solution infeasible. Any practical solutions to the problem will only be **near-optimal**.   

In [1]:
import sys
import os
import pandas as pd
import numpy as np
import pprint

# display related
from IPython.display import display, HTML

In [2]:
import backend_utils as bu
import config_parser as cp

In [3]:
CWD = os.getcwd()

DATADIR = "data"
DATAFILE = "calpads_sample_data.xlsx"
#DATAFILE = "calpads_sample_data_large.xlsx"

CONFIG_FILE = "config.json"

##### Algorithm Inputs  
  
The algorithm takes the following inputs:  
  
* School District Input: this contains information needed to compute per-school ISP  
* Configuration: this contains school meal rates, ISP thresholds among other information

In [4]:
data_in = bu.mcXLSchoolDistInput(os.path.join(DATADIR,DATAFILE))
df = data_in.to_frame()
df.head(n=3)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  df_full = pd.concat(df_list,axis=0,ignore_index=True)


Unnamed: 0,school_code,school_name,total_enrolled,frpm,foster,homeless,migrant,direct_cert,frpm_nodup,el,frpm_el_nodup,school_type
0,1000001,School NC01,37,4,27,0,0,6,29,5,30,non-charter
1,1000002,School NC02,1111,503,2,7,0,215,527,122,556,non-charter
2,1000003,School NC03,2332,897,2,14,0,440,979,169,1037,non-charter


In [5]:
cfg = cp.mcModelConfig(CONFIG_FILE)
cfg.show()



MealsCount Model Configuration
------------------------------
Version: 2.0
Model Variant: v2
Default ISP Width (%): 2.0
ISP Width Bundle  (%): [0.01, 2.0, 6.25, 12.5, 22.5]
Min CEP Threshold (%): 0.4
Max CEP Threshold (%): 0.625
CEP Rates Table:
         nslp_lunch_free_rate  nslp_lunch_paid_rate  sbp_bkfst_free_rate  \
default                  3.23                  0.31                 1.75   
AK                       5.24                  0.50                 2.79   
HI                       3.78                  0.36                 2.03   
PR                       3.78                  0.36                 2.03   

         sbp_bkfst_paid_rate  
default                 0.30  
AK                      0.45  
HI                      0.34  
PR                      0.34  


##### Compute ISP  
  
The ISP for each school in the school district is computed from the CALPADs school district input data as below:  
> ISP = (Foster + Homeless + Migrant + Direct Certification) / Total Enrollment

In [6]:
# convert fields to numeric as appropriate
NUMERIC_COLS = ['total_enrolled','frpm','foster','homeless','migrant','direct_cert']

df[NUMERIC_COLS] = df[NUMERIC_COLS].apply(pd.to_numeric)

In [7]:
# remove aggregated records
df = df[df['school_name']!='total']

In [11]:
# sum cols for homeless, migrant and foster students
df = df.assign(non_direct_cert=(df['foster'] + df['homeless'] + df['migrant']))
    
# compute total eligible and isp
total_eligible = (df['foster'] + df['homeless'] + df['migrant'] + df['direct_cert'])
isp = (total_eligible/df['total_enrolled']) * 100
df = df.assign(total_eligible=total_eligible)
df = df.assign(isp=isp)
df.loc[:,'isp'] = df['isp'].astype(np.double)
len(df)

33

###### Invalid Samples
  
Remove samples where total eligible students exceeds the total number of students enrolled in the school.  

In [12]:
df = df.loc[df['total_eligible'] <= df['total_enrolled']]
len(df)

33

In [13]:
df.head(n=3)

Unnamed: 0,school_code,school_name,total_enrolled,frpm,foster,homeless,migrant,direct_cert,frpm_nodup,el,frpm_el_nodup,school_type,non_direct_cert,total_eligible,isp
0,1000001,School NC01,37,4,27,0,0,6,29,5,30,non-charter,27,33,89.189189
1,1000002,School NC02,1111,503,2,7,0,215,527,122,556,non-charter,9,224,20.162016
2,1000003,School NC03,2332,897,2,14,0,440,979,169,1037,non-charter,16,456,19.554031


Sort schools within the district by their ISP in descending order (higher ISP schools appear earlier than lower ISP ones).

In [14]:
KEEP_COLS = ['school_code','total_enrolled','direct_cert','non_direct_cert','total_eligible','isp']

# remove cols not needed for further analysis
drop_cols = [s for s in df.columns.tolist() if s not in set(KEEP_COLS)]
dropped = df.drop(drop_cols,axis=1,inplace=True)

In [15]:
# sort by isp
df.sort_values('isp',ascending=False,inplace=True)
df.reset_index(inplace=True)
df.drop('index',axis=1,inplace=True)

Compute cumulative ISPs for the entire district.

In [17]:
# compute cumulative isp
cum_isp = (df['total_eligible'].cumsum()/df['total_enrolled'].cumsum()).astype(np.double)*100
df = df.assign(cum_isp=cum_isp)
df

Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
0,1000001,37,6,27,33,89.189189,89.189189
1,1000027,24,12,1,13,54.166667,75.409836
2,1000022,366,171,19,190,51.912568,55.269321
3,1000017,792,374,33,407,51.388889,52.748154
4,1000029,507,240,18,258,50.887574,52.201622
5,1000020,131,58,8,66,50.381679,52.073236
6,1000025,643,281,20,301,46.81182,50.72
7,1000028,2649,1166,55,1221,46.092865,48.339483
8,2000002,420,191,2,193,45.952381,48.159454
9,1000011,967,435,7,442,45.708376,47.796818


###### Binning Strategies   
  
Below code fragments are currently unused; retaining it here just in case.

In [18]:
%%capture
'''
NUM_ISP_BINS_MAX = 10
NUM_ISP_BINS_MIN = 5
NUM_ISP_BINS_DEFAULT = NUM_ISP_BINS_MAX
'''
'''
# generating bins of fixed width
groups = df.groupby(pd.cut(df['isp'], NUM_ISP_BINS_DEFAULT))
grp_counts = pd.DataFrame(groups.size()).rename(columns={0:'count'})
ivals = ['{0:.2f}-{0:.2f}'.format(s.left,s.right) for s in grp_counts.index.values]
size = [round(s.right-s.left,2) for s in grp_counts.index.values]
grp_counts = grp_counts.assign(ival=ivals)
grp_counts = grp_counts.assign(size=size)
grp_counts.reset_index(inplace=True)
grp_counts.drop('isp',axis=1,inplace=True)
grp_counts.T
'''

'''
# generating bins with uniform distribution => variable bin width
def ival_str(x):
    s = '{}-{}'.format(x.min(),x.max())
    return s

def ival_size(x):
    return round(x.max()-x.min(),2)

groups = df.groupby(pd.cut(df.index, NUM_ISP_BINS_DEFAULT,precision=0))
grp_counts = pd.DataFrame(groups.size()).reset_index().drop(['index'],axis=1).rename(columns={0:'count'})
ivals = pd.Series(groups['isp'].agg([('isp',ival_str)])['isp'].values)
size = pd.Series(groups['isp'].agg([('isp',ival_size)])['isp'].values)
grp_counts = grp_counts.assign(ival=ivals)
grp_counts = grp_counts.assign(size=size)
grp_counts.T
'''

##### Binning Schools  
  
We first bin schools based on the combined ISP (i.e.:`cum_isp`) required for CEP eligibility at 100% funding level.

In [19]:
#
# Function to compute the aggregate ISP of the school group provided as input
#
def group_isp(x):    
    return (x.total_eligible.sum()/x.total_enrolled.sum())*100

In [20]:
#
# Function to generate aggregates for each group in the groups specified as input
#
def summarize_all(groups):
    group_df = pd.DataFrame(groups.size()).rename(columns={0:'count'})
    group_df = group_df.assign(grp_isp=groups.apply(group_isp).values)
    group_df = group_df.assign(grp_total_enrolled=groups['total_enrolled'].agg(['sum']).values)
    group_df = group_df.assign(grp_total_eligible=groups['total_eligible'].agg(['sum']).values)
    return group_df

In [21]:
#
# Function to generate summary data for the specified single group of schools
#
def summarize_group(group_df,cfg):
    
        # compute total eligible and total enrolled students across all schools in the group
        summary = group_df[['total_enrolled','direct_cert','non_direct_cert','total_eligible']].aggregate(['sum'])        
        # compute the group's ISP        
        summary = summary.assign(grp_isp=(summary['total_eligible']/summary['total_enrolled'])*100)            
        # count the number of schools in the group
        summary = summary.assign(size=group_df.shape[0])
        # compute the % of meals covered at the free and paid rate for the group's ISP
        grp_isp = summary.loc['sum','grp_isp']        
        free_rate = (grp_isp * 1.6) if grp_isp >= (cfg.min_cep_thold_pct()*100) else 0.0
        free_rate = 100. if free_rate > 100. else free_rate
        summary = summary.assign(free_rate=free_rate)
        paid_rate = (100.0 - free_rate)
        summary = summary.assign(paid_rate=paid_rate)
        
        return summary

##### High ISP Schools  
  
First group those schools that have the highest ISP among all such that we arrive at two groups:  
* schools with ISP equal to or above *CEP Max Threshold* (currently 62.5%)  
* all other schools (i.e.: schools with ISP under 62.5)  

In [23]:
bins = [0.,cfg.max_cep_thold_pct()*100,100.]

groups = df.groupby(pd.cut(df['cum_isp'], bins))
ivals = groups.size().index.tolist()

In [24]:
summarize_all(groups)

Unnamed: 0_level_0,count,grp_isp,grp_total_enrolled,grp_total_eligible
cum_isp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"(0.0, 62.5]",31,32.020965,40258,12891
"(62.5, 100.0]",2,75.409836,61,46


In [25]:
# select the group with the high ISPs i.e.: ival (62.5-100]
group_df = groups.get_group(ivals[-1]).apply(list).apply(pd.Series) 
group_df

Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
0,1000001,37,6,27,33,89.189189,89.189189
1,1000027,24,12,1,13,54.166667,75.409836


In [26]:
# generate a summary for this high ISP group
summary_df = summarize_group(group_df,cfg)
summary_df

Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,61,18,28,46,75.409836,2,100.0,0.0


We then compute the impact of the rest of the schools, each one taken individually, if they were to be brought into this high ISP group. Our aim is to continue to maintain the aggregate ISP of this group at 62.5 (or higher) so as to secure 100% funding for all the schools in the group.

In [27]:
#
# Function to select schools to add, from among all schools not already in the destination group (df), 
# to the destination group (whose summary is provided as input) based on the impact each school has on the 
# destination group's ISP. Target ISP specifies the desired ISP at which to maintain the destination group
#
def select_by_isp_impact(df,group_df,target_isp):
    
    schools_to_add = pd.DataFrame()
    
    dst_grp_total_enrolled = group_df.loc[:,'total_enrolled'].sum()
    dst_grp_total_eligible = group_df.loc[:,'total_eligible'].sum()

    new_total_enrolled = df.loc[:,'total_enrolled'] + dst_grp_total_enrolled            
    new_isp = (((df.loc[:,'total_eligible'] + dst_grp_total_eligible)/new_total_enrolled)*100).astype(np.double)    
    
    isp_impact = pd.DataFrame({'new_isp':new_isp})    
    isp_impact.sort_values('new_isp',ascending=False,inplace=True)
    
    # select all schools whose ISP impact is small enough to not bring down the new ISP 
    # to under the target ISP
    idx = isp_impact[isp_impact['new_isp'] >= target_isp].index
    if len(idx) > 0:
        # add them to the existing group temporarily
        tmp_group_df = pd.concat([group_df,df.loc[idx,:]],axis=0)
        # recompute cumulative isp
        cum_isp = (tmp_group_df['total_eligible'].cumsum()/tmp_group_df['total_enrolled'].cumsum()).astype(np.double)*100
        tmp_group_df.loc[:,'cum_isp'] = cum_isp
        # retain only those that make the cut
        bins = [0.,target_isp,100.]
        tmp_groups = tmp_group_df.groupby(pd.cut(tmp_group_df['cum_isp'], bins))
        ivals = tmp_groups.size().index.tolist()
        tmp_df = tmp_groups.get_group(ivals[-1]).apply(list).apply(pd.Series)
        # determine which subset of schools to actully add
        potential_additions = idx
        group_selections = tmp_df.index.tolist()
        actual_additions = []
        for x in potential_additions:
            if x in group_selections:
                actual_additions.append(x)
        #generate schools to add
        if(len(actual_additions)):
            schools_to_add = df.loc[actual_additions,:]
        
    return isp_impact, schools_to_add

In [28]:
# purge schools already in the high ISP group from the rest of the dataframe so that the 
# dataframe represents all the remaining schools
df.drop(group_df.index.tolist(),axis=0,inplace=True)        

In [29]:
# from among remaining schools see if any qualify based on isp impact
isp_impact,schools_to_add = select_by_isp_impact(df,group_df,(cfg.max_cep_thold_pct()*100))
display(isp_impact.T)

Unnamed: 0,5,2,4,3,8,6,9,10,11,7,...,22,30,23,25,26,27,28,29,31,32
new_isp,58.333333,55.269321,53.521127,53.106682,49.68815,49.289773,47.470817,47.058824,46.892039,46.752768,...,38.25441,37.419355,33.023256,23.037543,20.977852,18.506494,17.147311,15.771622,14.720812,13.648124


In [30]:
# if we found more schools to add to the high ISP group ..
if schools_to_add.shape[0] > 0:
    # add them to the existing group
    group_df = pd.concat([group_df, schools_to_add],axis=0)    
    # remove them from the main dataframe
    df.drop(schools_to_add.index.tolist(),axis=0,inplace=True)        
        
# summarize again
summary_df = summarize_group(group_df,cfg)

In [31]:
display(HTML('<b>GRP 0 / HIGH ISP GROUP</b>'))
display(HTML(summary_df.to_html()))
display(HTML(group_df.to_html()))    

Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,61,18,28,46,75.409836,2,100.0,0.0


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
0,1000001,37,6,27,33,89.189189,89.189189
1,1000027,24,12,1,13,54.166667,75.409836


##### Low ISP Groups  
  
With the remaining schools the objective is no longer about getting to 100% funding, rather it has to do with maximizing CEP eligibility while continuing to achieve as high a funding rate as possible (on a per school basis). High ISP schools are still prioritized overall (as before). Except now, multiple school group combinations are generated, some favoring CEP coverage and others favoring a higher funding level for high ISP schools.

In [34]:
# drop schools that are already part of the high ISP group
try:
    df.drop(group_df.index.tolist(),axis=0,inplace=True)
except Exception as e:
    print(df,e)

    school_code  total_enrolled  direct_cert  non_direct_cert  total_eligible  \
2       1000022             366          171               19             190   
3       1000017             792          374               33             407   
4       1000029             507          240               18             258   
5       1000020             131           58                8              66   
6       1000025             643          281               20             301   
7       1000028            2649         1166               55            1221   
8       2000002             420          191                2             193   
9       1000011             967          435                7             442   
10      1000014             789          336               18             354   
11      1000006             856          359               25             384   
12      1000004             854          361               17             378   
13      1000024            2

In [35]:
# recompute cumulative ISPs of the remaining schools
df = df.assign(cum_isp=(df['total_eligible'].cumsum()/df['total_enrolled'].cumsum()).astype(np.double)*100)
df.head(n=3)

Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
2,1000022,366,171,19,190,51.912568,51.912568
3,1000017,792,374,33,407,51.388889,51.554404
4,1000029,507,240,18,258,50.887574,51.351351


In [36]:
df.tail(n=3)

Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
30,1000018,94,9,3,12,12.765957,34.633254
31,1000023,1712,209,6,215,12.558411,33.618073
32,1000009,3031,368,8,376,12.405147,32.020965


**Note**: With the first two schools removed (as part of the 62.5% grouping) the cumulative ISPs of the remaining schools seems to have been boosted significantly. This need not always be the case but nevertheless underlines the need to recalculate cumulative ISPs at each step in the iteration.

Moving forward, we group schools from among these such that the group includes schools up until the point that the group's ISP falls by a significant amount. This amount is configurable and is set to the 5% funding level here. 5% in funding level translates to 3.125% in ISP percentage (since 3.125 *X* 1.6 = 5). This will result in the school groups generated to differ in the percentage of meals covered by at least 5%. We call this (3.125%) the ***ISP WIDTH*** (*Ref*: See note later in the document). 
  
Further, we can generate collections of school groups for a pre-determined set of ISP Widths.   

In [37]:
# width of ISP percentages allowed per school group
ISP_WIDTH = 5/1.6

In [38]:
#
# Function to take in school data and group them based on the ISP_WIDTH
#
def groupby_isp_width(df,cfg,target_isp_width=None):
    
    min_cep_thold = (cfg.min_cep_thold_pct()*100)    
    
    # use default ISP width if not specified as input
    isp_width = cfg.isp_width() if target_isp_width is None else target_isp_width
    
    # recalculate cumulative-isp    
    cum_isp=(df['total_eligible'].cumsum()/df['total_enrolled'].cumsum()).astype(np.double)*100
    df = df.assign(cum_isp=cum_isp)

    top_isp = df.iloc[0]['isp']
    
    # if the top ISP is less than that needed for CEP eligibility 
    # we have nothing more to do
    if top_isp < min_cep_thold:
        return None
    
    # determine the next cut-off point
    isp_thold = (top_isp - isp_width) if (top_isp-isp_width) >= min_cep_thold else min_cep_thold
   
    # group schools at the cut-off point
    # note that this will generate exactly 2 groups: one of length ISP_WIDTH and the other containing 
    # the rest of the schools     
    groups = df.groupby(pd.cut(df['cum_isp'], [0.,isp_thold,top_isp]))    
    
    return groups    

In [39]:
#
# Function that implements a strategy to group schools with ISPs lower than that needed for 
# 100% CEP funding.
#
def group_schools_lo_isp(df,cfg,isp_width=None):
          
    school_groups = []
    school_group_summaries = []    
    
    top_isp = df.iloc[0]['isp']
    
    # exit the loop if the highest ISP from among the remaining schools (which are sorted by ISP)
    # is lower than that needed for CEP eligibility; we have nothing more to do
    
    while top_isp >= (cfg.min_cep_thold_pct()*100):
    
        # get the next isp_width group that still qualifies for CEP
        groups = groupby_isp_width(df,cfg,isp_width)    
    
        if (groups != None):
            
            ivals = pd.DataFrame(groups.size()).index.tolist()
            
            # get the last group: this is the group of isp_width
            group_df = groups.get_group(ivals[-1]) 
            summary_df = summarize_group(group_df,cfg)
            
            # trim the school data to remove this group
            df.drop(group_df.index.tolist(),axis=0,inplace=True)                
            # from among remaining schools see if any qualify based on isp impact
            _,schools_to_add = select_by_isp_impact(df,group_df,(cfg.max_cep_thold_pct()*100))
    
            if schools_to_add.shape[0] > 0:
                group_df = pd.concat([group_df, schools_to_add],axis=0)            
                df.drop(schools_to_add.index.tolist(),axis=0,inplace=True)        
            
            school_groups.append(group_df)
            
            summary_df = summarize_group(group_df,cfg)   
            school_group_summaries.append(summary_df)            
            
            # get the top isp for the remaining schools
            top_isp = df.iloc[0]['isp']            

    # at this point all remaining schools are ineligible for CEP 
    # pass them along as a group of their own    
    cum_isp = (df['total_eligible'].cumsum()/df['total_enrolled'].cumsum()).astype(np.double)*100
    df = df.assign(cum_isp=cum_isp)        
    school_groups.append(df)
    
    summary_df = summarize_group(df,cfg)   
    school_group_summaries.append(summary_df)
    
    return school_groups,school_group_summaries

In [40]:
def show_results(groups,summaries):    
    
    n = len(groups)
    
    for i in range(n):
        display(HTML('<b>GRP {}</b>'.format(i+1)))
        if (i==n-1):
            display(HTML("NOT ELIGIBLE FOR CEP"))
        display(HTML(summaries[i].to_html()))        
        display(HTML(groups[i].to_html()))        
        
    return

In [41]:
g1,s1 = group_schools_lo_isp(df.copy(),cfg,ISP_WIDTH)

In [42]:
display(HTML("<b>BUNDLE: 1 ISP_WIDTH: 3.125%</b>"))
show_results(g1,s1)

Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,2439,1124,98,1222,50.102501,5,80.164002,19.835998


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
2,1000022,366,171,19,190,51.912568,51.912568
3,1000017,792,374,33,407,51.388889,51.554404
4,1000029,507,240,18,258,50.887574,51.351351
5,1000020,131,58,8,66,50.381679,51.280624
6,1000025,643,281,20,301,46.81182,50.102501


Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,14794,6068,328,6396,43.233743,10,69.173989,30.826011


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
7,1000028,2649,1166,55,1221,46.092865,46.092865
8,2000002,420,191,2,193,45.952381,46.07364
9,1000011,967,435,7,442,45.708376,45.986125
10,1000014,789,336,18,354,44.86692,45.803109
11,1000006,856,359,25,384,44.859813,45.660975
12,1000004,854,361,17,378,44.262295,45.478194
13,1000024,2442,956,72,1028,42.096642,44.558316
14,1000007,2377,939,51,990,41.649138,43.949269
15,1000026,1812,685,57,742,40.949227,43.536382
16,1000005,1628,640,24,664,40.786241,43.233743


Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,23025,5069,204,5273,22.901194,16,0.0,100.0


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
17,1000015,1588,599,35,634,39.924433,39.924433
18,2000001,460,182,0,182,39.565217,39.84375
19,2000003,246,85,11,96,39.02439,39.755885
20,1000016,858,314,15,329,38.344988,39.371827
21,1000016,1795,640,25,665,37.047354,38.528401
22,1000012,1016,352,14,366,36.023622,38.101627
23,1000013,2089,645,19,664,31.785543,36.462991
24,1000008,77,16,1,17,22.077922,36.326731
25,1000002,1111,215,9,224,20.162016,34.383117
26,1000003,2332,440,16,456,19.554031,31.394746


With different significance levels one can either adjust for a higher CEP coverage or higher funding for high ISP schools.

In [43]:
ISP_WIDTH = (10/1.6)
g2,s2 = group_schools_lo_isp(df.copy(),cfg,ISP_WIDTH)
display(HTML("<b>BUNDLE: 2 ISP_WIDTH: {}%</b>".format(ISP_WIDTH)))
show_results(g2,s2)

Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,11416,4928,294,5222,45.742817,12,73.188507,26.811493


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
2,1000022,366,171,19,190,51.912568,51.912568
3,1000017,792,374,33,407,51.388889,51.554404
4,1000029,507,240,18,258,50.887574,51.351351
5,1000020,131,58,8,66,50.381679,51.280624
6,1000025,643,281,20,301,46.81182,50.102501
7,1000028,2649,1166,55,1221,46.092865,48.014937
8,2000002,420,191,2,193,45.952381,47.857662
9,1000011,967,435,7,442,45.708376,47.53668
10,1000014,789,336,18,354,44.86692,47.246696
11,1000006,856,359,25,384,44.859813,46.995074


Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,8969,3444,193,3637,40.550786,7,64.881258,35.118742


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
14,1000007,2377,939,51,990,41.649138,41.649138
15,1000026,1812,685,57,742,40.949227,41.346383
16,1000005,1628,640,24,664,40.786241,41.189617
17,1000015,1588,599,35,634,39.924433,40.918298
18,2000001,460,182,0,182,39.565217,40.839161
19,2000003,246,85,11,96,39.02439,40.78412
20,1000016,858,314,15,329,38.344988,40.550786


Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,19873,3889,143,4032,20.288834,12,0.0,100.0


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
21,1000016,1795,640,25,665,37.047354,37.047354
22,1000012,1016,352,14,366,36.023622,36.677339
23,1000013,2089,645,19,664,31.785543,34.591837
24,1000008,77,16,1,17,22.077922,34.398232
25,1000002,1111,215,9,224,20.162016,31.800263
26,1000003,2332,440,16,456,19.554031,28.408551
27,1000021,2403,390,20,410,17.062006,25.88931
28,1000019,2505,377,17,394,15.728543,23.979592
29,1000010,1708,228,5,233,13.641686,22.805267
30,1000018,94,9,3,12,12.765957,22.742895


In [44]:
ISP_WIDTH = (20/1.6)
g3,s3 = group_schools_lo_isp(df.copy(),cfg,ISP_WIDTH)
display(HTML("<b>BUNDLE: 3 ISP_WIDTH: {}%</b>".format(ISP_WIDTH)))
show_results(g3,s3)

Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,26473,10240,555,10795,40.777396,24,65.243833,34.756167


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
2,1000022,366,171,19,190,51.912568,51.912568
3,1000017,792,374,33,407,51.388889,51.554404
4,1000029,507,240,18,258,50.887574,51.351351
5,1000020,131,58,8,66,50.381679,51.280624
6,1000025,643,281,20,301,46.81182,50.102501
7,1000028,2649,1166,55,1221,46.092865,48.014937
8,2000002,420,191,2,193,45.952381,47.857662
9,1000011,967,435,7,442,45.708376,47.53668
10,1000014,789,336,18,354,44.86692,47.246696
11,1000006,856,359,25,384,44.859813,46.995074


Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,13785,2021,75,2096,15.204933,7,0.0,100.0


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
26,1000003,2332,440,16,456,19.554031,19.554031
27,1000021,2403,390,20,410,17.062006,18.289335
28,1000019,2505,377,17,394,15.728543,17.403315
29,1000010,1708,228,5,233,13.641686,16.685293
30,1000018,94,9,3,12,12.765957,16.644548
31,1000023,1712,209,6,215,12.558411,15.994049
32,1000009,3031,368,8,376,12.405147,15.204933


In [45]:
ISP_WIDTH = (1/1.6)
g4,s4 = group_schools_lo_isp(df.copy(),cfg,ISP_WIDTH)
display(HTML("<b>BUNDLE: 4 ISP_WIDTH: {}%</b>".format(ISP_WIDTH)))
show_results(g4,s4)

Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,1665,785,70,855,51.351351,3,82.162162,17.837838


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
2,1000022,366,171,19,190,51.912568,51.912568
3,1000017,792,374,33,407,51.388889,51.554404
4,1000029,507,240,18,258,50.887574,51.351351


Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,131,58,8,66,50.381679,1,80.610687,19.389313


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
5,1000020,131,58,8,66,50.381679,50.381679


Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,3712,1638,77,1715,46.201509,3,73.922414,26.077586


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
6,1000025,643,281,20,301,46.81182,46.81182
7,1000028,2649,1166,55,1221,46.092865,46.233293
8,2000002,420,191,2,193,45.952381,46.201509


Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,2612,1130,50,1180,45.17611,3,72.281776,27.718224


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
9,1000011,967,435,7,442,45.708376,45.708376
10,1000014,789,336,18,354,44.86692,45.330296
11,1000006,856,359,25,384,44.859813,45.17611


Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,854,361,17,378,44.262295,1,70.819672,29.180328


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
12,1000004,854,361,17,378,44.262295,44.262295


Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,6631,2580,180,2760,41.622681,3,66.59629,33.40371


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
13,1000024,2442,956,72,1028,42.096642,42.096642
14,1000007,2377,939,51,990,41.649138,41.875908
15,1000026,1812,685,57,742,40.949227,41.622681


Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,3922,1506,70,1576,40.18358,4,64.293728,35.706272


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
16,1000005,1628,640,24,664,40.786241,40.786241
17,1000015,1588,599,35,634,39.924433,40.360697
18,2000001,460,182,0,182,39.565217,40.261153
19,2000003,246,85,11,96,39.02439,40.18358


Unnamed: 0,total_enrolled,direct_cert,non_direct_cert,total_eligible,grp_isp,size,free_rate,paid_rate
sum,20731,4203,158,4361,21.036129,13,0.0,100.0


Unnamed: 0,school_code,total_enrolled,direct_cert,non_direct_cert,total_eligible,isp,cum_isp
20,1000016,858,314,15,329,38.344988,38.344988
21,1000016,1795,640,25,665,37.047354,37.467018
22,1000012,1016,352,14,366,36.023622,37.067321
23,1000013,2089,645,19,664,31.785543,35.151094
24,1000008,77,16,1,17,22.077922,34.978578
25,1000002,1111,215,9,224,20.162016,32.608696
26,1000003,2332,440,16,456,19.554031,29.327441
27,1000021,2403,390,20,410,17.062006,26.804212
28,1000019,2505,377,17,394,15.728543,24.848442
29,1000010,1708,228,5,233,13.641686,23.644142


##### ISP Width
  
The ISP width, which we've been using to determine the granularity of the generated groups (those with less than 100% funding), results in a trade off between the percentage of schools eligible for CEP and the funding level of high ISP schools. And the two have something of an inverse relationship. In other words the **more schools** we tack onto a single group so as to get them to enrol in CEP, the lower the group ISP and consequently **lower the funding level**. We can adjust this by creating many smaller groups with higher ISP as we did when CEP funding step size was set to 1 and 5 (as opposed to 10 and 20). When the step size was set to 10 and 20 only 12 and 7 schools, respectively, did not qualify for CEP; when set to 1 and 5, 13 and 16 schools were ineligible.  
  
Essentially, we control the group size by controlling CEP funding level for each school group - this is denoted by *CEP Funding Step Size* in the below equation. The larger the step size, larger is the group size and lower is the group ISP and consequently the funding level for the group.
  
> *__ISP Width__* = *CEP Funding Step Size* / *1.6*    

In [73]:
# Generate Naive coverage rates
naive_groups = [df[i:i+1] for i in range(len(df))]
naive_summaries = [summarize_group(ng,cfg) for ng in naive_groups]




In [80]:
baseline = (0,0)
for i,sg in enumerate([naive_summaries,s1,s2,s3,s4]):
    total_free,total_paid = 0,0
    display(HTML("<b>Values for s%i</b>" % i))
    for s in sg:
        #display(s)
        #print("students eating free",s["total_enrolled"] * s["free_rate"]/100.0)
        total_free += s["total_enrolled"] * s["free_rate"]/100.0
        total_paid += s["total_enrolled"] * s["paid_rate"]/100.0
    print(i,int(total_free),int(total_paid))
    if i == 0:
        baseline = (int(total_free),int(total_paid))
    else:
        improvement = ((total_free - baseline[0])/ (total_free + total_paid))
        print("Improvement: %0.2f%%, %i more students covered" % ((improvement*100.0),total_free - baseline[0]) )
 

0 12188 28069


1 12188 28069
Improvement: 0.00%, 0 more students covered


2 14174 26083
Improvement: 4.93%, 1986 more students covered


3 17272 22986
Improvement: 12.63%, 5084 more students covered


4 13648 26610
Improvement: 3.63%, 1460 more students covered
