# FY20 United Way of Southeast Louisiana (UWSELA) Interim Report Data

In [14]:
import pandas as pd
from pathlib import Path
from datetime import datetime
import numpy as np



in_file = Path.cwd() / "data" / "processed" / "processed_data.pkl"
df = pd.read_pickle(in_file)

In [15]:
df["Income"] = (
    df["Annual household income"]
    .replace({"\$": "", ",": "", "-": 0}, regex=True)
    .astype(float)
)

In [16]:
def create_income_bucket(value):
    if value < 11872:
        return "Less than or equal to 30% of Median Income"
    elif value < 19392:
        return "Between 30% of 50% of Median Income"
    elif value < 31265:
        return "Between 51% and 80% of Median Income"
    else:
        return "Greater than 80% of Median Income"

In [17]:
df["Income Bucket"] = df.apply(lambda x: create_income_bucket(x["Income"]), axis=1)

In [18]:
df["GPA (Prev Term CGPA)"] = pd.to_numeric(df["GPA (Prev Term CGPA)"], errors="coerce")

In [19]:
df["gpa"] = pd.to_numeric(df["gpa"], errors="coerce")

In [20]:
senior_df = df[df["High School Class"] == 2020]

In [21]:
hs_df = df[df["Contact Record Type"] == "Student: High School"]

## Follow up Questions
- Questions that were added on after the ticket was originally submitted.

### *% of freshmen and sophomores required to attend MathBlast who complete the 3-5 weeks


10 students didn't attend more than 80% of their classes, out of 46 students who were enrolled in Math Blast over the 2019 Summer.

10 / 46 = 21%



### *% of juniors who successfully complete recommended dual enrollment classes

I confirmed, we don't currently have a method to track this.

### *% of rising seniors who complete "College Prep Institute" to prepare for matriculation

This workshop is completed during the spring for rising seniors. We will have updated data for class of 2021 students at the end of the Spring 2019-20 semster.


### *average # of four-year colleges to which high school seniors apply


In [22]:
np.mean(senior_df["# Four Year College Applications"])

7.16

### *# of community service hours high school seniors complete throughout high school

Can you give me the average # of hours and the cumulative # of hours class of 2020 seniors have completed as a group?

**Note, since this was last generated, we've developed a distinction in community service hours. Now students have a record for Bank Book Eligible community service hours and total community service hours. The number I gave you before was closer to the total community service hours, so I'm providing that again now. Let me know if it should be something else though**

#### Total Hours for Class of 2020

In [23]:
sum(senior_df["Total Community Service Hours Completed"])

7915.049999999999

#### Average Hours for Class of 2020

In [24]:
np.mean(senior_df["Total Community Service Hours Completed"])

158.301

### *% and # of NOLA high school students with at least a 3.0 GPA during the reporting period
--you gave me % of NOLA high school and college students with 3.0+ GPAs

The first table is the count, the second table is the percent

In [25]:
(hs_df["gpa"] >= 3).value_counts(normalize=False)

True     172
False     53
Name: gpa, dtype: int64

In [26]:
(hs_df["gpa"] >= 3).value_counts(normalize=True)

True     0.764444
False    0.235556
Name: gpa, dtype: float64

### *# of class of 2020 NOLA seniors who have received at least one four-year college acceptance
--you gave me the % already

In [27]:
(senior_df["# Four Year College Acceptances"] > 0).value_counts(normalize=False)

True     39
False    11
Name: # Four Year College Acceptances, dtype: int64

### *# of class of 2020 NOLA seniors woh have submitted, completed, or are in the "review" phase with FAFSA

In [28]:
senior_df["FA Req: FAFSA/Alternative Financial Aid"].value_counts(normalize=False)

Submitted    46
-             4
Name: FA Req: FAFSA/Alternative Financial Aid, dtype: int64

## Original Questions

### Count of unduplicated participants during the above timeframe

This number is lower than the FY19 number because NOLA is a 9th grade recuitment site. Thus this only encompases 10th - College aged students who were active in Fall 2019-20.


In [29]:
len(df)

420

### Breakdown of above unduplicated participants by age and sex (example: Male - Youth 6-17).

In [30]:
df.pivot_table(
    index=["Age"],
    aggfunc="count",
    columns="Gender",
    values="18 Digit ID",
    fill_value=0,
    margins=True,
)

Gender,Decline to State,Female,Male,Other,All
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,0,1,0,0,1
14,0,11,3,0,14
15,1,42,15,0,58
16,0,41,23,1,65
17,0,33,23,0,56
18,0,40,13,0,53
19,0,26,16,0,42
20,0,29,9,0,38
21,0,36,9,0,45
22,0,13,9,0,22


### Count of unduplicated participants by sex and parish of residence

In [31]:
df.pivot_table(
    index="Parish",
    aggfunc="count",
    columns="Gender",
    values="18 Digit ID",
    fill_value=0,
    margins=True,
)

Gender,Decline to State,Female,Male,Other,All
Parish,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Jefferson Parish,0,18,12,0,30
Orleans Parish,1,260,114,1,376
Plaquemines Parish,0,1,0,0,1
St. Bernard Parish,0,1,1,0,2
St. John the Baptist Parish,0,1,0,0,1
St. Tammany Parish,0,2,1,0,3
All,1,283,128,1,413


### Count of unduplicated participants by sex and race/ethnicity (example: Male - African American)

In [32]:
df.pivot_table(
    index=["Ethnic background"],
    aggfunc="count",
    columns="Gender",
    values="18 Digit ID",
    fill_value=0,
    margins=True,
)

Gender,Decline to State,Female,Male,Other,All
Ethnic background,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
African-American,1,252,111,0,364
Asian-American,0,7,6,1,14
Decline to State,0,2,0,0,2
Latino / Chicano,0,19,8,0,27
Multiracial,0,4,3,0,7
Native American,0,2,0,0,2
Other,0,2,1,0,3
White / Caucasian,0,1,0,0,1
All,1,289,129,1,420


### They also require a breakdown of "Client Type of Household,"

For this, we don't track if a student is head of a household, so by default all options are "Other"

### Count of unduplicated participants by sex and Employment Status



In [33]:
df.pivot_table(
    index=["Employment Status"],
    aggfunc="count",
    columns="Gender",
    values="18 Digit ID",
    fill_value=0,
    margins=True,
)

Gender,Decline to State,Female,Male,Other,All
Employment Status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
-,1,218,102,1,322
Employed full time,0,2,1,0,3
Employed part time,0,6,3,0,9
No information available,0,63,23,0,86
All,1,289,129,1,420


### Count of unduplicated participants based on household income
For median household income I used: $39,576 taken from:

https://www.census.gov/quickfacts/fact/table/neworleanscitylouisiana/INC110218

Using that number would create the following buckets:

* Less than or equal to 30% of median income: $11,872.8

* 30% to 49%: $19,392.24

* 50% to 79%: $31,265.04

* Greater than 80%: anything above $31,265.04



In [34]:
df["Income Bucket"].value_counts()

Between 51% and 80% of Median Income          140
Greater than 80% of Median Income             134
Less than or equal to 30% of Median Income     98
Between 30% of 50% of Median Income            48
Name: Income Bucket, dtype: int64

### % of seniors (class of 2020) who have completed at least 100 hours of community service over their high school careers to date

The second table is students who are above 80 hours, used as a proxy to see if they are "on track." 80 was somewhat taken at random, I can replace it with any number if you think there is a better one

In [36]:
(senior_df["Total Community Service Hours Completed"] >= 100).value_counts(normalize=True)

True     0.68
False    0.32
Name: Total Community Service Hours Completed, dtype: float64

In [38]:
(senior_df["Total Community Service Hours Completed"] >= 80).value_counts(normalize=True)

True     0.8
False    0.2
Name: Total Community Service Hours Completed, dtype: float64

### % of students with at least 3.0 cumulative GPA (according to most recent data available)

Note, this includes college students

In [40]:
(df["gpa"] >= 3).value_counts(normalize=True)

True     0.554762
False    0.445238
Name: gpa, dtype: float64

### four-year college acceptance rate
--do we have the ability to estimate four-year eligibility for class of 2020 seniors?

Not any way I know of how to do it accurately, in theory it should be near or exactly 100%, but I don't think we have a good way to determine that.

The table below shows the percent of current seniors who have already been accepted into a 4 year program

In [41]:
(senior_df["# Four Year College Acceptances"] > 0).value_counts(normalize=True)

True     0.78
False    0.22
Name: # Four Year College Acceptances, dtype: float64

### four-year college matriculation rate
--since fall 2019 is the beginning of the 2019-20 academic year, I think I can use the DDT Fall 2019 for this. Please confirm whether you think this is correct.

Yep, that sounds right

### FAFSA completion rate (for seniors eligible for federal student aid)
--do we have a way to estimate whether class of 2020 seniors are "on track" for this by the end of the school year?

Sort of, the table below shows the percent of seniors with the following FAFSA statuses. So the students with Complete and Reviewed would be "on-track" to complete it, but they haven't submitted yet.

In [42]:
senior_df["FA Req: FAFSA/Alternative Financial Aid"].value_counts(normalize=True)

Submitted    0.92
-            0.08
Name: FA Req: FAFSA/Alternative Financial Aid, dtype: float64

### % of high school seniors (class of 2020) that have applied to at least 6 colleges to date

In [43]:
(senior_df["# Four Year College Applications"] >= 6).value_counts(normalize=True)

True     0.62
False    0.38
Name: # Four Year College Applications, dtype: float64

### % of high school seniors (class of 2020) on track to graduate high school

This should be 100%

### % of 2019-20 high school students promoted to the next grade (from the 2018-19 academic year)

This is also 100%

### % of seniors (class of 2020) on track for TOPS Opportunity eligibility

Using a minimum ACT of 20 and GPA of 2.5


In [45]:
(
    (senior_df["gpa"] >= 2.5)
    & (senior_df["ACT Superscore (highest official)"] >= 20)
).value_counts(normalize=True)

True     0.84
False    0.16
dtype: float64

In [47]:
# senior_df["Region Specific Funding Eligibility"].value_counts()

### *avg. # of outside scholarship applications high school seniors (class of 2020) have submitted so far

Three students have submitted one application each, none of them have been awarded anything yet. 

### % of high school seniors (class of 2020) who attended at least one college affordability workshop

As I think we discussed on a previous ticket, we don't have a college affordability category, however every senior has attend a college completion - which often includes a college affordability component.


### *% of students receiving at least 5 hours of mentoring, enrichment and/or tutoring instruction per week.

We don't have a quick way to track hours per week, so I took a sum of student's hours over the semester and divided it by 12 program weeks. Only one student is even close to hitting a 5 hours / week mark, with most students only at around 20 total hours. This makes sense, because given our program model only the most extreme case a student would be required to attend 5-6 sessions a week, and often those students have a high absenteeism rate.


In [49]:
%%html
<script src="https://cdn.rawgit.com/parente/4c3e6936d0d7a46fd071/raw/65b816fb9bdd3c28b4ddf3af602bfd6015486383/code_toggle.js"></script>


In [48]:
%%html

<style>
div.prompt {display:none}


h1, .h1 {
    font-size: 33px;
    font-family: "Trebuchet MS";
    font-size: 2.5em !important;
    color: #2a7bbd;
}

h2, .h2 {
    font-size: 10px;
    font-family: "Trebuchet MS";
    color: #2a7bbd; 
    
}


h3, .h3 {
    font-size: 10px;
    font-family: "Trebuchet MS";
    color: #5d6063; 
    
}

.rendered_html table {

    font-size: 14px;
}

.output_png {
  display: flex;
  justify-content: center;
}



</style>