# FY20 United Way of Southeast Louisiana (UWSELA) Interim Report Data

In [11]:
import pandas as pd
from pathlib import Path
from datetime import datetime
import numpy as np



in_file = Path.cwd() / "data" / "processed" / "processed_data.pkl"
df = pd.read_pickle(in_file)

In [12]:
df["Income"] = (
    df["Annual household income"]
    .replace({"\$": "", ",": "", "-": 0}, regex=True)
    .astype(float)
)

In [13]:
def create_income_bucket(value):
    if value < 11872:
        return "Less than or equal to 30% of Median Income"
    elif value < 19392:
        return "Between 30% of 50% of Median Income"
    elif value < 31265:
        return "Between 51% and 80% of Median Income"
    else:
        return "Greater than 80% of Median Income"

In [14]:
df["Income Bucket"] = df.apply(lambda x: create_income_bucket(x["Income"]), axis=1)

In [15]:
df["GPA (Prev Term CGPA)"] = pd.to_numeric(df["GPA (Prev Term CGPA)"], errors="coerce")

In [16]:
senior_df = df[df["High School Class"] == 2020]

In [29]:
hs_df = df[df["Contact Record Type"] == "Student: High School"]

## Follow up Questions
- Questions that were added on after the ticket was originally submitted.

### *% of freshmen and sophomores required to attend MathBlast who complete the 3-5 weeks


10 students didn't attend more than 80% of their classes, out of 46 students who were enrolled in Math Blast over the 2019 Summer.

10 / 46 = 21%



### *% of juniors who successfully complete recommended dual enrollment classes

I confirmed, we don't currently have a method to track this.

### *% of rising seniors who complete "College Prep Institute" to prepare for matriculation

This workshop is completed during the spring for rising seniors. We will have updated data for class of 2021 students at the end of the Spring 2019-20 semster.


### *average # of four-year colleges to which high school seniors apply


In [19]:
np.mean(senior_df["# Four Year College Applications"])

6.314814814814815

### *# of community service hours high school seniors complete throughout high school

Can you give me the average # of hours and the cumulative # of hours class of 2020 seniors have completed as a group?

#### Total Hours for Class of 2020

In [20]:
sum(senior_df["Community Service Hours"])

7804.55

#### Average Hours for Class of 2020

In [21]:
np.mean(senior_df["Community Service Hours"])

144.5287037037037

### *% and # of NOLA high school students with at least a 3.0 GPA during the reporting period
--you gave me % of NOLA high school and college students with 3.0+ GPAs

The first table is the count, the second table is the percent

In [34]:
(hs_df["GPA (Prev Term CGPA)"] >= 3).value_counts(normalize=False)

True     149
False      7
Name: GPA (Prev Term CGPA), dtype: int64

In [32]:
(hs_df["GPA (Prev Term CGPA)"] >= 3).value_counts(normalize=True)

True     0.955128
False    0.044872
Name: GPA (Prev Term CGPA), dtype: float64

### *# of class of 2020 NOLA seniors who have received at least one four-year college acceptance
--you gave me the % already

In [35]:
(senior_df["# Four Year College Acceptances"] > 0).value_counts(normalize=False)

True     32
False    22
Name: # Four Year College Acceptances, dtype: int64

### *# of class of 2020 NOLA seniors woh have submitted, completed, or are in the "review" phase with FAFSA

In [36]:
senior_df["FA Req: FAFSA/Alternative Financial Aid"].value_counts(normalize=False)

Submitted    31
-            16
Complete      6
Reviewed      1
Name: FA Req: FAFSA/Alternative Financial Aid, dtype: int64

## Original Questions

### Count of unduplicated participants during the above timeframe

This number is lower than the FY19 number because NOLA is a 9th grade recuitment site. Thus this only encompases 10th - College aged students who were active in Fall 2019-20.


In [7]:
len(df)

356

### Breakdown of above unduplicated participants by age and sex (example: Male - Youth 6-17).

In [9]:
df.pivot_table(
    index=["Age"],
    aggfunc="count",
    columns="Gender",
    values="18 Digit ID",
    fill_value=0,
    margins=True,
)

Gender,Female,Male,Other,All
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
15,24,10,0,34
16,30,14,1,45
17,33,24,0,57
18,36,14,0,50
19,32,16,0,48
20,26,9,0,35
21,34,8,0,42
22,13,8,0,21
23,8,3,0,11
24,7,3,0,10


### Count of unduplicated participants by sex and parish of residence

In [10]:
df.pivot_table(
    index="Parish",
    aggfunc="count",
    columns="Gender",
    values="18 Digit ID",
    fill_value=0,
    margins=True,
)

Gender,Female,Male,Other,All
Parish,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Jefferson Parish,15,10,0,25
Orleans Parish,216,98,1,315
Plaquemines Parish,1,0,0,1
St. Bernard Parish,2,1,0,3
St. John the Baptist Parish,1,0,0,1
St. Tammany Parish,2,1,0,3
All,237,110,1,348


### Count of unduplicated participants by sex and race/ethnicity (example: Male - African American)

In [11]:
df.pivot_table(
    index=["Ethnic background"],
    aggfunc="count",
    columns="Gender",
    values="18 Digit ID",
    fill_value=0,
    margins=True,
)

Gender,Female,Male,Other,All
Ethnic background,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
African-American,216,98,0,314
Asian-American,6,4,1,11
Decline to State,1,0,0,1
Latino / Chicano,16,5,0,21
Multiracial,3,3,0,6
Native American,1,0,0,1
Other,1,1,0,2
All,244,111,1,356


### They also require a breakdown of "Client Type of Household,"

For this, we don't track if a student is head of a household, so by default all options are "Other"

### Count of unduplicated participants by sex and Employment Status



In [12]:
df.pivot_table(
    index=["Employment Status"],
    aggfunc="count",
    columns="Gender",
    values="18 Digit ID",
    fill_value=0,
    margins=True,
)

Gender,Female,Male,Other,All
Employment Status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
-,169,83,1,253
Employed full time,2,1,0,3
Employed part time,7,4,0,11
No information available,66,23,0,89
All,244,111,1,356


### Count of unduplicated participants based on household income
For median household income I used: $39,576 taken from:

https://www.census.gov/quickfacts/fact/table/neworleanscitylouisiana/INC110218

Using that number would create the following buckets:

* Less than or equal to 30% of median income: $11,872.8

* 30% to 49%: $19,392.24

* 50% to 79%: $31,265.04

* Greater than 80%: anything above $31,265.04



In [15]:
df["Income Bucket"].value_counts()

Greater than 80% of Median Income             120
Between 51% and 80% of Median Income          111
Less than or equal to 30% of Median Income     82
Between 30% of 50% of Median Income            43
Name: Income Bucket, dtype: int64

### % of seniors (class of 2020) who have completed at least 100 hours of community service over their high school careers to date

The second table is students who are above 80 hours, used as a proxy to see if they are "on track." 80 was somewhat taken at random, I can replace it with any number if you think there is a better one

In [18]:
(senior_df["Community Service Hours"] >= 100).value_counts(normalize=True)

True     0.592593
False    0.407407
Name: Community Service Hours, dtype: float64

In [19]:
(senior_df["Community Service Hours"] >= 80).value_counts(normalize=True)

True     0.722222
False    0.277778
Name: Community Service Hours, dtype: float64

### % of students with at least 3.0 cumulative GPA (according to most recent data available)

Note, this includes college students

In [24]:
(df["GPA (Prev Term CGPA)"] >= 3).value_counts(normalize=True)

True     0.640449
False    0.359551
Name: GPA (Prev Term CGPA), dtype: float64

### four-year college acceptance rate
--do we have the ability to estimate four-year eligibility for class of 2020 seniors?

Not any way I know of how to do it accurately, in theory it should be near or exactly 100%, but I don't think we have a good way to determine that.

The table below shows the percent of current seniors who have already been accepted into a 4 year program

In [23]:
(senior_df["# Four Year College Acceptances"] > 0).value_counts(normalize=True)

True     0.592593
False    0.407407
Name: # Four Year College Acceptances, dtype: float64

### four-year college matriculation rate
--since fall 2019 is the beginning of the 2019-20 academic year, I think I can use the DDT Fall 2019 for this. Please confirm whether you think this is correct.

Yep, that sounds right

### FAFSA completion rate (for seniors eligible for federal student aid)
--do we have a way to estimate whether class of 2020 seniors are "on track" for this by the end of the school year?

Sort of, the table below shows the percent of seniors with the following FAFSA statuses. So the students with Complete and Reviewed would be "on-track" to complete it, but they haven't submitted yet.

In [24]:
senior_df["FA Req: FAFSA/Alternative Financial Aid"].value_counts(normalize=True)

Submitted    0.574074
-            0.296296
Complete     0.111111
Reviewed     0.018519
Name: FA Req: FAFSA/Alternative Financial Aid, dtype: float64

### % of high school seniors (class of 2020) that have applied to at least 6 colleges to date

In [25]:
(senior_df["# Four Year College Applications"] >= 6).value_counts(normalize=True)

True     0.518519
False    0.481481
Name: # Four Year College Applications, dtype: float64

### % of high school seniors (class of 2020) on track to graduate high school

This should be 100%

### % of 2019-20 high school students promoted to the next grade (from the 2018-19 academic year)

This is also 100%

### % of seniors (class of 2020) on track for TOPS Opportunity eligibility

Using a minimum ACT of 20 and GPA of 2.5


In [12]:
(
    (senior_df["GPA (Prev Term CGPA)"] >= 2.5)
    & (senior_df["ACT Superscore (highest official)"] >= 20)
).value_counts(normalize=True)

True     0.796296
False    0.203704
dtype: float64

In [13]:
senior_df["Region Specific Funding Eligibility"].value_counts()

-    54
Name: Region Specific Funding Eligibility, dtype: int64

### *avg. # of outside scholarship applications high school seniors (class of 2020) have submitted so far

Sadly, this is 0, for all class of 2020 students

### % of high school seniors (class of 2020) who attended at least one college affordability workshop

As I think we discussed on a previous ticket, we don't have a college affordability category, however every senior has attend a college completion - which often includes a college affordability component.


### *% of students receiving at least 5 hours of mentoring, enrichment and/or tutoring instruction per week.

We don't have a quick way to track hours per week, so I took a sum of student's hours over the semester and divided it by 12 program weeks. Only one student is even close to hitting a 5 hours / week mark, with most students only at around 20 total hours. This makes sense, because given our program model only the most extreme case a student would be required to attend 5-6 sessions a week, and often those students have a high absenteeism rate.
