# FY20 United Way of Southeast Louisiana (UWSELA) Interim Report Data

In [82]:
import pandas as pd
from pathlib import Path
from datetime import datetime
import numpy as np



in_file = Path.cwd() / "data" / "processed" / "processed_data.pkl"
df = pd.read_pickle(in_file)

in_file2 = Path.cwd() / "data" / "processed" / "processed_data_2.pkl"
df2 = pd.read_pickle(in_file2)

In [83]:
df["Income"] = (
    df["Annual household income"]
    .replace({"\$": "", ",": "", "-": 0}, regex=True)
    .astype(float)
)

In [84]:
def create_income_bucket(value):
    if value < 11872:
        return "Less than or equal to 30% of Median Income"
    elif value < 19392:
        return "Between 30% of 50% of Median Income"
    elif value < 31265:
        return "Between 51% and 80% of Median Income"
    else:
        return "Greater than 80% of Median Income"

In [85]:
df["Income Bucket"] = df.apply(lambda x: create_income_bucket(x["Income"]), axis=1)

In [86]:
df["GPA (Prev Term CGPA)"] = pd.to_numeric(df["GPA (Prev Term CGPA)"], errors="coerce")

df["ACT Superscore (highest official)"] = pd.to_numeric(df["ACT Superscore (highest official)"], errors="coerce")


In [87]:
df["gpa"] = pd.to_numeric(df["gpa"], errors="coerce")

In [88]:
senior_df = df[df["High School Class"] == 2020]

In [89]:
junior_df = df[df["High School Class"] == 2021]

In [90]:
hs_df = df[df["Contact Record Type"] == "Student: High School"]

## Follow up Questions
- Questions that were added on after the ticket was originally submitted.

### *% of freshmen and sophomores required to attend MathBlast who complete the 3-5 weeks


*Original response for Summer 2019*

10 students didn't attend more than 80% of their classes, out of 46 students who were enrolled in Math Blast over the 2019 Summer.

10 / 46 = 21%


*For Summer 2020*

the rates went down considerably, with only 25% of students completing 80% or more of the math blast sessions.


In [91]:
df2[df2.workshop == "Math Blast"]['above_80'].value_counts(normalize=False)

False    69
True     23
Name: above_80, dtype: int64

In [92]:
df2[df2.workshop == "Math Blast"]['above_80'].value_counts(normalize=True)

False    0.75
True     0.25
Name: above_80, dtype: float64

### *% of juniors who successfully complete recommended dual enrollment classes

I confirmed, we don't currently have a method to track this.

### *% of rising seniors who complete "College Prep Institute" to prepare for matriculation

Not sure we have any data on this workshop, or that this workshop occurred at NOLA



### *average # of four-year colleges to which high school seniors apply


In [93]:
np.mean(senior_df["# Four Year College Applications"])

7.44

### *# of community service hours high school seniors complete throughout high school

Can you give me the average # of hours and the cumulative # of hours class of 2020 seniors have completed as a group?

**Note, since this was last generated, we've developed a distinction in community service hours. Now students have a record for Bank Book Eligible community service hours and total community service hours. The number I gave you before was closer to the total community service hours, so I'm providing that again now. Let me know if it should be something else though**

#### Total Hours for Class of 2020

In [94]:
sum(senior_df["Total Community Service Hours Completed"])

9226.05

#### Average Hours for Class of 2020

In [95]:
np.mean(senior_df["Total Community Service Hours Completed"])

184.521

### *% and # of NOLA high school students with at least a 3.0 GPA during the reporting period
--you gave me % of NOLA high school and college students with 3.0+ GPAs

The first table is the count, the second table is the percent

In [96]:
(hs_df["gpa"] >= 3).value_counts(normalize=False)

True     197
False     28
Name: gpa, dtype: int64

In [97]:
(hs_df["gpa"] >= 3).value_counts(normalize=True)

True     0.875556
False    0.124444
Name: gpa, dtype: float64

### *# of class of 2020 NOLA seniors who have received at least one four-year college acceptance
--you gave me the % already

In [98]:
(senior_df["# Four Year College Acceptances"] > 0).value_counts(normalize=False)

True    50
Name: # Four Year College Acceptances, dtype: int64

### *# of class of 2020 NOLA seniors woh have submitted, completed, or are in the "review" phase with FAFSA

In [99]:
senior_df["FA Req: FAFSA/Alternative Financial Aid"].value_counts(normalize=False)

Submitted    46
-             4
Name: FA Req: FAFSA/Alternative Financial Aid, dtype: int64

## Original Questions

### Count of unduplicated participants during the above timeframe

This number is lower than the FY19 number because NOLA is a 9th grade recuitment site. Thus this only encompases 10th - College aged students who were active in Fall 2019-20.


In [100]:
len(df)

398

### Breakdown of above unduplicated participants by age and sex (example: Male - Youth 6-17).

In [101]:
df.pivot_table(
    index=["Age"],
    aggfunc="count",
    columns="Gender",
    values="18 Digit ID",
    fill_value=0,
    margins=True,
)

Gender,Decline to State,Female,Male,Other,All
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,0,1,0,0,1
14,0,4,2,0,6
15,1,33,11,0,45
16,0,45,19,1,65
17,0,39,24,0,63
18,0,33,16,0,49
19,0,31,12,0,43
20,0,24,14,0,38
21,0,41,6,0,47
22,0,11,7,0,18


### Count of unduplicated participants by sex and parish of residence

In [102]:
df.pivot_table(
    index="Parish",
    aggfunc="count",
    columns="Gender",
    values="18 Digit ID",
    fill_value=0,
    margins=True,
)

Gender,Decline to State,Female,Male,Other,All
Parish,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
East Baton Rouge Parish,0,0,1,0,1
Jefferson Parish,0,23,9,0,32
Lafayette Parish,0,0,1,0,1
Orleans Parish,1,245,103,1,350
Plaquemines Parish,0,1,0,0,1
St. Bernard Parish,0,1,1,0,2
St. John the Baptist Parish,0,1,0,0,1
St. Tammany Parish,0,3,0,0,3
All,1,274,115,1,391


### Count of unduplicated participants by sex and race/ethnicity (example: Male - African American)

In [103]:
df.pivot_table(
    index=["Ethnic background"],
    aggfunc="count",
    columns="Gender",
    values="18 Digit ID",
    fill_value=0,
    margins=True,
)

Gender,Decline to State,Female,Male,Other,All
Ethnic background,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
African-American,1,245,99,0,345
Asian-American,0,7,6,1,14
Decline to State,0,2,0,0,2
Latino / Chicano,0,17,7,0,24
Multiracial,0,4,3,0,7
Native American,0,2,0,0,2
Other,0,2,1,0,3
White / Caucasian,0,1,0,0,1
All,1,280,116,1,398


### They also require a breakdown of "Client Type of Household,"

For this, we don't track if a student is head of a household, so by default all options are "Other"

### Count of unduplicated participants by sex and Employment Status



In [104]:
df.pivot_table(
    index=["Employment Status"],
    aggfunc="count",
    columns="Gender",
    values="18 Digit ID",
    fill_value=0,
    margins=True,
)

Gender,Decline to State,Female,Male,Other,All
Employment Status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
-,1,212,98,1,312
Employed full time,0,1,0,0,1
Employed part time,0,5,1,0,6
No information available,0,62,17,0,79
All,1,280,116,1,398


### Count of unduplicated participants based on household income
For median household income I used: $39,576 taken from:

https://www.census.gov/quickfacts/fact/table/neworleanscitylouisiana/INC110218

Using that number would create the following buckets:

* Less than or equal to 30% of median income: $11,872.8

* 30% to 49%: $19,392.24

* 50% to 79%: $31,265.04

* Greater than 80%: anything above $31,265.04



In [105]:
df["Income Bucket"].value_counts()

Between 51% and 80% of Median Income          134
Greater than 80% of Median Income             127
Less than or equal to 30% of Median Income     90
Between 30% of 50% of Median Income            47
Name: Income Bucket, dtype: int64

### % of seniors (class of 2020) who have completed at least 100 hours of community service over their high school careers to date

The second table is students who are above 80 hours, used as a proxy to see if they are "on track." 80 was somewhat taken at random, I can replace it with any number if you think there is a better one

In [106]:
(senior_df["Total Community Service Hours Completed"] >= 100).value_counts(normalize=True)

True     0.9
False    0.1
Name: Total Community Service Hours Completed, dtype: float64

In [107]:
(senior_df["Total Community Service Hours Completed"] >= 80).value_counts(normalize=True)

True     0.9
False    0.1
Name: Total Community Service Hours Completed, dtype: float64

### % of students with at least 3.0 cumulative GPA (according to most recent data available)

Note, this includes college students

In [108]:
(df["gpa"] >= 3).value_counts(normalize=True)

True     0.645729
False    0.354271
Name: gpa, dtype: float64

### four-year college acceptance rate
--do we have the ability to estimate four-year eligibility for class of 2020 seniors?

Not any way I know of how to do it accurately, in theory it should be near or exactly 100%, but I don't think we have a good way to determine that.

The table below shows the percent of current seniors who have already been accepted into a 4 year program

In [109]:
(senior_df["# Four Year College Acceptances"] > 0).value_counts(normalize=True)

True    1.0
Name: # Four Year College Acceptances, dtype: float64

### four-year college matriculation rate
--since fall 2019 is the beginning of the 2019-20 academic year, I think I can use the DDT Fall 2019 for this. Please confirm whether you think this is correct.

Yep, that sounds right

### FAFSA completion rate (for seniors eligible for federal student aid)
--do we have a way to estimate whether class of 2020 seniors are "on track" for this by the end of the school year?

Sort of, the table below shows the percent of seniors with the following FAFSA statuses. So the students with Complete and Reviewed would be "on-track" to complete it, but they haven't submitted yet.

In [110]:
senior_df['Citizenship Status'].value_counts()

US Citizen            42
-                      4
Other                  3
Permanent Resident     1
Name: Citizenship Status, dtype: int64

In [111]:
pd.crosstab(junior_df['Citizenship Status'], junior_df["FA Req: FAFSA/Alternative Financial Aid"], margins=True)

FA Req: FAFSA/Alternative Financial Aid,-,All
Citizenship Status,Unnamed: 1_level_1,Unnamed: 2_level_1
Other,3,3
Permanent Resident,2,2
US Citizen,51,51
All,56,56


In [112]:
pd.crosstab(senior_df['Citizenship Status'], senior_df["FA Req: FAFSA/Alternative Financial Aid"], margins=True, normalize='index')

FA Req: FAFSA/Alternative Financial Aid,-,Submitted
Citizenship Status,Unnamed: 1_level_1,Unnamed: 2_level_1
-,0.25,0.75
Other,1.0,0.0
Permanent Resident,0.0,1.0
US Citizen,0.0,1.0
All,0.08,0.92


In [113]:
senior_df["FA Req: FAFSA/Alternative Financial Aid"].value_counts(normalize=True)

Submitted    0.92
-            0.08
Name: FA Req: FAFSA/Alternative Financial Aid, dtype: float64

### % of high school seniors (class of 2020) that have applied to at least 6 colleges to date

In [114]:
(senior_df["# Four Year College Applications"] >= 6).value_counts(normalize=True)

True     0.64
False    0.36
Name: # Four Year College Applications, dtype: float64

### % of high school seniors (class of 2020) on track to graduate high school

This should be 100%

### % of 2019-20 high school students promoted to the next grade (from the 2018-19 academic year)

This is also 100%

### % of seniors (class of 2020) on track for TOPS Opportunity eligibility

Using a minimum ACT of 20 and GPA of 2.5


In [116]:
(
    (senior_df["gpa"] >= 2.5)
    & (senior_df["ACT Superscore (highest official)"] >= 20)
).value_counts(normalize=True)

True     0.88
False    0.12
dtype: float64

In [117]:
# senior_df["Region Specific Funding Eligibility"].value_counts()

In [118]:
senior_df.columns

Index(['18 Digit ID', 'Full Name', 'High School Class', 'Age',
       'Ethnic background', 'Gender', 'Mailing Zip/Postal Code',
       'Employment Status', 'Employment Description',
       'Annual household income', 'GPA (Prev Term CGPA)',
       'BB Eligible Community Service Hours',
       '# Four Year College Acceptances',
       'FA Req: FAFSA/Alternative Financial Aid',
       '# Four Year College Applications', 'Contact Record Type',
       'Region Specific Funding Eligibility', 'GPA (prev prev term CGPA)',
       'ACT Superscore (highest official)', 'Global Academic Term',
       'Total Community Service Hours Completed', 'Citizenship Status',
       'GPA (Running Cumulative)', 'GPA (HS cumulative)', 'Parish', 'gpa',
       'Income', 'Income Bucket'],
      dtype='object')

### *avg. # of outside scholarship applications high school seniors (class of 2020) have submitted so far

Six students have submitted external scholarships, all of them were won for a total of $5,500. 

### % of high school seniors (class of 2020) who attended at least one college affordability workshop

As I think we discussed on a previous ticket, we don't have a college affordability category, however every senior has attend a college completion - which often includes a college affordability component.


### *% of students receiving at least 5 hours of mentoring, enrichment and/or tutoring instruction per week.

We don't have a quick way to track hours per week and because of COVID what I did in the past wouldn't make sense any more. So I don't think we can report on this. 


In [64]:
%%html
<script src="https://cdn.rawgit.com/parente/4c3e6936d0d7a46fd071/raw/65b816fb9bdd3c28b4ddf3af602bfd6015486383/code_toggle.js"></script>


In [65]:
%%html

<style>
div.prompt {display:none}


h1, .h1 {
    font-size: 33px;
    font-family: "Trebuchet MS";
    font-size: 2.5em !important;
    color: #2a7bbd;
}

h2, .h2 {
    font-size: 10px;
    font-family: "Trebuchet MS";
    color: #2a7bbd; 
    
}


h3, .h3 {
    font-size: 10px;
    font-family: "Trebuchet MS";
    color: #5d6063; 
    
}

.rendered_html table {

    font-size: 14px;
}

.output_png {
  display: flex;
  justify-content: center;
}



</style>