# Analyzing enrollment and projects

So, let's first group projects together into "super-types" and then remove certain projects we're not interested in. From there, we can categorize each one of the rows in enrollment (people) based on their project super-type. From there, let's check how often people come back

# First off, Project Preprocessing

In [1]:
import numpy as np
import pandas as pd

In [2]:
project = pd.read_csv("data/raw/Project.csv")

In [3]:
project["Project Type Code"].value_counts(dropna=False)

toss_out = {"Other (HUD)", "Services Only (HUD)", "Street Outreach (HUD)", "RETIRED (HUD)"} # And NaN value
ext_funding = {"Homelessness Prevention (HUD)"}
temp_housing = {"Transitional housing (HUD)"}
nightly_housing = {"Emergency Shelter (HUD)"}
long_stay = {"PH - Permanent Supportive Housing (disability required for entry) (HUD)", "PH - Rapid Re-Housing (HUD)"}

## Toss out unnecessary data

In [4]:
in_toss_out = project.apply(lambda row: row["Project Type Code"] in toss_out, axis=1)
filtered_project = project[~in_toss_out]

In [5]:
filtered_project = filtered_project.loc[filtered_project["Project Type Code"].dropna().index, :]

In [6]:
assert all(type(project_type) is str for project_type in filtered_project["Project Type Code"])

## Examine data

In [7]:
filtered_project["Project Type Code"].value_counts()

Homelessness Prevention (HUD)                                              36
PH - Permanent Supportive Housing (disability required for entry) (HUD)    35
PH - Rapid Re-Housing (HUD)                                                32
Emergency Shelter (HUD)                                                    27
Transitional housing (HUD)                                                 25
Name: Project Type Code, dtype: int64

# Create the "super projects"

In [8]:
assert ext_funding
assert temp_housing
assert long_stay

In [9]:
def assign_super_project(row):
    code = row["Project Type Code"]
    if code in ext_funding: return "External Funding"
    if code in temp_housing: return "Temporary Housing"
    if code in long_stay: return "Long Stay"
    if code in nightly_housing: return "Nightly Housing"
    raise ValueError("Project Type was not accounted for")

In [10]:
super_projects = filtered_project.apply(assign_super_project, axis=1)

In [11]:
filtered_project["Super Project"] = super_projects

In [12]:
filtered_project.head()

Unnamed: 0,Project Name,Project ID,Organization Name,CoC Code,Project Type Code,Method for Tracking ES Utilization,Address City,Address Postal Code,Funder,Grant Start Date,Grant End Date,Super Project
0,MOSBE CHS - Elm House,2142,MOSBE Community Human Services (CHS),CA-506,Transitional housing (HUD),,,93942,,,,Temporary Housing
1,MOSBE CHS - Elm House,2142,MOSBE Community Human Services (CHS),CA-506,Transitional housing (HUD),,,93955,,,,Temporary Housing
2,MOSBE CHS - RHY - BCP ES,3417,MOSBE Community Human Services (CHS),CA-506,Emergency Shelter (HUD),,Monterey,93942,,,,Nightly Housing
3,MOSBE CHS - RHY - BCP ES,3417,MOSBE Community Human Services (CHS),CA-506,Emergency Shelter (HUD),,Seaside,93955,,,,Nightly Housing
4,MOSBE CHS - RHY - BCP - HP,3418,MOSBE Community Human Services (CHS),CA-506,Homelessness Prevention (HUD),,Monterey,93942,,,,External Funding


In [13]:
filtered_project.to_csv("data/preprocessed/projects.csv", sep=",")