In [106]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import folium

# Explore public HUD REAC data in Philadelphia

Author: Jack Vandeleuv

Data files:
* [Public Housing Physical Inspection Scores (2016-2021)](https://www.huduser.gov/portal/datasets/pis.html#2021_data-collapse)
* [Multifamily Physical Inspection Scores (2016-2021)](https://www.huduser.gov/portal/datasets/pis.html#2021_data-collapse)

This notebook will explore the publicly released scores from the U.S. Department of Housing and Urban Development's Real Estate Assessment Center (HUD REAC). These scores are derived from physical inspections on HUD properties.

# Loading and cleaning

Place the data files in the data directory and load into pandas.

In [62]:
WORKING_DIR = '../data/'
MULTIFAMILY_FILES = [
    'multifamily_physical_inspection_scores_0321.xlsx',
    'multifamily_physical_inspection_scores_0620.xlsx',
    'multifamily-physical-inspection-scores-2016.xlsx',
    'multifamily-physical-inspection-scores-2018.xlsx',
    'multifamily-physical-inspection-scores-2019.xlsx'
]

PUBLIC_HOUSING_FILES = [
    'public_housing_physical_inspection_scores_0321.xlsx',
    'public_housing_physical_inspection_scores_0620.xlsx',
    'public-housing-physical-inspection-scores-2016.xlsx',
    'public-housing-physical-inspection-scores-2018.xlsx',
    'public-housing-physical-inspection-scores-2019.xlsx'
]

In [63]:
multi_dfs = []
public_dfs = []

for i, public_file in enumerate(PUBLIC_HOUSING_FILES):
    public_dfs.append(pd.read_excel(WORKING_DIR + public_file))
    multi_dfs.append(pd.read_excel(WORKING_DIR + MULTIFAMILY_FILES[i]))

multi = pd.concat(multi_dfs, axis=0)
public = pd.concat(public_dfs, axis=0)

In [64]:
multi.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 137649 entries, 0 to 27877
Data columns (total 19 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   INSPECTION_ID     137649 non-null  int64  
 1   PROPERTY_ID       137649 non-null  int64  
 2   PROPERTY_NAME     137649 non-null  object 
 3   ADDRESS           136278 non-null  object 
 4   CITY              137649 non-null  object 
 5   CBSA_NAME         126260 non-null  object 
 6   CBSA_CODE         137585 non-null  float64
 7   COUNTY_NAME       137597 non-null  object 
 8   COUNTY_CODE       137599 non-null  float64
 9   STATE_NAME        82016 non-null   object 
 10  STATE_CODE        137649 non-null  object 
 11  ZIPCODE           55335 non-null   float64
 12  LATITUDE          137599 non-null  float64
 13  LONGITUDE         137599 non-null  float64
 14  LOCATION_QUALITY  137599 non-null  object 
 15  INSPECTION_SCORE  137649 non-null  int64  
 16  INSPECTION_DATE   137

In [65]:
public.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 34125 entries, 0 to 6782
Data columns (total 22 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   INSPECTION_ID     34125 non-null  int64  
 1   DEVELOPMENT_ID    34125 non-null  object 
 2   DEVELOPMENT_NAME  34125 non-null  object 
 3   ADDRESS           13162 non-null  object 
 4   CITY              34125 non-null  object 
 5   CBSA_NAME         29207 non-null  object 
 6   CBSA_CODE         34108 non-null  float64
 7   COUNTY_NAME       34108 non-null  object 
 8   COUNTY_CODE       34108 non-null  float64
 9   STATE_NAME        20419 non-null  object 
 10  STATE_CODE        34117 non-null  object 
 11  ZIPCODE           13162 non-null  float64
 12  LATITUDE          34108 non-null  float64
 13  LONGITUDE         34108 non-null  float64
 14  LOCATION_QUALITY  34116 non-null  object 
 15  PHA_CODE          34125 non-null  object 
 16  PHA_NAME          34125 non-null  object 

It looks like we have multiple zipcode formats between years. We'll also drop FIPS_STATE_CODE, which we don't need for this analysis.

In [66]:
multi = multi.drop(['FIPS_STATE_CODE'], axis=1)
public = public.drop(['FIPS_STATE_CODE'], axis=1)

We'll also convert INSPECTION_DATE to a standard datetime format.

In [67]:
public['INSPECTION_DATE'] = pd.to_datetime(public['INSPECTION_DATE'], infer_datetime_format=True)
multi['INSPECTION_DATE'] = pd.to_datetime(multi['INSPECTION_DATE'], infer_datetime_format=True)

In [68]:
print(public.INSPECTION_DATE.dtype, multi.INSPECTION_DATE.dtype)

datetime64[ns] datetime64[ns]


Let's also get rid of redundant inspections. We'll define these as inspection that share the same INSPECTION_ID.

In [80]:
multi = multi[~multi.INSPECTION_ID.duplicated()]
public = public[~public.INSPECTION_ID.duplicated()]

For this analysis, we're interested in Philadelphia specifically, so we'll limit ourselves to those inspections.

In [84]:
public = public[public.CITY == 'Philadelphia']
multi = multi[multi.CITY == 'Philadelphia']

In [85]:
public.sample(3)

Unnamed: 0,INSPECTION_ID,DEVELOPMENT_ID,DEVELOPMENT_NAME,ADDRESS,CITY,CBSA_NAME,CBSA_CODE,COUNTY_NAME,COUNTY_CODE,STATE_NAME,...,ZIPCODE,LATITUDE,LONGITUDE,LOCATION_QUALITY,PHA_CODE,PHA_NAME,INSPECTION_SCORE,INSPECTION_DATE,ADDRESS.1,ZIP
11,523879,PA002000132,Suffolk Manor,1416 Clearview St,Philadelphia,"Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Me...",37980.0,Philadelphia,101.0,PA,...,19141.0,40.040464,-75.145217,R,PA002,Philadelphia Housing Authority,98,2014-06-03 09:04:17,,
4808,625664,PA002000147,Cambridge Phase III,,Philadelphia,"Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Me...",37980.0,Philadelphia,101.0,,...,,39.9668,-75.1549,B,PA002,Philadelphia Housing Authority,77,2018-08-28 00:00:00,905 N Marvine St,19123.0
3813,547665,PA002000013,WILSON PARK,,Philadelphia,"Philadelphia, PA Metropolitan Division",37964.0,Philadelphia,101.0,PA,...,,39.92545,-75.189028,R,PA002,Philadelphia Housing Authority,91,2015-05-28 00:00:00,2508 Jackson St,19145.0


# Basic exploration

We'll pull the overall summary statistics for this dataset.

In [88]:
print(public.describe())
print(f'% with failing score: {len(public[public.INSPECTION_SCORE < 60]) / len(public): .2f}')

       INSPECTION_ID     CBSA_CODE  COUNTY_CODE       ZIPCODE    LATITUDE  \
count     227.000000    227.000000        227.0     78.000000  227.000000   
mean   593718.145374  37971.400881        101.0  19129.320513   39.979040   
std     43194.941983      7.995164          0.0     12.848259    0.031648   
min    523098.000000  37964.000000        101.0  19104.000000   39.917289   
25%    567871.000000  37964.000000        101.0  19121.000000   39.964239   
50%    589742.000000  37964.000000        101.0  19129.500000   39.973101   
75%    625671.500000  37980.000000        101.0  19140.500000   39.994740   
max    667178.000000  37980.000000        101.0  19151.000000   40.049387   

        LONGITUDE  INSPECTION_SCORE           ZIP  
count  227.000000        227.000000    149.000000  
mean   -75.174367         73.986784  19130.013423  
std      0.036523         18.412789     12.761315  
min    -75.244380         22.000000  19104.000000  
25%    -75.197136         62.500000  19121.000

In [89]:
print(multi.describe())
print(f'% with failing score: {len(multi[multi.INSPECTION_SCORE < 60]) / len(multi): .2f}')

       INSPECTION_ID   PROPERTY_ID     CBSA_CODE  COUNTY_CODE       ZIPCODE  \
count     346.000000  3.460000e+02    346.000000   346.000000    163.000000   
mean   581375.274566  8.000861e+08  38778.231214   100.491329  19343.030675   
std     47026.695875  9.287763e+04   6696.271255     5.201581   2277.255681   
min    500931.000000  8.000121e+08  37964.000000    45.000000  13673.000000   
25%    543138.750000  8.000186e+08  37964.000000   101.000000  19116.000000   
50%    584890.000000  8.000189e+08  37980.000000   101.000000  19131.000000   
75%    621644.500000  8.002148e+08  37980.000000   101.000000  19141.000000   
max    672303.000000  8.002452e+08  99999.000000   101.000000  39350.000000   

         LATITUDE   LONGITUDE  INSPECTION_SCORE           ZIP  
count  346.000000  346.000000        346.000000    183.000000  
mean    39.944159  -75.325541         85.072254  19289.530055  
std      0.869622    1.494309         13.909771   2189.680063  
min     32.754848  -89.111520   

We have 78 public housing inspections and 163 multifamily inspections. For public, 22% on inspections assigned a failing grade, whereas only 6% did for multifamily.

Now let's get inspection count by year.

In [96]:
multi_counts = multi.INSPECTION_DATE.dt \
                                    .year \
                                    .value_counts() \
                                    .sort_index()
print(multi_counts)
print(multi_counts.mean())

2013    27
2014    53
2015    47
2016    47
2017    41
2018    69
2019    46
2020    16
Name: INSPECTION_DATE, dtype: int64
43.25


In [95]:
public_counts = public.INSPECTION_DATE.dt \
                                        .year \
                                        .value_counts() \
                                        .sort_index()
print(public_counts)
print(public_counts.mean())

2014    29
2015    27
2016    25
2017    57
2018    50
2019    30
2020     9
Name: INSPECTION_DATE, dtype: int64
32.42857142857143


On average, there are 43 and 32 inspection for multifamily and public housing respectively every year.

# Quick exploratory mapping

Here is the location of all the inspection on a map. We'll color the points based on inspection score.

In [108]:
def inspection_score_color(score):
    if score >= 80:
        return 'green'
    elif score >= 60:
        return 'orange'
    else:
        return 'red'

In [122]:
philly_map = folium.Map(location=[39.9526, -75.1652], zoom_start=13)

for index, row in public.iterrows():
    folium.Marker(
        location=[row['LATITUDE'], row['LONGITUDE']],
        popup=f"{row['DEVELOPMENT_NAME']} - Score: {row['INSPECTION_SCORE']} - Coord: {row['LATITUDE']}, {row['LONGITUDE']}",
        icon=folium.Icon(color=inspection_score_color(row['INSPECTION_SCORE']), icon='info-sign')
    ).add_to(philly_map)

for index, row in multi.iterrows():
    folium.Marker(
        location=[row['LATITUDE'], row['LONGITUDE']],
        popup=f"{row['PROPERTY_NAME']} - Score: {row['INSPECTION_SCORE']} - Coord: {row['LATITUDE']}, {row['LONGITUDE']}",
        icon=folium.Icon(color=inspection_score_color(row['INSPECTION_SCORE']), icon='info-sign')
    ).add_to(philly_map)

philly_map


The 2022 Philadelphia fire in a HUD assisted property occurred at North 23rd Street and Ogden Street. 

The nearest REAC inspection in our dataset is across South College Avenue. The inspection was called 'SCATTERED SITES', which received a failing score of 46. [39.974478, -75.177185]

The next closest was COLLEGEVIEW HOMES, which received a low score of 69. [39.975961, -75.180237]

In [124]:
pd.set_option('display.max_columns', None)
public[public.DEVELOPMENT_NAME.str.contains('COLLEGEVIEW HOMES')].sort_values(by='INSPECTION_DATE')

Unnamed: 0,INSPECTION_ID,DEVELOPMENT_ID,DEVELOPMENT_NAME,ADDRESS,CITY,CBSA_NAME,CBSA_CODE,COUNTY_NAME,COUNTY_CODE,STATE_NAME,STATE_CODE,ZIPCODE,LATITUDE,LONGITUDE,LOCATION_QUALITY,PHA_CODE,PHA_NAME,INSPECTION_SCORE,INSPECTION_DATE,ADDRESS.1,ZIP
6535,567891,PA002000065,COLLEGEVIEW HOMES,,Philadelphia,"Philadelphia, PA Metropolitan Division",37964.0,Philadelphia,101.0,PA,42,,39.975961,-75.180237,R,PA002,Philadelphia Housing Authority,72,2016-03-29 00:00:00,1251 Marston Ct,19121.0
4340,589708,PA002000065,COLLEGEVIEW HOMES,,Philadelphia,"Philadelphia, PA",37964.0,Philadelphia,101.0,,PA,,39.975961,-75.180237,R,PA002,Philadelphia Housing Authority,74,2017-08-02 00:00:00,1251 Marston Ct,19121.0
4786,625634,PA002000065,COLLEGEVIEW HOMES,,Philadelphia,"Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Me...",37980.0,Philadelphia,101.0,,PA,,39.975961,-75.180237,R,PA002,Philadelphia Housing Authority,69,2018-08-09 00:00:00,1251 Marston Ct,19121.0
4512,651085,PA002000065,COLLEGEVIEW HOMES,1251 Marston Ct,Philadelphia,"Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Me...",37980.0,Philadelphia,101.0,PA,42.0,19121.0,39.975961,-75.180237,R,PA002,Philadelphia Housing Authority,67,2019-08-12 09:17:17,,


In [129]:
public[public.LATITUDE == 39.974478].sort_values(by='INSPECTION_DATE')

Unnamed: 0,INSPECTION_ID,DEVELOPMENT_ID,DEVELOPMENT_NAME,ADDRESS,CITY,CBSA_NAME,CBSA_CODE,COUNTY_NAME,COUNTY_CODE,STATE_NAME,STATE_CODE,ZIPCODE,LATITUDE,LONGITUDE,LOCATION_QUALITY,PHA_CODE,PHA_NAME,INSPECTION_SCORE,INSPECTION_DATE,ADDRESS.1,ZIP
6556,567919,PA002000910,SCATTERED SITES,,Philadelphia,"Philadelphia, PA Metropolitan Division",37964.0,Philadelphia,101.0,PA,42,,39.974478,-75.177185,R,PA002,Philadelphia Housing Authority,58,2016-02-22 00:00:00,2501 N College Ave,19121.0
4366,589741,PA002000910,SCATTERED SITES,,Philadelphia,"Philadelphia, PA",37964.0,Philadelphia,101.0,,PA,,39.974478,-75.177185,R,PA002,Philadelphia Housing Authority,46,2017-07-10 00:00:00,2501 N College Ave,19121.0
3298,631392,PA002000910,SCATTERED SITES,2501 N College Ave,Philadelphia,"Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Me...",37980.0,Philadelphia,101.0,PA,42.0,19121.0,39.974478,-75.177185,R,PA002,Philadelphia Housing Authority,84,2018-11-05 08:52:59,,


This development has actually gotten a better score in 2018 and two bad scores in 2016 and 2017.