LIHTC Data Prep Notebook

<b>Author</b>: Phu Dang

<b>Mentor</b>: Dr. Feiyang Sun

<b>Date</b>: January 16, 2024

**Purpose**: Apply ML to see which factors seem to influence the success/longevity of LIHTC projects, then use the weights to develop 
a metric that measures project success to develop a user-friendly visualization tool/dashboard that shows where these projects are. 

**Context**:

Analysis approach based on the desired criteria of Qualified Allocation Plans (QAPs)

The QAP is a document that states, and a few local agencies, must develop in order to distribute federal Low
Income Housing Tax Credits (LIHTCs), which can be awarded only to a building that fits the QAP’s priorities
and criteria. Each QAP must spell out a housing finance agency’s (HFA’s) priorities and specify the criteria it
will use to select projects competing for tax credits. The priorities must be appropriate to local conditions.
The QAP must also give preference to projects:
- Serving residents with the lowest income;
- Serving income-eligible residents for the longest period of time; and,
- Located in qualified census tracts (QCTs) or difficult development areas (DDAs), as long as the project
contributes to a concerted community revitalization plan. QCTs are census tracts with a poverty rate of
25% or in which 50% of the households have incomes below 60% of the area median income (AMI). DDAs
are areas in which construction, land, and utility costs are high relative to incomes.

The QAP selection criteria must address 10 items: (1) location; (2) housing needs; (3) public housing waiting
lists; (4) individuals with children; (5) special needs populations; (6) whether a project includes the use of
existing housing as part of a community revitalization plan; (7) project sponsor characteristics; (8) projects
intended for eventual tenant ownership; (9) energy efficiency; and (10) historic nature.

(Source: https://nlihc.org/sites/default/files/2014AG-259.pdf)

In [2]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings("ignore")

In [6]:
pd.set_option('display.max_columns', None)

In [79]:
def viewAll(x, status=False):

    pd.set_option('display.max_rows', len(x))
    print(x)
    pd.set_option('display.max_rows', 11)

    return None

In [12]:
# Import LIHTC dataset

df = pd.read_csv("data/LIHTCPUB.csv")

In [13]:
df.head()

Unnamed: 0,hud_id,project,proj_add,proj_cty,proj_st,proj_zip,state_id,latitude,longitude,place1990,place2000,place2010,fips1990,fips2000,fips2010,st2010,cnty2010,scattered_site_cd,resyndication_cd,allocamt,n_units,li_units,n_0br,n_1br,n_2br,n_3br,n_4br,inc_ceil,low_ceil,ceilunit,yr_pis,yr_alloc,non_prof,basis,bond,mff_ra,fmha_514,fmha_515,fmha_538,home,home_amt,tcap,tcap_amt,cdbg,cdbg_amt,htf,htf_amt,fha,hopevi,hpvi_amt,tcep,tcep_amt,rad,qozf,qozf_amt,rentassist,trgt_pop,trgt_fam,trgt_eld,trgt_dis,trgt_hml,trgt_other,trgt_spc,type,credit,n_unitsr,li_unitr,metro,dda,qct,nonprog,nlm_reason,nlm_spc,datanote,record_stat
0,AKA0000X018,"GATEWAY-SEWARD ASSOCIATES, LTD PTN",1810 PHOENIX ROAD,SEWARD,AK,99664,AK-99-99,60.125469,-149.44606,,,68560.0,02XXXXXXXXX,02XXXXXXXXX,2122001300,2,122,,,,20.0,,,,,,,9.0,,,9999,9999.0,,,,,,,,,0.0,,0.0,,0.0,,0.0,,,0.0,,0.0,,,0.0,,,,,,,,,,1.0,20.0,20.0,,,,,,,,X
1,AKA0000X034,YENLO PHASE I AND II,402-451 NORTH YENLO STREET,WASILLA,AK,99654,AK-99-99,61.583096,-149.437637,,,83080.0,02XXXXXXXXX,02XXXXXXXXX,2170000800,2,170,,,,37.0,,,,,,,9.0,,,9999,9999.0,,,,,,,,,0.0,,0.0,,0.0,,0.0,,,0.0,,0.0,,,0.0,,,,,,,,,,2.0,37.0,37.0,,,,,,,,U
2,AKA19890010,PARK WEST APTS,2012 SANDVIK ST,FAIRBANKS,AK,99709,AK-89-00001,64.851646,-147.803421,1080.0,16750.0,16750.0,02090000600,02090000600,2090000600,2,90,2.0,,,83.0,81.0,0.0,41.0,42.0,0.0,0.0,,,,1989,1989.0,2.0,2.0,2.0,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,1.0,83.0,81.0,1.0,,2.0,,,,,X
3,AKA19900005,TYSON'S TERRACE,103 BURKHART DR,SITKA,AK,99835,AK-90-00001,57.048874,-135.303024,3040.0,70540.0,70540.0,02220967500,02220000100,2220000100,2,220,2.0,,,16.0,16.0,0.0,16.0,0.0,0.0,0.0,,,,1990,1990.0,2.0,2.0,2.0,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,16.0,16.0,1.0,,2.0,,,,,X
4,AKA19910005,NORTHWOOD APTS,190 PARKWOOD CIR,SOLDOTNA,AK,99669,AK-91-00001,60.489147,-151.073853,2810.0,65345.0,71640.0,02122953200,02122000500,2122000500,2,122,2.0,,,23.0,22.0,0.0,23.0,0.0,0.0,0.0,,,,1991,1991.0,2.0,2.0,2.0,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,23.0,22.0,1.0,,2.0,,,,,X


In [14]:
df.columns

Index(['hud_id', 'project', 'proj_add', 'proj_cty', 'proj_st', 'proj_zip',
       'state_id', 'latitude', 'longitude', 'place1990', 'place2000',
       'place2010', 'fips1990', 'fips2000', 'fips2010', 'st2010', 'cnty2010',
       'scattered_site_cd', 'resyndication_cd', 'allocamt', 'n_units',
       'li_units', 'n_0br', 'n_1br', 'n_2br', 'n_3br', 'n_4br', 'inc_ceil',
       'low_ceil', 'ceilunit', 'yr_pis', 'yr_alloc', 'non_prof', 'basis',
       'bond', 'mff_ra', 'fmha_514', 'fmha_515', 'fmha_538', 'home',
       'home_amt', 'tcap', 'tcap_amt', 'cdbg', 'cdbg_amt', 'htf', 'htf_amt',
       'fha', 'hopevi', 'hpvi_amt', 'tcep', 'tcep_amt', 'rad', 'qozf',
       'qozf_amt', 'rentassist', 'trgt_pop', 'trgt_fam', 'trgt_eld',
       'trgt_dis', 'trgt_hml', 'trgt_other', 'trgt_spc', 'type', 'credit',
       'n_unitsr', 'li_unitr', 'metro', 'dda', 'qct', 'nonprog', 'nlm_reason',
       'nlm_spc', 'datanote', 'record_stat'],
      dtype='object')

In [16]:
# Remove unnecessary columns

keepCols = ['hud_id', 'proj_cty', 'proj_st', 'proj_zip',
       'state_id', 'latitude', 'longitude', 'place1990', 'place2000',
       'place2010', 'fips1990', 'fips2000', 'fips2010', 'st2010', 'cnty2010',
       'scattered_site_cd', 'resyndication_cd', 'allocamt', 'n_units',
       'li_units', 'n_0br', 'n_1br', 'n_2br', 'n_3br', 'n_4br', 'inc_ceil',
       'low_ceil', 'ceilunit', 'yr_pis', 'yr_alloc', 'non_prof', 'basis',
       'bond', 'mff_ra', 'fmha_514', 'fmha_515', 'fmha_538', 'home',
       'home_amt', 'tcap', 'tcap_amt', 'cdbg', 'cdbg_amt', 'htf', 'htf_amt',
       'fha', 'hopevi', 'hpvi_amt', 'tcep', 'tcep_amt', 'rad', 'qozf',
       'qozf_amt', 'rentassist', 'trgt_pop', 'trgt_fam', 'trgt_eld',
       'trgt_dis', 'trgt_hml', 'trgt_other', 'trgt_spc', 'type', 'credit',
       'n_unitsr', 'li_unitr', 'metro', 'dda', 'qct']

df.drop(columns=[col for col in keepCols if col not in keepCols], inplace=True)

In [17]:
df.head()

Unnamed: 0,hud_id,project,proj_add,proj_cty,proj_st,proj_zip,state_id,latitude,longitude,place1990,place2000,place2010,fips1990,fips2000,fips2010,st2010,cnty2010,scattered_site_cd,resyndication_cd,allocamt,n_units,li_units,n_0br,n_1br,n_2br,n_3br,n_4br,inc_ceil,low_ceil,ceilunit,yr_pis,yr_alloc,non_prof,basis,bond,mff_ra,fmha_514,fmha_515,fmha_538,home,home_amt,tcap,tcap_amt,cdbg,cdbg_amt,htf,htf_amt,fha,hopevi,hpvi_amt,tcep,tcep_amt,rad,qozf,qozf_amt,rentassist,trgt_pop,trgt_fam,trgt_eld,trgt_dis,trgt_hml,trgt_other,trgt_spc,type,credit,n_unitsr,li_unitr,metro,dda,qct,nonprog,nlm_reason,nlm_spc,datanote,record_stat
0,AKA0000X018,"GATEWAY-SEWARD ASSOCIATES, LTD PTN",1810 PHOENIX ROAD,SEWARD,AK,99664,AK-99-99,60.125469,-149.44606,,,68560.0,02XXXXXXXXX,02XXXXXXXXX,2122001300,2,122,,,,20.0,,,,,,,9.0,,,9999,9999.0,,,,,,,,,0.0,,0.0,,0.0,,0.0,,,0.0,,0.0,,,0.0,,,,,,,,,,1.0,20.0,20.0,,,,,,,,X
1,AKA0000X034,YENLO PHASE I AND II,402-451 NORTH YENLO STREET,WASILLA,AK,99654,AK-99-99,61.583096,-149.437637,,,83080.0,02XXXXXXXXX,02XXXXXXXXX,2170000800,2,170,,,,37.0,,,,,,,9.0,,,9999,9999.0,,,,,,,,,0.0,,0.0,,0.0,,0.0,,,0.0,,0.0,,,0.0,,,,,,,,,,2.0,37.0,37.0,,,,,,,,U
2,AKA19890010,PARK WEST APTS,2012 SANDVIK ST,FAIRBANKS,AK,99709,AK-89-00001,64.851646,-147.803421,1080.0,16750.0,16750.0,02090000600,02090000600,2090000600,2,90,2.0,,,83.0,81.0,0.0,41.0,42.0,0.0,0.0,,,,1989,1989.0,2.0,2.0,2.0,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,1.0,83.0,81.0,1.0,,2.0,,,,,X
3,AKA19900005,TYSON'S TERRACE,103 BURKHART DR,SITKA,AK,99835,AK-90-00001,57.048874,-135.303024,3040.0,70540.0,70540.0,02220967500,02220000100,2220000100,2,220,2.0,,,16.0,16.0,0.0,16.0,0.0,0.0,0.0,,,,1990,1990.0,2.0,2.0,2.0,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,16.0,16.0,1.0,,2.0,,,,,X
4,AKA19910005,NORTHWOOD APTS,190 PARKWOOD CIR,SOLDOTNA,AK,99669,AK-91-00001,60.489147,-151.073853,2810.0,65345.0,71640.0,02122953200,02122000500,2122000500,2,122,2.0,,,23.0,22.0,0.0,23.0,0.0,0.0,0.0,,,,1991,1991.0,2.0,2.0,2.0,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,23.0,22.0,1.0,,2.0,,,,,X


In [59]:
# Assess missingness

for k, v in dict(np.round((np.sum(df.isnull(), axis=0).sort_values(ascending=False) / df.shape[0]) * 100, 2)).items():
    print(f"{k}: {v}")

nlm_spc: 99.95
nlm_reason: 99.28
trgt_spc: 94.13
qozf: 92.03
rad: 86.97
htf: 86.04
resyndication_cd: 83.98
nonprog: 82.37
htf_amt: 77.9
qozf_amt: 77.9
datanote: 77.7
tcap: 72.32
mff_ra: 69.04
ceilunit: 68.84
dda: 65.96
fmha_538: 63.49
tcep: 62.87
tcap_amt: 62.14
hopevi: 55.86
fmha_514: 54.34
tcep_amt: 52.89
trgt_hml: 52.64
rentassist: 51.76
low_ceil: 51.5
trgt_other: 50.4
trgt_dis: 49.2
trgt_eld: 48.41
hpvi_amt: 48.41
fha: 48.0
trgt_fam: 44.67
cdbg: 43.76
home: 40.18
place1990: 38.96
cdbg_amt: 38.13
allocamt: 36.6
inc_ceil: 34.76
home_amt: 32.86
trgt_pop: 30.97
bond: 26.47
place2000: 25.43
credit: 23.28
basis: 20.6
n_4br: 20.53
n_0br: 20.27
fmha_515: 20.01
non_prof: 19.86
n_3br: 19.41
n_1br: 18.84
n_2br: 18.76
type: 12.69
scattered_site_cd: 12.0
qct: 7.21
proj_zip: 5.98
li_units: 5.37
place2010: 4.82
longitude: 4.64
latitude: 4.64
metro: 2.2
proj_add: 1.81
state_id: 1.08
n_units: 0.67
n_unitsr: 0.26
li_unitr: 0.26
proj_cty: 0.08
yr_alloc: 0.0
hud_id: 0.0
project: 0.0
yr_pis: 0.0
cnty20

In [61]:
df[df['proj_st'] == 'CA'].shape[0]  # Number of projects in CA

4602

In [80]:
viewAll(df[df['proj_st'] == 'CA']['proj_cty'].value_counts())

LOS ANGELES                  586
SAN FRANCISCO                196
SAN JOSE                     161
SAN DIEGO                    157
SACRAMENTO                   133
OAKLAND                      130
FRESNO                        79
BAKERSFIELD                   59
ANAHEIM                       47
SANTA ROSA                    44
STOCKTON                      43
LONG BEACH                    40
SANTA MONICA                  30
SALINAS                       29
HAYWARD                       29
LANCASTER                     28
SANTA BARBARA                 27
OXNARD                        26
BERKELEY                      26
FREMONT                       26
SANTA ANA                     24
VENTURA                       22
RICHMOND                      22
ESCONDIDO                     22
ELK GROVE                     21
SAN MARCOS                    21
MERCED                        21
RIVERSIDE                     20
CHULA VISTA                   19
IRVINE                        19
MORGAN HIL