LIHTC Data Prep Notebook

<b>Author</b>: Phu Dang

<b>Mentor</b>: Dr. Feiyang Sun

<b>Date</b>: January 16, 2024

**Purpose**: Apply ML to see which factors seem to influence the success/longevity of LIHTC projects, then use the weights to develop 
a metric that measures project success to develop a user-friendly visualization tool/dashboard that shows where these projects are. 

**Context**:

Analysis approach based on the desired criteria of Qualified Allocation Plans (QAPs)

The QAP is a document that states, and a few local agencies, must develop in order to distribute federal Low
Income Housing Tax Credits (LIHTCs), which can be awarded only to a building that fits the QAP’s priorities
and criteria. Each QAP must spell out a housing finance agency’s (HFA’s) priorities and specify the criteria it
will use to select projects competing for tax credits. The priorities must be appropriate to local conditions.
The QAP must also give preference to projects:
- Serving residents with the lowest income;
- Serving income-eligible residents for the longest period of time; and,
- Located in qualified census tracts (QCTs) or difficult development areas (DDAs), as long as the project
contributes to a concerted community revitalization plan. QCTs are census tracts with a poverty rate of
25% or in which 50% of the households have incomes below 60% of the area median income (AMI). DDAs
are areas in which construction, land, and utility costs are high relative to incomes.

The QAP selection criteria must address 10 items: (1) location; (2) housing needs; (3) public housing waiting
lists; (4) individuals with children; (5) special needs populations; (6) whether a project includes the use of
existing housing as part of a community revitalization plan; (7) project sponsor characteristics; (8) projects
intended for eventual tenant ownership; (9) energy efficiency; and (10) historic nature.

(Source: https://nlihc.org/sites/default/files/2014AG-259.pdf)

In [268]:
import pandas as pd
import numpy as np
from collections import defaultdict

import warnings
warnings.filterwarnings("ignore")

In [269]:
pd.set_option('display.max_columns', None)

In [270]:
def viewAll(status=False):

    if status:
        pd.set_option('display.max_rows', None)
    else:
        pd.set_option('display.max_rows', 11)

    return None

In [271]:
# Import LIHTC dataset

df = pd.read_csv("data/LIHTCPUB.csv")

In [272]:
df.head()

Unnamed: 0,hud_id,project,proj_add,proj_cty,proj_st,proj_zip,state_id,latitude,longitude,place1990,place2000,place2010,fips1990,fips2000,fips2010,st2010,cnty2010,scattered_site_cd,resyndication_cd,allocamt,n_units,li_units,n_0br,n_1br,n_2br,n_3br,n_4br,inc_ceil,low_ceil,ceilunit,yr_pis,yr_alloc,non_prof,basis,bond,mff_ra,fmha_514,fmha_515,fmha_538,home,home_amt,tcap,tcap_amt,cdbg,cdbg_amt,htf,htf_amt,fha,hopevi,hpvi_amt,tcep,tcep_amt,rad,qozf,qozf_amt,rentassist,trgt_pop,trgt_fam,trgt_eld,trgt_dis,trgt_hml,trgt_other,trgt_spc,type,credit,n_unitsr,li_unitr,metro,dda,qct,nonprog,nlm_reason,nlm_spc,datanote,record_stat
0,AKA0000X018,"GATEWAY-SEWARD ASSOCIATES, LTD PTN",1810 PHOENIX ROAD,SEWARD,AK,99664,AK-99-99,60.125469,-149.44606,,,68560.0,02XXXXXXXXX,02XXXXXXXXX,2122001300,2,122,,,,20.0,,,,,,,9.0,,,9999,9999.0,,,,,,,,,0.0,,0.0,,0.0,,0.0,,,0.0,,0.0,,,0.0,,,,,,,,,,1.0,20.0,20.0,,,,,,,,X
1,AKA0000X034,YENLO PHASE I AND II,402-451 NORTH YENLO STREET,WASILLA,AK,99654,AK-99-99,61.583096,-149.437637,,,83080.0,02XXXXXXXXX,02XXXXXXXXX,2170000800,2,170,,,,37.0,,,,,,,9.0,,,9999,9999.0,,,,,,,,,0.0,,0.0,,0.0,,0.0,,,0.0,,0.0,,,0.0,,,,,,,,,,2.0,37.0,37.0,,,,,,,,U
2,AKA19890010,PARK WEST APTS,2012 SANDVIK ST,FAIRBANKS,AK,99709,AK-89-00001,64.851646,-147.803421,1080.0,16750.0,16750.0,02090000600,02090000600,2090000600,2,90,2.0,,,83.0,81.0,0.0,41.0,42.0,0.0,0.0,,,,1989,1989.0,2.0,2.0,2.0,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,1.0,83.0,81.0,1.0,,2.0,,,,,X
3,AKA19900005,TYSON'S TERRACE,103 BURKHART DR,SITKA,AK,99835,AK-90-00001,57.048874,-135.303024,3040.0,70540.0,70540.0,02220967500,02220000100,2220000100,2,220,2.0,,,16.0,16.0,0.0,16.0,0.0,0.0,0.0,,,,1990,1990.0,2.0,2.0,2.0,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,16.0,16.0,1.0,,2.0,,,,,X
4,AKA19910005,NORTHWOOD APTS,190 PARKWOOD CIR,SOLDOTNA,AK,99669,AK-91-00001,60.489147,-151.073853,2810.0,65345.0,71640.0,02122953200,02122000500,2122000500,2,122,2.0,,,23.0,22.0,0.0,23.0,0.0,0.0,0.0,,,,1991,1991.0,2.0,2.0,2.0,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,23.0,22.0,1.0,,2.0,,,,,X


In [273]:
df.columns

Index(['hud_id', 'project', 'proj_add', 'proj_cty', 'proj_st', 'proj_zip',
       'state_id', 'latitude', 'longitude', 'place1990', 'place2000',
       'place2010', 'fips1990', 'fips2000', 'fips2010', 'st2010', 'cnty2010',
       'scattered_site_cd', 'resyndication_cd', 'allocamt', 'n_units',
       'li_units', 'n_0br', 'n_1br', 'n_2br', 'n_3br', 'n_4br', 'inc_ceil',
       'low_ceil', 'ceilunit', 'yr_pis', 'yr_alloc', 'non_prof', 'basis',
       'bond', 'mff_ra', 'fmha_514', 'fmha_515', 'fmha_538', 'home',
       'home_amt', 'tcap', 'tcap_amt', 'cdbg', 'cdbg_amt', 'htf', 'htf_amt',
       'fha', 'hopevi', 'hpvi_amt', 'tcep', 'tcep_amt', 'rad', 'qozf',
       'qozf_amt', 'rentassist', 'trgt_pop', 'trgt_fam', 'trgt_eld',
       'trgt_dis', 'trgt_hml', 'trgt_other', 'trgt_spc', 'type', 'credit',
       'n_unitsr', 'li_unitr', 'metro', 'dda', 'qct', 'nonprog', 'nlm_reason',
       'nlm_spc', 'datanote', 'record_stat'],
      dtype='object')

In [274]:
# Remove unnecessary columns

keepCols = ['hud_id', 'proj_cty', 'proj_st', 'proj_zip',
       'state_id', 'latitude', 'longitude', 'place1990', 'place2000',
       'place2010', 'fips1990', 'fips2000', 'fips2010', 'st2010', 'cnty2010',
       'scattered_site_cd', 'resyndication_cd', 'allocamt', 'n_units',
       'li_units', 'n_0br', 'n_1br', 'n_2br', 'n_3br', 'n_4br', 'inc_ceil',
       'low_ceil', 'ceilunit', 'yr_pis', 'yr_alloc', 'non_prof', 'basis',
       'bond', 'mff_ra', 'fmha_514', 'fmha_515', 'fmha_538', 'home',
       'home_amt', 'tcap', 'tcap_amt', 'cdbg', 'cdbg_amt', 'htf', 'htf_amt',
       'fha', 'hopevi', 'hpvi_amt', 'tcep', 'tcep_amt', 'rad', 'qozf',
       'qozf_amt', 'rentassist', 'trgt_pop', 'trgt_fam', 'trgt_eld',
       'trgt_dis', 'trgt_hml', 'trgt_other', 'trgt_spc', 'type', 'credit',
       'n_unitsr', 'li_unitr', 'metro', 'dda', 'qct']

df.drop(columns=[col for col in keepCols if col not in keepCols], inplace=True)

In [275]:
df.head()

Unnamed: 0,hud_id,project,proj_add,proj_cty,proj_st,proj_zip,state_id,latitude,longitude,place1990,place2000,place2010,fips1990,fips2000,fips2010,st2010,cnty2010,scattered_site_cd,resyndication_cd,allocamt,n_units,li_units,n_0br,n_1br,n_2br,n_3br,n_4br,inc_ceil,low_ceil,ceilunit,yr_pis,yr_alloc,non_prof,basis,bond,mff_ra,fmha_514,fmha_515,fmha_538,home,home_amt,tcap,tcap_amt,cdbg,cdbg_amt,htf,htf_amt,fha,hopevi,hpvi_amt,tcep,tcep_amt,rad,qozf,qozf_amt,rentassist,trgt_pop,trgt_fam,trgt_eld,trgt_dis,trgt_hml,trgt_other,trgt_spc,type,credit,n_unitsr,li_unitr,metro,dda,qct,nonprog,nlm_reason,nlm_spc,datanote,record_stat
0,AKA0000X018,"GATEWAY-SEWARD ASSOCIATES, LTD PTN",1810 PHOENIX ROAD,SEWARD,AK,99664,AK-99-99,60.125469,-149.44606,,,68560.0,02XXXXXXXXX,02XXXXXXXXX,2122001300,2,122,,,,20.0,,,,,,,9.0,,,9999,9999.0,,,,,,,,,0.0,,0.0,,0.0,,0.0,,,0.0,,0.0,,,0.0,,,,,,,,,,1.0,20.0,20.0,,,,,,,,X
1,AKA0000X034,YENLO PHASE I AND II,402-451 NORTH YENLO STREET,WASILLA,AK,99654,AK-99-99,61.583096,-149.437637,,,83080.0,02XXXXXXXXX,02XXXXXXXXX,2170000800,2,170,,,,37.0,,,,,,,9.0,,,9999,9999.0,,,,,,,,,0.0,,0.0,,0.0,,0.0,,,0.0,,0.0,,,0.0,,,,,,,,,,2.0,37.0,37.0,,,,,,,,U
2,AKA19890010,PARK WEST APTS,2012 SANDVIK ST,FAIRBANKS,AK,99709,AK-89-00001,64.851646,-147.803421,1080.0,16750.0,16750.0,02090000600,02090000600,2090000600,2,90,2.0,,,83.0,81.0,0.0,41.0,42.0,0.0,0.0,,,,1989,1989.0,2.0,2.0,2.0,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,1.0,83.0,81.0,1.0,,2.0,,,,,X
3,AKA19900005,TYSON'S TERRACE,103 BURKHART DR,SITKA,AK,99835,AK-90-00001,57.048874,-135.303024,3040.0,70540.0,70540.0,02220967500,02220000100,2220000100,2,220,2.0,,,16.0,16.0,0.0,16.0,0.0,0.0,0.0,,,,1990,1990.0,2.0,2.0,2.0,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,16.0,16.0,1.0,,2.0,,,,,X
4,AKA19910005,NORTHWOOD APTS,190 PARKWOOD CIR,SOLDOTNA,AK,99669,AK-91-00001,60.489147,-151.073853,2810.0,65345.0,71640.0,02122953200,02122000500,2122000500,2,122,2.0,,,23.0,22.0,0.0,23.0,0.0,0.0,0.0,,,,1991,1991.0,2.0,2.0,2.0,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,23.0,22.0,1.0,,2.0,,,,,X


In [276]:
# Assess missingness and get attributes with >= 70% missing

discardAttr = []
for k, v in dict(np.round((np.sum(df.isnull(), axis=0).sort_values(ascending=False) / df.shape[0]) * 100, 2)).items():
    print(f"{k}: {v}")
    if v >= 70: 
        discardAttr.append(k)

nlm_spc: 99.95
nlm_reason: 99.28
trgt_spc: 94.13
qozf: 92.03
rad: 86.97
htf: 86.04
resyndication_cd: 83.98
nonprog: 82.37
htf_amt: 77.9
qozf_amt: 77.9
datanote: 77.7
tcap: 72.32
mff_ra: 69.04
ceilunit: 68.84
dda: 65.96
fmha_538: 63.49
tcep: 62.87
tcap_amt: 62.14
hopevi: 55.86
fmha_514: 54.34
tcep_amt: 52.89
trgt_hml: 52.64
rentassist: 51.76
low_ceil: 51.5
trgt_other: 50.4
trgt_dis: 49.2
trgt_eld: 48.41
hpvi_amt: 48.41
fha: 48.0
trgt_fam: 44.67
cdbg: 43.76
home: 40.18
place1990: 38.96
cdbg_amt: 38.13
allocamt: 36.6
inc_ceil: 34.76
home_amt: 32.86
trgt_pop: 30.97
bond: 26.47
place2000: 25.43
credit: 23.28
basis: 20.6
n_4br: 20.53
n_0br: 20.27
fmha_515: 20.01
non_prof: 19.86
n_3br: 19.41
n_1br: 18.84
n_2br: 18.76
type: 12.69
scattered_site_cd: 12.0
qct: 7.21
proj_zip: 5.98
li_units: 5.37
place2010: 4.82
longitude: 4.64
latitude: 4.64
metro: 2.2
proj_add: 1.81
state_id: 1.08
n_units: 0.67
n_unitsr: 0.26
li_unitr: 0.26
proj_cty: 0.08
yr_alloc: 0.0
hud_id: 0.0
project: 0.0
yr_pis: 0.0
cnty20

In [277]:
# Remove attrs with over 70% missing values

print(discardAttr)
if discardAttr[0] in df.columns:
    df.drop(columns=discardAttr, inplace=True)

['nlm_spc', 'nlm_reason', 'trgt_spc', 'qozf', 'rad', 'htf', 'resyndication_cd', 'nonprog', 'htf_amt', 'qozf_amt', 'datanote', 'tcap']


In [278]:
df[df['proj_st'] == 'CA'].shape[0]  # Number of projects in CA

4602

In [279]:
df[df['proj_st'] == 'CA']['proj_cty'].value_counts()

LOS ANGELES                  586
SAN FRANCISCO                196
SAN JOSE                     161
SAN DIEGO                    157
SACRAMENTO                   133
OAKLAND                      130
FRESNO                        79
BAKERSFIELD                   59
ANAHEIM                       47
SANTA ROSA                    44
STOCKTON                      43
LONG BEACH                    40
SANTA MONICA                  30
SALINAS                       29
HAYWARD                       29
LANCASTER                     28
SANTA BARBARA                 27
OXNARD                        26
BERKELEY                      26
FREMONT                       26
SANTA ANA                     24
VENTURA                       22
RICHMOND                      22
ESCONDIDO                     22
ELK GROVE                     21
SAN MARCOS                    21
MERCED                        21
RIVERSIDE                     20
CHULA VISTA                   19
IRVINE                        19
MORGAN HIL

#### Calculate attribute proportions at the CA and San Diego Metro levels

In [280]:
attrs = ['allocamt', 'li_units', 'n_0br', 'n_1br', 'n_2br', 'n_3br', 'n_4br', \
    'inc_ceil', 'home_amt', 'trgt', 'type', 'credit', 'dda', 'yr_alloc']

In [281]:
# Turn 9s under inc_ceil to NaNs
df['inc_ceil'] = df['inc_ceil'].replace(9, np.nan)

# Replace all zeros in trgt columns with NaNs
for attr in df.columns:
    if 'trgt' in attr:
        df[attr] = df[attr].replace(0, np.nan)

df['yr_alloc'] = [str(x)[:4] if not np.isnan(x) else x for x in df['yr_alloc']]
df['yr_alloc'] = df['yr_alloc'].replace(['8888', '9999'], np.nan)

df = df.fillna('missing')

In [282]:
# Get CA and SD sub-datasets

ca = df[df['proj_st'] == 'CA']

sdCities = ['del mar', 'chula vista', 'coronado', 'carlsbad', 'el cajon', 'encinitas', \
    'escondido', 'imperial beach', 'la mesa', 'lemon grove', 'national city', \
    'oceanside', 'poway', 'san diego', 'san marcos', 'santee', 'solana beach', 'vista']

sd = df[df['proj_cty'].str.lower().isin(sdCities)]

In [283]:
# Get proportions for categorical/ordinal variables using double indexes

# Get lists of indexes to be paired up
outerIndexes = []
innerIndexes = []
for attr in attrs:

    if attr not in ['inc_ceil', 'trgt', 'type', 'credit', 'dda', 'yr_alloc']:
        continue

    elif 'trgt' in attr:
        cols = df.loc[:,df.columns.str.contains('trgt')].columns
        for col in cols:
            count = 0
            vals = df[col].unique().tolist()
            vals.remove('missing')
            for val in sorted(vals):
                if np.isnan(val): continue
                innerIndexes.append(val)
                count += 1
            if 'missing' in df[col].unique():
                innerIndexes.append('missing')
                count += 1
            outerIndexes += [col]*count
        continue

    count = 0
    vals = df[attr].unique().tolist()
    vals.remove('missing')
    for val in sorted(vals):
        try:
            if np.isnan(val): continue
        except:
            continue
        innerIndexes.append(val)
        count += 1

    if 'missing' in df[attr].unique():
        innerIndexes.append('missing')
        count += 1

    if attr == 'yr_alloc':
        yearVals = sorted(df['yr_alloc'].unique().tolist())
        for yr in yearVals:
            if yr == 'missing': continue
            else: innerIndexes.append(yr)
        count += len(yearVals)-1

    outerIndexes += [attr]*count


In [284]:
# sanity check

len(outerIndexes) == len(innerIndexes)

True

In [285]:
outerIndexes[:19]

['inc_ceil',
 'inc_ceil',
 'inc_ceil',
 'inc_ceil',
 'trgt_pop',
 'trgt_pop',
 'trgt_pop',
 'trgt_fam',
 'trgt_fam',
 'trgt_fam',
 'trgt_eld',
 'trgt_eld',
 'trgt_eld',
 'trgt_dis',
 'trgt_dis',
 'trgt_dis',
 'trgt_hml',
 'trgt_hml',
 'trgt_hml']

In [286]:
# sanity check

pd.Series(outerIndexes).value_counts()

yr_alloc      37
dda            6
type           5
credit         5
inc_ceil       4
trgt_pop       3
trgt_fam       3
trgt_eld       3
trgt_dis       3
trgt_hml       3
trgt_other     3
dtype: int64

In [302]:
# Get counts at CA and SD levels

sdCounts, caCounts, usCounts = [], [], []
for idx in range(len(outerIndexes)):
    pair = (outerIndexes[idx], innerIndexes[idx])

    try:
        sdCount = sd[pair[0]].value_counts()[pair[1]]
        sdCounts.append(int(sdCount))
    except: sdCounts.append(np.NaN)

    try:
        caCount = ca[pair[0]].value_counts()[pair[1]]
        caCounts.append(int(caCount))
    except: caCounts.append(np.NaN)

    try:
        usCount = df[pair[0]].value_counts()[pair[1]]
        usCounts.append(int(usCount))
    except: usCounts.append(np.NaN)

In [308]:
# Put everything into a table

arrays = [np.array(outerIndexes), np.array(innerIndexes)]
catTable = pd.DataFrame(index=arrays, data={'sd_cnts': sdCounts, \
                                            'ca_cnts': caCounts, 
                                            'us_cnts': usCounts})

catTable['sd_%_of_ca'] = np.round((catTable['sd_cnts'] / catTable['ca_cnts'])*100,2)
catTable['ca_%_of_us'] = np.round((catTable['ca_cnts'] / catTable['us_cnts'])*100,2)

In [305]:
pd.set_option('display.max_rows', None)

In [306]:
catTable

Unnamed: 0,Unnamed: 1,sd_cnts,ca_cnts,us_cnts,sd_%_of_ca,ca_%_of_us
inc_ceil,1.0,12.0,194.0,3803,6.19,5.1
inc_ceil,2.0,282.0,4189.0,26535,6.73,15.79
inc_ceil,3.0,2.0,40.0,283,5.0,14.13
inc_ceil,missing,24.0,179.0,21386,13.41,0.84
trgt_pop,1.0,218.0,3617.0,26108,6.03,13.85
trgt_pop,2.0,77.0,800.0,9790,9.62,8.17
trgt_pop,missing,25.0,185.0,16109,13.51,1.15
trgt_fam,1.0,151.0,2154.0,16579,7.01,12.99
trgt_fam,2.0,41.0,955.0,10309,4.29,9.26
trgt_fam,missing,128.0,1493.0,25119,8.57,5.94


In [291]:
# Change variable categorical/ordinal numbering to real meaning

incCeilMapping = {1: '50% AMGI', 2: '60% AMGI', 3: 'Income Average'}
trgtMappings = {1: 'Yes', 2: 'No', 0: 'blank'}
creditMappings = {1: '30% PV', 2: '70% PV', 3: "both", 4: 'TCEP only'}
ddaMappings = {0: 'Not in DDA', 1: 'In Metro DDA', 2: 'In Non-Metro DDA', \
    3: 'In Metro GO Zone DDA', 4: 'In Non-Metro GO Zone DDA'}
constrTypeMappings = {1: 'New construction', 2: 'Acquisition and Rehab', \
    3: 'Both NC and A/R', 4: 'Existing'}

newOuterIndexes = []
newInnerIndexes = []
for pair in catTable.index:

    if pair[1] == 'missing': 
        newOuterIndexes.append(pair[0])
        newInnerIndexes.append('missing')
        continue

    if pair[0] == 'inc_ceil':
        index = (pair[0], incCeilMapping[float(pair[1])])
    elif 'trgt' in pair[0]:
        index = (pair[0], trgtMappings[float(pair[1])])
    elif pair[0] == 'credit':
        index = (pair[0], creditMappings[float(pair[1])])
    elif pair[0] == 'dda':
        index = (pair[0], ddaMappings[float(pair[1])])
    elif pair[0] == 'type':
        index = (pair[0], constrTypeMappings[float(pair[1])])
    else:
        if pair[1] == 9999:
            index = (pair[0], 'missing')
        elif pair[1] == 8888:
            index = (pair[0], 'unconfirmed')
        else:
            index = (pair[0], str(pair[1])[:4])
    
    newOuterIndexes.append(index[0])
    newInnerIndexes.append(index[1])
    
catTable.index = [np.array(newOuterIndexes), np.array(newInnerIndexes)]


In [292]:
viewAll(True)
catTable

Unnamed: 0,Unnamed: 1,sd_cnts,ca_cnts,us_cnts,sd_%_of_ca,ca_%_of_us
inc_ceil,50% AMGI,12.0,194.0,3803,6.19,5.1
inc_ceil,60% AMGI,282.0,4189.0,26535,6.73,15.79
inc_ceil,Income Average,2.0,40.0,283,5.0,14.13
inc_ceil,missing,24.0,179.0,21386,13.41,0.84
trgt_pop,Yes,218.0,3617.0,26108,6.03,13.85
trgt_pop,No,77.0,800.0,9790,9.62,8.17
trgt_pop,missing,25.0,185.0,16109,13.51,1.15
trgt_fam,Yes,151.0,2154.0,16579,7.01,12.99
trgt_fam,No,41.0,955.0,10309,4.29,9.26
trgt_fam,missing,128.0,1493.0,25119,8.57,5.94


In [209]:
viewAll(False)

In [213]:
# Get proportions for continuous variables using double indexes

sd.replace('missing', np.NaN, inplace=True)
ca.replace('missing', np.NaN, inplace=True)
df.replace('missing', np.NaN, inplace=True)

attrs = ['allocamt', 'li_units', 'n_0br', 'n_1br', 'n_2br', 'n_3br', 'n_4br', 'home_amt']

# Get counts at CA and SD levels

sdCounts, caCounts, usCounts = [], [], []
for attr in attrs:
    sdCounts.append(np.round(np.sum(sd[attr]),0))
    caCounts.append(np.round(np.sum(ca[attr]),0))
    usCounts.append(np.round(np.sum(df[attr]),0))


In [214]:
conTable = pd.DataFrame(index=attrs, data={'sd_amts': sdCounts, \
                                            'ca_amts': caCounts, 
                                            'us_amts': usCounts})

In [215]:
conTable['sd_%_of_ca'] = np.round((conTable['sd_amts'] / conTable['ca_amts'])*100,2)
conTable['ca_%_of_us'] = np.round((conTable['ca_amts'] / conTable['us_amts'])*100,2)

In [216]:
conTable

Unnamed: 0,sd_amts,ca_amts,us_amts,sd_%_of_ca,ca_%_of_us
allocamt,248487534.0,3306434000.0,19459160000.0,7.52,16.99
li_units,30706.0,357483.0,2995641.0,8.59,11.93
n_0br,2900.0,34302.0,152387.0,8.45,22.51
n_1br,7223.0,114156.0,938306.0,6.33,12.17
n_2br,10876.0,103644.0,1053935.0,10.49,9.83
n_3br,6222.0,61382.0,475306.0,10.14,12.91
n_4br,474.0,10530.0,68088.0,4.5,15.47
home_amt,34409358.0,1035542000.0,4387805000.0,3.32,23.6


In [309]:
# sanity check

34409358.0 / 1.035542e+09 * 100

3.3228355778906122

In [313]:
wp = "C:/Users/phuro/UCSD/ULI/H2H/ULI-UCSD_H2H/LIHTC_analysis/data/LIHTC/sumTableCategorical.csv"

catTable.to_csv(wp)

In [314]:
wp = "C:/Users/phuro/UCSD/ULI/H2H/ULI-UCSD_H2H/LIHTC_analysis/data/LIHTC/sumTableNumerical.csv"

conTable.to_csv(wp)

In [319]:
viewAll(False)

In [322]:
df.replace('missing', np.NaN, inplace=True)

In [None]:
wp = "C:/Users/phuro/UCSD/ULI/H2H/ULI-UCSD_H2H/LIHTC_analysis/data/cleanLIHTC.csv"

df.to_csv(wp)

#### Unused code archive

In [2]:
# # Get proportions for categorical/ordinal variables using double indexes

# # Get lists of indexes to be paired up
# outerIndexes = []
# innerIndexes = []
# for attr in attrs:

#     if (attr not in ['inc_ceil', 'yr_alloc', 'credit', 'dda']) or ('trgt' not in attr):
#         continue

# # Cast yr_alloc (year allocation) from float to string
# df['yr_alloc'] = [str(y)[:4] for y in df['yr_alloc']]
