# Data Exploration: Los Angeles Affordable Housing Projects

## Introduction

The California Tax Credit Allocation Committee (TCAC) keeps track of all the multifamily housing projects developed using Low Income Housing Tax Credits (LIHTC) in California. ([Source](https://www.treasurer.ca.gov/ctcac/projects.asp)). From this data set I pulled out the Los Angeles data to begin exploring. I've uploaded it to the data folder but I think we need to import the pandas library first to be able to read it.

In [1]:
# we import this so we can process this csv file easier.
import pandas as pd

Now we can import the data into a variable so we can use it later

In [5]:
# create a variable called projects and fill it with the data from the csv fle.
projects = pd.read_csv('data/LAProjects.csv')

## Look at the data
To get a peek of the data, we can take a look at the "info" of the file to get a preview of what data is in this file

In [15]:
# give me the overall info about this file
projects.info

<bound method DataFrame.info of      Application Number  Unnamed: 1 Type of tax credit funding  \
0           CA-2000-858         NaN                         4%   
1           CA-2005-863         NaN                         4%   
2           CA-1992-901         NaN                         4%   
3           CA-2003-819         NaN                         4%   
4           CA-2010-830         NaN                         4%   
...                 ...         ...                        ...   
1301        CA-2009-856         NaN                         4%   
1302        CA-2013-825         NaN                         4%   
1303        CA-1995-009         NaN                         9%   
1304        CA-2014-876         NaN                         4%   
1305        CA-2016-875         NaN                         4%   

                       Project Name  \
0                 Main Street Plaza   
1         Wysong Village Apartments   
2        Altadena Vistas Apartments   
3          Heritage

Now let's see what the "shape of it is"

In [14]:
# show the (rows, columns)
projects.shape

(1306, 68)

And finally, we can take a peek at the first few rows of data's "head"

In [12]:
# show me the first 5 rows
projects.head()

Unnamed: 0,Application Number,Unnamed: 1,Type of tax credit funding,Project Name,Project Address,Project City,Project Zip Code,Project Phone Number,Project County,California Assembly District,...,Management Company Fax,Developer,Annual Federal Award,Total State Award,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67
0,CA-2000-858,,4%,Main Street Plaza,"333 West Main Street, Alhambra, CA 918017427",Alhambra,91801,626-289-5800,Los Angeles,49.0,...,310-432-0888,,"$486,869",$0,,,,,,
1,CA-2005-863,,4%,Wysong Village Apartments,"111 North Chapel Avenue, Alhambra, CA 918010000",Alhambra,91801,626-284-3956,Los Angeles,49.0,...,614-273-2154,National Church Residences,"$293,106",$0,,,,,,
2,CA-1992-901,,4%,Altadena Vistas Apartments,"815 E. Calaveras Street, Altadena, CA 91001",Altadena,91001,323-734-2111,Los Angeles,41.0,...,310-358-3494,L.A. Community Development Commission,"$74,027",$0,,,,,,
3,CA-2003-819,,4%,Heritage Park at Arcadia,"150 West Las Tunas Drive, Arcadia, CA 91007",Arcadia,91007,626-821-9048,Los Angeles,49.0,...,9167730529,"American Senior Living, Inc.","$295,337",$0,,,,,,
4,CA-2010-830,,4%,Campus Commons,"16 Campus Drive, Arcadia, CA 91007",Arcadia,91007,626.445.7017,Los Angeles,49.0,...,559.476.5538,"Ashwood Construction, Inc","$383,390",$0,,,,,,


# Data Exploration
Now, let's try to explore this data a little bit.

In [13]:
# Just show me the columns
projects.columns.to_list()

['Application Number',
 'Unnamed: 1',
 'Type of tax credit funding',
 'Project Name',
 'Project Address',
 'Project City',
 'Project Zip Code',
 'Project Phone Number',
 'Project County',
 'California Assembly District',
 'California Senate District',
 'Federal Congressional District',
 'Census Tract',
 "Assessor's Parcel Number (APN)",
 'Application Stage',
 'Placed in Service (PIS) Date ',
 'Last Building PIS Date',
 'Construction Type',
 'Housing Type',
 'Total Units',
 'Low Income Units',
 'Number of SRO/Studio Units',
 'Number of 1 Bedroom Units',
 'Number of 2 Bedroom Units',
 'Number of 3 Bedroom Units',
 'Number of 4 Bedroom Units',
 'Number of 5 Bedroom Units',
 'Number of 6 Bedroom Units',
 'Units at or below 20% AMI',
 'Units at or below 30% AMI',
 'Units at 35% AMI',
 'Units at 40% AMI',
 'Units at 45% AMI',
 'Units at 50% AMI',
 'Units at 55% AMI',
 'Units at 60% AMI',
 'Units at 70% AMI',
 'Units at 80% AMI',
 'Owner or Applicant Name',
 'Owner/Applicant Contact',
 'Owner

Lets start taking a look at what we have in here.

In [18]:
projects['Project City'].value_counts()

Los Angeles       806
Long Beach         48
Santa Monica       37
Lancaster          25
Pasadena           19
                 ... 
Woodland Hills      1
Santa Clarita       1
Sunland             1
Highland Park       1
Marina Del Rey      1
Name: Project City, Length: 100, dtype: int64

Ok, this is a pretty big file. Maybe we should trim down the data to something easier to manage. And then take a look at the new set.

In [24]:
# create a new variable to store a subset of the columns from the original data set.
projects_trimmed = projects[['Type of tax credit funding',
 'Project Name',
 'Project Address',
 'Project City',
 'Project Zip Code',
 'Project County',
 'Census Tract',
 'Construction Type',
 'Housing Type',
 'Total Units',
 'Low Income Units',
 'Owner or Applicant Name',
 'Developer',
 'Annual Federal Award',
 'Total State Award',
]]
# show a preview of the first 5 rows.
projects_trimmed.head()

Unnamed: 0,Type of tax credit funding,Project Name,Project Address,Project City,Project Zip Code,Project County,Census Tract,Construction Type,Housing Type,Total Units,Low Income Units,Owner or Applicant Name,Developer,Annual Federal Award,Total State Award
0,4%,Main Street Plaza,"333 West Main Street, Alhambra, CA 918017427",Alhambra,91801,Los Angeles,4803.04,New Construction,Senior,110.0,109.0,Main Street Plaza/4th St. L.P.,,"$486,869",$0
1,4%,Wysong Village Apartments,"111 North Chapel Avenue, Alhambra, CA 918010000",Alhambra,91801,Los Angeles,4810.02,Acquisition/Rehab,Senior,95.0,94.0,Wysong Village Apartments LP,National Church Residences,"$293,106",$0
2,4%,Altadena Vistas Apartments,"815 E. Calaveras Street, Altadena, CA 91001",Altadena,91001,Los Angeles,4611.0,New Construction,Senior,22.0,22.0,Altadena Vistas Apartments Limited Part,L.A. Community Development Commission,"$74,027",$0
3,4%,Heritage Park at Arcadia,"150 West Las Tunas Drive, Arcadia, CA 91007",Arcadia,91007,Los Angeles,4316.0,New Construction,Senior,54.0,53.0,"Arcadia Heritage Park, L.P., a CA LP","American Senior Living, Inc.","$295,337",$0
4,4%,Campus Commons,"16 Campus Drive, Arcadia, CA 91007",Arcadia,91007,Los Angeles,4307.21,New Construction,Senior,43.0,42.0,"Arcadia Campus Commons Associates, a CA LP","Ashwood Construction, Inc","$383,390",$0


Let's start doing some summaries of the data. How many 9% vs 4% projects are there?

In [26]:
# Count the number of projects for each type of tax credit funding.
projects['Type of tax credit funding'].value_counts()

9%         606
4%         489
9% ARRA     16
4% ARRA      9
Name: Type of tax credit funding, dtype: int64

I'm not sure what these ARRA ones are all about but let's move on to Construction Type.

In [29]:
# Count the number of projects for each type of construction.
projects['Construction Type'].value_counts()

New Construction                                     689
Acquisition/Rehab                                    218
Acquisition/Rehabilitation                           181
Rehabilitation                                       109
Acquisition & Rehabilitation                          26
New Construction                                      19
Acquisition and Rehabilitation                         3
New Construction and Acquisition & Rehabilitation      2
New Construction / Adaptive Reuse                      1
Name: Construction Type, dtype: int64

Uhg this column is a mess, and needs to be cleaned up because people are sloppy with their data input. But I don't know how to do that yet so I will just move on for now. What about the type of housing? 

In [30]:
# Count the number of projects for each type of housing.
projects['Housing Type'].value_counts()

Large Family                  502
Senior                        184
Special Needs                 178
Non Targeted                  130
Non-Targeted                  109
Seniors                        68
At-Risk                        67
SRO                            63
Special Needs/SRO               2
New Construction                1
Special Needs/Large Family      1
non-targeted                    1
Name: Housing Type, dtype: int64

This one looks a little better but also needs to have some of the things merged and cleaned up, there's a lot of duplicates. At the very least, Large Family seems to be a mostly consistent category so let's look at that one in particular. (I tried several different ways to trim it but this was the only one that worked because my column name has a space in it i think.)

In [46]:
# Create a new variable to store only the large family housing type projects. 
LF_Fam = projects_trimmed.loc[projects['Housing Type'] == 'Large Family']

# Show the new table
LF_Fam


Unnamed: 0,Type of tax credit funding,Project Name,Project Address,Project City,Project Zip Code,Project County,Census Tract,Construction Type,Housing Type,Total Units,Low Income Units,Owner or Applicant Name,Developer,Annual Federal Award,Total State Award
7,9%,Cantamar Villas,"309 Beacon Street, Avalon, CA 90704",Avalon,90704,Los Angeles,5990,New Construction,Large Family,38.0,36.0,"Catalina Avalon Limited, a California LP",,"$232,245",$0
10,4%,Villa Ramona,"13030 Ramona Blvd., Baldwin Park, CA 91706",Baldwin Park,91706,Los Angeles,4048.01,New Construction,Large Family,71.0,70.0,Baldwin Park Family Housing Limited Partnershi...,Thomas Safran & Associates,"$522,176",$0
11,9%,Baldwin Park Transit Center Apartments,"Ramona Boulevard and Maine Avenue, Baldwin Par...",Baldwin Park,91706,Los Angeles,,New Construction,Large Family,70.0,69.0,ROEM Development Corporation,ROEM Development Corporation,"$1,522,725",$0
12,9%,Villa Florentina,"4576 Florence Avenue, Bell, CA 90201",Bell,90201,Los Angeles,5338.03,New Construction,Large Family,13.0,12.0,"Villa Florentina, LLC",MICH Development Company,"$153,218",$0
23,9%,Alabama Court,"7440 Alabama Avenue, Canoga Park, CA 91303",Canoga Park,91303,Los Angeles,1345.22,,Large Family,43.0,42.0,"Alabama Court, LP",LA Family Housing Corp.,"$367,104",$0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1292,9%,Sunshine Terrace,"10800 Laurel Avenue, Whittier, CA 90605",Whittier,90605,Los Angeles,5029.02,New Construction,Large Family,50.0,49.0,Sunshine Terrace Apartments Limited Partnership,Abode Communities,"$667,938",$0
1295,9%,Mosaic Gardens at Whittier,"12524 Philadelphia Street, Whittier, CA 90602 ...",Whittier,90601,Los Angeles,5014,New Construction,Large Family,21.0,20.0,"Whittier Family Apartments, L.P.",LINC Housing Corporation,"$413,565",$0
1299,9%,New Dana Strand Phase 1 Garden Apartments,"326 N. King Avenue, Wilmington, CA 90744",Wilmington,90744,Los Angeles,2949,New Construction,Large Family,120.0,118.0,"New Dana Strand Partners I, L.P.",Abode Communities,"$1,629,992","$6,145,706"
1300,4%,New Dana Strand Town Homes,"450 N. King Avenue, Wilmington, CA 907440000",Wilmington,90744,Los Angeles,2949,New Construction,Large Family,116.0,114.0,"New Dana Strand Town Homes, a CA LP",Mercy Housing California,"$1,739,269",$0


## Foucus in on one particular developer

So this is starting to look a little interesting.

In [54]:
# Count the number of projects by developer
LF_Fam['Developer'].value_counts()

Abode Communities                        22
Meta Housing Corporation                 20
Community Corporation of Santa Monica    19
AMCAL Enterprises, Inc.                  11
American Communities, LLC                 9
                                         ..
Arlington-Rodeo Properties Inc.           1
Spruce Dev LA, LLC                        1
ABS Properties, Inc.                      1
Decro Corp. / Veloce Partners             1
Esperanza Community Housing Corp.         1
Name: Developer, Length: 219, dtype: int64

Now we have our top 5 developers of Large Family affordable housing bubbled up to the top here! I would like to have it show more than the top 5, maybe top 10 would be nice but I'm not sure how to do it.

In [78]:
# Create a new variable with just the Adobe Communitities projects
topdev_trimmed = LF_Fam.loc[LF_Fam['Developer'] == 'Abode Communities']
topdev_trimmed

Unnamed: 0,Type of tax credit funding,Project Name,Project Address,Project City,Project Zip Code,Project County,Census Tract,Construction Type,Housing Type,Total Units,Low Income Units,Owner or Applicant Name,Developer,Annual Federal Award,Total State Award
80,9%,Casa Dominguez,"15729 S. Atlantic Avenue, East Rancho Domingue...",East Rancho Dominguez,90221,Los Angeles,5421.06,New Construction,Large Family,70.0,69.0,"Casa Dominguez, L.P.",Abode Communities,"$1,754,750",$0
188,9%,Osage Apartments,"11128 Osage Avenue, Lennox, CA 90304",Lennox,90304,Los Angeles,6017.0,New Construction,Large Family,21.0,21.0,Osage Apartments Limited Partnership,Abode Communities,"$199,121",$0
196,4%,Grisham Community Housing,"4901 Ruth Ave., Long Beach, CA 908050000",Long Beach,90805,Los Angeles,5717.01,Acquisition/Rehab,Large Family,96.0,94.0,Grisham Community Housing Limited Partnership,Abode Communities,"$607,458",$0
275,9%,Villa Esperanza,"255 East 28th Street, Los Angeles, CA 90011",Los Angeles,90011,Los Angeles,2246.0,New Construction,Large Family,33.0,33.0,Villa Esperanza Limited Partnership,Abode Communities,"$605,423",$0
319,9%,Park Place Apartments,"2500 West 4th Street, Los Angeles, CA 90057",Los Angeles,90057,Los Angeles,2088.02,New Construction,Large Family,49.0,49.0,"Park Place Apartments, LP",Abode Communities,"$897,493",$0
338,9%,Oxnard Villa,"14045 Oxnard Street, Los Angeles, CA 91401",Los Angeles,91401,Los Angeles,1286.01,,Large Family,40.0,39.0,Oxnard Villa Limited Partnership,Abode Communities,"$243,064",$0
339,9%,Parthenia Court,"14825 Parthenia Street, Los Angeles, CA 91402 ...",Los Angeles,91402,Los Angeles,1201.08,New Construction,Large Family,25.0,24.0,"Parthenia Housing Associates, Limited Partnership",Abode Communities,"$333,921",$0
340,9%,Reseda Village,"7939 Reseda Boulevard, Los Angeles, CA 91335",Los Angeles,91335,Los Angeles,1310.1,,Large Family,42.0,41.0,"Reseda Village, LP",Abode Communities,"$327,928",$0
355,9%,Astoria Place Townhomes,"13230 Bromont Avenue, Los Angeles, CA 91342",Los Angeles,91342,Los Angeles,1064.07,New Construction,Large Family,18.0,17.0,Astoria Place Limited Partnership,Abode Communities,"$164,167",$0
575,9%,Hart Village,"6941 Owensmouth Avenue, Los Angeles, CA 91303-...",Los Angeles,91303,Los Angeles,1345.2,New Construction,Large Family,47.0,46.0,"Hart Village, LP",Abode Communities,"$1,106,574",$0


And finally we find out how many of their projects use 9% vs 4% funding

In [77]:
topdev_trimmed['Type of tax credit funding'].value_counts()

9%    16
4%     6
Name: Type of tax credit funding, dtype: int64

I would like to make some graphs or plot these projects on a map but I don't know how to do it with the data I have yet. I imagine there is a way to plot it with the address or census tract but I'm not sure so we'll leave it like this for now.