# Where is affordable housing urgently needed in San Diego?

Our final project will deal with location where more affordable (Section 8) housing can be developed. To do this, we will first look at existing Section 8 housing within San Diego county. We feel this is an important issue because of a shortage of affordable housing in San Diego. For example, in a building that was supposed to have 33 units built, only 4 were built.

For our current existing affordable housing dataset, we scraped https://www.sdhc.org/housing-opportunities/affordable-rentals/rent-from-sdhc/sdhc-owned-properties/ for their backend dataset of housing units at https://www.sdhc.org/wp-content/themes/sdhc/properties.php?type=sdhc-owned-properties%2Chousing-development-partners&query=, which returns a JSON file with all the current SDHC affordable housing units in San Diego. We then downloaded this for use as the file "properties.json".

# Background and Literature

### [A senior on California’s streets with little chance of a home](https://www.thereporter.com/2019/05/12/a-senior-on-californias-streets-with-little-chance-of-a-home/)
This identifies the purpose of the research question, homlessness is a major issue in San Diego and neighbouring Southern California Counties. With the creation of greater public transportation links comes the ability to expand the affordable housing sector in areas not previously accessible before. With one of the 22 ‘promise zones’ in San Diego, in addition to a large number of workforce coming in from the Mexico-US border, housing supply needs to keep up with an increased demand; and the need to match the country wide inflation. 

### [The Housing Crisis Is Obvious – but the Data Detailing Its Scope Is Not](https://www.voiceofsandiego.org/topics/land-use/the-housing-crisis-is-obvious-but-the-data-detailing-its-scope-is-not/)
> “A city report says just 33 units for middle-income units were built in seven years. The real number is actually four”

Expanding on the point above, the previous problem these housing units are not adequately budgeted and if they are the proposals end up getting rejected by the lobbyists pushing for welfare and safety of its current citizens. 
With the public transport expansion(planned and current) it is possible to have these units away from the city. This creates neighbourhoods and communities, increasing the trickle down by adding jobs and infrastructure to different parts of the city. 

### [SDHC - Affordable Housing Crisis Action Plan](https://www.sdhc.org/wp-content/uploads/2019/01/2016-01-04_SDHC-Housing-Affordability-Crisis-Action-Plan_web.pdf)
Action plan from the San Diego Housing Commission about the current housing crisis in San Diego and what can be done to address it. Points include having annual goals of housing production, introducing tax rebates for housing, opening more vacant land for development, and allocating more resources towards low-income housing.

### [Chicago Affordable Rental Housing](https://www.kaggle.com/chicago/chicago-affordable-rental-housing-developments)
This project was inspired from the Chicago Affordable Rental Housing project, the idea is to create a similar dataset for San Diego County by merging already available housing data and the San Diego County socio-economic and transportation patterns to add efficiency to this pre-existing model. 

# Python Libraries and ArcGIS modules used
#### Standard Python modules used
1. geopandas
2. pandas
3. Shapely
4. numpy
5. arcgis

#### GIS operations used
1. dissolve_boundaries
2. overlay_layers
3. geoenrichment
4. SpatialDataFrame & related functions
5. create_buffers
7. gis.map
9. spatial.join
10. Other functions on ArcGIS Online

This list evolved as we realized the limitations of the Zillow dataset. We had to use alternative datasets that were more detailed than the Zillow dataset. The inclusion of geopandas, pandas, Shapely, and numpy were needed in order to preprocess data into shape files and feature layers for ease of use as well as adding/dropping columns as part of data cleaning.

# Data Sources
1. [SDHC existing properties](https://www.sdhc.org/wp-content/themes/sdhc/properties.php?type=sdhc-owned-properties%2Chousing-development-partners&query=): JSON file traced form SDHC homepage AJAX requests that loaded pins on Google Maps for existing SDHC properties
2. [San Diego County ZIP codes](https://ucsdonline.maps.arcgis.com/home/item.html?id=81cf0db99f754a0ebc6f5ddaefd7bcad): Map of San Diego County ZIP codes
3. [ZIP Code mean and median incomes](https://www.psc.isr.umich.edu/dis/census/Features/tract2zip/): University of Michigan dataset derived from the ACS survey that details mean and median incomes by ZIP code
4. [Transit Stops GTFS](https://sdgis-sandag.opendata.arcgis.com/datasets/transit-stops-gtfs?geometry=-117.248%2C32.867%2C-117.166%2C32.88): Dataset of stop IDs and location
5. [Transit Stop Routes](https://sdgis-sandag.opendata.arcgis.com/datasets/transit-stop-routes): Stop IDs and routes that serve every stop
6. [Transit Routes GTFS](https://sdgis-sandag.opendata.arcgis.com/datasets/transit-routes-gtfs): Route geometry for SDMTS and NCTD
7. [City-owned Land](http://rdw.sandag.org/Account/GetFSFile.aspx?dir=Land+Use&Name=CITY_OWNED_LAND.zip): City-owned land for all parcels either owned by the city or leased to city.

In our project proposal, we were going to use homeless spot counts for analysis. However, we realized that a much better indicator of the need for affordable housing is to look at rent as a percentage of income as well as area median incomes, as the homeless population is very small when compared to the former. We also removed the Zillow dataset as a principal dataset because it did not give us fine enough data on the ZIP code level, which we had to use the UMich dataset as a substitute.

Many of our data sources used were from SANDAG or the open-data warehouse from the city for city resources (public transit, land). We could not obtain the existing SDHC properties through conventional means, so we had to resort to digging around on a web browser console and listening to web requests in order to gather our data.

# Data Cleaning
The first point of data cleaning we had to do was to move our existing SDHC properties data to a GIS-compatible format from just a simple JSON file. To do this, we used pandas, geopandas, and Shapely to convert the data and then save it with geopandas to a .shp file type, which we could then upload onto ArcGIS Online. We thought that most of our data would be available already in some form of a GIS warehouse, so when we were confronted with this issue we had to think of a solution we used in previous MPs for this project.

The next point in cleaning we had to do was joining the 3 transit datasets in order to get meaningful information (routes that serve stops and geometry all together). We had to fill NULL values for stop_ids as well as route_ids in order to both merge and drop them.

Finally, a last part of cleaning we had to do was using numpy in order to do calculations on an enriched dataframe to go from total in an area to a percentage.

In our project proposal, we did not expect this amount of data cleaning, which soon became apparent as our first step was to preprocess the SDHC existing properties data into a usable format.

# Descriptive Statistics 
In the development of the project analysis, we first saw clusters of affordable housing scattered around maps. As we discussed in class spatial autocorrelation can be used to find clusters of similar data (+ve Morans I). The goal in relation to the current housing locations was to have to the new housing locations further away from the current sites. This in some sense could be considered to be a variation of negative spatial autocorrelation while we look for sites in inverse value of affordable housing availability. 

For the most part, with the housing sites and the developable and vacant land plot we see random point patterns dispersed around the city based on land vacancy. 
Additionally, the results we can up with can be considered random point patterns, as the are not spatially correlated and depend heavily on city owned sites.

The description of the random point patterns in the data also explain how there is a shortage of spatial statistical descriptions. The diversity of datasets and criteria developed in the analysis do not allow for extensive statistical significance.

# Analysis and Code

All cells that are Raw NBConvert cells are code cells that are ran only once for this project that either
1. create shape files for upload onto ArcGIS Online
2. create feature layers that can be pulled later to save processing time between notebook launches
3. feature layers created by web operations on ArcGIS Online

In [1]:
from arcgis import GIS
from arcgis.features.analysis import dissolve_boundaries
from IPython.display import display
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
from arcgis.geoenrichment import *
gis = GIS(username="nes008_UCSDOnline8") 

Enter password: ········


Loading our above saved existing SDHC properties into our notebook (id: 9f5c612ca7154dfab7072023a4fb3920)

In [2]:
# visualize where current existing housing is
existing = gis.content.get('9f5c612ca7154dfab7072023a4fb3920').layers[0]
map2 = gis.map('San Diego')
map2.add_layer(existing, {'renderer': 'ClassedColorRenderer', 'field_name': 'zip'})
map2

MapView(layout=Layout(height='400px', width='100%'))

As we can see, the majority of current housing is centered around north of Downtown, in Normal Heights/Midcity, along with another concentration south in Otay Mesa. New affordable housing should be concentrated in places where

1. There is currently no/low amounts of affordable housing
2. Job hub or near many jobs
3. Easy access to mass transit
4. Good community fit in regards to demand/need of affordable housing
5. High concentration of homeless

To start off, we will overlay a layer on low-income neighborhoods in San Diego with current affordable housing locations to identify those neighborhoods that need development. We will use median income per zip code.

Loading a dissolve between a spatial join of ZIP codes and the University of Michigan processed ACS dataset of mean and median income by ZIP code (id: f9ca0f98d4254ec2b3c17a744ade8b9c)

In [3]:
low_income = gis.content.get('f9ca0f98d4254ec2b3c17a744ade8b9c').layers[0]
map3 = gis.map('San Diego')
map3.add_layer(low_income)
map3.add_layer(existing, {'renderer': 'ClassedColorRenderer', 'field_name': 'zip'})
map3

MapView(layout=Layout(height='400px', width='100%'))

As we can see, a large portion of Downtown is underserved by affordable housing, along with Imperial Beach and other parts of South San Diego. In addition, we also have a large portion of Grantville (East of Qualcomm Stadium) and North Clairemont also left underserved.

Our next step in the analysis is to determine the median incomes, population, and homeless count in these areas in order to find the most efficient place to develop new Section 8 housing. In order to do so, we will first define what it means to be low-income for this project, which is below 1.5x the poverty line for our means.

Loading dataset on total population per ZIP code that is 150% or below of the poverty line that we obtained through geoenrichment (id: c8752e6f0837405fa87646547c5b54b8)

In [4]:
pop_low_income = gis.content.get('c8752e6f0837405fa87646547c5b54b8').layers[0]
map4 = gis.map('San Diego')
map4.add_layer(existing)
map4.add_layer(pop_low_income, {'renderer': 'ClassedColorRenderer', 'field_name': 'totpot', 'opacity': .7})
map4

MapView(layout=Layout(height='400px', width='100%'))

We will dedicate our search to certain portions of the map that appear to need affordable housing, namely ZIP codes of:
1. 92114
2. 92113
3. 92102
4. 92117

because of the current lack or shortages of affordable housing in those areas

Now that we have identified several areas that are in dire need of more housing, we can add in other variables such as transportation and job hubs to further narrow down our search.

Loading our processed transit routes dataset with routes that serve every single stop along with stop ID and geometry. In addition, we added the transit route geometry. (id: ce767793deec4d3a96f632c2b1477acd & 5b9d7189e107452bb6abf991bd20151e)

In [5]:
route_geometry = gis.content.get('5b9d7189e107452bb6abf991bd20151e')
route_geometry_df = route_geometry.layers[0].query().df

Loading our route dataset that we subsetted from our whole route dataset that only serves transit centers (id: ce767793deec4d3a96f632c2b1477acd)

We determined if a route served a transit center by checking to see if any of the stop names a route served included the string "transit center." This method is not perfect, as transit centers such as the Gilman Transit Center are actually labeled by this dataset as simply "Gilman St and Myers Ave," but should be good enough for our analysis.

In [6]:
transit_routes = gis.content.get('ce767793deec4d3a96f632c2b1477acd').layers[0]
from arcgis.features.manage_data import dissolve_boundaries
transit_routes_dissolved = dissolve_boundaries(transit_routes, dissolve_fields=['route_id'])

In [7]:
map5 = gis.map('San Diego')
map5.add_layer(transit_routes_dissolved)
map5

MapView(layout=Layout(height='400px', width='100%'))

This map shows all of the routes serving a transit center, with all shapes for each route dissolved for easier analysis.

Next, we will use a spatial join with the identified ZIP codes that need housing with bus routes, then create a buffer around the bus routes inside that community in order to identify corridors near which we should build affordable housing. We did this in ArcGIS Online utilizing filtering, find_existing_locations, as well as an spatial join.

Loading our dataset of an intersection join between our identified ZIP code geometry along with the bus routes that serve transit center dataset (id: 2293760eb2a44f34aa56f56cc89e298e)

We did this intersection overlay on ArcGIS Online, as the UI was much more intuitive.

In [8]:
# output id: 2293760eb2a44f34aa56f56cc89e298e
intersecting_routes = gis.content.get('2293760eb2a44f34aa56f56cc89e298e')
intersecting_routes_dissolved = dissolve_boundaries(intersecting_routes, dissolve_fields=['route_id'])
from arcgis.features.use_proximity import create_buffers
intersecting_routes_buffer = create_buffers(intersecting_routes_dissolved, dissolve_type=None, distances=[0.25], ring_type='Rings', units='Miles')
map6 = gis.map('San Diego')
map6.add_layer(intersecting_routes_buffer)
map6

MapView(layout=Layout(height='400px', width='100%'))

From this dissolved bus route map inside the 4 ZIP codes that we identified, we have finally found potential areas where we can build affordable housing near a transit route that serves a transit center. Next, we need to find locations within this buffer that are suitable for housing.

In [9]:
from arcgis.features.manage_data import overlay_layers
overlayed = overlay_layers(intersecting_routes_buffer, existing, tolerance=0, context={})
map6.add_layer(overlayed)
map6

MapView(layout=Layout(height='400px', width='100%'))

With the bus route buffer and existing housing points, we created a buffer of 1 mile around existing housing points, as to find areas currently not close to existing affordable housing. Then, we used an erase overlay between the two layers to further narrow down our search, the result of which can be viewed at https://ucsdonline.maps.arcgis.com/home/webmap/viewer.html?webmap=92af2b047323454a92e8844f36b2bc85

From the little sliver near the Naval Base on National City, we can see that at 4152 Nordica Ave, there is an empty lot, perfect for development of new affordable housing. In addition, there is a massive empty stretch of land at 311 N Highland Ave. More up north, we have areas such as 32.715448, -117.048834, which are still empty lots waiting to be developed. However, purely eyeballing for empty lots to build is not enough, so we will now incorporate empty land dataset along with rent-to-income ratios.

According to SDHC, people who qualify for their rent voucher program need to spend between 30-40% of their income on rent. Therefore, we enriched our undissolved buffers in order to have several different study areas with data on the amount of people that spend over 30% of their income on rent as a proportion of total population. All of this can be viewed on the webmap.

Loading an enriched overlay of the buffered regions along bus routes (id: 6f9a6299e967411fac47123e26656c6b)

In [10]:
enriched = gis.content.get('6f9a6299e967411fac47123e26656c6b')
enriched_df = enriched.layers[0].query().df

# Where can affordable housing be built?

Now, every single polygon buffer has a proportion of people that need affordable housing per square mile, which will allow us to identify high-impact areas which need more immediate attention. Using San Diego's open parcel dataset with developable land will then allow us finally get candidates for suitable housing locations.

Loading our SANDAG city-owned parcels dataset and our enriched dataset of suitable locations after an erase overlay between our bus buffers and existing SDHC properties buffer (id: 9b69c6f97490426f83a4a7eae6d0b61d & 7734752ceb2941ae84681cbc5d27ca1d)

In [11]:
developable_land = gis.content.get('9b69c6f97490426f83a4a7eae6d0b61d')
process_enriched = gis.content.get('7734752ceb2941ae84681cbc5d27ca1d')
overlay_land_enrich = overlay_layers(developable_land, process_enriched)
map7 = gis.map('San Diego')
map7.add_layer(overlay_land_enrich)
map7

MapView(layout=Layout(height='400px', width='100%'))

For every vacant parcel, total_in_n specifies how urgent the need for new affordable housing is at that ZIP code, with higher meaning more urgency. This intersection overlay displays all the land the city owns and can be developed in the area that we identified after an erase overlay between the bus route buffer and existing affordable housing buffer. 

Some parcels are too small, while others are irrelevant to our analysis (road right of way). We believe that the parcels that we have found here after filtering for relevant categories would be best suited towards developing more affordable housing for San Diego in the short-term due to current housing needs in the city.

Cross-checking our suitable parcels with parcels that San Diego recently announced for more affordable housing, we can see that we have overlaps, especially in National City near the Naval Base, where the proposed site from the City of San Diego is straddled by parcels in our analysis. However, other housing developments the city proposed are within 1 mile of a currently existing affordable housing property, which was not our focus in this project (which was finding the best places to build housing now to alleviate the biggest housing crisises in San Diego), so they did not line up.

In [12]:
overlay_enrich_df = overlay_land_enrich.query().df
categories = ['Mixed Use', 'Single Family Residential', 'Spaced Rural Residential', 'Multi-Family Residential', 'Single Family Detached']
suitable_best_location = overlay_enrich_df[overlay_enrich_df['plannedlu'].isin(categories)].drop_duplicates(subset='FID_DEVELOPABLE_LAND_DEVELOPABLE_LAND')
suitable_best_location

Unnamed: 0,AnalysisAr,AnalysisArea,Analysis_1,BUFF_DIST,Count_,ENRICH_FID,FID_A17C2E_A17C2E,FID_DEVELOPABLE_LAND_DEVELOPABLE_LAND,HasData,ID,...,apportionm,dev_code,devtype,plannedlu,plu,population,route_id,sourceCoun,total_in_n,SHAPE
0,1.551946,0.000022,1.551946,0.25,35,4,4,2840,1,3,...,2.576,3,Vacant Developable,Spaced Rural Residential,1000,2.191,105,US,860.565973,"{'rings': [[[-13047659.7077, 3871514.057800002..."
2,1.872066,0.008374,1.872066,0.25,14,6,5,2841,1,5,...,2.576,3,Vacant Developable,Spaced Rural Residential,1000,2.191,50,US,815.608019,"{'rings': [[[-13048111.1316, 3873302.8191], [-..."
3,1.872066,0.000347,1.872066,0.25,14,6,5,2887,1,5,...,2.576,3,Vacant Developable,Single Family Residential,1100,2.191,50,US,815.608019,"{'rings': [[[-13048755.629, 3872562.831699997]..."
7,5.125949,0.000784,2.133321,0.25,96,12,7,3463,1,11,...,2.576,3,Vacant Developable,Single Family Residential,1100,2.191,4,US,617.878604,"{'rings': [[[-13027998.1213, 3857639.077500000..."
8,5.125949,0.000303,2.133321,0.25,96,12,7,3464,1,11,...,2.576,3,Vacant Developable,Single Family Residential,1100,2.191,4,US,617.878604,"{'rings': [[[-13028152.6571, 3857881.5713], [-..."
9,3.357977,0.001779,0.491037,0.25,59,10,10,3465,1,9,...,2.576,3,Vacant Developable,Single Family Residential,1100,2.191,520,US,504.017217,"{'rings': [[[-13029631.397300001, 3857565.3250..."
10,1.786745,0.000338,0.287566,0.25,26,5,3,3629,1,4,...,2.576,3,Vacant Developable,Single Family Residential,1100,2.191,929,US,995.554743,"{'rings': [[[-13036340.8283, 3853050.285400003..."
11,2.848266,0.000221,0.116861,0.25,108,7,2,3635,1,6,...,2.576,3,Vacant Developable,Single Family Residential,1100,2.191,955,US,1079.207342,"{'rings': [[[-13035909.5878, 3853427.618199996..."
12,2.848266,0.000753,0.116861,0.25,108,7,2,3636,1,6,...,2.576,3,Vacant Developable,Single Family Residential,1100,2.191,955,US,1079.207342,"{'rings': [[[-13035493.515, 3853438.4970000014..."
14,1.786745,0.000576,0.287566,0.25,26,5,3,3743,1,4,...,2.576,16,Vacant Developable Mixed Use,Mixed Use,9700,2.191,929,US,995.554743,"{'rings': [[[-13036549.0214, 3852932.928999997..."


Plotting these parcels on a map will show location and size of land designated for residential use

In [13]:
map8 = gis.map('San Diego, CA', zoomlevel=11)
suitable_best_location.spatial.plot(kind='map', map_widget=map8)
map8

MapView(layout=Layout(height='400px', width='100%'), zoom=11.0)

As seen on the map, these are all the vacant parcels owned by the city, designated as for residential use within 1/4 of a mile of a transit route that serves a transit center, at least 1 mile away from other affordable housing in an area that currently has very high demand for affordable housing.

# Conclusions and Future Work 
### Conclusion
This project was successful on our part in considering appropriate land. The land parcels we selected in our final analysis have also recently been identified by the San Diego Housing Commission as sites suitable for developing affordable housing in the near future. 
In fact, we feel that our analysis looked at transit routes and proximity to employment opportunities more closely than the analysis put forth by the SDHC. 

While stating this project was successful in it's purpose, it is important to point out the various logistical and analytical issues in our analysis. This ranges from the lack of consideration for all types of transportation (people who need Section 8 housing may have cars, bicycles, etc) to taking a buffer of the flat distance in miles as opposed to the other diverse options available to us (e.g. driving time) through ArcGIS. 

In conclusion, we would appreciate our results to be critiqued and taken into consideration by both SDHC and SANDAG, as both are in charge of city planning and for the healthy urbanization of San Diego.
### Future Work
The project proposal and the research question we had formulated were extremely solution based. Our analysis and conclusions, as described earlier in the notebook are very precise. 

The final dataset that was incorporated into our analysis was the `developable and vacant land` dataset. This allowed us to provide exact locations that were developable while being owned by the city. The research question focused on the development of affordable housing on an urgent timeline, thus streamlining the process to looking at city-owned sites seemed the most efficient. 

Additionally, if this project was not looking at housing development on an urgent scale, a model put forward by one of the San Diego city supervisor candidates would best describe our desired future improvements. These would be to address the shortage of developable areas by not just developing housing units but also the infrastructure that came along with it. This allows the problem to be tackled at both the root and at scale. 

Below is the article referenced in the article above describing the plan of this supervisor candidate. 
https://timesofsandiego.com/politics/2019/05/28/supervisor-candidate-castellanos-offers-1-billion-affordable-housing-plan/

## Summary of Products and Result

As seen below, we have a final map layer derived from the following steps: 
1. Current units of Section 8 housing in San Diego: Scraped from the SDHC website 

2. Transit routes, transit stops and transit route stops from the MTS dataset representing the entire map of the San Diego County Transit system

3. Derived study area from zip codes representing Area Median Income (AMI). This was used to create our preliminary study area

4. `Steps 1 & 2` clipped to the study area defined in `Step 3`

5. A buffer of 1/4 mile was created around the transport (bus/trolley/rail) routes serving transit centers

6. An erase overlay of a mile around existing affordable housing units to ensure we weren't proposing new and urgent housing too close to existing units. This would not represent the diversity in demand adequately. 

7. The final derived study area with the vacant and developable land parcels dataset clipped onto it. 

8. A deep dive into the final developable plots to look at the individual parcels to identify specific sites. 

Additionally, the results are shown through the code in the notebook along with the map views with the respective layers to provide the most hollistic view of the soluition addressing the research question. 

An example of the product obtained can be viewed below:

Map View            |  Satellite view
:-------------------------:|:-------------------------:
![map view](map_view.png) |  ![satellite view](satelite.png)

## Discussion

The most important problem this project addresses is that of low-income populations who pay a large proportion of their paycheck on rent and where affordable housing can be built near where they live to alleviate their burden.

While this could be dealt with in a very linear manner, understanding that tackling this issue from the crux of the issue would be more efficient, giving people near or under the poverty line more access to employment and public transportation while allowing them to keep more money for saving or healthier living in their pockets every month due to rent assistance.

In the process of working on this problem we realized that there wasn't a lot of land available within the city limits that match the bounds of the criterion we had applied. 
This understanding led us to realize that the housing crisis could be solved on a short-term basis but for permenant fix that addressed the root of the issue, there has to be neighborhoods and communities developed as a whole in places that are currently underdeveloped.

This led us to analyze the potential expansion of transit lines and urban accessibility in areas far off from city centers. Building more transit centres may also be a solution to this transit problem. 

From a more technical point of view we had to make a number of informed estimations while applying buffers at two points. First, applying buffers around the transit lines was a bit tricky as we had to make sure we were considering the right type of distance parameter that would best represent the travel time. We used a flat distance in our preliminary findings and should probably use a walking time for further analysis. The second buffer we had to apply was around the current sites of affordable housing where the question was if we needed the buffer in the first place. This was because the demand for housing was so high that even sites near current Section 8 housing would serve as an asset. However, we decided on a 1 mile buffer in order to address the issue stated in our problem statement which revolved around the urgent need for housing in areas currently not served by Section 8 housing availability. 

