# SF Data Project
## By Kavish Harjai
## Analysis

### Synopsis

The San Francisco Rent Ordinance created the San Francisco Rent Board in 1979. The Board receives petitions from the city's landlords and tenants. In some cases, the Tent Board will oversee a mediation/arbitration session between the complainant and the alleged wrongdoer.  The nature of the petition, of course, varies depending on whom the filing party is, but the Rent Board can only adjudicate issues that are covered in the rent ordinance. 

I was specifically interested in investigating tenant petitions. In a petition, a tenant can allege one or more complaint, including wrongful eviction, unfair rent increase, a decrease in housing services (necessary repairs and replacements), etc. 

The city of San Francisco keeps a CSV of the petitions sent to the Board going back to January 1997. 

https://data.sfgov.org/Housing-and-Buildings/Petitions-to-the-Rent-Board/6swy-cmkq

The data is "wide," meaning each possible complaint is its own column, and each column contains boolean values: 'True' if the particular complaint is one of the reasons for the petition, and 'False' if not. In addition to those complaint columns, there are several columns important to this analysis, including: 

* petition_id: unique identifier for each petition
* date_filed: when the petition was filed
* filing_party: did a tenant or a landlord file the petition?
* neighborhoods: in which neighborhood does the petitioner live? 

One thing that's critical to understand is that each row refers to one petition. But a single petition can include several complaints (according to the SF Rent Board website: "The tenant may combine more than one type of claim in a Tenant Petition").

I will analyze petitions submitted by tenants from 2010 to present. I ask the following questions: 

* In which year did tenants file the most petitions?
* What kinds of complaints are the most popular? 
* From which neighborhoods did most complaints arise? And how does the number of complaints compare to the population of renters in each neighborhood? 

This project is divided into two notebooks. One focused on data preparation and another focused on analysis. Both can be found in the notebooks subfolder. My findings are located in the analysis notebook.

I prepare the analyze the data in order of the questions above. 


In [80]:
import pandas as pd
from datetime import datetime as dt
import os
import numpy as np
import altair as alt

#pd.set_option('display.max_columns', None)
#Optional command to see all columns

#pd.set_option('display.max_rows', None)
#Optional command to see all rows

data_dir = os.environ["DATA_DIR"]
raw_data = data_dir + "/raw/"
processed_data = data_dir + '/processed/'
petitions_file = os.path.join(processed_data, 'tenant_petitions_2010.csv')

tenant_petitions = pd.read_csv(petitions_file, low_memory=False)

In [2]:
tenant_petitions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17269 entries, 0 to 17268
Data columns (total 15 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   petition_id          17269 non-null  object 
 1   date_filed           17269 non-null  object 
 2   Priority             17269 non-null  bool   
 3   appeal_id            0 non-null      float64
 4   filing_party         17269 non-null  object 
 5   prop_i_petition      0 non-null      float64
 6   Address              17269 non-null  object 
 7   petition_zip_code    17262 non-null  object 
 8   supervisor_district  17269 non-null  float64
 9   neighborhoods        17269 non-null  object 
 10  shape                17269 non-null  object 
 11  year                 17269 non-null  int64  
 12  month                17269 non-null  int64  
 13  reason               17269 non-null  object 
 14  value                17269 non-null  bool   
dtypes: bool(2), float64(3), int64(2), ob

#### Since I prepared the data in the previous notebook, there's not much to see here ^. 

### 1. In which year did tenants file the most petitions?

In this portion of the analysis, we'll group by year and count each rows in each year. 

In [3]:
pop_years = tenant_petitions.groupby('year').petition_id.count().reset_index().sort_values('petition_id', ascending=False)
pop_years

Unnamed: 0,year,petition_id
5,2015,1912
4,2014,1881
2,2012,1598
8,2018,1511
3,2013,1510
7,2017,1499
6,2016,1473
1,2011,1427
9,2019,1383
0,2010,1255


In [4]:
pop_years_chart = alt.Chart(pop_years).mark_bar().encode(
    x=alt.X('year:O', title='Year'),
    y=alt.Y('petition_id', title='Amount of Complaints'),
    tooltip=[alt.Tooltip('petition_id', title='Amount of Complaints')]
).properties(
    title='Complaints by Year',
    width=800,
    height=300
).configure_bar(
    opacity=0.8,
    color='gold'
)
pop_years_chart

### Finding 1: 
* More than 1,900 complaints were leveled at landlords in 2015, making it the year that saw the highest number of complaints. 2014 was not far behind, with just 30 fewer complaints. In 2014, there were several news reports about record-breaking rent prices for one-bedrooms in San Francisco. According to Business Insider, which cites an analysis conducted by real estate website Zumper, rent for a one-bedroom in 2014 rose 13.5% city-wide. In some neighborhoods, rent for a one-bedroom rose as much as 30%. It makes sense that increasing rent would lead to more complaints: Tenants might have been surprised by the amount they'd have to spend in 2014 versus 2013, and they might have seen a quality of apartment maintenance that didn't correspond with the new, higher rent they were paying. 

SOURCES: https://www.businessinsider.com/chart-shows-how-expensive-san-francisco-rent-became-in-2014-2014-12 ; https://sf.curbed.com/2014/4/14/10115060/are-you-sitting-down-sfs-median-rent-rate-is-3200-month
* 2020 and 2021 saw the fewest amount of complaints. There are several factors that could explain why years characterized by financial strife and longer hours at home didn't correlate with more complaints from tenants. According to data from the California Policy Lab, San Francisco saw a decrease in population in 2020. That's the first time that's happened since 2016, when CA Policy Lab started collecting data. According to a SF Chronicle analysis of the data: "the number of Californians leaving the state for other parts of the country has increased, while the number of people moving into the state from the rest of the U.S. has decreased." Fewer people means less competition for apartments and lower rents. According to data from ApartmentsList cited by SFGate, overall rents in San Francisco dropped more than 26% between March 2020 and January 2021. 

SOURCES:
https://www.capolicylab.org/pandemic-patterns-california-is-seeing-fewer-entrances-and-more-exits-updated/
https://www.sfchronicle.com/projects/2021/california-bay-area-migration-maps/
https://www.sfgate.com/realestate/article/Can-your-landlord-raise-the-rent-pandemic-16302349.php

### 2. What kinds of complaints are the most popular? 

In [5]:
pop_petitions = tenant_petitions.groupby('reason').petition_id.count().reset_index().sort_values('petition_id', ascending=False)
pop_petitions

Unnamed: 0,reason,petition_id
2,tenant_decrease_housing_services,5789
13,tenant_wrongful_eviction,4457
7,tenant_rent_increase,2723
9,tenant_section_six,813
1,tenant_capital_improvement,812
12,tenant_water_revenue_hardship,608
5,tenant_other,491
3,tenant_failure_repair_maintain,384
0,tenant_alt_dispute,365
6,tenant_passthrough_challenge,264


In [6]:
alt.Chart(pop_petitions).mark_bar().encode(
    x=alt.X('reason', sort='-y', title='Complaint Type'),
    y=alt.Y('petition_id', title='Amount of Complaints'),
    tooltip=[alt.Tooltip('petition_id', title='Amount of Complaints')]
).properties(
    title='Top Complaints',
    width=800,
    height=300
).configure_bar(
    opacity=0.8,
    color='green'
)

### Finding 2:

* The most popular tenant complaint is a decrease in housing services. This means the tenant alleges their landlord decreased necessary maintenance, including repairs and painting, without decreasing their base rent. Petitions alleging a decrease in housing services are essentially pleas to the Rent Board to compel the landlord to decrease the tenant's rent. In order for that to happen, the tenant must prove that they brought the issue to the landlord and the landlord did not change their practices. 
* The other two prominent types of complaints were those that alleged a wrongful eviction or unlawful rent increase. It's worth noting that while the Rent Board will investigate complaints alleging a wrongful eviction, it cannot decide whether the eviction is legal or not—that's up to the court.   

#### To dig into this finding, I will see how the top three complaints changed over time. 

In [7]:
three_most_popular = tenant_petitions[
    (tenant_petitions['reason'] == 'tenant_decrease_housing_services') |
    (tenant_petitions['reason'] =='tenant_rent_increase')|
    (tenant_petitions['reason'] =='tenant_wrongful_eviction')
]

In [8]:
most_pop_by_year = three_most_popular.groupby(
    ["reason", "year"]
).petition_id.count().reset_index()

In [78]:
most_pop_by_year_chart = alt.Chart(most_pop_by_year).mark_line().encode(
    x=alt.X('year:O', title='Year'),
    y=alt.Y('petition_id', title='Amount of Complaints'),
    color=alt.Color('reason', title='Complaint Type'),
    strokeDash='reason',
    tooltip=['year', 'petition_id', 'reason']
).properties(
    title='Top Three Complaints Over Time',
    width=800,
    height=300
).interactive()

most_pop_by_year_chart.configure_legend(
    strokeColor='gray',
    fillColor='#FFFFF',
    padding=10,
    cornerRadius=0,
)

#### This chart is helpful, but I think it will be even more helpful to see how these three complaints changed over time along with the trend of all complaints, which I showed earlier in the pop_years_chart. 

The below cell demonstrates how to create a layered chart in altair. I copied my code for the chart under question one (pop_years_chart) and named it 'base,' since it will be the background chart. Next, I copied my code for the chart immediately above this text (most_pop_by_year) and named it 'second_layer,' since it will be in the foreground. 

I combine the two charts using the alt.layer() function.

In [79]:
base = alt.Chart(pop_years).encode(
    x=alt.X('year:O', title='Year'),
    y=alt.Y('petition_id', title='Amount of Complaints')
)

second_layer = alt.Chart(most_pop_by_year).encode(
    x=alt.X('year:O', title='Year'),
    y=alt.Y('petition_id', title='Amount of Complaints'),
    color=alt.Color('reason', title='Complaint Type'),
    strokeDash='reason',
    tooltip=['year', 'petition_id', 'reason']
)

alt.layer(
    base.mark_bar().encode(),
    second_layer.mark_line().encode()
).properties(
    title='Top Three Complaints Over Time Versus All Complaints',
    width=800,
    height=300
).configure_bar(
    opacity=0.4,
    color='gold'
)


### Finding 2 (cont.):
* It's interesting that complaints alleging an unlawful rent increase and wrongful eviction decreased since 2015, mirroring the pattern of the decrease in *all complaints* since 2015, but the complaints alleging a decrease in housing services remained steady. This raises a few questions that could guide further reporting on the trend: Could the stay-at-home orders be responsible, in that people were spending more time at their homes and demanded/noticed housing services? Did complaints for housing services consistently come from the same source neighborhood? Are there any similarities between the buildings where complaints about a decrease in housing services came from? Like a similar owner?  

### 3. From which neighborhoods did most complaints arise? And how does the number of complaints compare to the population of renters in each neighborhood?  

In this portion of the analysis, I'll group by neighborhoods and count the number of complaints. 

In [11]:
pop_neighborhoods = tenant_petitions.groupby('neighborhoods').petition_id.count().reset_index().sort_values('petition_id', ascending=False)

In [12]:
pop_neighborhoods.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 41 entries, 18 to 29
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   neighborhoods  41 non-null     object
 1   petition_id    41 non-null     int64 
dtypes: int64(1), object(1)
memory usage: 984.0+ bytes


In [13]:
alt.Chart(pop_neighborhoods).mark_bar().encode(
    x=alt.X('neighborhoods', sort='-y', title='Neighborhood'),
    y=alt.Y('petition_id', title='Amount of Complaints'),
    tooltip=[alt.Tooltip('petition_id', title='Amount of Complaints')]
).properties(
    title='Amount of Complaints by Neighborhood',
    width=800,
    height=300
).configure_bar(
    opacity=0.8,
    color='pink'
)

### Finding 3:
* The main takeaway from the above graph is that most tenant complaints since 2010 came from people living in the Mission. Tenderloin and Sunset/Parkside saw the second and third most complaints. 

#### The raw number of complaints per neighborhood is good, but it's not normalized. That means the above graph doesn't take into account how many people live in any given neighorhood, and even more specifically, how many people in that neighborhood rent, rather than own, their home. In the last bit of analysis that follows, I use additional data from the city of San Francisco and the American Community Survey five-year estimates to find a rate of complaints per every 100 renters in each neighborhood in two different years: 2015 and 2019. 

#### Broadly, these are the next steps:
1. Bring in 2010 census tracts assigned to SF neighborhoods
2. Use Census API to get renters per tract in 2015 and 2019
3. Merge dataframes from steps one and two, resulting in two new dataframes: merged_2015 and merged_2019
4. Add up renters by neighborhod in 2015 and 2019, and sum the complaints by neighborhood in 2015 and 2019 using my original dataframe 
5. Merge the group-by objects in step 4, and find rate of complaints per 100 renters in the neighborhood

#### 1. Bring in 2010 census tracts assigned to SF neighborhoods

San Francisco has data categorizing 2010 census tracts by neighborhood: https://data.sfgov.org/Geographic-Locations-and-Boundaries/Analysis-Neighborhoods-2010-census-tracts-assigned/bwbp-wk3r

I pull that data in here using the relative path I defined at the beginning of this notebook. I drop columns that are not important to this analysis, including the shape information used to create a map. I retain and rename columns regarding the tract number and what neighborhood it corresponds to. 


In [53]:
tracts_file = os.path.join(processed_data, 'sf_tracts_neighborhood_EDITED.csv')
tracts_file = pd.read_csv(tracts_file, low_memory=False, dtype=str)

In [54]:
tracts_file = tracts_file.drop(columns=['the_geom', 
                                        'GEOID', 
                                        'SHAPE_AREA', 
                                        'SHAPE_LEN',
                                       'Unnamed: 6',
                                       'Unnamed: 7']
                              ).rename(columns={'NHOOD': 'neighborhoods',
                                                'TRACTCE10': 'tract'})

In [55]:
tracts_file.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   neighborhoods  196 non-null    object
 1   tract          196 non-null    object
dtypes: object(2)
memory usage: 3.5+ KB


In [56]:
tracts_file

Unnamed: 0,neighborhoods,tract
0,Bayview Hunters Point,61200.0
1,Bayview Hunters Point,23003.0
2,Bayview Hunters Point,23001.0
3,Bayview Hunters Point,23400.0
4,Bayview Hunters Point,23300.0
5,Bayview Hunters Point,23102.0
6,Bayview Hunters Point,23103.0
7,Bayview Hunters Point,23200.0
8,Bayview Hunters Point,61000.0
9,Bayview Hunters Point,980600.0


Here I notice that there are multiple rows that are N/A. I know from looking at the CSV that these are the rows dedicated to shape information, so I will drop them using their index.  

In [57]:
tracts_file.drop(tracts_file.index[196:214], axis=0, inplace=True)

In [58]:
tracts_file

Unnamed: 0,neighborhoods,tract
0,Bayview Hunters Point,61200
1,Bayview Hunters Point,23003
2,Bayview Hunters Point,23001
3,Bayview Hunters Point,23400
4,Bayview Hunters Point,23300
5,Bayview Hunters Point,23102
6,Bayview Hunters Point,23103
7,Bayview Hunters Point,23200
8,Bayview Hunters Point,61000
9,Bayview Hunters Point,980600


#### 2. Use Census API to get renters per tract in 2015 and 2019

To work with the Census API, I need the requests library. 

In [19]:
import requests
from pprint import pprint

I'll create a function for the Census API call.

In [20]:
def get_ACS(year):
    dsource = 'acs'
    dname = 'acs5'
    base_url = f'https://api.census.gov/data/{year}/{dsource}/{dname}'
    chart = 'B25003'
    state = '06'
    county = '075'
    api_key = os.environ["CENSUS_API_KEY"]
    data_url = f'{base_url}?get=group({chart})&for=tract:*&in=state:{state}&in=county:{county}&key={api_key}'
    response=requests.get(data_url)
    json=response.json()
    return json

Ok, let's breakdown the above API call so that future Kavish or someone else using this script can figure out how to change the fields to get the data they need. 

* The function takes a year as an argument.
* dsource is the Census product you need information from. I'm using the ACS but one could use this to specify that they want information from the decennial census, for example. 
* dname is the name of the survey I want: ACS five year estimates. 
* **The above forms the base url.**
* chart refers to the specific table I want. In this case, table B25003 is called 'Tenure' and lists the amount of owner and rent-occupied units in the specified geography.
* state and county are self-explanatory. Here, 06 = California and 075 = San Francisco County.
* My api_key is stored on my computer. 
* **The base url plus the extra chart and geographical specifications forms the final url that tells the API exactly what data we want.**

Pro-tip: If you want to check out what kind of data you can expect to get before sending the API call, you can put the data_url minus the API key in your browser. For example: https://api.census.gov/data/2015/acs/acs5?get=group(B25003)&for=tract:*&in=state:06&in=county:075

Note: The url construction here will return all tracts in the specified county (hence 'for=tract:(asterisk)'). If you wanted a specific tract, or a completely different geography altogether, check out the examples of how to construct the url here: https://api.census.gov/data/2015/acs/acs5/examples.html

#### Alrighty, let's use the function to call data for 2015 and 2019. 

In [60]:
renters_per_tract_2015 = get_ACS(2015)
renters_2015 =pd.DataFrame(renters_per_tract_2015[1:], columns=renters_per_tract_2015[0] #first argument specified rows, second specifies columns
                         ).rename(columns={"GEO_ID":"geo_id",
                                           "B25003_001E":"total",
                                           "B25003_001M":"total_moe", #moe is margin of error
                                           "B25003_002E":"owner",
                                           "B25003_002M":"owner_moe",
                                           "B25003_003E":"renter",
                                          "B25003_003M":"renter_moe"} #these readable column names came from investigating the chart on data.census.gov
                                 ).drop(columns=['B25003_001EA', 
                                                 'B25003_001MA', 
                                                 'B25003_002EA',
                                                 'B25003_002MA',
                                                 'B25003_003EA',
                                                 'B25003_003MA',
                                                 'geo_id',]
                                       )

In [61]:
renters_2015

Unnamed: 0,total,total_moe,owner,owner_moe,renter,renter_moe,NAME,state,county,tract
0,1240,72,680,110,560,131,"Census Tract 260.04, San Francisco County, Cal...",6,75,26004
1,2096,108,593,126,1503,158,"Census Tract 301.01, San Francisco County, Cal...",6,75,30101
2,2689,112,1549,203,1140,212,"Census Tract 330, San Francisco County, Califo...",6,75,33000
3,1533,106,782,95,751,133,"Census Tract 254.03, San Francisco County, Cal...",6,75,25403
4,1062,57,723,82,339,93,"Census Tract 264.01, San Francisco County, Cal...",6,75,26401
5,2109,133,459,112,1650,144,"Census Tract 302.02, San Francisco County, Cal...",6,75,30202
6,2589,69,1777,189,812,185,"Census Tract 353, San Francisco County, Califo...",6,75,35300
7,2197,116,673,153,1524,173,"Census Tract 476, San Francisco County, Califo...",6,75,47600
8,1872,126,236,94,1636,134,"Census Tract 106, San Francisco County, Califo...",6,75,10600
9,1545,101,934,116,611,126,"Census Tract 610, San Francisco County, Califo...",6,75,61000


In [59]:
renters_per_tract_2019 = get_ACS(2019)
renters_2019=pd.DataFrame(renters_per_tract_2019[1:], columns=renters_per_tract_2019[0] #first argument specified rows, second specifies columns
                         ).rename(columns={"GEO_ID":"geo_id",
                                           "B25003_001E":"total",
                                           "B25003_001M":"total_moe", #moe is margin of error
                                           "B25003_002E":"owner",
                                           "B25003_002M":"owner_moe",
                                           "B25003_003E":"renter",
                                          "B25003_003M":"renter_moe"} #these readable column names came from investigating the chart on data.census.gov
                                 ).drop(columns=['B25003_001EA', 
                                                 'B25003_001MA', 
                                                 'B25003_002EA',
                                                 'B25003_002MA',
                                                 'B25003_003EA',
                                                 'B25003_003MA',
                                                 'geo_id',]
                                       )

### 3. Merge dataframes from steps one and two, resulting in two new dataframes: merged_2015 and merged_2019

* merged_2015 = Census tracts in SF and 2015 estimates of renters
* merged_2019 = Census tracts in SF and 2019 estimates of renters

Before I can do the merge, I should see if there are any discrepancies between the data I got from SF Data (which is every tract in SF and its corresponding neighborhood) and the data I got from the API (which shows the number of renters in each tract).

In [25]:
renter_tracts = set(renters_2015['tract'].unique())
tract_file_tracts = set(tracts_file['tract'].unique())
renter_tracts - tract_file_tracts #figure out where those tracts went from sf data

{'980401', '990100'}

#### Ok so these two tracts are in the Census data but not in the SF data. 

##### I looked up the above census tracts that are in the renter_tracts dataframe but not in the tract_files dataframe on Census Reporter, and it looks like they have a population of 0. So I feel comfortable moving on with my analysis. These tracts will be dropped when I do the merge.
* 980401: https://censusreporter.org/profiles/14000US06075980401-census-tract-980401-san-francisco-ca/
* 990100: https://censusreporter.org/profiles/14000US06075990100-census-tract-9901-san-francisco-ca/

In [26]:
merged_2015 = pd.merge(tracts_file,renters_2015,on='tract')

In [70]:
merged_2015.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 196 entries, 0 to 195
Data columns (total 11 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   neighborhoods  196 non-null    object
 1   tract          196 non-null    object
 2   total          196 non-null    object
 3   total_moe      196 non-null    object
 4   owner          196 non-null    object
 5   owner_moe      196 non-null    object
 6   renter         196 non-null    int64 
 7   renter_moe     196 non-null    object
 8   NAME           196 non-null    object
 9   state          196 non-null    object
 10  county         196 non-null    object
dtypes: int64(1), object(10)
memory usage: 18.4+ KB


In [66]:
merged_2015

Unnamed: 0,neighborhoods,tract,total,total_moe,owner,owner_moe,renter,renter_moe,NAME,state,county
0,Bayview Hunters Point,61200,1166,74,465,90,701,89,"Census Tract 612, San Francisco County, Califo...",6,75
1,Bayview Hunters Point,23003,1191,92,922,130,269,111,"Census Tract 230.03, San Francisco County, Cal...",6,75
2,Bayview Hunters Point,23001,1335,78,919,107,416,99,"Census Tract 230.01, San Francisco County, Cal...",6,75
3,Bayview Hunters Point,23400,887,92,382,81,505,102,"Census Tract 234, San Francisco County, Califo...",6,75
4,Bayview Hunters Point,23300,905,86,590,70,315,93,"Census Tract 233, San Francisco County, Califo...",6,75
5,Bayview Hunters Point,23102,1547,82,520,85,1027,120,"Census Tract 231.02, San Francisco County, Cal...",6,75
6,Bayview Hunters Point,23103,1039,92,62,39,977,94,"Census Tract 231.03, San Francisco County, Cal...",6,75
7,Bayview Hunters Point,23200,1249,107,721,115,528,133,"Census Tract 232, San Francisco County, Califo...",6,75
8,Bayview Hunters Point,61000,1545,101,934,116,611,126,"Census Tract 610, San Francisco County, Califo...",6,75
9,Bayview Hunters Point,980600,164,40,103,30,61,34,"Census Tract 9806, San Francisco County, Calif...",6,75


I'll want to convert the renter column to an integer, because I plan on adding them up by neighborhood..

In [68]:
merged_2015['renter'] = merged_2015.renter.astype(int)

#### Gonna do the same process above for 2019 data.

In [71]:
renter_tracts = set(renters_2019['tract'].unique())
tract_file_tracts = set(tracts_file['tract'].unique())
renter_tracts - tract_file_tracts #figure out where those tracts went from sf data

{'980401', '990100'}

#### Same deal for 2019. Will proceed with the merge. 

In [30]:
merged_2019 = pd.merge(tracts_file,renters_2019,on='tract')

In [31]:
merged_2019['renter'] = merged_2019.renter.astype(int)

### 4. Add up renters by neighborhod in 2015 and 2019, and sum the complaints by neighborhood in 2015 and 2019 using my original dataframe 

#### Merged groupbys:
* renters_2015_neighborhood = Sum of renters grouped by neighborhood in 2015
* renters_2019_neighborhood = Sum of renters grouped by neighborhood in 2019

#### Tenant_petitions groupbys:
* grouped_complaints_in_2015 = Sum of complaints grouped by neighborhood in 2015
* grouped_complaints_in_2019 = Sum of complaints grouped by neighborhood in 2019

In [72]:
renters_2015_neighborhood = merged_2015.groupby('neighborhoods').renter.sum().reset_index()

In [73]:
renters_2015_neighborhood

Unnamed: 0,neighborhoods,renter
0,Bayview Hunters Point,5672
1,Bernal Heights,4020
2,Castro/Upper Market,6539
3,Chinatown,6204
4,Excelsior,4135
5,Financial District/South Beach,6640
6,Glen Park,1402
7,Golden Gate Park,65
8,Haight Ashbury,5891
9,Hayes Valley,7239


In [74]:
mask_2015_complaints = (tenant_petitions['year'] == 2015)
complaints_in_2015 = tenant_petitions.loc[mask_2015_complaints]

In [75]:
grouped_complaints_in_2015 = complaints_in_2015.groupby('neighborhoods').petition_id.count().reset_index()

In [76]:
grouped_complaints_in_2015

Unnamed: 0,neighborhoods,petition_id
0,Bayview Hunters Point,64
1,Bernal Heights,54
2,Castro/Upper Market,39
3,Chinatown,157
4,Excelsior,82
5,Financial District/South Beach,29
6,Glen Park,7
7,Golden Gate Park,6
8,Haight Ashbury,46
9,Hayes Valley,62


In [34]:
renters_2019_neighborhood = merged_2019.groupby('neighborhoods').renter.sum().reset_index()

In [38]:
mask_2019_complaints = (tenant_petitions['year'] == 2019)
complaints_in_2019 = tenant_petitions.loc[mask_2019_complaints]

In [39]:
grouped_complaints_in_2019 = complaints_in_2019.groupby('neighborhoods').petition_id.count().reset_index()

### 5. Merge the groupby objects in step 4, and find rate of complaints per 100 renters in the neighborhood 
* **complaints_by_renters_2015** = renters_2015_neighborhood, which has the amount of renters per neighborhood in **2015**, and grouped_complaints_in_2015, which has the amount of complaints by neighborhood in **2015**
* **complaints_by_renters_2019** = renters_2019_neighborhood, which has the amount of renters per neighborhood in **2019**, and grouped_complaints_in_2015, which has the amount of complaints by neighborhood in **2019**

#### For each merged dataframe, I will create a column to calculate the rate of complaints for every 100 renters. I chose a rate of per 100 because some neighborhoods have fewer than 1,000 renters, according to the ACS data.

In [40]:
complaints_by_renters_2015 = pd.merge(renters_2015_neighborhood, grouped_complaints_in_2015, on='neighborhoods')

In [41]:
complaints_by_renters_2015['rate'] = ((complaints_by_renters_2015['petition_id']/complaints_by_renters_2015['renter'])*100).round(2)

In [42]:
complaints_by_renters_2015

Unnamed: 0,neighborhoods,renter,petition_id,rate
0,Bayview Hunters Point,5672,64,1.13
1,Bernal Heights,4020,54,1.34
2,Castro/Upper Market,6539,39,0.6
3,Chinatown,6204,157,2.53
4,Excelsior,4135,82,1.98
5,Financial District/South Beach,6640,29,0.44
6,Glen Park,1402,7,0.5
7,Golden Gate Park,65,6,9.23
8,Haight Ashbury,5891,46,0.78
9,Hayes Valley,7239,62,0.86


In [43]:
complaints_by_renters_2019 = pd.merge(renters_2019_neighborhood, grouped_complaints_in_2019, on='neighborhoods')
complaints_by_renters_2019['rate'] = ((complaints_by_renters_2019['petition_id']/complaints_by_renters_2019['renter'])*100).round(2)

In [44]:
complaints_by_renters_2019

Unnamed: 0,neighborhoods,renter,petition_id,rate
0,Bayview Hunters Point,5849,33,0.56
1,Bernal Heights,3888,38,0.98
2,Castro/Upper Market,6901,78,1.13
3,Chinatown,6486,59,0.91
4,Excelsior,3587,40,1.12
5,Financial District/South Beach,7656,9,0.12
6,Glen Park,1201,2,0.17
7,Golden Gate Park,48,2,4.17
8,Haight Ashbury,5562,59,1.06
9,Hayes Valley,7696,60,0.78


In [77]:
alt.Chart(complaints_by_renters_2015).mark_circle(size=60).encode(
    x=alt.X('petition_id',title='Amount of Complaints'),
    y=alt.Y('rate',title='Complaints for Every 100 Renters'),
    tooltip=['neighborhoods', 'rate', 'petition_id']
).properties(
    title='Complaints by Neighborhood in 2015',
    width=800,
    height=300
).interactive() 

In [46]:
alt.Chart(complaints_by_renters_2019).mark_circle(size=60).encode(
    x=alt.X('petition_id',title='Amount of Complaints'),
    y=alt.Y('rate',title='Complaints for Every 100 Renters'),
    tooltip=['neighborhoods', 'rate', 'petition_id']
).properties(
    title='Amount of Complaints by Neighborhood in 2019',
    width=800,
    height=300
).interactive()

### Finding 3 (cont.):
* In 2015, Golden Gate Park had the most complaints per 100 renters. However, the number of complaints from that neighborhood in that year was only 9. What's more interesting is that in Chinatown, which had the third highest amount of complaints in 2015, had a rate of 2.5 complaints per 100 renters. The rate of complaints per 100 renters in Chinatown *decreased* in 2019 to less than one (but keep in mind that the number of complaints in 2019 was lower than in 2015).
* The main finding here is that none of the neighborhoods with a high number of complaints stick out. They all have relatively low rates of complaints per 100 renters. 

## Conclusion 😅

**Findings**

Since 2010, tenants filed the most complaints in 2014 and 2015. This correlates with an increase in rent that the city hadn't seen before. Complaints overall decreased to their lowest levels in 2020 and 2021, coinciding with the pandemic and increased housing protections. However, complaints alleging that the landlord decreased housing services stayed relatively stable, while the number of all complaints decreased over time. This could be related to the fact that more people were home and perhaps noticed or demanded more housing services. Overall, most of the tenant complaints since 2010 came from the Mission, but when normalized for the number of renters there in 2015 and 2019, Mission didn't stand out. 

**Further analysis/reporting**

I'd be interested in seeing complaints/number of renters for each year in this interval. Is there a year where a neighborhood with a high number of complaints stood out for its rate? Are the complaints all similar? Do they all come from a specific building? 

Why did the number of housing services complaints stay steady when the overall number of complaints decreased? How did the rent board handle these complaints, and what was the outcome? Did tenants win lower rents for what they saw as decreased housing services? I'd be especially interested in focusing on housing service complaints that occurred during the pandemic, since, as mentioned before, people were more home for longer periods of time. 

Are the trends similar for complaints from landlords? Do landlord complaints rise in 2015 and decrease after? 