#Analysing Eviction Notices in San Francisco  

This notebook is the final assignment given to Standford University's Programming in Journalism class - the course repository can be found [here](https://github.com/stanfordjournalism/stanford-progj-2020). 
For this task, I will be analysing data on eviction notices in San Francisco between 2010 to 2020. I will be using the following tools to accomplish this task:
* [Jupyter Notebook](https://jupyter.org/)
* [Pandas](https://pandas.pydata.org/)

TO DO----
A narrative summary at top briefly describing the story idea and highlighting important background information about the data. This should include a description of the key fields used in your analysis, along with any significant data anomalies or issues and an explanation of how you worked around these issues.

At least 2 "Findings". These should be summarized in narrative -- e.g. "Arrests were highest in September" -- and demonstrated clearly using code and/or visualization.


The first step will be importing the above libraries.

In [2]:
import pandas as pd
import numpy as np
from sodapy import Socrata

#Loading the data
The eviction notice data is taken from [DataSF](https://datasf.org/) as per the requirments of this assignment (details of this assignment can be found [here](https://github.com/stanfordjournalism/stanford-progj-2020/blob/master/projects/sf_data_analysis.md)).

To load data, DataSF (talk about sodapy, socrata... include screenshot)

In [5]:
client = Socrata("data.sfgov.org", None)
results = client.get("5cei-gny5", limit=17600)
df = pd.DataFrame.from_records(results)



#Getting to know the data


In [7]:
df.head()

Unnamed: 0,eviction_id,address,city,state,zip,file_date,non_payment,breach,nuisance,illegal_use,...,:@computed_region_p5aj_wyqh,:@computed_region_rxqg_mtj9,:@computed_region_yftq_j783,:@computed_region_bh8s_q3mv,:@computed_region_6pnf_4xz7,:@computed_region_9jxd_iqea,:@computed_region_6ezc_tdp2,:@computed_region_h4ep_8xdi,:@computed_region_pigm_ib2e,constraints_date
0,M200680,900 Block Of Divisadero Street,San Francisco,CA,94115,2020-05-06T00:00:00.000,False,False,True,False,...,5,11,15,29490,1,,,,,
1,M200681,1100 Block Of Mission Street,San Francisco,CA,94103,2020-05-06T00:00:00.000,False,True,False,False,...,2,9,8,28853,2,7.0,1.0,1.0,6.0,
2,M200682,300 Block Of Executive Park Boulevard,San Francisco,CA,94134,2020-05-06T00:00:00.000,False,False,True,False,...,3,8,10,58,1,,,,,
3,M200679,500 Block Of Jessie Street,San Francisco,CA,94103,2020-05-06T00:00:00.000,False,False,True,False,...,2,9,14,28853,2,7.0,1.0,1.0,,
4,M200678,900 Block Of Sutter Street,San Francisco,CA,94109,2020-05-06T00:00:00.000,False,False,True,False,...,1,10,13,28858,2,,,,,


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17600 entries, 0 to 17599
Data columns (total 43 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   eviction_id                  17600 non-null  object
 1   address                      17600 non-null  object
 2   city                         17600 non-null  object
 3   state                        17599 non-null  object
 4   zip                          17598 non-null  object
 5   file_date                    17600 non-null  object
 6   non_payment                  17600 non-null  bool  
 7   breach                       17600 non-null  bool  
 8   nuisance                     17600 non-null  bool  
 9   illegal_use                  17600 non-null  bool  
 10  failure_to_sign_renewal      17600 non-null  bool  
 11  access_denial                17600 non-null  bool  
 12  unapproved_subtenant         17600 non-null  bool  
 13  owner_move_in                17

We can change the data type of file_date to datetime, and apply a mask on our dataframe so that only data in the period between 1st, January 2010 and 1st, January 2020 is selected. 

In [21]:
df.file_date = pd.to_datetime(df['file_date'])
start_date = '2010-1-1'
end_date = '2020-1-1'
mask = (df.file_date > start_date) & (df.file_date <= end_date)
df = df.loc[mask]