# Los Angeles city guide for young thrifty professionals using Foursquare API

### A description of the problem and a discussion of the background. (15 marks)
> Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem. This submission will eventually become your Introduction/Business Problem section in your final report. So I recommend that you push the report (having your Introduction/Business Problem section only for now) to your Github repository and submit a link to it.

Whether you're a seasoned Angelino or a recent transplant from somewhere else, or even just someone who's contemplating moving to the city of Angels, searching for a young, fun, and affordable neighborhood to live in Los Angeles can be a daunting task. With 16 counties and 272 neighborhoods in this great city, how do you figure out which one fits your personality and priorities the best? In order to better understand the different neighborhoods and help you make an informed choice, in this exploratory data analysis I will leverage power of data to cluster the different neighborhoods in LA. Hopefully this can help young professionals make a decision on finding a comfortable and fun neighborhood to live in without spending a fortune. I can think of 3 main criteria when evaluating neighborhoods:

1. __Housing price__: Where you live must be affordable. If most of the hard-earned money goes to rent, then little dispoable income will be left for all the other fun stuff. The recommended rent compare to earning is ~30% of after-tax paycheck.
2. __Safety__: Nobody wants to live somewhere that, although very affordable, has unspoken curfews at night because the neighborhood isn't safe. On the other hand, maybe it's not necessary to live in an expensive gated community with 24-hour security while having to shell out most of your paycheck just so you feel safe.
3. __Entertainment__: What do young and restless minds do when they are not working? They are enjoying life! An ideal neighborhood should provide a vast array of entertainment options that is affordable with high customer ratings.

Just like with anything in life, there is no 100% perfect choice for everything we do; however, when we strike a balance between what we think is important, then usually the optimal choice will emerge. In the case of finding an ideal neighborhood to live in, I believe safety, housing prices, and entertainment options are three critical criteria that a good neighborhood should have a good balance in. With that in mind, let's move onto the data requirements!


### A description of the data and how it will be used to solve the problem. (15 marks)

> Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data. This submission will eventually become your Data section in your final report. So I recommend that you push the report (having your Data section) to your Github repository and submit a link to it.

The main criteria and data sources that will be used to cluster the neighborhoods will be: 
1. Crime rate ([Data.org](https://catalog.data.gov/dataset?organization=city-of-los-angeles))
2. Housing prices ([Zillow API](https://www.zillow.com/howto/api/APIOverview.htm))
3. Venue information ([Foursquare API](https://developer.foursquare.com/docs/api/endpoints))

#### Data.org
Data.org provides [Arrest Data from 2010 to Present](https://catalog.data.gov/dataset/arrest-data-from-2010-to-present) in Los Angeles. Here's the description: 
>This dataset reflects arrest incidents in the City of Los Angeles dating back to 2010. This data is transcribed from original arrest reports that are typed on paper and therefore there may be some inaccuracies within the data. Some location fields with missing data are noted as (0.0000°, 0.0000°). Address fields are only provided to the nearest hundred block in order to maintain privacy. This data is as accurate as the data in the database. Please note questions or concerns in the comments.

This can be used to evaluate the crime history of each neighborhood. The dataset contains the following features:
>Index(['Report ID', 'Arrest Date', 'Time', 'Area ID', 'Area Name',
       'Reporting District', 'Age', 'Sex Code', 'Descent Code',
       'Charge Group Code', 'Charge Group Description', 'Arrest Type Code',
       'Charge', 'Charge Description', 'Address', 'Cross Street', 'Location'],
      dtype='object')

#### Zillow API
[Zillow API](https://www.zillow.com/howto/api/APIOverview.htm) calls of interest `GetRegionChildren` and `GetRegionChart` provide neighborhood data. They can fetch neighborhood and city affordability statistics: 
* Zillow Home Value Index, Zestimate distribution, median single family home and condo values, average tax rates, and percentage of flips.
* Demographic data at the city and neighborhood level
* Lists of counties, cities, ZIP codes, and neighborhoods, as well as latitude and longitude data for these areas so you can put them on a map.  

Also, the [Python-Zillow](https://github.com/seme0021/python-zillow) library, a Python wrapper around the Zillow API, will be used to fetch data from Zillow.
#### Foursquare API
Foursquare API provide regular endpoints for fetching:
* Price
* Like count
* Rating
* Category
* Postal code 

of venues in LA. This can be used to judge the vibrancy of the neighborhoods and whether or not it would be a fun place to live in.


In [13]:
!wget -O 'crime_data.csv' 'https://data.lacity.org/api/views/yru6-6re4/rows.csv?accessType=DOWNLOAD'
print('Crime data downloaded!')

--2019-04-10 15:35:30--  https://data.lacity.org/api/views/yru6-6re4/rows.csv?accessType=DOWNLOAD
Resolving data.lacity.org (data.lacity.org)... 52.206.140.205, 52.206.68.26, 52.206.140.199
Connecting to data.lacity.org (data.lacity.org)|52.206.140.205|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘crime_data.csv’

    [   <=>                                 ] 211,367,554 3.11MB/s   in 65s    

Last-modified header invalid -- time-stamp ignored.
2019-04-10 15:36:35 (3.12 MB/s) - ‘crime_data.csv’ saved [211367554]

Crime data downloaded!


In [14]:
import pandas as pd
df = pd.read_csv('crime_data.csv')


Unnamed: 0,Report ID,Arrest Date,Time,Area ID,Area Name,Reporting District,Age,Sex Code,Descent Code,Charge Group Code,Charge Group Description,Arrest Type Code,Charge,Charge Description,Address,Cross Street,Location
0,111421932,07/02/2016,1845.0,14,Pacific,1411,50,M,H,24.0,Miscellaneous Other Violations,M,LAMC,LOS ANGELES MUNICIPAL CODE,BROOKS,OCEAN FRONT,"(33.9918, -118.4791)"
1,121801235,12/18/2016,940.0,18,Southeast,1822,38,F,B,24.0,Miscellaneous Other Violations,M,LAMC,LOS ANGELES MUNICIPAL CODE,300 W CENTURY BL,,"(33.9456, -118.2784)"
2,150604240,01/03/2016,1315.0,6,Hollywood,669,43,M,O,24.0,Miscellaneous Other Violations,M,71.02LAMC,HIRE VEH W/O LIC,HOBART,SANTA MONICA,"(34.0908, -118.3046)"
3,150704165,04/27/2016,2230.0,7,Wilshire,721,20,F,B,6.0,Larceny,M,484(A)PC,GRAND THEFT (OVER $400),8500 BEVERLY BL,,"(34.0761, -118.3766)"
4,151405507,01/22/2016,1309.0,14,Pacific,1427,60,M,W,17.0,Liquor Laws,M,25620BP,OPEN ALCOHOLIC BEV IN PUBLIC PARK/PLACE,VENICE,MOTOR,"(34.0237, -118.4246)"


In [15]:
df.columns
# df.head()

Index(['Report ID', 'Arrest Date', 'Time', 'Area ID', 'Area Name',
       'Reporting District', 'Age', 'Sex Code', 'Descent Code',
       'Charge Group Code', 'Charge Group Description', 'Arrest Type Code',
       'Charge', 'Charge Description', 'Address', 'Cross Street', 'Location'],
      dtype='object')