##### Justin Valdez 

# The Battle of Neighborhoods 

## Week 1

❏ A description of the problem and a discussion of the background. 

❏ A description of the data and how it will be used to solve the problem.

## 1. Introduction Section:

### Discussion of the business problem and the audience who would be interested in this project.

#### 1.1 Scenario and Background:
A contractor is moving to one of the biggest cities in California, Los Angeles. This contractor's name is John. He is planning to open a restaurant and is in need of recommendations. Quick summary if you don't know much about LA - it's a really big city that has a population of almost 4 million people at a density of over 27,000 people per square mile. Now, it will be stressful and hard to find a spot for John to find, but I will explore ways to make sure my decision on a location for John is factual and rewarding. 

#### 1.2 Problem to be resolved:
The challenge to resolve is being able to find a location in Los Angeles, California that is available not only to John, but for customers to be able to visit. Therefore, in order to set a basis for comparison, I want to find a spot subject to the following conditions:
* Visibility and accessibility
* The competition of the area
* Located within walking distance (<=1.0 mile, 1.6 km) from a subway metro station in LA

#### 1.3 Interested Audience
The audience that may be interested in this project are anyone considering moving to a major city and starting something new. The use of FourSquare data and mapping techniques combined with data analysis will help resolve the key questions arisen. 

## 2. Data Section:

### A description of the data and how it will be used to solve the problem.

#### 2.1 Description of the Data:
Foursquare will be used to identify venues around the popular areas of LA. Hot areas in LA will serve as a reference for the desired future location.

#### 2.2 Data Required to Resolve the Problem:
* List of restaurants of LA with their geodata (latitude and longitude)
* List of Metro Rail Stations in LA with their address location
* Venues for each LA neighborhood (than can be clustered)
* Venues for metro  rail stations

#### 2.3 How the Data Will be Used to Solve the Problem:
The data will be used as follows:

* Use Foursquare and geopy data to map venues for all LA neighborhoods and clustered in groups
* Use foursquare and geopy data to map the location of metro rail stations, separately and on top of the above clustered map in order to be able to identify the venues and ammenities near each metro station, or explore each metro location separately
* Use Foursquare and geopy data to map the location of rental places, in some form, linked to the metro locations. 
* Create a map that depicts, for instance, the average rental price per square ft, around a radious of 1.0 mile (1.6 km) around each metro station - or a similar metrics. I will be able to quickly point to the popups to know the relative price per metro area.
* Addresses from rental locations will be converted to geodata (lat, long) using Geopy-distance and Nominatim.
* Data will be searched in open data sources if available, from real estate sites if open to reading, libraries or other government agencies such as Los Angeles Metro Rail stations, etc.

The procesing of these DATA will allow to answer the key questions to make a decision:

* What is the cost of rent (per square ft) around a mile radius from each metro station?
* What is the area of LA  with best rental pricing that meets criteria established?
* What are the venues of the two best places to live? How the prices compare?
* How venues distribute among LA neighborhoods and around metro stations?
* Are there tradeoffs between size and price and location?
* Any other interesting statistical data findings of the real estate and overall data.

### Installing and importing important libraries

In [16]:
# Libraries
print("Installing libraries . . . ")
print(". . .")
print("")

# Numpy and Pandas
import numpy as np # library used to handle data in a vectorized manner
import pandas as pd # pandas library

# JSON
import json # library used to handle JSON files
import requests # library used to handle requests

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# Time
import time # module provides various time-related functions

# Random 
import random # library for random number generation

# CSV
import csv # module implements classes to read and write tabular data in CSV format

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Seaborn
import seaborn as sns # data visualization library based on matplotlib

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Geopy
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# Folium
!pip install folium
import folium # plotting library
from folium import plugins # Wrap some of the most popular leaflet external plugins


# Success 
print("")
print("Libraries imported")

Installing libraries . . . 
. . .


Libraries imported


## Restaurant Data
The data below pictures the competition in LA. 

In [25]:
# Data
restaurants_in_la = 'https://data.lacity.org/api/views/ieer-tbdq/rows.csv?accessType=DOWNLOAD'

# DatFrame
df = pd.read_csv(restaurants_in_la)

df.head()

Unnamed: 0,LOCATION ACCOUNT #,BUSINESS NAME,DBA NAME,STREET ADDRESS,CITY,LOCATION DESCRIPTION,MAILING ADDRESS,MAILING CITY,MAILING ZIP CODE,NAICS,PRIMARY NAICS DESCRIPTION,COUNCIL DISTRICT,LOCATION START DATE,LOCATION END DATE,LOCATION
0,0000410578-0001-0,DOUBLE O TWO INC,INTERNATIONAL HOUSE OF PANCAKES #2,8555 VESPER AVENUE,PANORAMA CITY,8555 VESPER 91402-2914,24801 PICO CANYON ROAD #200,STEVENSON RANCH,91381-1762,722110,Full-service restaurants,6,08/02/1982,,"(34.2253, -118.4502)"
1,0002815295-0001-6,MANNA LA LLC,PANERA BREAD,8647 S SEPULVEDA BLVD,WESTCHESTER,8647 SEPULVEDA 90045-4001,2339 11TH STREET,ENCINITAS,92024-6604,722110,Full-service restaurants,11,03/15/2015,,"(33.9592, -118.3963)"
2,0002537656-0014-1,CALIFORNIA FOOD MANAGEMENT LLC,BURGER KING 11024 | CALIFORNIA FOOD MANAGEMENT,5609 W SUNSET BLVD,LOS ANGELES,5609 SUNSET 90028-8534,8306 WILSHIRE BLVD SUITE #5002,BEVERLY HILLS,90211-2304,722110,Full-service restaurants,13,01/03/2011,,"(34.098, -118.3117)"
3,0002798053-0001-3,CHARCOAL VENICE PARTNERS LLC,CHARCOAL VENICE,425 WASHINGTON BLVD,VENICE,425 WASHINGTON 90292-5213,425 WASHINGTON BLVD,VENICE,90292-5213,722110,Full-service restaurants,11,01/18/2015,,"(33.9815, -118.4627)"
4,0002537656-0006-1,CALIFORNIA FOOD MANAGEMENT LLC,BURGER KING 2554 | CALIFORNIA FOOD MANAGEMENT,181 S VERMONT AVENUE,LOS ANGELES,181 VERMONT 90004-5904,8306 WILSHIRE BLVD SUITE #5002,BEVERLY HILLS,90211-2304,722110,Full-service restaurants,13,01/03/2011,,"(34.0705, -118.2916)"


## Metro Rail Stations Data
The data below shows the metro rail stations in LA needed for transportation.

In [27]:
# Data
metro_rail_stations = 'https://opendata.arcgis.com/datasets/6679d1ccc3744a7f87f7855e7ce33395_1.csv'

# Dataframe
df2 = pd.read_csv(metro_rail_stations)

df2.head()

Unnamed: 0,OBJECTID,MetroLine,Station,StopNumber,TOOLTIP,NLA_URL
0,1,Blue Line,Downtown Long Beach Station,80101,Stop: Downtown Long Beach Station\nStop No: 80...,http://www.metro.net/riding/maps/blue-line/?nl...
1,2,Blue Line,Pacific Ave Station,80102,Stop: Pacific Ave Station\nStop No: 80102\nBlu...,http://www.metro.net/riding/maps/blue-line/?nl...
2,3,Blue Line,Anaheim Street Station,80105,Stop: Anaheim Street Station\nStop No: 80105\n...,http://www.metro.net/riding/maps/blue-line/?nl...
3,4,Blue Line,Pacific Coast Hwy Station,80106,Stop: Pacific Coast Hwy Station\nStop No: 8010...,http://www.metro.net/riding/maps/blue-line/?nl...
4,5,Blue Line,Willow Street Station,80107,Stop: Willow Street Station\nStop No: 80107\nB...,http://www.metro.net/riding/maps/blue-line/?nl...
