# Capstone Project - The Battle of Neighborhoods (Week 1)

Now that you have been equipped with the skills and the tools to use location data to explore a geographical location, over the course of two weeks, you will have the opportunity to be as creative as you want and come up with an idea to leverage the Foursquare location data to explore or compare neighborhoods or cities of your choice or to come up with a problem that you can use the Foursquare location data to solve. If you cannot think of an idea or a problem, here are some ideas to get you started:

1. In Module 3, we explored New York City and the city of Toronto and segmented and clustered their neighborhoods. Both cities are very diverse and are the financial capitals of their respective countries. One interesting idea would be to compare the neighborhoods of the two cities and determine **how similar or dissimilar they are**. **Is New York City more like Toronto or Paris or some other multicultural city?** I will leave it to you to refine this idea.
2. In a city of your choice, if someone is looking to **open a restaurant**, where would you recommend that they open it? Similarly, if **a contractor is trying to start their own business, where would you recommend that they setup their office?**
These are just a couple of many ideas and problems that can be solved using location data in addition to other datasets. No matter what you decide to do, make sure to provide sufficient justification of **why you think what you want to do or solve is important** and **why would a client or a group of people be interested in your project**.

These are just a couple of many ideas and problems that can be solved using location data in addition to other datasets. No matter what you decide to do, make sure to provide sufficient justification of why you think what you want to do or solve is important and why would a client or a group of people be interested in your project.

## Review criteria

This capstone project will be graded by your peers. This capstone project is worth 70% of your total grade. The project will be completed over the course of 2 weeks. Week 1 submissions will be worth 30% whereas week 2 submissions will be worth 40% of your total grade.

For this week, you will required to submit the following:

1. A description of the **problem** and a discussion of the **background**. (15 marks)
    * Clearly **define a problem or an idea of your choice**, **where you would need to leverage the Foursquare location data to solve or execute**. Remember that data science problems always target an audience and are meant to **help a group of stakeholders solve a problem**, so make sure that you explicitly **describe your audience and why they would care about your problem**.
    * This submission will eventually become your **Introduction/Business Problem section** in your final report. So I recommend that you push the report (having your Introduction/Business Problem section only for now) to your Github repository and submit a link to it.
2. A description of the **data** and how it will be used to solve the problem. (15 marks)
    * **Describe the data that you will be using to solve the problem or execute your idea**. Remember that you will need to use the **Foursquare location data** to solve the problem or execute your idea. You can absolutely **use other datasets in combination with the Foursquare location data**. So make sure that you provide **adequate explanation and discussion**, **with examples, of the data that you will be using**, even if it is only Foursquare location data.
    * This submission will eventually become your **Data section** in your final report. So I recommend that you push the report (having your Data section) to your Github repository and submit a link to it.

For the second week, the final deliverables of the project will be:

1. A link to your Notebook on your Github repository, showing your code. (15 marks)
2. A full report consisting of all of the following components (15 marks):
    * **Introduction** where you discuss the business problem and who would be interested in this project.
    * **Data** where you describe the data that will be used to solve the problem and the source of the data.
    * **Methodology** section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.
    * **Results** section where you discuss the results.
    * **Discussion** section where you discuss any observations you noted and any recommendations you can make based on the results.
    * **Conclusion** section where you conclude the report.
3. Your choice of a presentation or blogpost. (10 marks)

# Coursera Capstone Project

## The Battle of Neighborhoods (Week 1)

In [1]:
import numpy as np # library to handle data in a vectorized manner
import time
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Coursera Capstone - REPORT CONTENT

1. Introduction Section : ⁃ Discussion of the business problem and the interested audience in this project.
2. Data Section:
⁃ Description of the data that will be used to solve the problem and the sources.
3. Methodology section ⁃ Discussion and description of exploratory data analysis carried out, any inferential statistical testing performed, and if any machine learnings were used establishing the strategy and purposes.
4. Results section ⁃ Discussion of the results.
5. Discussion section ⁃ Elaboration and discussion on any observations noted and any recommendations suggested based on the results.
6. Conclusion section ⁃ Report Conclusion.


# A description of the problem and a discussion of the background. (15 marks)

## 1. Introduction Section

### Discussion of the business problem and the audience who would be interested in this project.

### Business Problem:

I have been renting an apartment at San Jose area more than ten years. The leasing rate of a two beds room apartment has been raising dramatically since the crashing of economic bubble at 2008. We pay around 5,000 for our small town house now rather than 2,200 per month ten years ago. My family is pushing me lately to search and planning to buy a house by ourselves if the price is within our budget. I am very excited and I want to use this opportunity to practice what I learned of this course in order to answer relevant questions arisen. The key question is : How can I find a convenient and enjoyable place similar to my current resident area? I can use the FourSquare API we've learned at this course and some available real estate API available in the market such as ZillowAPI for example. The idea is to use this chance to practise and apply the knowledge and tools I have learned so far. In order to make a comparison and evaluation of the housing price in Bay Area, here below is my requirements:

- The ammenities in the selected neighborhood shall be similar to my current residence
- The price is around 1.5M 
- House must be at least 3 bedrooms
- More than 2 bathrooms
- 1 car garage
- House must be more that 1,500 square foot
- The location is near the supermarket within 1 mile radius
- The location is near the shopping mall within 5 mile radius
- The location is close to venues such as restaurants (Asian and Mexicon foods ...etc), parks and coffee shops

Base on the requirements listed above, I finalize the business problem as:

**How to find a suitable house which complies with the requirements on price, features, location and venues?**



### The audience who would be interested in this project:
This case is also applicable for anyone interested in exploring the ways of searching and analysis the location and real estate data for finding a suitable house to buy in Bay Area

# A description of the data and how it will be used to solve the problem. (15 marks)

## 2. Data Section

### Description of the data and its sources that will be used to solve the problem

### Description of the Data:

The following data is required to answer the issues of the problem:
- List of Boroughs and neighborhoods of Manhattan with their geodata (latitud and longitud)
- List of Subway metro stations in Manhattan with their address location
- List of apartments for rent in Manhattan area with their addresses and price
- Preferably, a list of apartment for rent with additional information, such as price, address, area, # of beds, etc
- Venues for each Manhattan neighborhood ( than can be clustered)
- Venues for subway metro stations, as needed

### How the data will be used to solve the problem

The data will be used as follows:
- Use Foursquare and geopy data to map top 10 venues for all Manhattan neighborhoods and clustered in groups ( as per Course LAB)
- Use foursquare and geopy data to map the location of subway metro stations , separately and on top of the above clustered map in order to be able to identify the venues and ammenities near each metro station, or explore each subway location separately
- Use Foursquare and geopy data to map the location of rental places, in some form, linked to the subway locations.
- create a map that depicts, for instance, the average rental price per square ft, around a radious of 1.0 mile (1.6 km) around each subway station - or a similar metrics. I will be able to quickly point to the popups to know the relative price per subway area.
- Addresses from rental locations will be converted to geodata( lat, long) using Geopy-distance and Nominatim.
- Data will be searched in open data sources if available, from real estate sites if open to reading, libraries or other government agencies such as Metro New York MTA, etc.

The procesing of these DATA will allow to answer the key questions to make a decision:

- what is the cost of rent (per square ft) around a mile radius from each subway metro station?
- what is the area of Manhattan with best rental pricing that meets criteria established?
- What is the distance from work place ( Park Ave and 53 rd St) and the tentative future home?
- What are the venues of the two best places to live? How the prices compare?
- How venues distribute among Manhattan neighborhoods and around metro stations?
- Are there tradeoffs between size and price and location?
- Any other interesting statistical data findings of the real estate and overall data.

### Reference of venues around current residence in Singapore for comparison to Manhattan place

In [2]:
# Shenton Way, District 01, Singapore
address = 'Mccallum Street, Singapore'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Singapore home are {}, {}.'.format(latitude, longitude))


  after removing the cwd from sys.path.


The geograpical coordinate of Singapore home are 1.2792423, 103.8481312.


In [3]:
neighborhood_latitude=1.2792655
neighborhood_longitude=103.8480938

In [4]:
credential = json.loads(open('Foursquare_Credential.json').read())

In [5]:
CLIENT_ID = credential['client_id']
CLIENT_SECRET = credential['client_secret']
VERSION = '20180605' # Foursquare API version

In [7]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

In [8]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d221a24d29cbb1d38c51f0e'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-51b49e49abd88dd0b4e7330f-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/winery_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d123941735',
         'name': 'Wine Bar',
         'pluralName': 'Wine Bars',
         'primary': True,
         'shortName': 'Wine Bar'}],
       'id': '51b49e49abd88dd0b4e7330f',
       'location': {'address': '206 Telok Ayer Street',
        'cc': 'SG',
        'city': 'Singapore',
        'country': 'Singapore',
        'distance': 112,
        'formattedAddress': ['206 Telok Ayer Street', '068641', 'Singapore'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 1.2799249387439204,
          'lng'

In [9]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [10]:
venues = results['response']['groups'][0]['items']
    
SGnearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
SGnearby_venues =SGnearby_venues.loc[:, filtered_columns]

# filter the category for each row
SGnearby_venues['venue.categories'] = SGnearby_venues.apply(get_category_type, axis=1)

# clean columns
SGnearby_venues.columns = [col.split(".")[-1] for col in SGnearby_venues.columns]

SGnearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,Napoleon Food & Wine Bar,Wine Bar,1.279925,103.847333
1,Pepper Bowl,Asian Restaurant,1.279371,103.84671
2,Native,Cocktail Bar,1.280135,103.846844
3,Park Bench Deli,Deli / Bodega,1.279872,103.847287
4,Sofitel So Singapore,Hotel,1.280124,103.849867
5,Freehouse,Beer Garden,1.281254,103.848513
6,PS.Cafe,Café,1.280468,103.846264
7,Coffee Break,Coffee Shop,1.279529,103.846695
8,Dumpling Darlings,Dumpling Restaurant,1.280483,103.846942
9,Muchachos,Burrito Place,1.279072,103.847026


### Map of Singapore with venues near residence place - for reference

In [11]:
# create map of Singapore place  using latitude and longitude values
map_sg = folium.Map(location=[latitude, longitude], zoom_start=20)

# add markers to map
for lat, lng, label in zip(SGnearby_venues['lat'], SGnearby_venues['lng'], SGnearby_venues['name']):
    label = folium.Popup(label, parse_html=True)
    folium.RegularPolygonMarker(
        [lat, lng],
        number_of_sides=4,
        radius=10,
        popup=label,
        color='blue',
        fill_color='#0f0f0f',
        fill_opacity=0.7,
    ).add_to(map_sg)  
    
map_sg

## 3.	Methodology section

## 4.	Results section

## 5.	Discussion section

## 6.	Conclusion section


### 1.	Introduction Section :

    1.1 Discussion of the "backgroung situation" leading to the problem at hand:

    1.2 Problem to be resolved

    1.3 Audience for this project.

### 2.	Data Section:

	2.1 Data of Current Situation (current residence place)

    2.2 Data required to resolve the problem

    2.3 Data sources and data manipulation

### 3.	Methodology section :

	3.1 Process steps and strategy to resolve the problem

    3.2 Data Science Methods, machine learing, mapping tools and exploratory data analysis.

### 4.	Results section

	Discussion of the results and how they help to take a decision.

### 5.	Discussion section

	Elaboration and discussion on any observations and/or recommendations for improvement.

### 6.	Conclusion section 
	Desicison taken and Report Conclusion.
