# Applied Capstone Project - The Battle of Neighborhoods

# Determine New Store Locations in Manhattan for a Cosmetic Shop

## 1. Introduction and Discussion of the Business Objective

#### Problem Background and Description
Manhattan , is the most densely populated of the five boroughs of New York City. Manhattan serves as the city's economic and administrative center, cultural identifier, and historical birthplace.The borough consists mostly of Manhattan Island, bounded by the Hudson, East, and Harlem rivers; as well as several small adjacent islands. Manhattan additionally contains Marble Hill, a small neighborhood now on the U.S. mainland, that was connected using landfill to the Bronx and separated from the rest of Manhattan by the Harlem River. Manhattan Island is divided into three informally bounded components, each aligned with the borough's long axis: Lower, Midtown, and Upper Manhattan. 

Manhattan has been described as the cultural, financial, media, and entertainment capital of the world, and the borough hosts the United Nations Headquarters. MEG Company produces cosmetic products, they have some stores in Europe but they also want to built a store in Manhattan. They puts lots of work on their Research and Development projects and eventually improved an eyeshadow palet that has tones of colour on it and they also care about human skin health and regulations about chemicals. Before they release the product to the market; MEG Company's manager decide to allocate this project to me. Manager also want from me to build a system that can help in reccommending new places for their new Cosmetic Shop that will open in Manhattan. This will be a major part of their decision-making process. MEG Company want to position for their new shop in high traffic areas where consumers go restaurants, cafes.

#### Criteria
According to information provided by another company that have shops in Manhattan suggests that the best locations to open new Cosmetic Shop stores may not only be where other Cosmetic Shops are located. This data strongly suggests that the best places are in fact areas that are near Italıan Restaurants, Cafés, Coffee Shops and Hotels. The people live in Manhattan are very social people that frequent these place often.

The analysis and recommendations for this new store locations will provide us on general districts with these establishments, not on specific store addresses. Narrowing down the best district options derived from analysis allows for either further research to be conducted, or on the ground searching for specific sites by the company's personnel.


## Data Requirements

#### Data Collection 

The number of main districts in Manhattan is 40 . The data regarding the districts in Manhattan needs to be provided for this analyses. Before the exploratory analyses, we can see that the raw data is not a usable form. We need to prepare the data to turn its usable form to do so firstly data wrangling and cleaning will have to be performed. I am able to rich the list of Boroughs in NewYork from https://cocl.us/new_york_dataset. The table gives me the list as shown below. 

In [11]:
neighborhoods.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
5,Bronx,Kingsbridge,40.881687,-73.902818
6,Manhattan,Marble Hill,40.876551,-73.91066
7,Bronx,Woodlawn,40.898273,-73.867315
8,Bronx,Norwood,40.877224,-73.879391
9,Bronx,Williamsbridge,40.881039,-73.857446


NewYork has 5 boroughs and 306 neighborhoods. After that we need the data includes just Manhattan's neighborhoods, latitudes and longtitudes. The data frame as shown below gives us these values.The cleansed data will be used with Foursquare data.

In [12]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


#### Data Analysis and Location Data:

* Foursquare data will be usefull to explore or compare districts around Manhattan.
* Foursquare data will gives us the venues, latitudes and longtitudes and venues' catagory which will help us to determine the possible areas for Cosmetic Shop.
* Data manipulation and analysis to derive subsets of the initial data.

In [1]:
# Import libraries
import numpy as np # library to handle data in a vectorized manner
import json # library to handle JSON files
import pandas as pd

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from bs4 import BeautifulSoup

# Import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: - ^C
failed

CondaError: KeyboardInterrupt

Libraries imported.


#### Data Preparation

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [4]:
neighborhoods_data = newyork_data['features']

In [5]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [6]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [10]:
neighborhoods.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
5,Bronx,Kingsbridge,40.881687,-73.902818
6,Manhattan,Marble Hill,40.876551,-73.91066
7,Bronx,Woodlawn,40.898273,-73.867315
8,Bronx,Norwood,40.877224,-73.879391
9,Bronx,Williamsbridge,40.881039,-73.857446


In [8]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [9]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688
