# 1. Introduction / Business Problem

New York is major United States hub. A massive working population, multiple colleges and universities, and a huge social scene; New York is a great place to open a coffee shop. But where would be a good place to open one? This project investigates which areas in the United States would be a good area to open a coffee shop. However, opening a city in New York comes with both pros and cons. As one of the largest cities in the world, there are many places that are available to build a coffee shop. However, New York is one of the most expensive and competitive places, meaning finding the most profitable area is critical in having a successful coffee shop in order to avoid going out of business due to little demand or too expensive cost. This project aims to find the most profitable places for a coffee shop. 



# 2. Data

We will be using New York neighborhood data from the internet combined with FourSquare locations to find where the least amount of coffee shops are in New York City. We will then plan to find the area that seems to be the most profitable. We are attempting to find the least coffee shop-dense area in New York in order to claim a customer base in that area. By having as little competition as possible, we can gain maximum profit



In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/anaconda3

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    certifi-2019.11.28         |           py37_0         148 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------

In [10]:
!pip install beautifulsoup4
from bs4 import BeautifulSoup # library to parse HTML and XML documents




In [16]:
conda install -c anaconda wget


Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/anaconda3

  added / updated specs:
    - wget


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.1.1   |                0         132 KB  anaconda
    certifi-2019.11.28         |           py37_0         156 KB  anaconda
    conda-4.8.3                |           py37_0         3.0 MB  anaconda
    openssl-1.1.1d             |       h1de35cc_4         3.4 MB  anaconda
    wget-1.20.1                |       h051b688_0         478 KB  anaconda
    ------------------------------------------------------------
                                           Total:         7.2 MB

The following NEW packages will be INSTALLED:

  wget               anaconda/osx-64::wget-1.20.1-h051b688_0

The following packages will be SUPERSEDED by a higher-prio

In [17]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [18]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [19]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

In [8]:
# send the GET request
data = requests.get('https://cocl.us/new_york_dataset').text
data

'{"type":"FeatureCollection","totalFeatures":306,"features":[{"type":"Feature","id":"nyu_2451_34572.1","geometry":{"type":"Point","coordinates":[-73.84720052054902,40.89470517661]},"geometry_name":"geom","properties":{"name":"Wakefield","stacked":1,"annoline1":"Wakefield","annoline2":null,"annoline3":null,"annoangle":0E-11,"borough":"Bronx","bbox":[-73.84720052054902,40.89470517661,-73.84720052054902,40.89470517661]}},{"type":"Feature","id":"nyu_2451_34572.2","geometry":{"type":"Point","coordinates":[-73.82993910812398,40.87429419303012]},"geometry_name":"geom","properties":{"name":"Co-op City","stacked":2,"annoline1":"Co-op","annoline2":"City","annoline3":null,"annoangle":0E-11,"borough":"Bronx","bbox":[-73.82993910812398,40.87429419303012,-73.82993910812398,40.87429419303012]}},{"type":"Feature","id":"nyu_2451_34572.3","geometry":{"type":"Point","coordinates":[-73.82780644716412,40.887555677350775]},"geometry_name":"geom","properties":{"name":"Eastchester","stacked":1,"annoline1":"Ea

In [11]:

soup = BeautifulSoup(data, 'html.parser')
soup

{"type":"FeatureCollection","totalFeatures":306,"features":[{"type":"Feature","id":"nyu_2451_34572.1","geometry":{"type":"Point","coordinates":[-73.84720052054902,40.89470517661]},"geometry_name":"geom","properties":{"name":"Wakefield","stacked":1,"annoline1":"Wakefield","annoline2":null,"annoline3":null,"annoangle":0E-11,"borough":"Bronx","bbox":[-73.84720052054902,40.89470517661,-73.84720052054902,40.89470517661]}},{"type":"Feature","id":"nyu_2451_34572.2","geometry":{"type":"Point","coordinates":[-73.82993910812398,40.87429419303012]},"geometry_name":"geom","properties":{"name":"Co-op City","stacked":2,"annoline1":"Co-op","annoline2":"City","annoline3":null,"annoangle":0E-11,"borough":"Bronx","bbox":[-73.82993910812398,40.87429419303012,-73.82993910812398,40.87429419303012]}},{"type":"Feature","id":"nyu_2451_34572.3","geometry":{"type":"Point","coordinates":[-73.82780644716412,40.887555677350775]},"geometry_name":"geom","properties":{"name":"Eastchester","stacked":1,"annoline1":"Eas

In [26]:
neighborhoods_data = newyork_data['features']

In [27]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [28]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [29]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [30]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [31]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.
