# Justin Chu 


### A description of the problem and a discussion of the background.

A few colleagues and myself are looking to open a Crispy Creme donut shop in Suffolk County, New York.  If you live on Long Island you can relate to the continually worsening problem of a lack of donuts.  Dunkin' Donuts can only satiate the problem of having to drive the New York City for a Crispy Creme for so long before the masses crave for something more delightful. 








### A description of the data and how it will be used to solve the problem.

I will be using the Foursquare API to analyze the neighborhoods of Suffolk County to locate an area with the lowest number of venues in order to place our Crispy Creme in an area of high necessity.

Installing and importing necessary python libraries:

In [1]:
!pip install geocoder
!pip install folium

import pandas as pd
import requests
import numpy as np
import geocoder
import folium
import requests 
import matplotlib.cm as cm
import matplotlib.colors as colors
import json
import xml
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

from pandas.io.json import json_normalize 
from sklearn.neighbors import KNeighborsClassifier
from geopy.geocoders import Nominatim 
from bs4 import BeautifulSoup



In [2]:
# I spent my entire life on Long Island.  These popular towns came from my own knowledge:
addresses = ['Coram, New York', 'Middle Island, New York','Port Jefferson, New York', 'Patchogue, New York', 
            'Riverhead, New York', 'Central Islip, New York', 'Smithtown, New York', 'Mattituck, New York',
            'Mastic Beach, New York', 'Lake Grove, New York', 'Kings Park, New York', 'Shirley, New York', 'South Hampton, New York',
             'East Hampton, New York', 'Manorville, New York', 'Moriches, New York', 'Wading River, New York', 'West Hampton, New York',
            'Quogue, New York']
geolocator = Nominatim(user_agent='justinchu24@gmail.com')
addy_coordinates = {}
for address in addresses:
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    addy_coordinates[address] = [latitude, longitude]
    print('The Geograpical Coordinates of {} are {}, {}.'.format(address, latitude, longitude))

The Geograpical Coordinates of Coram, New York are 40.87941295, -73.00394259997734.
The Geograpical Coordinates of Middle Island, New York are 40.879135, -72.94014075519021.
The Geograpical Coordinates of Port Jefferson, New York are 40.9465118, -73.0691257.
The Geograpical Coordinates of Patchogue, New York are 40.7656539, -73.0151084.
The Geograpical Coordinates of Riverhead, New York are 40.9170435, -72.6620402.
The Geograpical Coordinates of Central Islip, New York are 40.78102005, -73.19299521234569.
The Geograpical Coordinates of Smithtown, New York are 40.8559314, -73.2006687.
The Geograpical Coordinates of Mattituck, New York are 41.00715165, -72.55413708959088.
The Geograpical Coordinates of Mastic Beach, New York are 40.76078865, -72.83542008960674.
The Geograpical Coordinates of Lake Grove, New York are 40.8528761, -73.1151102.
The Geograpical Coordinates of Kings Park, New York are 40.88373, -73.2544898.
The Geograpical Coordinates of Shirley, New York are 40.79684165000000

In [3]:
#create a dataframe for the towns
df = pd.DataFrame(addy_coordinates).transpose().reset_index()
df.columns = ['Town', 'Latitude', 'Longitude']
df

Unnamed: 0,Town,Latitude,Longitude
0,"Coram, New York",40.879413,-73.003943
1,"Middle Island, New York",40.879135,-72.940141
2,"Port Jefferson, New York",40.946512,-73.069126
3,"Patchogue, New York",40.765654,-73.015108
4,"Riverhead, New York",40.917043,-72.66204
5,"Central Islip, New York",40.78102,-73.192995
6,"Smithtown, New York",40.855931,-73.200669
7,"Mattituck, New York",41.007152,-72.554137
8,"Mastic Beach, New York",40.760789,-72.83542
9,"Lake Grove, New York",40.852876,-73.11511


In [4]:
#map the towns 
li_map = folium.Map(location=[40.879135, -72.94014075519021], zoom_start=9)

for lat, lng, town in zip(df['Latitude'], df['Longitude'], df['Town']):
    
    label = '{}'.format(town)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(li_map)  
    
li_map

In [18]:
# credentials for FourSquare API
radius = 15000 
LIMIT = 100
CLIENT_ID = 'IFL3GCM4TJC20R4ZPO15LTJKZX0OM2QZMGNHGC115GAWBZPN'
CLIENT_SECRET = 'XVXNQZLQFUW5B4VJBQQDLMK2LWT55XTETEEEBCGCAB03MVPZ'
VERSION = '20180604'
LIMIT = 30

In [19]:
def GetNearbyVenues(towns, latitudes, longitudes):
    venue_list = []
    for town, lat, long in zip(towns, latitudes, longitudes):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
       long, 
        radius, 
       LIMIT)
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        venue_list.append([(
            town,
            lat,
            long,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
    
    return (venue_list)
                        
    


In [20]:
#Create a DataFrame from venues
li_venues = GetNearbyVenues(towns= df['Town'], latitudes= df['Latitude'], longitudes = df['Longitude'])

In [21]:
nearby_venues = pd.DataFrame([ven for venue_list in li_venues for ven in venue_list])
nearby_venues.columns = ['Town', 'TLat', 'TLong', 'Venue', 'VLat', 'VLong', 'Category']

In [22]:
nearby_venues
#274 venues 

Unnamed: 0,Town,TLat,TLong,Venue,VLat,VLong,Category
0,"Coram, New York",40.879413,-73.003943,Tequila Deli,40.882688,-73.011369,Deli / Bodega
1,"Coram, New York",40.879413,-73.003943,Popei's Clam Bar,40.869130,-73.008327,Seafood Restaurant
2,"Coram, New York",40.879413,-73.003943,Best Market,40.869260,-73.025057,Fruit & Vegetable Store
3,"Coram, New York",40.879413,-73.003943,Panera Bread,40.864738,-73.049045,Bakery
4,"Coram, New York",40.879413,-73.003943,Better On A Bagel,40.894961,-73.005985,Bagel Shop
...,...,...,...,...,...,...,...
565,"Quogue, New York",40.823177,-72.609645,Wild By Nature,40.867107,-72.549150,Grocery Store
566,"Quogue, New York",40.823177,-72.609645,Dockers Waterside Marina & Restaurant,40.819399,-72.563184,Seafood Restaurant
567,"Quogue, New York",40.823177,-72.609645,Joe's American Grill,40.821505,-72.641576,American Restaurant
568,"Quogue, New York",40.823177,-72.609645,Baby Moon,40.821609,-72.641764,Italian Restaurant


In [23]:
# filtering venues based on category.  We only need venues that sell donuts.
nearby_venues['Category'].unique()

array(['Deli / Bodega', 'Seafood Restaurant', 'Fruit & Vegetable Store',
       'Bakery', 'Bagel Shop', 'Steakhouse', 'Pizza Place',
       'Grocery Store', 'Breakfast Spot', 'General Entertainment', 'Café',
       'Fast Food Restaurant', 'Asian Restaurant', 'Mexican Restaurant',
       'Gym', 'Bar', 'Japanese Restaurant', 'Chinese Restaurant', 'Food',
       'Liquor Store', 'Park', 'Turkish Restaurant', 'Ice Cream Shop',
       'Coffee Shop', 'Italian Restaurant', 'Donut Shop',
       'Baseball Field', 'American Restaurant', 'Science Museum',
       'Gift Shop', 'Brewery', 'Creperie', 'Resort', 'Candy Store',
       'Dive Bar', 'Beach', 'Martial Arts School', 'Beer Store',
       'New American Restaurant', 'Sandwich Place', 'College Stadium',
       'Bistro', 'Concert Hall', 'Caribbean Restaurant', 'Pub',
       'BBQ Joint', 'Pharmacy', 'Restaurant', 'Flea Market', 'Diner',
       'Aquarium', 'Performing Arts Venue', 'Sushi Restaurant',
       'Warehouse Store', 'Outdoor Sculpture', '

In [24]:
filtered_columns = ['Donut Shop', 'Coffee Shop', 'Bakery', 'Dessert Shop']

In [25]:
nearby_donuts = nearby_venues.loc[nearby_venues['Category'].isin(filtered_columns)]
nearby_donuts.drop(columns=['TLat', 'TLong'], inplace=True)
nearby_donuts

Unnamed: 0,Town,Venue,VLat,VLong,Category
3,"Coram, New York",Panera Bread,40.864738,-73.049045,Bakery
28,"Coram, New York",Starbucks,40.85749,-73.080955,Coffee Shop
40,"Middle Island, New York",Dunkin',40.94387,-72.975369,Donut Shop
44,"Middle Island, New York",Starbucks,40.94202,-72.98737,Coffee Shop
50,"Middle Island, New York",Panera Bread,40.864738,-73.049045,Bakery
56,"Middle Island, New York",Starbucks,40.82229,-72.99688,Coffee Shop
96,"Patchogue, New York",Roast Coffee & Tea Trading Company,40.765763,-73.014122,Coffee Shop
103,"Patchogue, New York",Dunkin',40.774826,-73.027982,Donut Shop
111,"Patchogue, New York",Starbucks,40.77782,-73.032559,Coffee Shop
112,"Patchogue, New York",Dolci Momenti Bakery,40.792668,-73.007331,Bakery


In [26]:
nearby_donuts.shape

(36, 5)

In [27]:
for lat, lng, category in zip(nearby_donuts['VLat'], nearby_donuts['VLong'], nearby_donuts['Category']):
    
    label = '{}'.format(category)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(li_map) 

In [28]:
li_map
# blue points are towns, red points are near by venues that sell donuts

## By looking at the map above, we can determine which towns have the least competition around.