# Introduction/Business Problem
### A description of the problem and a discussion of the background.
Taking on the role of a restruarantier; I will use my new data science knowledge to help in the location planning phase for a new venture in a wealthy, fast growing city inside the United States.  

# Data Plan

### A description of the data and how it will be used to solve the problem.

To solve this challenge, the following preliminary plan is in place.
1. Determine the proper cities/areas which are both wealthy and fast growing.
2. Determine which eateries in the chosen area receive the most visitors.
3. Identify a location within the area with the largest gap between similar restaurants which still holds other venues with high foot traffic
4. Decide to either accept the location, or to iterate through these steps until a suitable location has been identified.


In support of that plan, the following data sources will be used.
Census data:
* The expected wealth of populations in a city
* The recent growth rate of the cities being reviewed

Foursquare:
* Review venues to identify the popularity of differing types of restaurants (comparing foot traffic to other restaurants)
* Foot traffic experienced by other businesses nearby potential locations

Other:
* Geo-spacial location data from cocl.us will be used for visualizations

All of the data will be compiled and appropriate visualizations will be produced and reviewed.  A final determination will be made by hand.

## Step 1: Determine the weathiest and fastest growing city for this effort

The list below was created by combining the data from the US Census Bureau with population data and sorted based on Population, growth rate, and income (with priorities in that order)


#### Massachusetts: Boston-Cambridge-Newton
- Approx Population: ***4,470k***
- 2010-2018 pop. growth: **6.8%**
- Median household income: **\\$85,691**

#### Connecticut: Bridgeport-Stamford-Norwalk
- Approx Population: **950k**
- 2010-2018 pop. growth: **2.6%**
- Median household income: **\\$91,198**

#### New Jersey: Trenton
- Approx Population: **85k**
- 2010-2018 pop. growth: **0.5%**
- Median household income: **\\$79,173**

#### New Hampshire: Manchester-Nashua
- Approx Population: **88k**
- 2010-2018 pop. growth: **3.5%**
- Median household income: **\\$78,769**

#### Hawaii: Kahului-Wailuku-Lahaina
- Approx Population: **12k**
- 2010-2018 pop. growth: **7.9%**
- Median household income: **\\$80,183**


Based on these numbers, combined with the statistics on growth, the **Boston-Cambridge-Newton, MA-NH Metro Area** area was selected.


(Original source: https://www.usatoday.com/story/money/2019/04/22/the-fastest-growing-city-in-each-state/39362311/)

### Get relevant zip codes.

To get the zip codes into a list that can be used for further queries; the data from https://www.bestplaces.net/find/zip.aspx?msa=14460&st=MA was placed into a text file and processed below.

In [1]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

In [39]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,town,near
0,Abington,"""Abington, MA"""
1,Acton,"""Acton, MA"""
2,Amesbury,"""Amesbury, MA"""
3,Andover,"""Andover, MA"""
4,Arlington,"""Arlington, MA"""


### Initialize the parts needed for the remainder of this project.

In [16]:
# The code was removed by Watson Studio for sharing.

In [17]:
!wget -O Geospatial_Coordinates.csv https://cocl.us/Geospatial_data/Geospatial_Coordinates.csv

--2019-11-10 00:49:09--  https://cocl.us/Geospatial_data/Geospatial_Coordinates.csv
Resolving cocl.us (cocl.us)... 159.8.72.228
Connecting to cocl.us (cocl.us)|159.8.72.228|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-11-10 00:49:12--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.26.197, 107.152.27.197
Connecting to ibm.box.com (ibm.box.com)|107.152.26.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-11-10 00:49:12--  https://ibm.box.com/public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Reusing existing connection to ibm.box.com:443.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.ent.box.com/public/static/9afzr83pps4pwf2smjjcf1y

In [6]:
!conda install -c conda-forge folium=0.5.0 --yes # uncomment for the first run of this notebook as needed.
import folium # map rendering library

from pandas.io.json import json_normalize # tranforms JSON files into a pandas dataframes

import requests

#hard coding for simplicities sake
### toronto_lat=43.6532
### toronto_long=-79.3832

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.2 MB

The following NEW packages will be 

In [10]:
# reuse the get_category_type and getNearbyVenues functions from the lab.

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
#This function returns a dataframe with venue information for a input
def getNearbyVenuesll(names, latitudes, longitudes, radius=500, limit=100):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name) #Rem'd since it is uneeded unless testing.

        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENTID, 
            CLIENTSECRET, 
            CLIENTVERSION, 
            lat, 
            lng, 
            radius, 
            limit)

        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighbourhood', 
                      'Neighbourhood Latitude', 
                      'Neighbourhood Longitude', 
                      'Venue', 
                      'Venue Latitude', 
                      'Venue Longitude', 
                      'Venue Category']
    #Return the completed 
    return(nearby_venues)


def getNearbyVenuesNear(names, near, radius=500, limit=100):
    venues_list=[]
    for name, thisnear in zip(names, near):
        #print(name) #Rem'd since it is uneeded unless testing.

        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&radius={}&limit={}'.format(
            CLIENTID, 
            CLIENTSECRET, 
            CLIENTVERSION, 
            thisnear, 
            radius, 
            limit)

        print(url)
        
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            thisnear, 
            v['venue']['name'],
            '"01431"',
            #v['venue']['location']['labeledLatLngs']['postalCode'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'], 
            v['venue']['categories'][0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighbourhood', 
                      'Neighbourhood Latitude', 
                      'Neighbourhood Longitude', 
                      'Venue', 
                      'Venue Latitude', 
                      'Venue Longitude', 
                      'Venue Category']
    #Return the completed 
    return(nearby_venues)

# FROM HERE DOWN IS DEV
# ============================
# ============================
# ============================
# ============================
# ============================

### Build a profile of the existing restauraunts

Using the zip codes that were just read, foursquare is querried to find the most visited resturaunts in the area.

In [31]:
bcn_near.head()

Unnamed: 0,town,near
0,Abington,"""Abington, MA"""
1,Acton,"""Acton, MA"""
2,Amesbury,"""Amesbury, MA"""
3,Andover,"""Andover, MA"""
4,Arlington,"""Arlington, MA"""


In [40]:
df_bcn_venues=pd.DataFrame(getNearbyVenuesNear(bcn_near['town'],bcn_near['near']))
df_bcn_venues.head()


https://api.foursquare.com/v2/venues/explore?&client_id=K35LHC34SX1ZW2M4JOVPEFYOMBVVJOOZMCOTNYNQPBJN3OZ4&client_secret=AUQ4CADYTEURBZ3WC3KAM24WOG55HBWZHPRTLWS2AUXUENJG&v=20170511&near="Abington, MA"&radius=500&limit=100
https://api.foursquare.com/v2/venues/explore?&client_id=K35LHC34SX1ZW2M4JOVPEFYOMBVVJOOZMCOTNYNQPBJN3OZ4&client_secret=AUQ4CADYTEURBZ3WC3KAM24WOG55HBWZHPRTLWS2AUXUENJG&v=20170511&near="Acton, MA"&radius=500&limit=100
https://api.foursquare.com/v2/venues/explore?&client_id=K35LHC34SX1ZW2M4JOVPEFYOMBVVJOOZMCOTNYNQPBJN3OZ4&client_secret=AUQ4CADYTEURBZ3WC3KAM24WOG55HBWZHPRTLWS2AUXUENJG&v=20170511&near="Amesbury, MA"&radius=500&limit=100
https://api.foursquare.com/v2/venues/explore?&client_id=K35LHC34SX1ZW2M4JOVPEFYOMBVVJOOZMCOTNYNQPBJN3OZ4&client_secret=AUQ4CADYTEURBZ3WC3KAM24WOG55HBWZHPRTLWS2AUXUENJG&v=20170511&near="Andover, MA"&radius=500&limit=100
https://api.foursquare.com/v2/venues/explore?&client_id=K35LHC34SX1ZW2M4JOVPEFYOMBVVJOOZMCOTNYNQPBJN3OZ4&client_secret=AUQ4C

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abington,"""Abington, MA""",Brother's Roast Beef and Pizza,"""01431""",42.106271,-70.945855,American Restaurant
1,Abington,"""Abington, MA""",Spencer's Pizza,"""01431""",42.106542,-70.939848,Pizza Place
2,Abington,"""Abington, MA""",Marylou's,"""01431""",42.107437,-70.944641,Coffee Shop
3,Abington,"""Abington, MA""",7-Eleven,"""01431""",42.105487,-70.947068,Convenience Store
4,Abington,"""Abington, MA""",Submarine Galley,"""01431""",42.105705,-70.949048,Sandwich Place


In [9]:
#populate the venue dataframe
df_toronto_venues=pd.DataFrame(getNearbyVenues(df_toronto['Neighbourhood'],df_toronto['Latitude'], df_toronto['Longitude']))


#See how many types of venues there are.
print('There are {} unique categories.'.format(len(df_toronto_venues['Venue Category'].unique())))
df_toronto_venues.head()




#Get  an overview of the number of venues per area
print('Venues per Neighbourhood:')
df_toronto_venues.groupby('Neighbourhood')[['Venue']].count()





#Analyze the neighbourhoods in the same manner as the New York lab

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]







#Prep the data for processing by knn by converting text values into numerical values.

# one hot encoding
toronto_onehot = pd.get_dummies(df_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighbourhood column back to dataframe
toronto_onehot['Neighbourhood'] = df_toronto_venues['Neighbourhood'] 

# move neighbourhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

#Create a grouped version for knn use later.
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()




#Get the top venues for each area and visibly validate the quality of the data before processing and clustering

num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')



NameError: name 'toronto_grouped' is not defined