## Introduction ##

The audience is a hospitality company that manages many restaurants around the US. This company wants to open a new restaurant and maximize the likelihood of the success of this restaurant. This analysis will recommend a location to open up a restaurant. This project takes on a broad scope in that is considers all major cities in the country. I will first identify the vibrant, up and coming cities as the first filter for narrowing the candidates. Of those cities, I will identify which cities may be underserved, from a gastronomic perspective. Next, I will choose one city, and identify which types of restaurants might add an exciting new element to their restaurant scene. Finally, I will conduct location-based clustering and recommend a specific cluster associated with a location to open in the target city.

## Data ##

Three different data sources will be utilized. First, data from the largest cities in the US is scraped from Wikipedia. Next, the zip codes from the filtered cities is downloaded from a Zip Code API web service. Once all the zip codes for each city have been obtained, they will be appended with their corresponding latitude and longitude. Finally, the venues associated with these zip codes will be extracted from Foursquare. The data from Foursquare will be analyzed to make the final recommendation.

In [4]:
import pandas as pd
import numpy as np
import requests
import urllib
import re
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import json # library to handle JSON files
from geopy.geocoders import Nominatim
import geocoder

from IPython.display import display, clear_output
from bs4 import BeautifulSoup, SoupStrainer
pd.set_option('display.max_colwidth', 60)


The raw data from Wikipedia is as follows:

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population'
session = requests.Session()
response = session.get(url, allow_redirects=True)
soup = BeautifulSoup(response.content, 'html.parser')

df = pd.read_html(url)[4]
df.head()

Unnamed: 0,2018rank,City,State[c],2018estimate,2010Census,Change,2016 land area,2016 land area.1,2016 population density,2016 population density.1,Location
0,1,New York[d],New York,8398748,8175133,+2.74%,301.5 sq mi,780.9 km2,"28,317/sq mi","10,933/km2",40°39′49″N 73°56′19″W﻿ / ﻿40.6635°N 73.9387°W
1,2,Los Angeles,California,3990456,3792621,+5.22%,468.7 sq mi,"1,213.9 km2","8,484/sq mi","3,276/km2",34°01′10″N 118°24′39″W﻿ / ﻿34.0194°N 118.4108°W
2,3,Chicago,Illinois,2705994,2695598,+0.39%,227.3 sq mi,588.7 km2,"11,900/sq mi","4,600/km2",41°50′15″N 87°40′54″W﻿ / ﻿41.8376°N 87.6818°W
3,4,Houston[3],Texas,2325502,2100263,+10.72%,637.5 sq mi,"1,651.1 km2","3,613/sq mi","1,395/km2",29°47′12″N 95°23′27″W﻿ / ﻿29.7866°N 95.3909°W
4,5,Phoenix,Arizona,1660272,1445632,+14.85%,517.6 sq mi,"1,340.6 km2","3,120/sq mi","1,200/km2",33°34′20″N 112°05′24″W﻿ / ﻿33.5722°N 112.0901°W


The zip code data is extracted from the API as follows:

In [3]:
zip_api_key = 'R9D3J1IzH1U8bpB0qIUx2bokJzImwCGcERHGj0QGHAZqwb0ZWbWKhh7t2juEOV9E'
call = 'https://www.zipcodeapi.com/rest/{}/city-zips.{}/{}/{}'.format(zip_api_key,'json','Chicago','Illinois')
zips = requests.get(call).json()
zips['zip_codes']

['60290',
 '60601',
 '60602',
 '60603',
 '60604',
 '60605',
 '60606',
 '60607',
 '60608',
 '60609',
 '60610',
 '60611',
 '60612',
 '60613',
 '60614',
 '60615',
 '60616',
 '60617',
 '60618',
 '60619',
 '60620',
 '60621',
 '60622',
 '60623',
 '60624',
 '60625',
 '60626',
 '60628',
 '60629',
 '60630',
 '60631',
 '60632',
 '60633',
 '60634',
 '60636',
 '60637',
 '60638',
 '60639',
 '60640',
 '60641',
 '60642',
 '60643',
 '60644',
 '60645',
 '60646',
 '60647',
 '60649',
 '60651',
 '60652',
 '60653',
 '60654',
 '60655',
 '60656',
 '60657',
 '60659',
 '60660',
 '60661',
 '60663',
 '60664',
 '60666',
 '60668',
 '60669',
 '60670',
 '60673',
 '60674',
 '60675',
 '60677',
 '60678',
 '60679',
 '60680',
 '60681',
 '60682',
 '60684',
 '60685',
 '60686',
 '60687',
 '60688',
 '60689',
 '60690',
 '60691',
 '60693',
 '60694',
 '60695',
 '60696',
 '60697',
 '60699',
 '60701',
 '60706',
 '60707',
 '60712',
 '60803',
 '60804',
 '60805',
 '60827']

These zip codes are used to extract a latitude and longitude for each zip code using geolocator.

In [6]:
    address = '60827, Chicago, Illinois'

    geolocator = Nominatim(user_agent="my_application")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    print('The geograpical coordinate of Chicago Zip Code 60827 are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Chicago Zip Code 60827 are 41.73724645, -87.55118560173352.


Finally, these coordinates will be used to extract venue information from Foursquare in similar manner conducted in the labs.