# Determining Ideal Locations for New Restaurants in Detroit Metropolitan Area
## Author: Neil Gurram

In [64]:
!pip install requests
!pip install beautifulsoup4
!pip install -U notebook-as-pdf
!pip install pyppeteer

Requirement already up-to-date: notebook-as-pdf in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (0.4.0)


In [51]:
import requests 
from bs4 import BeautifulSoup 
from geopy.geocoders import Nominatim
import numpy as np
import pandas as pd

## Introduction

The city of Detroit, Michigan is seemingly trying to get back on the upswing and improve as a city. With this realization, restaurant companies, especially those which wouldn't be considered common such as cuisines from foreign countries, could very well be looking to find a place in Detroit Metropolitan Area (DMA) to take advantage of the high population and bolster their revenues. An increase in the number of restaurants in the DMA could not only allow for variety of restaurants that Detroiters can go to but also could help with increasing employment to provide more jobs in the DMA for those who may be unemployed. With this growth in the restaurant industry, this could supplement the growth of other industries in the DMA, such as tourism, as those go hand in hand.

## Business Problem

The goal of this project is to find locations in DMA that would be suitable to start a restaurant in to help not only bolster the restaurant industry and improve Detroit, but also to provide restaurants the chances to improve their revenue. 

The target audience for this project will be not only people who are interested in opening up restaurants in DMA but also those who would be interested in opening up other types of businesses because similar analyses could be used for other businesses.

## Data

The relevant pieces of data needed for our project are 1) list of cities and towns in DMA, 2) longitudinal and latitudinal data for cities and towns in DMA, and 3) data for venues associated with DMA.

#### Cities and Towns in DMA

For the purpose of this project, I will limit the DMA to cities in Macomb, Oakland, and Wayne Counties. I had to web scrape data from websites pertaining to cities in [Macomb](https://geographic.org/streetview/usa/mi/macomb/index.html), [Oakland](https://geographic.org/streetview/usa/mi/oakland/index.html), and [Wayne](https://geographic.org/streetview/usa/mi/wayne/index.html) Counties. The information on the three websites were consistent in that all were under an unordered list (ul) tag, so I could create code that would extract the cities from each of the three counties.

In [52]:
def get_cities(county_name):##This method assumes the county_name is in Michigan, and that the list of cities is part of an unordered list tag. 
    cities = []
    url = "https://geographic.org/streetview/usa/mi/"+county_name.lower()+"/index.html"
    #open with GET method 
    resp=requests.get(url) 
      
    #http_respone 200 means OK status 
    if resp.status_code==200: 
      
        # we need a parser,Python built-in HTML parser is enough . 
        soup=BeautifulSoup(resp.text,'html.parser')     
  
        # l is the list which contains all the cities
        l=soup.find("ul") 
      
        #Find all the elements of a, i.e anchor. This happens to correspond to the cities listed.
        for i in l.findAll("a"): 
            cities.append(i.text) 
    return cities

def get_all_cities(county_list):##This method returns a list of lists of city and county
    cities = []
    for county in county_list:
        for city in get_cities(county):
            cities.append((city,county))
    return cities

#### Longitudes and Latitudes for Cities and Towns in DMA

I will be using the geopy module to get the longitudes and latitudes for cities and towns in DMA. I will then display the results into a Dataframe containing city, county, latitude, and longitude.

Upon initial inspection of the data frame, there seemed to be some errors with the latitudes and longitudes computed, so we needed to focus on the correct latitudes and longitudes. This can be determined by first ensuring that the latitude is at least 41 and at most 43 and longitude is at least -84 and at most -82. Then, from a manual check, we can see that all the latitudes and longitudes seem proper. Alternatively, we could have manually replaced the coordinates that were wrong with the correct ones, but decided to go with the former method simply to ensure all data was coming from the same source.

After filtering, we get a dataframe displayed below.

In [53]:
dma_tuples = get_all_cities(["Macomb","Oakland","Wayne"])

def create_location_dataframe(city_tuples):##Assumes we have a city,county tuple pair from Michigan
    location_dataframe = []
    geolocator = Nominatim(user_agent="detroit_explorer")
    for city,county in city_tuples:
        city_augmented = city+", MI"
        coordinates = geolocator.geocode(city_augmented)
        latitude = coordinates.latitude
        longitude = coordinates.longitude
        location_dataframe.append([city.title(),county.title(),latitude,longitude])
    location_dataframe = pd.DataFrame(location_dataframe,columns=["City","County","City Latitude","City Longitude"])
    return location_dataframe
dma_dataframe = create_location_dataframe(dma_tuples)

In [54]:
dma_dataframe = dma_dataframe[(dma_dataframe["City Latitude"]>=41) & (dma_dataframe["City Latitude"] <= 43) 
                              & (dma_dataframe["City Longitude"]>=-84) & (dma_dataframe["City Longitude"]<=-82)]

In [55]:
dma_dataframe

Unnamed: 0,City,County,City Latitude,City Longitude
0,Armada,Macomb,42.844196,-82.884372
1,Center Line,Macomb,42.485036,-83.0277
2,Clinton Township,Macomb,42.584852,-82.934824
3,Eastpointe,Macomb,42.46837,-82.955475
4,Fraser,Macomb,42.539202,-82.949365
5,Harrison Township,Macomb,42.587337,-82.810484
6,Macomb,Macomb,42.674119,-82.902901
7,Mount Clemens,Macomb,42.597256,-82.877975
8,New Baltimore,Macomb,42.681144,-82.736862
12,Romeo,Macomb,42.802808,-83.012987


#### Venues in DMA

After getting latitude and longitude information for cities in DMA, I will be using the FourSquare API to get relevant venue information for the aforementioned cities. I will append the venue data, which will comprise of the Venue Name, Venue Latitude, Venue Longitude, and Venue Category to the above dataframe associated with city information. To get venues tied to each city, we will find out all venues within one mile of a given latitude,longitude pair. We will limit for each city 200 venues displayed.

In [56]:
# The code was removed by Watson Studio for sharing.

In [57]:
def getNearbyVenues(cities, counties, city_latitudes, city_longitudes, radius=1610):
    
    venues_list=[]
    for city, county, city_latitude, city_longitude in zip(cities, counties, city_latitudes,city_longitudes):
        ##print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            city_latitude, 
            city_longitude, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            city,
            county,
            city_latitude, 
            city_longitude, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 'County','City Latitude', 'City Longitude', 
                  'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    return(nearby_venues)

In [58]:
dma_venues = getNearbyVenues(cities=dma_dataframe['City'],
                             counties=dma_dataframe['County'],
                             city_latitudes=dma_dataframe['City Latitude'],
                             city_longitudes=dma_dataframe['City Longitude']
                            )
dma_venues

Unnamed: 0,City,County,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Armada,Macomb,42.844196,-82.884372,Papa's,42.844017,-82.884001,Restaurant
1,Armada,Macomb,42.844196,-82.884372,SUBWAY,42.844247,-82.884455,Sandwich Place
2,Armada,Macomb,42.844196,-82.884372,Achatz Handmade Pie Co.,42.854847,-82.884699,Bakery
3,Armada,Macomb,42.844196,-82.884372,Tivoli's Pizza,42.844254,-82.883684,Pizza Place
4,Armada,Macomb,42.844196,-82.884372,Chap's Food & Spirits,42.844008,-82.883720,Restaurant
...,...,...,...,...,...,...,...,...
2720,Wyandotte,Wayne,42.200662,-83.151016,Henry's On The River,42.209764,-83.147036,Café
2721,Wyandotte,Wayne,42.200662,-83.151016,Vinewood Mexican Bakery,42.207565,-83.164956,Bakery
2722,Wyandotte,Wayne,42.200662,-83.151016,Kielbasa Joe’s,42.204145,-83.167687,Butcher
2723,Wyandotte,Wayne,42.200662,-83.151016,The Pasta House,42.207219,-83.167102,Italian Restaurant


In [59]:
unique_venue_categories=list(dma_venues['Venue Category'].unique())
unique_venue_categories

['Restaurant',
 'Sandwich Place',
 'Bakery',
 'Pizza Place',
 'Bookstore',
 'Food',
 'Bar',
 'Park',
 'Trail',
 'Hardware Store',
 'American Restaurant',
 'Post Office',
 'Dessert Shop',
 'Butcher',
 'Italian Restaurant',
 'Pharmacy',
 'Arts & Crafts Store',
 'Chinese Restaurant',
 'Salon / Barbershop',
 'Rental Car Location',
 'Thrift / Vintage Store',
 'Diner',
 'Indian Restaurant',
 'Deli / Bodega',
 'Ice Cream Shop',
 'Auto Dealership',
 'Coffee Shop',
 'Fast Food Restaurant',
 'Convenience Store',
 'Optical Shop',
 'Discount Store',
 'Donut Shop',
 'Medical Supply Store',
 'Grocery Store',
 'Gas Station',
 'Thai Restaurant',
 'Intersection',
 'Automotive Shop',
 'Eastern European Restaurant',
 'Shopping Mall',
 'Electronics Store',
 'Gym',
 'Board Shop',
 'Liquor Store',
 'Marijuana Dispensary',
 'Snack Place',
 'Golf Course',
 'Cosmetics Shop',
 'Food Truck',
 'Home Service',
 'Flower Shop',
 'Event Service',
 'Record Shop',
 'Burger Joint',
 'Wings Joint',
 'Pet Store',
 'Event 

After getting the dataframe, I realized there were 287 Venue Categories total, with many not pertinent to restaurants. So I then went through all the unique Venue Categories and only kept the ones that pertained to a restaurant. I made my own discretion as to what to consider to be restaurants. The filtered restaurants data is shown below.

In [60]:
restaurant_venue_categories = set(['Restaurant','Sandwich Place','Pizza Place','Food','Bar','American Restaurant','Italian Restaurant','Chinese Restaurant','Diner','Indian Restaurant',
                                  'Deli / Bodega','Fast Food Restaurant','Thai Restaurant','Eastern European Restaurant','Food Truck','Burger Joint','Wings Joint','Middle Eastern Restaurant',
                                  'Irish Pub','Sports Bar','Mexican Restaurant','Fried Chicken Joint','Asian Restaurant','Breakfast Spot','BBQ Joint', 'Cajun / Creole Restaurant','Pub',
                                  'Hot Dog Joint','Gluten-free Restaurant','Café','Sushi Restaurant','Greek Restaurant','Mediterranean Restaurant','Steakhouse','Seafood Restaurant','Bistro',
                                  'Cocktail Bar','Vietnamese Restaurant','Beer Garden','Gastropub','Taco Place','French Restaurant','Beer Bar','Korean Restaurant','Noodle House',
                                  'Japanese Restaurant','Mongolian Restaurant','Soup Place','Ethiopian Restaurant','Theme Restaurant','Whisky Bar','Polish Restaurant','Salad Place',
                                  'Yemeni Restaurant','Latin American Restaurant','Falafel Restaurant','Cuban Restaurant','Fish & Chips Shop','Tex-Mex Restaurant','Hungarian Restaurant',
                                  'Buffet'])
for category in restaurant_venue_categories:
    total = (dma_venues['Venue Category']==category).sum()
    print("Category: ", category," has ",total," results.")

Category:  Diner  has  54  results.
Category:  Restaurant  has  30  results.
Category:  Fried Chicken Joint  has  22  results.
Category:  Asian Restaurant  has  9  results.
Category:  Indian Restaurant  has  3  results.
Category:  Salad Place  has  3  results.
Category:  Eastern European Restaurant  has  2  results.
Category:  Noodle House  has  2  results.
Category:  Wings Joint  has  9  results.
Category:  Japanese Restaurant  has  5  results.
Category:  Sports Bar  has  17  results.
Category:  Mexican Restaurant  has  44  results.
Category:  Ethiopian Restaurant  has  1  results.
Category:  Tex-Mex Restaurant  has  2  results.
Category:  Fast Food Restaurant  has  82  results.
Category:  Pub  has  14  results.
Category:  Mediterranean Restaurant  has  15  results.
Category:  Deli / Bodega  has  19  results.
Category:  Bistro  has  4  results.
Category:  Breakfast Spot  has  19  results.
Category:  Steakhouse  has  10  results.
Category:  Theme Restaurant  has  1  results.
Category: 

In [61]:
dma_restaurant_venues = dma_venues[dma_venues['Venue Category'].isin(restaurant_venue_categories)]
dma_restaurant_venues.groupby(['County','Venue','Venue Latitude','Venue Longitude']).ngroups

1068

In [62]:
dma_restaurant_venues.reset_index(drop=True,inplace=True)
dma_restaurant_venues

Unnamed: 0,City,County,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Armada,Macomb,42.844196,-82.884372,Papa's,42.844017,-82.884001,Restaurant
1,Armada,Macomb,42.844196,-82.884372,SUBWAY,42.844247,-82.884455,Sandwich Place
2,Armada,Macomb,42.844196,-82.884372,Tivoli's Pizza,42.844254,-82.883684,Pizza Place
3,Armada,Macomb,42.844196,-82.884372,Chap's Food & Spirits,42.844008,-82.883720,Restaurant
4,Armada,Macomb,42.844196,-82.884372,Papa's of Armada Family Restaurant,42.843985,-82.883772,Food
...,...,...,...,...,...,...,...,...
1103,Wyandotte,Wayne,42.200662,-83.151016,Big Boy Restaurant,42.199199,-83.151626,American Restaurant
1104,Wyandotte,Wayne,42.200662,-83.151016,Brooklyn's,42.199880,-83.168524,Bar
1105,Wyandotte,Wayne,42.200662,-83.151016,Wendy’s,42.198147,-83.152044,Fast Food Restaurant
1106,Wyandotte,Wayne,42.200662,-83.151016,Henry's On The River,42.209764,-83.147036,Café


Note that it is possible that the same restaurant could be obtained from two different API calls of neighborhoods of cities. However, when processing later we can ignore duplicates as necessary. It turns out that this situation happens around 50 times in our total of 1108 restaurants found in the DMA.

It also is possible that there may be restaurants that indeed represent the same restaurant but aren't identical in the dataframe. Because there isn't an obvious way to determine if two restaurants are identical from the data, we will keep the data as is.

We have all the relevant data needed now to proceed forward in our project. Any additional manipulations and presentations of data will be specified in future sections, but will all originate from the three types of data presented here.

## Methodology

## Results

## Discussion

## Conclusion