# Introduction

##### In a city of your choice, if someone is looking to open a restaurant, where would you recommend that they open it? Similarly, if a contractor is trying to start their own business, where would you recommend that they setup their office?

For this assignment we're going to use the former statement in the question above and say that we have a client looking to open a restaurant in Detroit, Michigan, and they want to know the best location for it. We can narrow it down to a radius of 1/2 of a mile (~800m) from the heart of downtown (known as Campus Martius). Let's say this restaurant will be a seafood restaurant. Detroit has a wide variety of restaurants downtown and we'll be doing this research with the help of Foursquare and their vast location dataset on the city and its venues. By opening this new seafood restaurant this will allow citizens a wider variety to choose from when dining downtown and will generate revenue itself, surrounding venues, and for the city as a whole.

# Data

We'll be using the Foursquare API to request location data that we'll be using similarly to how we've used it in the New York and Toronto assignments. We'll start by specifying our initial location (Campus Martius) and setting our 800m radius, and then exploring the neighborhood for similar restaurants. We'll want to look for an area that has a decent amount of foot traffic and is decently far enough away from other seafood restaurants so the market in one area isn't over-saturated. We should take into consideration the sports arenas, casinos, and other entertainment venues as that will generate the most foot traffic. Main streets are ideal for cars driving by *and* foot traffic, but will most likely be more expensive to rent out. We'll try and find the fairest compromise for a neighborhood and see what the best option will be.

Our relevant data needed to begin:
- `address`: `'Detroit, MI'`, but we'll be specifying Campus Martius using it's latitude and longitude.
- `Campus_lat` and `Campus_lon`: these are the latitude and longitude variable for Campus Martius to specify the heart of our city in question.

- `radius`: This will be set to 800 (in meters) to specify our target radius.

###### Foursquare API variables
- `CLIENT_ID = 'client_id'` Foursquare ID
- `CLIENT_SECRET = 'client_secret'` Foursquare Secret key
- `VERSION = '20180605'` Foursquare API version
- `LIMIT = limit_num` Limit of number of venues returned by Foursquare API

---------------------------

From all the above data we'll then be able to collect a whole bunch of information by using the following URL and requesting Foursquare's information based on our query.

- ```url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    Campus_lat, 
    Campus_lon, 
    radius, 
    LIMIT)'
```

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [2]:
address = 'Detroit, MI'
Campus_lat = 42.3318
Campus_lon = -83.0466
radius = 800
LIMIT = 500

CLIENT_ID = 'OCSANRSKD4JUKC0SNU3OEVYHVDTTYZH0APEJ512LT2ORO5BG' # your Foursquare ID
CLIENT_SECRET = 'HTWUBMYI1HGF3QLBWECH0MU4IXDC2PE5FEC50S0FPI0D0TRL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
  CLIENT_ID, 
  CLIENT_SECRET, 
  VERSION, 
  Campus_lat, 
  Campus_lon, 
  radius, 
  LIMIT)

print('URL created!')

URL created!


### Now let's take a look at some results from our query

In [3]:
results = requests.get(url).json()
#results

In [4]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Here's our list of venues from the JSON data from Foursquare

In [5]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Avalon Cafe and Bakery,Café,42.332834,-83.047694
1,Campus Martius,Park,42.331575,-83.046598
2,Texas de Brazil,Steakhouse,42.332293,-83.046711
3,Dime Store,American Restaurant,42.331039,-83.047734
4,Chase Bank,Bank,42.330676,-83.046648
5,Bon Bon Bon,Dessert Shop,42.330548,-83.047914
6,Athens Souvlaki Detroit,Diner,42.33036,-83.048183
7,Detroit Water Ice Factory,Ice Cream Shop,42.332575,-83.047436
8,Lafayette Coney Island,Hot Dog Joint,42.331683,-83.04878
9,Grand Trunk Pub,Pub,42.330403,-83.045943


### Lots of venues in that list. We see banks, cafes, and various restaurants. So let's narrow our list down to items that *only* contain the word "restaurant" in them.

In [6]:
restaurants = nearby_venues[nearby_venues['categories'].str.contains("Restaurant")]
restaurants

Unnamed: 0,name,categories,lat,lng
3,Dime Store,American Restaurant,42.331039,-83.047734
16,The Standby,New American Restaurant,42.334439,-83.046009
19,Parc,American Restaurant,42.331564,-83.0467
23,Vicente's Cuban Cuisine,Cuban Restaurant,42.334436,-83.047193
24,Orchid Thai,Thai Restaurant,42.333579,-83.04522
28,Maru Sushi & Grill,Japanese Restaurant,42.33036,-83.048269
44,Townhouse Detroit,New American Restaurant,42.330305,-83.045361
45,Sweetwater Tavern,American Restaurant,42.331861,-83.041839
46,Caucus Club,American Restaurant,42.329436,-83.047551
47,Central Kitchen + Bar,American Restaurant,42.331518,-83.045962


In [7]:
restaurants.shape

(22, 4)

### So now we have a dataframe of 22 restaurants within 800 meters of Campus Martius. Let's see how many are seafood places.

In [8]:
seafood_restaurants = nearby_venues[nearby_venues['categories'].str.contains("Seafood")]
seafood_restaurants

Unnamed: 0,name,categories,lat,lng
96,Fishbone's Rhythm Kitchen Cafe,Seafood Restaurant,42.334376,-83.043068


### Okay, so there's only one. Not bad in terms of competition! So now we don't have to worry about saturating the market with seafood places. So lets see where it's at on the map of Detroit in relation to Campus Martius.

In [9]:
locations = seafood_restaurants[['lat', 'lng']]
locationlist = locations.values.tolist()
len(locationlist)

1

In [27]:
Detroit = folium.Map(location=[Campus_lat, Campus_lon], zoom_start=15)

# define our radius the restaurant should be in 
folium.Circle([Campus_lat, Campus_lon],
              radius=radius,
              color='black'
             ).add_to(Detroit)

# make a large circle for Campus Martius in green
folium.CircleMarker(
        [Campus_lat, Campus_lon],
        radius=40,
        popup='Campus Martius',
        color='green',
        fill=True,
        fill_color='#228B22',
        fill_opacity=0.7,
        parse_html=False).add_to(Detroit)

# make a circle for Fishbone's Cafe in blue
folium.CircleMarker(
        [42.334376, -83.043068],
        radius=10,
        popup='Fishbone Cafe',
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Detroit) 

Detroit

### Fishbone's isn't all that far from Campus Martius at all it seems. It's right in Greektown and that's a fairly busy area. There's even a casino around the corner. So let's look for a similar location within our limits.

### From looking at our list of restaurants and the map of Detroit we can see Maru Sushi, Which wich?, and a Dime Store all on the same block and relatively close to Campus Martius. Since Maru is not listed as "seafood" in Foursquare's database, we can chalk it up as *uncooked* seafood while our restaurant will serve *cooked* seafood like Fishbone's does.

### So now, let's find a building to rent near the area. On the site *42floors* we're able to draw a map and find places for rent in that specified area. So I drew roughly an 800 meter radius and got a list of places available. We'll scrape that data and make it a dataframe.

In [11]:
from bs4 import BeautifulSoup

floors_url = 'https://42floors.com/us/mi/detroit/downtown-detroit?regions%5B%5D=90816&within=0106000020e610000001000000010300000001000000090000002fd8209e22c354c03cbccc48af2b45405fd7209ea9c354c0f894cd26c62b454038da20c6ccc354c05c5f329a682b45406ad62086c1c354c03cfc203bdb2a454051d82066f1c254c02001f740a72a45405bda20cec5c254c02c98955d622b4540d1d920ce1fc354c03cbccc48af2b45402fd8209e22c354c03cbccc48af2b45402fd8209e22c354c03cbccc48af2b4540'
    
r = requests.get(floors_url) 
  
soup = BeautifulSoup(r.content, 'lxml') 
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta content="authenticity_token" name="csrf-param"/>
  <meta content="RmaRI7JeWBBgBq6EY4XARL4CvSwTZCmXFQnqivyWl+6Rrt/EKAvvCii/KoyuHgRHYrDgNpAWKffOtXBnyBxsYg==" name="csrf-token"/>
  <title>
   Downtown Detroit
    Commercial Real Estate | 42Floors
  </title>
  <meta content="Search 667 Downtown Detroit office spaces. Find all the most flexible and affordable office rentals and leases. See real photos of listings." name="description"/>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
  <meta content="EN" http-equiv="Content-Language"/>
  <meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0" name="viewport"/>
  <meta content="no-cache" name="turbolinks-cache-control"/>
  <link href="https://42floors.com/us/mi/detroit/downtown-detroit" rel="canonical"/>
  <script type="text/javascript">
   MAP_PROVIDER_STYLE = 

### That's a lot of HTML data. We only need the table portion on the left side of the website. The table looks something like this.

![The table we want](office-spaces.png)

In [12]:
table = soup.findAll('span', attrs={'class':'title block text-overflow text-blue'})
table

[<span class="title block text-overflow text-blue">
                 1452 Brush St
                 <span class="text-light">
                     -
                         3 Spaces
                 </span>
 </span>, <span class="title block text-overflow text-blue">
                 1442 Brush St
                 <span class="text-light">
                     -
                         Floor 1
                 </span>
 </span>, <span class="title block text-overflow text-blue">
                 2310 Park Ave
                 <span class="text-light">
                     -
                         Floor 5
                 </span>
 </span>, <span class="title block text-overflow text-blue">
                 1323 Broadway St
                 <span class="text-light">
                     -
                         Unit 101 (Floor 1)
                 </span>
 </span>, <span class="title block text-overflow text-blue">
                 2233 Park Ave
                 <span class="text-lig

In [13]:
addresses = []

for address in range(0, len(table)):
    addresses.append(table[address].text)
    
addresses

['\n                1452 Brush St\n                \n                    -\n                        3 Spaces\n                \n',
 '\n                1442 Brush St\n                \n                    -\n                        Floor 1\n                \n',
 '\n                2310 Park Ave\n                \n                    -\n                        Floor 5\n                \n',
 '\n                1323 Broadway St\n                \n                    -\n                        Unit 101 (Floor 1)\n                \n',
 '\n                2233 Park Ave\n                \n                    -\n                        Space\n                \n',
 '\n                1529 Broadway St\n                \n                    -\n                        Unit 2\n                \n',
 '\n                28 W Adams Ave\n                \n                    -\n                        Unit 101\n                \n']

### So now we have our scraped addresses. Let's clean up each array element that contains and address and make it into a usable value.

In [14]:
for _ in range(0,len(addresses)):
    addresses[_] = " ".join(addresses[_].split())
               
addresses

['1452 Brush St - 3 Spaces',
 '1442 Brush St - Floor 1',
 '2310 Park Ave - Floor 5',
 '1323 Broadway St - Unit 101 (Floor 1)',
 '2233 Park Ave - Space',
 '1529 Broadway St - Unit 2',
 '28 W Adams Ave - Unit 101']

### Now we'll get rid of the extraneous information and whitespace after the address.

In [15]:
for _ in range(0,len(addresses)):
    addresses[_] = addresses[_].split('-', 1)[0]
    addresses[_] = addresses[_][:-1]
    
addresses

['1452 Brush St',
 '1442 Brush St',
 '2310 Park Ave',
 '1323 Broadway St',
 '2233 Park Ave',
 '1529 Broadway St',
 '28 W Adams Ave']

### Let's do the same thing for the type of space each proprty could be used for now. This is shown in the right-most column in the figure above.

In [16]:
table = soup.findAll('span', attrs={'class':'width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0'})
table

[<span class="width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0">
             Retail<br/>Restaurant<br/>Office
         </span>,
 <span class="width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0">
             Office<br/>Retail
         </span>,
 <span class="width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0">
             Office
         </span>,
 <span class="width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0">
             Office
         </span>,
 <span class="width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0">
             Retail<br/>Restaurant
         </span>,
 <span class="width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0">
             Office
         </span>,
 <span class="width-90-px-md col-4-sm col-center col-gutter-none-sm text

In [17]:
building_types = []

for types in range(0, len(table)):
    building_types.append(table[types].text)
    building_types[types] = " ".join(building_types[types].split())
    
building_types

['RetailRestaurantOffice',
 'OfficeRetail',
 'Office',
 'Office',
 'RetailRestaurant',
 'Office',
 'SubleaseOffice']

### We'll do a more efficient cleanup of the data once we make a dataframe.

In [18]:
detroit_table = pd.DataFrame(list(zip(addresses, building_types)), 
               columns =['Address', 'Building Type'])

detroit_table

Unnamed: 0,Address,Building Type
0,1452 Brush St,RetailRestaurantOffice
1,1442 Brush St,OfficeRetail
2,2310 Park Ave,Office
3,1323 Broadway St,Office
4,2233 Park Ave,RetailRestaurant
5,1529 Broadway St,Office
6,28 W Adams Ave,SubleaseOffice


### This looks good so far. So let's actually plot out these points on a map so we can figure out which is closest to Campus Martius and is a good fit for a restaurant.

In [20]:
building_lats = []
building_lons = []

zipcode = '48226'
city = 'Detroit, MI'

geolocator = Nominatim(user_agent="detroit_app")

for _ in range(0,len(addresses)-1):
    place = addresses[_] + ' ' + city + ' ' + zipcode
    location = geolocator.geocode(place)
    building_lats.append(location.latitude)
    building_lons.append(location.longitude)
    print(location.latitude, location.longitude)


42.3364310475167 -83.0449531912913
42.3363419274117 -83.044893667659
42.3383745102041 -83.0539696326531
42.3348050816327 -83.0460905714286
42.3378896 -83.0538867
42.3358776 -83.0488959


### For some reason we cannot retrieve "28 W Adams Ave" using geopy. Looking at the table, that's okay since we can just delete that line altogether with it being a subleased office space and we're looking for something to hold a restaurant.

In [21]:
detroit_table = detroit_table.drop([6])
detroit_table

Unnamed: 0,Address,Building Type
0,1452 Brush St,RetailRestaurantOffice
1,1442 Brush St,OfficeRetail
2,2310 Park Ave,Office
3,1323 Broadway St,Office
4,2233 Park Ave,RetailRestaurant
5,1529 Broadway St,Office


### Let's append our latitudes and longitudes to the `detroit_table` dataframe.

In [22]:
detroit_table['Latitude'] = building_lats
detroit_table['Longitude'] = building_lons

detroit_table

Unnamed: 0,Address,Building Type,Latitude,Longitude
0,1452 Brush St,RetailRestaurantOffice,42.336431,-83.044953
1,1442 Brush St,OfficeRetail,42.336342,-83.044894
2,2310 Park Ave,Office,42.338375,-83.05397
3,1323 Broadway St,Office,42.334805,-83.046091
4,2233 Park Ave,RetailRestaurant,42.33789,-83.053887
5,1529 Broadway St,Office,42.335878,-83.048896


### Let's plot our building locations now and append them to the map of Detroit we already have.

In [28]:
for _ in range(0,len(addresses)-1):
    folium.Marker([building_lats[_], building_lons[_]], popup=addresses[_]).add_to(Detroit)
    
Detroit

### Looks like 2 places are outside of our radius, so they're disqualified. 1452 and 1442 overlap each other on the map. Luckily, since 1452 is in our radius *and* is a restaurant space, we may have found our answer to this problem.

# Results

Since 1452 Brush St, Detroit, MI, 48226 is listed as a restaurant space and is within 800 meters of Campus Martius, this appears to be the best location for our client's new business. It's relatively close to a heavy foot-traffic area and would be perfect for a new seafood restaurant since it's only got one competitor to keep up with in the neighborhood.

# Discussion

There are a few different approaches one could take to solving this problem. This is the most "brute-force" approach. We collected the data relevant to our starting location being Campus Martius and our specified radius. From there we built our map using the generated venue data that included seafood restauirants. To our surprise, there was only one we had to take into consideration in the radius. Once we built the map we then scraped building location data from a real estate website and cross referenced it using the geopy API to get building latitudes and longitudes to be mapped using Folium. From there we narrowed down our selection to include buildings suitable for restaurants and made our location selection.

Another method of doing this same task would be to use the Foursquare API to explore the areas within 800 meters of Campus Martius. We could use one-hot encoding to then find restaurants that are related to seafood. Once we had our areas that did/didn't have seafood restaurants we could then use KNN clustering to determine the areas where there is a lack of seafood choices. We could then do the same analysis we did with finding a suitable building for our client's restaurant and make our decision with the results.

# Conclusion

This project as a whole was a great learning experience on how to scrape data and use it effectiely to solve a business problem that exists for business owners. I would love to revisit this problem in the future and use more advanced techniques to solve it with relative ease. I would highly recommend that any and all aspiring data scientists practice with a problem such as this to refine their own skills. 