# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Opening an Asian Restaurant in Austin, Texas <a name="introduction"></a>

In this project the problem we will try to address is finding an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Asian restaurant** in **Austin, Texas**.

Opening a restaurant is no easy task due to the numerous factors involved such as cost, demand, and location. 

As such, there are many ways to narrow down this problem and how to solve it, so we'll start by saying that we'd like to find **locations that are not already crowded with restaurants**. Beyond that, it would obviously be advantageous to find **areas with no or minimal Asian restaurants in the vicinity**. Finally, if those two conditions can be met, we'll make the assumption that **being closer to the city center is better.**

Using data science, we'll narrow Austin's neighborhoods based on these conditions. Then we can further analyze candidates to find the best possible locations for stakeholders to choose.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decision are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Asian restaurants in the neighborhood, if any
* distance of neighborhood from city center

Since data breaking down specific neighborhoods in Austin is difficult to find and may not meet our needs anyway, we'll do it ourselves by creating equal chunks spaced from the center and call those our neighborhoods.

The following data sources will be needed to extract/generate the required information:
* Austin's center coordinates will be obtained using **Google Maps API geocoding**
* centers of candidate areas will be generated algorithmically and their approximate addresses will be obtained using **Google Maps API reverse geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**

First we'll organize the city into neighborhoods using the Google Maps API, then we'll use the Foursquare API to discover restaurant details about each of those neighborhoods. At that point, we'll be able to identify regions and neighborhoods that deserve futher analysis. Once we find an area or neighborhood that seems optimal based on our problem criteria, we can use clusters to break it down further and finally use the Google Maps API to get the addresses of those clusters.

In the end, the hope is that these cluster addresses will represent ideal areas for stakeholders to use as a solution to the business problem.

### Defining Neighborhoods with the Google Maps API

The first step is to get coordinates and create our neighborhoods. We will build groups 6km in each direction around Austin's city center, in other words it should approximate a circle around Austin's center with a 6km radius.

First we'll find the latitude & longitude of Austin's city center, using the Google Maps geocoding API. The hidden cell below contains credentials for this process.

In [1]:
# @hidden_cell

In [2]:
import requests

def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = 'Austin, Texas'
austin_center = get_coordinates(google_api_key, address)
print('Coordinates of {}: {}'.format(address, austin_center))

Coordinates of Austin, Texas: [30.267153, -97.7430608]


Now let's create a grid of area candidates, equally spaced, centered around the city center and within ~6km. Our neighborhoods will be defined as circular areas with radii of 300 meters, so our neighborhood centers will be 600 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in  meters).

Fortunately these conversions work accurately with negative numbers so we will not need to worry about that when calculating distances.

In [3]:
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Austin center longitude={}, latitude={}'.format(austin_center[1], austin_center[0]))
x, y = lonlat_to_xy(austin_center[1], austin_center[0])
print('Austin center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Austin center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Austin center longitude=-97.7430608, latitude=30.267153
Austin center UTM X=-6446199.9680359475, Y=13719627.309461577
Austin center longitude=-97.7430608000033, latitude=30.267152999998057


Rather than a perfect circle, we'll make the cells hexagonal, offset every other row, and adjust the vertical row spacing so that **every cell center is equally distant from all it's neighbors**.

In [4]:
austin_center_x, austin_center_y = lonlat_to_xy(austin_center[1], austin_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = austin_center_x - 6000
x_step = 600
y_min = austin_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(austin_center_x, austin_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


Let's visualize the data we have so far: city center location and candidate neighborhood centers.

In [5]:
!pip install folium
import folium



In [6]:
map_austin = folium.Map(location=austin_center, zoom_start=13)
folium.Marker(austin_center, popup='City Center').add_to(map_austin)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=250, color='blue', fill=False).add_to(map_austin)
map_austin

OK, we now have the center coordinates of the neighborhoods/areas to be evaluated, equally spaced (distance from every point to it's neighbors is exactly the same) and within ~6km from Austin. 

Let's now use the Google Maps API to get approximate addresses of those locations.

In [7]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(google_api_key, austin_center[0], austin_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(austin_center[0], austin_center[1], addr))

Reverse geocoding check
-----------------------
Address of [30.267153, -97.7430608] is: 4320 Congress Ave, Austin, TX 78701, USA


In [8]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(google_api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', USA', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [9]:
addresses[70:100]

['Etter-Harbin Alumni Center, 2110 San Jacinto Blvd, Austin, TX 78712',
 "Peter O'Donnell Jr Building, 201 E 24th St, Austin, TX 78712",
 'Moffett Molecular Biology Building, 2500 Speedway, Austin, TX 78712',
 'Guadalupe St & W 27th St, Austin, TX 78705, United States',
 '2813 Rio Grande St, Austin, TX 78705',
 '808 W 29th St, Austin, TX 78705',
 '1101 Belmont Pkwy, Austin, TX 78703',
 '3015 E 3rd St, Austin, TX 78702',
 '403 N Pleasant Valley Rd, Austin, TX 78702',
 '2511 E 6th St, Austin, TX 78702',
 '2321 E 7th St, Austin, TX 78702',
 '2106 E 9th St, Austin, TX 78702',
 '1800 E 11th St, Austin, TX 78702',
 '1132 Concho St, Austin, TX 78702',
 '1179 San Bernard St, Austin, TX 78702',
 '1010 E 13th St, Austin, TX 78702',
 '1617 1/2 I-35, Austin, TX 78702',
 '1810 Red River St, Austin, TX 78701',
 'Basketball Support Building, 301 Jester Cir, Austin, TX 78712',
 '2100 Speedway, Austin, TX 78705',
 'Peter T. Flawn Academic Center, 2304 Whitis Ave, Austin, TX 78705',
 'San Antonio Garage

The fact that it's populated in this range is a good sign. Now we'll convert the data into a pandas dataframe to make it easier to work with.

In [10]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"1610 Clifford Ave, Austin, TX 78702",30.279408,-97.708109,-6448000.0,13713910.0,5992.495307
1,"1904 Alexander Ave, Austin, TX 78702",30.281937,-97.710507,-6447400.0,13713910.0,5840.3767
2,"2712 E 22nd St, Austin, TX 78722",30.284467,-97.712905,-6446800.0,13713910.0,5747.173218
3,"3211 Hemlock Ave, Austin, TX 78722",30.286997,-97.715304,-6446200.0,13713910.0,5715.767665
4,"3306 French Pl, Austin, TX 78722",30.289527,-97.717703,-6445600.0,13713910.0,5747.173218
5,"3414 Robinson Ave, Austin, TX 78722",30.292057,-97.720102,-6445000.0,13713910.0,5840.3767
6,"916 E 37th St, Austin, TX 78705",30.294587,-97.722502,-6444400.0,13713910.0,5992.495307
7,"1184 1/2 Sol Wilson Ave, Austin, TX 78702",30.273811,-97.707036,-6448900.0,13714430.0,5855.766389
8,"2816 E 12th St, Austin, TX 78702",30.276341,-97.709434,-6448300.0,13714430.0,5604.462508
9,"1609 Ulit Ave, Austin, TX 78702",30.27887,-97.711832,-6447700.0,13714430.0,5408.326913


We're going to move to collecting data with Foursquare, but first it's a good idea to save what we've worked on so we can easily recall it later for analysis.

In [11]:
df_locations.to_pickle('./locations.pkl') 

### Gathering Restaurant Data with the Foursquare API
Now that we have our location candidates, let's use the Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are proper restaurants. Fast food and the like are not direct competitors so we don't care about those. We will include in our list only venues that have 'restaurant' in their category name, and we'll make sure to detect and include all the subcategories of specific 'Asian restaurant' category, as we need info on Asian restaurants in the neighborhood.

**Note that Asian cuisine is diverse and includes potential outliers such as Indonesian or Mongolian restaurants.** We will not differentiate in our data collection, but depending on the subcategory of Asian restaurant that stakeholders are interested in, it might be prudent to refine data collection at this stage in the future.

Foursquare credentials are defined in hidden cell below.

In [12]:
# @hidden_cell

In [13]:
# Category IDs corresponding to Asian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

asian_restaurant_categories = ['4bf58dd8d48988d142941735','56aa371be4b08b9a8d573568','52e81612bcbc57f1066b7a03',
                                 '4bf58dd8d48988d145941735','52af3a7c3cf9994f4e043bed','58daa1558bbb0b01f18ec1d3',
                                 '4bf58dd8d48988d1f5931735','52af3a9f3cf9994f4e043bef','52af3aaa3cf9994f4e043bf0',
                                 '52af3ac83cf9994f4e043bf3','52af3afc3cf9994f4e043bf8','52af3b463cf9994f4e043bfe',
                                 '52af3b593cf9994f4e043c00','52af3b773cf9994f4e043c03','52af3b813cf9994f4e043c04',
                                 '52af3b913cf9994f4e043c06','4eb1bd1c3b7b55596b4a748f','52e81612bcbc57f1066b79fb',
                                 '52af0bd33cf9994f4e043bdd','4deefc054765f83613cdba6f','4bf58dd8d48988d111941735',
                                 '55a59bace4b013909087cb30','55a59bace4b013909087cb24','55a59bace4b013909087cb15',
                                 '55a59bace4b013909087cb27','4bf58dd8d48988d1d2941735','55a59bace4b013909087cb2a',
                                 '4bf58dd8d48988d113941735','4bf58dd8d48988d156941735','4eb1d5724b900d56c88a45fe',
                                 '4bf58dd8d48988d1d1941735','56aa371be4b08b9a8d57350e','4bf58dd8d48988d149941735',
                                 '52af39fb3cf9994f4e043be9','4bf58dd8d48988d14a941735']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', United States', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [14]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found Asian restaurants

import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    asian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, foursquare_client_id, foursquare_client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_asian = is_restaurant(venue_categories, specific_filter=asian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_asian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_asian:
                    asian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, asian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
asian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('asian_restaurants_350.pkl', 'rb') as f:
        asian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, asian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('asian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(asian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)
        

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [15]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Asian restaurants:', len(asian_restaurants))
print('Percentage of Asian restaurants: {:.2f}%'.format(len(asian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 410
Total number of Asian restaurants: 91
Percentage of Asian restaurants: 22.20%
Average number of restaurants in neighborhood: 2.5961538461538463


On the bright side, the restaurants in the city are relatively well-spaced out, as indicated by there being only ~2.5 restaurants per neighborhood. However, 22% of all restaurants near Austin's city center are Asian restaurants, which means there's a fairly high density of competitors in the market.

In [16]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('52d1b65711d28feb21a82899', 'King & Country Food Co.', 30.279188, -97.709907, '2921 E 17th St Bldg A (Alexander), Austin, TX 78702', 188, False, -6447849.404408395, 13714157.854252627)
('593712351543c7473c5b2775', 'Emojis Grilled Cheese Bar', 30.282999799981486, -97.71003395318984, '2830 Real St (martin luther king jr blvd), Austin, TX 78702', 126, False, -6447297.453089695, 13713729.833113099)
('4c13a89a7f7f2d7fee64df68', 'Dai Due Butcher Shop & Supper Club', 30.284906130396855, -97.71690416595273, '2406 Manor Rd, Austin, TX 78722', 267, False, -6446333.994918442, 13714351.196234656)
('4ba26ebcf964a52081f837e3', 'Cafe Hornitos', 30.294147490790813, -97.72052252219598, '3704 N. I-35 (at 38th St.), Austin, TX 78705', 196, False, -6444661.978846006, 13713719.853455022)
('5a58f9ebc47cf954ca472db0', 'Yoshi Ramen Austin', 30.291614146388497, -97.72315893725829, '3320 Harmon Ave (Duncan Ln), Austin, TX 78705', 65, True, -6444753.925698954, 137

In [17]:
print('List of Asian restaurants')
print('---------------------------')
for r in list(asian_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(asian_restaurants))

List of Asian restaurants
---------------------------
('5a58f9ebc47cf954ca472db0', 'Yoshi Ramen Austin', 30.291614146388497, -97.72315893725829, '3320 Harmon Ave (Duncan Ln), Austin, TX 78705', 65, True, -6444753.925698954, 13714338.056531286)
('4cd9ffb3fc97370498adc505', "Cuauhtli's Ramen Stand", 30.277992994943272, -97.71543521368247, 'Austin, TX 78702', 39, True, -6447459.950883178, 13714975.517933454)
('4a9c6599f964a5200c3720e3', 'Thai Kitchen', 30.296772442633426, -97.74180623939364, '3009 Guadalupe St, Austin, TX 78705', 309, True, -6442141.892452672, 13716024.664604487)
('4beed9aa2c082d7f62fa3042', 'Pad Thai', 30.29958160670869, -97.74031572273279, '3208A Guadalupe St, Austin, TX 78705', 295, True, -6441895.522779486, 13715515.034727428)
('5b444891065ef500394c610c', 'Korean Komfort', 30.30093, -97.73895, '3423 Guadalupe St, Austin, TX 78705', 325, True, -6441842.907203354, 13715190.704929693)
('4a465a7cf964a520baa81fe3', 'Magic Wok', 30.293054810398377, -97.74168408535371, '2716

In [18]:
print('Restaurants around location')
print('---------------------------')
for i in range(100, 110):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

Restaurants around location
---------------------------
Restaurants around location 101: Suerte, East Side King Thai-kun, Tamale House East, It's Italian Cucina & Wine Bar, Nasha, Lefty's, Vixen’s Wedding, Thai-Kun at Whisler's
Restaurants around location 102: Nasha
Restaurants around location 103: Hillside Farmacy, Rosewood, Nissi Vegan, Aimee's Super Fantazmo, Tacos Deliciosos, Nice-n-Ful
Restaurants around location 104: Old Thousand, Nissi Vegan, Aimee's Super Fantazmo
Restaurants around location 105: Brick Oven Restaurant
Restaurants around location 106: 
Restaurants around location 107: 
Restaurants around location 108: The Carillon, Lavaca Teppan, El Mercado, Gabriel's Cafe, Tejas Restaraunt
Restaurants around location 109: Teji's Indian Restaurant, The Carillon, Chipotle Mexican Grill, K-Bop, Pho Thaison, Zarab's Kabobs, Thai, How Are You, Gabriel's Cafe
Restaurants around location 110: Don Japanese Kitchen, Teji's Indian Restaurant, Little Sheep Mongolian Hot Pot, Sushi Niichi,

In order to get a better handle on this data, we'll visualize all the collected restaurants with a folium map. To make it even easier, we'll display Asian restaurants in red while other restaurants are displayed blue.

In [19]:
map_austin = folium.Map(location=austin_center, zoom_start=13)
folium.Marker(austin_center, popup='City Center').add_to(map_austin)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_asian = res[6]
    color = 'red' if is_asian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_austin)
map_austin

So now we have all the restaurants within a few kilometers of Austin's city center, and we know which ones are Asian restaurants. We also know which restaurants exactly are in vicinity of every neighborhood candidate center.

With this data, we should be ready to move into the analysis phase. We'll break this down so that we can find the optimal location to open a new Asian restaurant!

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Austin that have low restaurant density, particularly those with low number of Asian restaurants. We will limit our analysis to area ~6km around city center.

In first step we collected the required **data: location and type (category) of every restaurant within 6km from Austin's city center**. We also **identified Asian restaurants**, not including fast food, as classified by Foursquare.

Second step in our analysis will be calculation and exploration of '**restaurant density**' across different areas of Austin - we will use **heatmaps** to identify a few promising areas close to Austin's center with a low number of restaurants in general (also ideally with no Asian restaurants in the vicinity) and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in our discussion with stakeholders: we will take into consideration locations with **no more than two restaurants in radius of 250 meters**, and we want locations **without Asian restaurants in radius of 400 meters**. 

We will present a map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for the final 'street level' exploration and the search for optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the **number of restaurants in every area candidate**:

In [20]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

Average number of restaurants in every area with radius=300m: 2.5961538461538463


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area
0,"1610 Clifford Ave, Austin, TX 78702",30.279408,-97.708109,-6448000.0,13713910.0,5992.495307,1
1,"1904 Alexander Ave, Austin, TX 78702",30.281937,-97.710507,-6447400.0,13713910.0,5840.3767,1
2,"2712 E 22nd St, Austin, TX 78722",30.284467,-97.712905,-6446800.0,13713910.0,5747.173218,0
3,"3211 Hemlock Ave, Austin, TX 78722",30.286997,-97.715304,-6446200.0,13713910.0,5715.767665,1
4,"3306 French Pl, Austin, TX 78722",30.289527,-97.717703,-6445600.0,13713910.0,5747.173218,0
5,"3414 Robinson Ave, Austin, TX 78722",30.292057,-97.720102,-6445000.0,13713910.0,5840.3767,2
6,"916 E 37th St, Austin, TX 78705",30.294587,-97.722502,-6444400.0,13713910.0,5992.495307,2
7,"1184 1/2 Sol Wilson Ave, Austin, TX 78702",30.273811,-97.707036,-6448900.0,13714430.0,5855.766389,0
8,"2816 E 12th St, Austin, TX 78702",30.276341,-97.709434,-6448300.0,13714430.0,5604.462508,0
9,"1609 Ulit Ave, Austin, TX 78702",30.27887,-97.711832,-6447700.0,13714430.0,5408.326913,2


OK, now let's calculate the **distance to nearest Asian restaurant from every area candidate center** (not only those within 300m - we want distance to closest one, regardless of how distant it is).

In [21]:
distances_to_asian_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in asian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_asian_restaurant.append(min_distance)

df_locations['Distance to Asian restaurant'] = distances_to_asian_restaurant

In [22]:
df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Asian restaurant
0,"1610 Clifford Ave, Austin, TX 78702",30.279408,-97.708109,-6448000.0,13713910.0,5992.495307,1,1193.173812
1,"1904 Alexander Ave, Austin, TX 78702",30.281937,-97.710507,-6447400.0,13713910.0,5840.3767,1,1065.665596
2,"2712 E 22nd St, Austin, TX 78722",30.284467,-97.712905,-6446800.0,13713910.0,5747.173218,0,1252.047355
3,"3211 Hemlock Ave, Austin, TX 78722",30.286997,-97.715304,-6446200.0,13713910.0,5715.767665,1,1507.631672
4,"3306 French Pl, Austin, TX 78722",30.289527,-97.717703,-6445600.0,13713910.0,5747.173218,0,947.471612
5,"3414 Robinson Ave, Austin, TX 78722",30.292057,-97.720102,-6445000.0,13713910.0,5840.3767,2,492.393796
6,"916 E 37th St, Austin, TX 78705",30.294587,-97.722502,-6444400.0,13713910.0,5992.495307,2,554.257022
7,"1184 1/2 Sol Wilson Ave, Austin, TX 78702",30.273811,-97.707036,-6448900.0,13714430.0,5855.766389,0,1539.473346
8,"2816 E 12th St, Austin, TX 78702",30.276341,-97.709434,-6448300.0,13714430.0,5604.462508,0,1000.978322
9,"1609 Ulit Ave, Austin, TX 78702",30.27887,-97.711832,-6447700.0,13714430.0,5408.326913,2,594.92606


In [23]:
print('Average distance to closest Asian restaurant from each area center:', df_locations['Distance to Asian restaurant'].mean())

Average distance to closest Asian restaurant from each area center: 921.3158331153576


OK, so **on average an Asian restaurant can be found within ~1 km** of each neighborhood center. That's not bad; there's definitely room to work in there!

Let's crete a map showing a **heatmap of restaurant density** and try to extract some meaningfull info from that. Also, let's show a few circles indicating distance of 1km, 2km and 3km from the city center to help put these distances in context.

In [24]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

asian_latlons = [[res[2], res[3]] for res in asian_restaurants.values()]

In [25]:
from folium import plugins
from folium.plugins import HeatMap

map_austin = folium.Map(location=austin_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_austin) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_austin)
folium.Marker(austin_center).add_to(map_austin)
folium.Circle(austin_center, radius=1000, fill=False, color='white').add_to(map_austin)
folium.Circle(austin_center, radius=2000, fill=False, color='white').add_to(map_austin)
folium.Circle(austin_center, radius=3000, fill=False, color='white').add_to(map_austin)
map_austin

Looks like a few pockets of low restaurant density closest to city center can be found **south, south-east and north-east from the city center**. 

Let's create another heatmap map showing **Asian restaurant density** only.

In [26]:
map_austin = folium.Map(location=austin_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_austin) #cartodbpositron cartodbdark_matter
HeatMap(asian_latlons).add_to(map_austin)
folium.Marker(austin_center).add_to(map_austin)
folium.Circle(austin_center, radius=1000, fill=False, color='white').add_to(map_austin)
folium.Circle(austin_center, radius=2000, fill=False, color='white').add_to(map_austin)
folium.Circle(austin_center, radius=3000, fill=False, color='white').add_to(map_austin)
map_austin

This map is 'cooler' than our previous map (due to Asian restaurants representing only a subset of all restaurants in Austin) but it also indicates higher density of existing Asian restaurants right around the city center as well as north and southwest. However, **there are openings with low Asian restaurant density all around, particularly south, south-east, north-east, and north-west of the city center**.

Based on this we will now focus our analysis on areas *south, south-east, north-east, and north-west from Austin's center* - we will move the center of our area of interest and reduce it's size to have a radius of **2.5km**.

### University of Texas at Austin

A general overview of the city using maps and travel guides reveals that the north side of Austin, particularly north-east, is dominated by UT's campus. While this could provide opportunities to attract student customers and it would be possible for stakeholders to work with the university to establish a location on campus, it does also present additional challenges that likely wouldn't be present at another location. The cost of opening a restaurant on or near campus would on average be much higher, and available space would be extremely limited.

These factors would explain the relative lack of Asian restaurants (and restaurants in general) on and around the campus. For this reason, **we will shift our focus to the other open areas around the city's center - south and south-east**, in hopes that locations there will present less hurdles for stakeholders to overcome. At the very least, initial purchase costs should be lower.

### East Cesar Chavez and Travis Heights

East Cesar Chavez represents a district in the south-east portion of the city while Travis Heights is more southerly. These two districts seem like optimal candidates to focus our search around, but first we should do some outside analysis around the web to get a better idea of the area's culture and economy. This will let us know if the districts merit further evaluation.

*"This neighborhood is magical. Every day feels like a vacation. Thanks to the dense amount of trees in the neighborhood and lining the lake, birds singing act as the morning alarm. Close proximity to Lady Bird Lake provides peaceful walks or exercise at one of our city’s greatest amenities. There is no shortage of cafes or restaurants to meet friends at, and an easy meander home with locusts singing and crickets chirping are everything you need to remember you live in the best neighborhood in Austin."* - austin.curbed.com

*"Located just south and east of downtown Austin, the neighborhood offers a variety of bars, restaurants, time-honored taquerias, mom-and-pop shops, and family homes that make East Cesar Chavez a popular destination for locals and tourists alike."* - do512.com

*"Travis Heights is a very popular, historic neighborhood known for it's rich architecture, convenient location, stately homes, dog park, trails, community pool and eclectic residents. Known for: Tourist Attractions."* - airbnb.com

*"Travis Heights is located in a booming part of south central Austin. This isn’t boring suburbia. Travis Heights is known to be a liberal, eclectic area with a diverse population. People in this neighborhood tend to be the type who support the “Keep Austin Weird” attitude, and it’s not uncommon to see liberal political signs in yards or windows. There are plenty of families, but there are also young professionals, artists, and musicians."* - tripsavvy.com

Both locations seem to be popular with locals and tourists, to have economies that are stable if not booming, to have locations decently close to the city center, and to contain a great variety of potential customers. At the least, they appear to justify further analysis.

Let's define a new, more narrow region of interest, which will include low-restaurant-count parts of East Cesar Chavez and Travis Heights.

In [27]:
roi_x_min = austin_center_x - 5000
roi_y_max = austin_center_y + 3000
roi_width = 5000
roi_height = 5000
roi_center_x = roi_x_min + 2500
roi_center_y = roi_y_max - 2500
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_austin = folium.Map(location=roi_center, zoom_start=14)
HeatMap(restaurant_latlons).add_to(map_austin)
folium.Marker(austin_center).add_to(map_austin)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_austin)
map_austin

Not bad - this nicely covers all the pockets of low restaurant density in East Cesar Chavez and Travis Heights closest to Austin's center.

Let's also create new, more dense grid of location candidates restricted to our new region of interest (let's make our location candidates 100m appart).

In [28]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100 * k 
roi_y_min = roi_center_y - 2500

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 2501):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

2261 candidate neighborhood centers generated.


OK. Now let's calculate two most important things for each location candidate: **number of restaurants in vicinity** (we'll use radius of **250 meters**) and **distance to closest Asian restaurant**.

In [29]:
def count_restaurants_nearby(x, y, restaurants, radius=250):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_asian_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, asian_restaurants)
    roi_asian_distances.append(distance)
print('done.')


Generating data on location candidates... done.


In [30]:
# Let's put this into dataframe
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Asian restaurant':roi_asian_distances})

df_roi_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Asian restaurant
0,30.263354,-97.723153,-6448750.0,13717630.0,1,254.706362
1,30.263775,-97.723553,-6448650.0,13717630.0,1,350.935206
2,30.260736,-97.721374,-6449300.0,13717710.0,0,290.310986
3,30.261157,-97.721774,-6449200.0,13717710.0,5,190.330057
4,30.261579,-97.722174,-6449100.0,13717710.0,6,90.391328
5,30.262,-97.722574,-6449000.0,13717710.0,6,10.130947
6,30.262421,-97.722973,-6448900.0,13717710.0,5,91.557764
7,30.262843,-97.723373,-6448800.0,13717710.0,5,191.475879
8,30.263264,-97.723773,-6448700.0,13717710.0,2,291.450168
9,30.263685,-97.724173,-6448600.0,13717710.0,1,327.802624


OK. Let us now **filter** those locations: we're interested only in **locations with no more than two restaurants in radius of 250 meters**, and **no Asian restaurants in radius of 400 meters**.

In [31]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())

good_ita_distance = np.array(df_roi_locations['Distance to Asian restaurant']>=400)
print('Locations with no Asian restaurants within 400m:', good_ita_distance.sum())

good_locations = np.logical_and(good_res_count, good_ita_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]


Locations with no more than two restaurants nearby: 1910
Locations with no Asian restaurants within 400m: 1567
Locations with both conditions met: 1542


Let's see how this looks on a map.

In [32]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_austin = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_austin)
HeatMap(restaurant_latlons).add_to(map_austin)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_austin)
folium.Marker(austin_center).add_to(map_austin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_austin) 
map_austin

Now we have a large number of locations fairly close to Austin's city center. Each of those locations has no more than two restaurants within 250 meters, and no Asian restaurant closer than 400 meters. Every one of these locations could be perfect for stakeholders, based solely on nearby competition.

However, we need to go further. For now, let's visualize our new data with another heatmap.

In [33]:
map_austin = folium.Map(location=roi_center, zoom_start=14)
HeatMap(good_locations, radius=25).add_to(map_austin)
folium.Marker(austin_center).add_to(map_austin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_austin)
map_austin

What we have now is a clear indication of zones with low number of restaurants in vicinity, and *no* Asian restaurants at all nearby.

It would be very difficult for stakeholders to go to every single address in these areas, so now we will **cluster** new zones based on the possible locations we've generated. The center of these clusters or zones will represent areas of very high interest to stakeholders, so the addresses of those centers will be our final deliverable.

Based on the data and location we have, we're going to use 10 clusters. This should provide us with 10 addresses that the stakeholders can assess.

In [34]:
from sklearn.cluster import KMeans

number_of_clusters = 10

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_austin = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_austin)
HeatMap(restaurant_latlons).add_to(map_austin)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_austin)
folium.Marker(austin_center).add_to(map_austin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_austin) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_austin)
map_austin

Our clusters represent groupings of most of the candidate locations and cluster centers are placed nicely in the middle of the zones 'rich' with location candidates.

Addresses of those cluster centers will be a good starting point for exploring the neighborhoods to find the best possible location based on neighborhood specifics.

Let's **reverse geocode those candidate area centers to get the addresses** which can be presented to stakeholders.

In [35]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_address(google_api_key, lat, lon).replace(', USA', '')
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, austin_center_x, austin_center_y)
    print('{}{} => {:.1f}km from City Center'.format(addr, ' '*(50-len(addr)), d/1000))
    

Addresses of centers of areas recommended for further analysis

2010 Canterbury St, Austin, TX 78702               => 3.9km from City Center
1208 Newning Ave, Austin, TX 78704                 => 3.0km from City Center
43 Rainey St, Austin, TX 78701                     => 2.1km from City Center
1301 N Interstate 35 Frontage Rd, Austin, TX 78741 => 4.4km from City Center
Ann and Roy Butler Hike and Bike Trail, Austin, TX 78741 => 3.3km from City Center
1405 Holly St, Austin, TX 78702                    => 2.8km from City Center
300 S Congress Ave, Austin, TX 78704               => 1.6km from City Center
1100 E 3rd St, Austin, TX 78702                    => 1.9km from City Center
1505 Alta Vista Ave, Austin, TX 78704              => 4.0km from City Center
Edward Rendon Sr. Park at Festival Beach in Town Lake Metropolitan Park, 2101 Jesse E. Segovia St, Austin, TX 78702 => 4.3km from City Center


We have found 10 addresses representing the centers of zones with a low number of restaurants and no Asian restaurants nearby, all of which are fairly close to city center (less than 5 km, many being much less than that). With this deliverable now ready, our analysis is finished.

#### Special Notes Regarding Cluster Addresses

Although zones are shown on map with a radius of ~500 meters (green circles), their shape is actually very irregular and their centers/addresses should be considered only as a starting point for exploring area neighborhoods in search for potential restaurant locations. Keep in mind our earlier analysis pointing out that these locations are centered around East Cesar Chavez and Travis Heights, which we noted due to their booming nature and close proximity to the city center. This should help give the search more narrow bounds.

Some of the locations might not be suitable for an Asian restaurant directly, such as parks or stadiums. Again, we must stress that these are just starting points that serve as the center of key areas of interest. While the park itself is unlikely to let stakeholders open a restaurant within its grounds, the area nearby must be teeming with good locations to have met our criteria.

In [36]:
map_austin = folium.Map(location=roi_center, zoom_start=14)
folium.Circle(austin_center, radius=50, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_austin)
for lonlat, addr in zip(cluster_centers, candidate_area_addresses):
    folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(map_austin) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.05).add_to(map_austin)
map_austin

## Results and Discussion <a name="results"></a>

Austin's city center is crowded with restaurants. However, due to how tightly packed the 400+ restaurants are in a 6km radius around the city center, there are numerous areas of opportunity that are still reasonably close to the city center. The highest concentration was, unsurprisingly, in the city center itself with other high density areas extending north, west and a bit directly east. This led us to focus our attention north-east, south, and south-east.

Due to the presence of the University of Texas on the north side of Austin, we decided it would be best to focus our efforts south and south-east. If stakeholders are particularly interested in working with a college campus, then further analysis should be directed north as we did identify potential locations in that direction.

In focusing on the south and south-east parts of the city, we found that many of our low-restaurant density areas fell in the communities of East Cesar Chavez and Travis Heights. Further analysis of those areas showed that they contained strong economies and diverse consumers, lending themselves to further analysis.

At this point, we essentially repeated our earlier process but with a smaller area of interest centered around East Cesar Chavez and Travis Heights. With a more detailed grid of location candidates (100m apart), we filtered out locations that did not meet our established criteria: no more than two restaurants within 250 meters or no Asian restaurant within 400 meters. If those conditions were met, the location was eliminated. This left us with approximately 1544 locations that were promising for stakeholders, based solely on nearby competition.

Of course, searching 1544 locations would be a monumental task and doesn't serve us well as a final deliverable. As such, we then clustered these locations into 10 areas of interest, each containing a large number of potential quality locations. The addresses of the centers of these clusters were retrieved using reverse geocoding, giving us a much more manageable number of addresses: 10.

Again, it should be stressed that these 10 addresses are simply starting points for further analysis. They represent the center of areas that should be of high interest to stakeholders.

It should also be noted that this project solely looked at location data. While the provided addresses meet our criteria and do merit further investigation, it is very possible that these areas are relatively restaurant-free for a reason. Perhaps they are major residential or industrial zones or maybe the communities there do not like Asian food. Sadly, location data is only one part of the equation when determining the best possible spot to open a new restaurant. Other information should be collected such as community interests and average income in order to arm stakeholders with all the knowledge they'd need to open a successful restaurant.

That said, this study has hopefully provided a good starting point with which stakeholders can use to narrow down their search.

## Conclusion <a name="conclusion"></a>

This project was conducted with the end goal of finding the best location to open an Asian restaurant in Austin, Texas. 

We determined that solving this problem would require identifying areas in Austin with a low number of restaurants, especially Asian restaurants, that are relatively close to the city center. Using the Google Maps API and the Foursquare API, we split the city into neighborhoods and determined restaurant density in each of those areas. We eventually narrowed this data down to areas in two promising districts in Austin, East Cesar Chavez and Travis Heights. Because there were so many potential locations of interest, we clustered the data to provide 10 major zones of interest and the addresses of their center points.

Based on this study alone, it can be concluded that there is ample opportunity to open an Asian restaurant in Austin at a prime location with minimal competition.

However, the final decision for the best restaurant location should be made by stakeholders using the information we provided alongside other criteria that they desire. Location is only a single factor, so additional information should be gathered on real estate availability, prices, average consumer income, community interest, and general location appeal.