# **What is the best Tucson neighborhood for Embassy Tire to expand to next?**
#### Authored by Jon Ingram

## Introduction
Embassy Tire & Wheel Company is a growing, family-owned business in the Tucson Metro Area. Starting from a single location in 2004, it is now expanding to a third location set to open in the summer of 2019. The positioning of the three current locations provide a decent range of coverage of the region. As a growing business, they most certainly aren't planned to settle on just three locations. After the third location opens and business starts to pick up, the company is sure to look to expand once again. But where to?

The report seeks to answer the question of "given the population and tire shop distribution of the Tucson Metro Area, where would be the best neighborhood(s) to look to expand into?"

## Data 
The types of data that are required to execute this idea are mostly population data and location data. Specifically, it requires data like the neighborhoods of Tucson, their populations, population densities, and the locations of all neighborhoods and tire shops in the Tucson metro area.

The data for the populations and population densities of the top 50 populated neighborhoods in Tucson are publically available at the following Statistical Atlas website: https://statisticalatlas.com/place/Arizona/Tucson/Population. The neighborhood names of the 50 most populated neighborhoods, their populations, their rank in population, their population densities, and their rank in population densities will be ripped directly from the website. This will be accomplished using the Beautiful Soup library. 

The population data as well as the Statistical Atlas links for each neighborhood will be collected into a single DataFrame for ease of use. This DataFrame will also contain an Area (in square miles) column calculated by dividing Population (number of people) by Population Density (number of people per square mile) for each neighborhood.

The geographical locations of these neighborhoods will be retrieved using OpenStreetMap data obtained from the geocoder library. Locations which are not retrieved using this method will be found manually using the Google search engine.

The acquisition of tire shop location data is a process in several steps. First, the zip codes of major postal area codes in Tucson will be taken manually from the following Statistical Atlas website: https://statisticalatlas.com/place/Arizona/Tucson/Overview. Their geographical coordinates do not appear in OpenStreetMap so their coordinates will be acquired manually using the Google Search engine. Finally, these locations will be used as the basis for a series of Foursquare API search queries (with keyword 'tire') to collect a comprehensive list of tire shops in the Tucson Metro Area. The IDs, Names, and Location in coordinates will be taken from the results of the queries and be compiled into a pandas DataFrame.

## Data Collection
Step 1: 
- Import the required libraries (pandas, nump, requests, and beautifulsoup4)

In [9]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

Step 2: 
- Run a simple html script to remove 'Run This Cell' buttons from the display of the Notebook
- Not necessary to do but these buttons tend to be visually distracting when editing

In [222]:
%%html
<style>
.code_cell .run_this_cell {
    display: none;
}
</style>

Step 3:
- Collect the initial data from the Statistical Atlas website
    - Retrieve the html data of the website
    - Create a Beautiful Soup object using the html text
    - Define a function to rip the data from the formatted tables on the site
    - Use the function to create a pandas DataFrame for the population counts
    - Repeat the process for the population densities
    - Merge the resulting DataFrames

In [10]:
tucson_raw = requests.get("https://statisticalatlas.com/place/Arizona/Tucson/Population").text

soup = BeautifulSoup(tucson_raw, 'lxml')

In [11]:
def getPopulationDataFrame(div_id):
    total_pop_raw = soup.find('div', id=div_id).find('div', class_='figure-contents')
    
    neighborhood_links = []
    neighborhood_names = []
    neighborhood_pops = []
    pop_ranks = []
    
    for raw_link in total_pop_raw.find_all('a'):
        neighborhood_links.append('{}{}'.format('https://statisticalatlas.com', raw_link['xlink:href']))
        neighborhood_names.append(raw_link.title.text.split(" Neighborhood")[0])
    
    for raw_g in total_pop_raw.find_all('g')[3:]:
        neighborhood_pops.append(float(raw_g.title.text[:-4].replace(',','')))
    
    for raw_text in total_pop_raw.find_all('text', {'fill-opacity':'0.500'}):
        pop_ranks.append(raw_text.text)
    
    df_pop = pd.DataFrame()
    df_pop['Link'] = neighborhood_links
    df_pop['Neighborhood'] = neighborhood_names
    df_pop['Population'] = neighborhood_pops
    df_pop['Population Rank'] = pop_ranks
    
    return df_pop

In [12]:
df_tuc_pop = getPopulationDataFrame(div_id='figure/neighborhood/total-population')
df_tuc_pop.head()

Unnamed: 0,Link,Neighborhood,Population,Population Rank
0,https://statisticalatlas.com/neighborhood/Ariz...,Casas Adobes,52552.0,1
1,https://statisticalatlas.com/neighborhood/Ariz...,Drexel Heights,28627.0,2
2,https://statisticalatlas.com/neighborhood/Ariz...,Tanque Verde,17718.0,3
3,https://statisticalatlas.com/neighborhood/Ariz...,Rita Ranch,16282.0,4
4,https://statisticalatlas.com/neighborhood/Ariz...,Flowing Wells,15667.0,5


In [13]:
df_tuc_dens = getPopulationDataFrame(div_id='figure/neighborhood/population-density')
df_tuc_dens.drop('Neighborhood', axis=1, inplace=True)
df_tuc_dens.columns = ['Link', 'Density', 'Density Rank']
df_tuc_dens.head()

Unnamed: 0,Link,Density,Density Rank
0,https://statisticalatlas.com/neighborhood/Ariz...,11246.94,1
1,https://statisticalatlas.com/neighborhood/Ariz...,9082.08,2
2,https://statisticalatlas.com/neighborhood/Ariz...,8598.79,3
3,https://statisticalatlas.com/neighborhood/Ariz...,7996.36,4
4,https://statisticalatlas.com/neighborhood/Ariz...,7575.59,5


In [14]:
#Create the merged Dataframe od all population data
tucson_pop_data = pd.merge(df_tuc_pop, df_tuc_dens, how='left', on='Link')

#There are two distinct neighborhoods both names 'Flowing Wells'
#Change the western most 'Flowing Wells' to 'West Flowing Wells'
tucson_pop_data.set_index('Link', inplace=True)
temp = tucson_pop_data['Neighborhood']
temp['https://statisticalatlas.com/neighborhood/Arizona/Tucson/Flowing-Wells/Population'] = 'West Flowing Wells'
tucson_pop_data['Neighborhood'] = temp
tucson_pop_data.reset_index(inplace=True)

#Add 'Area' column
tucson_pop_data['Area'] = tucson_pop_data['Population']/tucson_pop_data['Density']

#Move the 'Link' column to the right-most side of the DataFrame
columns = tucson_pop_data.columns.values
columns = np.append(columns[1:], [columns[0]])
tucson_pop_data = tucson_pop_data[columns]

#Make 'Neighborhood' column the index
tucson_pop_data.set_index('Neighborhood', inplace=True)

tucson_pop_data.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0_level_0,Population,Population Rank,Density,Density Rank,Area,Link
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Casas Adobes,52552.0,1,2437.98,41,21.55555,https://statisticalatlas.com/neighborhood/Ariz...
Drexel Heights,28627.0,2,1452.47,46,19.709185,https://statisticalatlas.com/neighborhood/Ariz...
Tanque Verde,17718.0,3,541.25,50,32.735335,https://statisticalatlas.com/neighborhood/Ariz...
Rita Ranch,16282.0,4,1044.09,47,15.594441,https://statisticalatlas.com/neighborhood/Ariz...
Flowing Wells,15667.0,5,4493.78,28,3.486374,https://statisticalatlas.com/neighborhood/Ariz...


Step 4: 
- Collect neighborhood location data using geocoder
    - Import `geocoder` library
    - Use geocoder to collect lists of latitudes and longitudes
    - Add found latitudes and longitudes to main DataFrame

In [15]:
import geocoder

In [8]:
lats = []
longs = []
limit = 20

for neigh in tucson_pop_data.index.values:
    i = 0
    coords = None
    while(coords is None and i < limit):
        g = geocoder.osm('{}, Tucson, AZ'.format(neigh))
        coords = g.latlng
        i += 1
    if coords is None:
        lats.append(0.0)
        longs.append(0.0)
        print('None')
    else:
        lats.append(coords[0])
        longs.append(coords[1])
        print(coords)

[32.2489081, -111.1166957]
[32.1412491, -111.0284428]
[32.264927, -110.736165323875]
[32.10366625, -110.769504742694]
[32.28612725, -111.026304046398]
[32.2218917, -110.9262353]
[32.1337024, -110.9989566]
[32.28612725, -111.026304046398]
[32.2305549, -110.9481365]
[32.1267081, -110.9775837]
None
[32.20009585, -110.831061637474]
[32.2804351, -110.952442712677]
[32.2318048, -110.9717815]
None
[32.21308835, -110.919030425972]
[32.2432093, -110.9180065]
[32.18634545, -110.817170419797]
None
None
None
[32.2022876, -110.884957218835]
[32.0670238, -110.9514796]
[32.2176772, -110.986486558684]
[32.228764, -110.935261723233]
[32.2572379, -110.966488967937]
[32.1847991, -110.9712029]
None
[32.23176725, -110.96272684997]
None
None
[32.236636, -110.8302939]
None
None
None
[32.26123755, -110.952414053388]
[32.28336425, -110.971896724784]
[32.1674295, -110.9958145]
[32.2325577, -110.9160918]
[32.0967117, -110.7727045]
[32.2103597, -110.9922659]
[32.1696124, -110.951467492503]
None
[32.2555313, -110.

In [16]:
tucson_pop_data['Latitude'] = lats
tucson_pop_data['Longitude'] = longs
tucson_pop_data

Unnamed: 0_level_0,Population,Population Rank,Density,Density Rank,Area,Link,Latitude,Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Casas Adobes,52552.0,1,2437.98,41,21.55555,https://statisticalatlas.com/neighborhood/Ariz...,32.248908,-111.116696
Drexel Heights,28627.0,2,1452.47,46,19.709185,https://statisticalatlas.com/neighborhood/Ariz...,32.141249,-111.028443
Tanque Verde,17718.0,3,541.25,50,32.735335,https://statisticalatlas.com/neighborhood/Ariz...,32.264927,-110.736165
Rita Ranch,16282.0,4,1044.09,47,15.594441,https://statisticalatlas.com/neighborhood/Ariz...,32.103666,-110.769505
Flowing Wells,15667.0,5,4493.78,28,3.486374,https://statisticalatlas.com/neighborhood/Ariz...,32.286127,-111.026304
Sunnyside,15483.0,6,5784.51,20,2.676631,https://statisticalatlas.com/neighborhood/Ariz...,32.221892,-110.926235
Midvale Park,13635.0,7,3617.7,35,3.768969,https://statisticalatlas.com/neighborhood/Ariz...,32.133702,-110.998957
West Flowing Wells,9553.0,8,2080.07,43,4.592634,https://statisticalatlas.com/neighborhood/Ariz...,32.286127,-111.026304
Cherry Avenue,9453.0,9,7575.59,5,1.247824,https://statisticalatlas.com/neighborhood/Ariz...,32.230555,-110.948137
Elvira,8876.0,10,5594.56,21,1.586541,https://statisticalatlas.com/neighborhood/Ariz...,32.126708,-110.977584


Step 5:
- Find rows in DataFrame that are missing coordinates data
- Collect and data manually
- Import data into main DataFrame from temporary DataFrame of manually collected data

In [17]:
missing_loc_data = tucson_pop_data.loc[tucson_pop_data['Latitude']==0.0, 'Latitude':'Longitude']
missing_loc_data

Unnamed: 0_level_0,Latitude,Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Amphi,0.0,0.0
South Harrison,0.0,0.0
Broadway Pantano East,0.0,0.0
Groves Lincoln Park,0.0,0.0
Julia Keen,0.0,0.0
Westside Development,0.0,0.0
Terra Del Sol,0.0,0.0
Stella Mann,0.0,0.0
Drexel-Alvernon,0.0,0.0
Rancho Buena,0.0,0.0


In [18]:
#amphi 32.272222, -110.972222
#south harrison 32.1869, -110.7867
#b p east 32.2208, -110.8233
#groves lincoln park 32.1699, -110.8225
#julia keen 32.19944, -110.918055
#westside dev 32.1903, -111.0192
#terra del sol 32.1984, -110.8464
#stella mann 32.1842438, -110.8453649
#drexel alvernon 32.1447, -110.8985
#rancho buena 32.1561, -110.9301
#harrison east-south 32.1988, -110.7808
#las vistas 32.1840, -110.9298
#dodge flower 32.2533, -110.9134

new_lats = [32.272222,32.1869,32.2208,32.1699,32.19944,32.1903,32.1984,32.1842438,32.1447,32.1561,32.1988,32.1840,32.2533]
new_longs = [-110.972222,-110.7867,-110.8233,-110.8225,-110.918055,-111.0192,-110.8464,-110.8453649,-110.8985,-110.9301,-110.7808,-110.9298,-110.9134]

missing_loc_data['Latitude'] = new_lats
missing_loc_data['Longitude'] = new_longs

for neigh in missing_loc_data.index.values:
    tucson_pop_data.at[neigh,'Latitude'] = missing_loc_data.at[neigh,'Latitude']
    tucson_pop_data.at[neigh,'Longitude'] = missing_loc_data.at[neigh,'Longitude']

tucson_pop_data

Unnamed: 0_level_0,Population,Population Rank,Density,Density Rank,Area,Link,Latitude,Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Casas Adobes,52552.0,1,2437.98,41,21.55555,https://statisticalatlas.com/neighborhood/Ariz...,32.248908,-111.116696
Drexel Heights,28627.0,2,1452.47,46,19.709185,https://statisticalatlas.com/neighborhood/Ariz...,32.141249,-111.028443
Tanque Verde,17718.0,3,541.25,50,32.735335,https://statisticalatlas.com/neighborhood/Ariz...,32.264927,-110.736165
Rita Ranch,16282.0,4,1044.09,47,15.594441,https://statisticalatlas.com/neighborhood/Ariz...,32.103666,-110.769505
Flowing Wells,15667.0,5,4493.78,28,3.486374,https://statisticalatlas.com/neighborhood/Ariz...,32.286127,-111.026304
Sunnyside,15483.0,6,5784.51,20,2.676631,https://statisticalatlas.com/neighborhood/Ariz...,32.221892,-110.926235
Midvale Park,13635.0,7,3617.7,35,3.768969,https://statisticalatlas.com/neighborhood/Ariz...,32.133702,-110.998957
West Flowing Wells,9553.0,8,2080.07,43,4.592634,https://statisticalatlas.com/neighborhood/Ariz...,32.286127,-111.026304
Cherry Avenue,9453.0,9,7575.59,5,1.247824,https://statisticalatlas.com/neighborhood/Ariz...,32.230555,-110.948137
Elvira,8876.0,10,5594.56,21,1.586541,https://statisticalatlas.com/neighborhood/Ariz...,32.126708,-110.977584


Step 6:
- Collect tire shop location data
    - Collect zip code locations
        - Acquire list of zip codes from Statistical Atlas website
        - Search for coordinate data manually
        - Import data into intermediate DataFrame
    - Collect tire shop data using Foursquare API
        - Define Foursquare required variables such as `CLIENT_ID` and `CLIENT_SECRET`
        - Loop through zip code coordinates from intermediate DataFrame and put relevant data in lists
        - Merge data lists to single DataFrame
        - Use Foursquare unique venue IDs to remove all duplicates from trie shop DataFrame
- Acquire Embassy Tire locations from Foursquare data for later use

In [24]:
#85701 = 32.2217558,-110.9761537
#85705 = 32.2885088,-110.9760925
#85745 = 32.2344873,-111.0347749
#85713 = 32.1888037,-110.9866533
#85719 = 32.2097405,-110.9455327
#85716 = 32.2098592,-110.922933
#85712 = 32.2548859,-110.9047567
#85715 = 32.251459,-110.8395991
#85711 = 32.2236711,-110.8848073
#85710 = 32.2165714,-110.8217179
#85748 = 32.2064291,-110.6905959
#85714 = 32.16151,-110.9142931
#85707 = 32.1781922,-110.8872836
#85730 = 32.1772587,-110.7978765
#85706 = 32.1485701,-110.9647741
#85747 = 32.0446072,-110.7856052

tucson_major_zips = [85701, 85705, 85745, 85713, 85719, 85716, 85712, 85715, 85711, 85710, 85748, 85714, 85707, 85730, 85706, 85747]
zip_lats = [32.2217558, 32.2885088, 32.2344873, 32.1888037, 32.2097405, 32.2098592, 32.2548859, 32.251459, 32.2236711, 32.2165714, 32.2064291, 32.16151, 32.1781922, 32.1772587, 32.1485701, 32.0446072]
zip_longs = [-110.9761537, -110.9760925, -111.0347749, -110.9866533, -110.9455327, -110.922933, -110.9047567, -110.8395991, -110.8848073, -110.8217179, -110.6905959, -110.9142931, -110.8872836, -110.7978765, -110.9647741, -110.7856052]

tucson_zip_data = pd.DataFrame({'Zip Code':tucson_major_zips, 'Latitude':zip_lats, 'Longitude':zip_longs})
tucson_zip_data.sort_values(by=['Zip Code'], ascending=True, inplace=True)
tucson_zip_data.reset_index(drop=True, inplace=True)
tucson_zip_data

Unnamed: 0,Zip Code,Latitude,Longitude
0,85701,32.221756,-110.976154
1,85705,32.288509,-110.976092
2,85706,32.14857,-110.964774
3,85707,32.178192,-110.887284
4,85710,32.216571,-110.821718
5,85711,32.223671,-110.884807
6,85712,32.254886,-110.904757
7,85713,32.188804,-110.986653
8,85714,32.16151,-110.914293
9,85715,32.251459,-110.839599


In [26]:
CLIENT_ID = '' #removed for security
CLIENT_SECRET = '' #removed for security
VERSION = '20180604'
FOUR_SQ_LIMIT = 50

In [62]:
#Fill rest of foursquare URI related variables
FOUR_SQ_RADIUS = 2000 #approximately 1 mile
QUERY = 'tire'

#Define lists to store relevant data
ids = []
names = []
shop_lats = []
shop_longs = []

#Iterate through zip code data and fills lists with relevant foursquare data
for index, row in tucson_zip_data.iterrows():
    #Generate URI
    uri = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&query={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, row['Latitude'], row['Longitude'], QUERY,FOUR_SQ_LIMIT)
    
    #Rip raw JSON file from URI
    raw_four_sq = requests.get(uri).json()
    
    #Iterate through all venues and then add the relevant data to the appropriate lists
    for venue in raw_four_sq['response']['venues']:
        ids.append(venue['id'])
        names.append(venue['name'])
        shop_lats.append(venue['location']['lat'])
        shop_longs.append(venue['location']['lng'])
        
#Add all lists to a single dataframe
tucson_shop_data = pd.DataFrame({'ID':ids, 'Name':names, 'Latitude':shop_lats, 'Longitude':shop_longs})

#Print the shape of the dataframe and the first 5 rows
print(tucson_shop_data.shape)
tucson_shop_data.head()

(800, 4)


Unnamed: 0,ID,Name,Latitude,Longitude
0,5555904d498e2f2c61b12d17,Phil's Fleet Tire Service Inc.,32.214113,-110.944973
1,5afe24bb345cbe002cd224f4,Discount Tire,32.187005,-110.943963
2,4c2d09bd260bc928b25119d3,Discount Tire,32.275145,-110.977893
3,57adfb19498e812df760d199,Jack Furrier Tire & Auto Care,32.261595,-110.960618
4,5334785811d2caf0f822559e,Discount Tire,32.153825,-110.987217


In [68]:
#Drop all reoccurences of location data from the data set
#Uses foursquare venue ids as a primary key
tucson_shop_data.drop_duplicates(subset='ID', keep='first', inplace=True)

#Reset the indicies of the dataframe
tucson_shop_data.reset_index(drop=True, inplace=True)

#Reprint the shape of the dataframe and the first 5 rows
print(tucson_shop_data.shape)
tucson_shop_data.head()

(97, 4)


Unnamed: 0,ID,Name,Latitude,Longitude
0,5555904d498e2f2c61b12d17,Phil's Fleet Tire Service Inc.,32.214113,-110.944973
1,5afe24bb345cbe002cd224f4,Discount Tire,32.187005,-110.943963
2,4c2d09bd260bc928b25119d3,Discount Tire,32.275145,-110.977893
3,57adfb19498e812df760d199,Jack Furrier Tire & Auto Care,32.261595,-110.960618
4,5334785811d2caf0f822559e,Discount Tire,32.153825,-110.987217


In [69]:
embassy_coords = tucson_shop_data.loc[tucson_shop_data['Name'].str.contains('Embassy'),:]
embassy_coords

Unnamed: 0,ID,Name,Latitude,Longitude
6,4e24a307d16474063225f037,Embassy Tire & Wheel Company,32.272466,-110.988764
62,4f83603ce4b03e850b0a0c6a,Embassy Tire & Wheel Co.,32.133841,-110.968153


## Data Analysis

Step 1: 
- Import Python's `math` library
- Define important constants
    - `LAT_CONSTANT`: latitude degrees per mile
    - `LONG_CONSTANT`: longitude degrees per mile
- Define important functions
    - `getNeighborhoodRadius`
        - Use Area of neighborhood to return a radius suitiable for check if a tire shop is in the neighborhood or not
        - Radius returned must have a minimum value of 1.5
        - NOTE: Radius is in miles
    - `latDistanceInMiles`
        - Returns difference of latitudes 'a' and 'b' in miles
    - `longDistanceInMiles`
        - Returns difference of longitudes 'a' and 'b' in miles
    - `modifiedEuclidistance`
        - 'modified' because function accepts two pre-calculated differences into the formula
        - Returns euclidean distance of two points using pre-calculated differences
    - `getShopCounts`
        - Given the name of a neighborhood get the search radius and coordinates of that neighborhood
        - Iterate through the list of tire shops
            - Calculate the euclidean distance of each shop with the given neighborhood
            - Increase shop count if shop is inside the search radius of the given neighborhood
        - Return the shop count for that neighborhood

In [109]:
#1 latitude degree = .01456 miles
#1 longitude degree = .01446 miles

#radius of neighborhood (miles): sqrt(area)/1.5
#distance in miles: (a-b)*conversion_constant
#euclidean distance (2 points): sqrt(pow((a-b),2) + pow((c-d), 2))
#contains formula: if (euclidean distance of (distances of lats and longs in miles)) < radius_neighborhood, True, else, False

import math

LAT_CONSTANT = .01456
LONG_CONSTANT = .01446

def getNeighborhoodRadius(neigh):
    if math.sqrt(tucson_pop_data.at[neigh, 'Area']) < 1.5:
        return 1.5
    else:
        return math.sqrt(tucson_pop_data.at[neigh, 'Area'])

def latDistanceInMiles(a, b):
    return (a-b)/LAT_CONSTANT

def longDistanceInMiles(a, b):
    return (a-b)/LONG_CONSTANT

def modifiedEuclidistance(diffA, diffB): #'modified' because function accepts two pre-calculated differences into the formula
    return math.sqrt(math.pow(diffA, 2) + math.pow(diffB, 2))

def getShopCount(neigh):
    count = 0
    neigh_radius = getNeighborhoodRadius(neigh)
    
    #for each tire shop in the shop data dataframe
    for index, shop in tucson_shop_data.iterrows():
        #calculate the latitude and longitude raw distances in miles
        lat_dist = latDistanceInMiles(float(tucson_pop_data.at[neigh, 'Latitude']), shop['Latitude'])
        long_dist = longDistanceInMiles(tucson_pop_data.at[neigh, 'Longitude'], shop['Longitude'])
        
        #get the euclidean distance of the coordinates
        euclidistance = modifiedEuclidistance(lat_dist, long_dist)
        #print('{}: {}, {}'.format(index, euclidistance, neigh_radius))
        
        #check if euclidean distance is in radius or not, increase 'count' if True
        if euclidistance < neigh_radius:
            count += 1
    
    return count

Step 2:
- Use `shop_counts` method to calculate the number of shops in or close to each neighborhood
- Put those values into a new DataFrame

In [110]:
shop_counts = []

for index, row in tucson_pop_data.iterrows():
    shop_counts.append(getShopCount(index))
    
df_counts = pd.DataFrame({'Neighborhood':tucson_pop_data.index.values, 'Number of Shops':shop_counts})
df_counts.head()

Unnamed: 0,Neighborhood,Number of Shops
0,Casas Adobes,0
1,Drexel Heights,10
2,Tanque Verde,0
3,Rita Ranch,4
4,Flowing Wells,1


Step 3:
- Build a DataFrame of shop count to 'Shop Count Rank' conversion using the unique values of the shop sount DataFrame
- Add a column to the shop count DataFrame with the correct 'Shop Count Rank' for each neighborhood

In [130]:
#get the rank of number of tire shops per neighborhood
rank_values = pd.DataFrame({'Count':list(set(df_counts['Number of Shops'].values))}).sort_values(by='Count', ascending=False, axis=0)
rank_values.reset_index(drop=True, inplace=True)
rank_values.index.name = 'Rank'
rank_values.reset_index(drop=False, inplace=True)
rank_values['Rank'] = rank_values['Rank']+1
rank_values.set_index('Count', inplace=True)
print(rank_values)

       Rank
Count      
10        1
9         2
8         3
7         4
6         5
5         6
4         7
3         8
2         9
1        10
0        11


In [133]:
shop_ranks = []

for index, row in df_counts.iterrows():
    shop_ranks.append(rank_values.at[row['Number of Shops'], 'Rank'])
    
df_counts['Shop Count Rank'] = shop_ranks
df_counts.head(15)

Unnamed: 0,Neighborhood,Number of Shops,Shop Count Rank
0,Casas Adobes,0,11
1,Drexel Heights,10,1
2,Tanque Verde,0,11
3,Rita Ranch,4,7
4,Flowing Wells,1,10
5,Sunnyside,4,7
6,Midvale Park,3,8
7,West Flowing Wells,1,10
8,Cherry Avenue,1,10
9,Elvira,3,8


Step 4:
- Compile all of the statistical ranks for each neighborhood into a new DataFrame
- Append an 'Optimal' neighborhood row as a metric to measure the quality of each neighborhood
    - Optimally, a neighborhood would have a Population and Density Rank of 1 and a Shop Count of 11
    - This means that there is high population and low tire shop competition (perfect for expansion!)

In [180]:
tucson_ranks = tucson_pop_data.reset_index()
tucson_ranks = tucson_ranks[['Neighborhood', 'Population Rank', 'Density Rank', 'Population', 'Density']]
tucson_ranks = pd.merge(tucson_ranks, df_counts, how='left', on='Neighborhood')
tucson_ranks = tucson_ranks[['Neighborhood', 'Population Rank', 'Density Rank', 'Shop Count Rank']]
tucson_ranks.set_index('Neighborhood')
tucson_ranks = tucson_ranks.append({'Neighborhood':'Optimal', 'Population Rank':1, 'Density Rank':1, 'Shop Count Rank':11}, ignore_index=True)
tucson_ranks.tail(5)

Unnamed: 0,Neighborhood,Population Rank,Density Rank,Shop Count Rank
46,Arroyo Chico,47,33,8
47,Barrio Hollywood,48,16,11
48,Dodge Flower,49,1,6
49,Civano,50,49,10
50,Optimal,1,1,11


Step 5:
- Calculate the 'Overall Rank' for each neighborhood
    - Find the euclidean distance of each neighborhood to the Optimal neighborhood
    - Add all distances to the DataFrame of ranks
    - Sort all rows ascending by 'Distance From Optimal'
    - Add the 'Overall Rank' as a column to the final ranks DataFrame

In [182]:
distances = []
optimal_rank = rank_values.at[0, 'Rank']

for index, row in tucson_ranks.iterrows():
    distances.append(math.sqrt(math.pow(float(row['Population Rank'])-1, 2) + math.pow(float(row['Density Rank'])-1, 2) + math.pow(float(row['Shop Count Rank'])-optimal_rank, 2)))
    
tucson_ranks['Distance From Optimal'] = distances
tucson_ranks.sort_values(by='Distance From Optimal', inplace=True)

#remove 'Optimal' row and clean up dataframe
tucson_ranks.reset_index(drop=True, inplace=True)
tucson_ranks.index.name = 'Overall Rank'
tucson_ranks.drop([0], inplace=True)
tucson_ranks.reset_index(inplace=True)
tucson_ranks.set_index('Neighborhood', inplace=True)
tucson_ranks.head(15)

Unnamed: 0_level_0,Overall Rank,Population Rank,Density Rank,Shop Count Rank,Distance From Optimal
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Cherry Avenue,1,9,5,10,9.0
Amphi,2,11,6,1,15.0
Campus Farm,3,13,14,8,17.944358
Garden District,4,14,13,8,17.944358
Sunnyside,5,6,20,7,20.049938
Corbett,6,16,15,10,20.542639
Elvira,7,10,21,8,22.135944
Lakeside Park,8,18,17,9,23.430749
Myers,9,22,18,10,27.037012
Palo Verde,10,17,22,5,27.073973


## Results

Let's start by looking at the top ten neighborhoods in terms of their overall rank.

In [183]:
top10 = tucson_ranks.index.values[0:10]
top10

array(['Cherry Avenue', 'Amphi', 'Campus Farm', 'Garden District',
       'Sunnyside', 'Corbett', 'Elvira', 'Lakeside Park', 'Myers',
       'Palo Verde'], dtype=object)

Neat! Looks like Cherry Avenue, Amphi, and Campus Farm are some top contenders. For a better look at our results, though, let's plot out our top ten neighborhoods and compare them to the locations of Tucson's tire shops as well as the distance of the neighborhoods to the three existing Embassy Tire & Wheel locations.

In [184]:
import folium

Now that `folium` library is imported, the locations of the three Embassy Tire locations need to be compiled. We will use the coordinates from Foursquare acquired before as well as coordinates for the third location found on Google using it's address of 1431 S Kolb.

In [239]:
#coordinates of the soon-to-be 3rd location at 1431 S Kolb: 32.2037278,-110.8427026
embassy_data = embassy_coords.append({'ID':'Not Availible', 'Name':'Embassy Tire @ Kolb', 'Latitude':32.2037278, 'Longitude':-110.8427026}, ignore_index=True)
embassy_data = embassy_data.drop('ID', axis=1)
embassy_data['Name'] = ['Embassy Tire @ Prince', 'Embassy Tire @ Valencia', 'Embassy Tire @ Kolb']
embassy_data

Unnamed: 0,Name,Latitude,Longitude
0,Embassy Tire @ Prince,32.272466,-110.988764
1,Embassy Tire @ Valencia,32.133841,-110.968153
2,Embassy Tire @ Kolb,32.203728,-110.842703


Finally, using all of the acquired location data, let's plot this on an interactive Leaflet map. The map will be centered around a moving services company determined by trial and error to give the best view for all of the data. The map will also include three types of markers:
- Yellow Circle Markers representing tire shop locations
- Red and Black Circle Markers representing the Embassy Tire & Wheel locations
- Point Markers representing the ten best neighborhoods for expansion
    - These markers include popup labels with the neighborhood name and that neighborhood's Overall Rank

In [254]:
#coordinates of University of Arizona's Old Main Building as a reference: 32.2319° N, 110.9534° W
#coordinates of 'Moving Services Inc.' on 17th Street just South of Aviation Parkway: 32.2134379° N, 110.9578382° W

world_map = folium.Map(location=[32.2134379,-110.9578382], zoom_start=11, tiles="OpenStreetMap", height=500)

#Add the locations of all tire shops in data set as Circle Markers to the map
tire_shops = folium.map.FeatureGroup()
for lat, long in zip(tucson_shop_data.Latitude, tucson_shop_data.Longitude):
    tire_shops.add_child(
        folium.vector_layers.CircleMarker(
            [lat, long],
            radius=5,
            color='yellow',
            fill=True,
            fill_color='yellow',
            fill_opacity=0.5
        )
    )
world_map.add_child(tire_shops)

#Add the top 10 neighborhoods to the map as Markers that, when clicked,
#display the neighborhood name and that neighborhood's overall rank
for neigh in top10:
    lat = tucson_pop_data.at[neigh, 'Latitude']
    long = tucson_pop_data.at[neigh, 'Longitude']
    label = '{}\nRank: {}'.format(neigh, tucson_ranks.at[neigh, 'Overall Rank'])
    
    folium.Marker(
        [lat, long],
        popup=label
    ).add_to(world_map)
    
#Finally, add Circle Markers to represent the locations of the 3 current and upcoming Embassy Tire & Wheel locations
embassys = folium.map.FeatureGroup()
for index, row in embassy_data.iterrows():
    tire_shops.add_child(
        folium.vector_layers.CircleMarker(
            [row['Latitude'], row['Longitude']],
            radius=5,
            color='black',
            fill=True,
            fill_color='red',
            fill_opacity=0.8,
            popup=row['Name']
        )
    )
world_map.add_child(tire_shops)

world_map

## Discussion

Of our top 10 neighborhoods, some of them have to be ruled out due to their proximity to the current three Embassy Tire & Wheel locations and other shops. For example, using the generated map as a reference, Amphi places 2nd overall as a very ideal candidate but is only about a mile away from the location at Prince. As a small, yet, growing business, opening a new shop a mile away from a current location is a poor idea. So, Amphi must be ruled out as a potential neighborhood for expansion.

It is also important to note that just the population, population density and tire shop count alone may not be enough to give the best objective suggestion in selecting Tucson neighborhoods for expansion. Other factors may include Average Individual Income, Average Household Income, Household Types, and Industries in each area. This is just a non-comprehensive list of additional factors which are potentially important to look into but all of the data in that list, like the Population data, is also available on Statistical Atlas.

With the brevity of the dataset in mind, an interesting observation is the proximity of some of the top ten neighborhoods to the locations at Prince and Valencia (the locations on the Western side of the city). The success of the Prince and Valencia locations is what has allowed the company to continue to expand. While these close neighborhoods (namely: Amphi, Campus Farm, and Elvira) need to be ruled out with regards to new expansions, the fact that such high ranking neighborhoods (which include neighborhoods ranked 2 and 3) are geographically close to already successful locations suggests that the current algorithm used to generate the list of top neighborhoods can imply future success in expansion into other top neighborhoods.

## Conclusion

Using the generated map as a reference, it is determined that Cherry Avenue, Sunnyside, and Corbett are the ideal Tucson neighborhoods to look to expand into. Placed near the center of town, they all rank high statistically (ranked 1, 5, and 6, respectively) and are effectively equidistant from the current three Embassy Tire & Wheel locations. The map also indicates that the center of town is fairly lacking in tire shops in general which would also make this region great for expansion.

For information on the location data, demographics and more on those neighborhoods, the links to their Statistical Atlas websites will be provided below.

Thank you for reading!

In [253]:
pd.set_option('display.max_colwidth', -1)
tucson_pop_data.loc[['Cherry Avenue', 'Sunnyside', 'Corbett']]['Link']

Neighborhood
Cherry Avenue    https://statisticalatlas.com/neighborhood/Arizona/Tucson/Cherry-Avenue/Population
Sunnyside        https://statisticalatlas.com/neighborhood/Arizona/Tucson/Sunnyside/Population    
Corbett          https://statisticalatlas.com/neighborhood/Arizona/Tucson/Corbett/Population      
Name: Link, dtype: object