# Michelle's Applied Data Science Capstone Project - The Battle of Neighborhoods

## Description of the problem and a discussion of the background.

### Opening of a new Pet Supply Store in or near Austin, Texas.

A friend is looking for the best place around Austin, Texas to open a new family owned Pet Supply Store.
Austin and the surrounding suburbs (Cedar Park, Georgetown, Round Rock, etc) are generally known to be pet friendly. There
are several "large box store" pet supply places already, but the friend feels that providing a more personalized experience and 
potentially carrying more unique items/foods in addition to the most popular items at the box store places, he has the potential
to bring in the needed customer base to do well.

The friend is looking for the overall best areas near Austin to place his store where there is not already a large population of other pet stores.
We will limit the data to the top 5 best choice locations.

## Description of the data and how it will be used to solve the problem.

### Data Used

Austin is surrounded by several suburbs (Cedar Park, Round Rock, Georgetown, etc).  The focus will be limited to Travis and Williamson counties.

The data regarding the main city and the suburbs needs to be researched and a suitable useable source identified.
If it is found but is not in a useable form, data cleaning and manipulation will need to be performed.

The cleansed data will then be used along with Foursquare data. 
Foursquare location data will be leveraged to explore or compare the different areas around Austin, identifying 
high traffic areas, but with fewer existing Pet Supply Stores.  Existence of nearby Pet Rescue/Shelters will also be taken into consideration.

The Data Science Workflow for Week 2 will be:

* Data Analysis and Location Data:
Foursquare location data will be leveraged to explore or compare districts around Paris.
Data manipulation and analysis to derive subsets of the initial data.
Identifying the high traffic areas using data visualisation and tatistical nalysis.

* Visualization:
Analysis and plotting visualizations.
Data visualization using various mapping libraries.

* Conclusions:
Recomendations and results based on the data analysis.
Discussion of any limitations and how the results can be used, and any conclusions that can be drawn.

## Data Analysis

Begin by importing all the required libraries

In [1]:
import numpy as np 
import json 
import pandas as pd
import re

import requests
from pandas.io.json import json_normalize

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from bs4 import BeautifulSoup

from sklearn.cluster import KMeans

In [2]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.21.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [3]:
!conda install -c conda-forge folium=0.5.0 --yes 
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         673 KB

The following NEW packages will be INSTALLED:

    altair:  4.0.1-py_0 conda-forge
    branca:  0.4.0-py_0 conda-forge
    folium:  0.5.0-py_0 conda-forge
    vincent: 0.4.4-py_1 conda-forge


Downloading and Extracting Packages
vincent-0.4.4        | 28 KB     | #####

In [4]:
CLIENT_ID = '0HTHQIE3CZDDJUUKO3U2U2KD42DG0XEP5ITJ3R4QPKKWYYE0' # your Foursquare ID
CLIENT_SECRET = 'PDGZFNC44BKXLDQO4S0CQ50VUEBA4T4METQW34ELW4GSXUNK' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0HTHQIE3CZDDJUUKO3U2U2KD42DG0XEP5ITJ3R4QPKKWYYE0
CLIENT_SECRET:PDGZFNC44BKXLDQO4S0CQ50VUEBA4T4METQW34ELW4GSXUNK


Since we want to limit this to Travis and Williamson Counties, we'll find the center point of both, then merge both to find an overall centerpoint to focus our search.

In [5]:
#Utilizing the data found at https://latitude.to/articles-by-country/us/united-states/8474/travis-county-texas
latitude1 = 30.33
longitude1 = -97.78

#Utilizing the data found at https://latitude.to/articles-by-country/us/united-states/15441/williamson-county-texas
latitude2 = 30.65551
longitude2 = -97.5839

a = np.array([[latitude1, longitude1], [latitude2, longitude2]])
loc = np.median(a, axis=0)
loc

array([ 30.492755, -97.68195 ])

Next we want to get a list of all the main cities within Travis and Williamson counties and see the populations of each city, along with mapping them for distace from the ceterpoint of the two counties.

In [6]:
source1 = 'https://www.zip-codes.com/county/tx-travis.asp'
source1_get = requests.get(source1)
soup = BeautifulSoup(source1_get.content, 'lxml')
right_table1=soup.find('table', class_='statTable')

TZip=[]
TClass=[]
TCity=[]
TPop=[]
TTimezone=[]
TArea_codes=[]

for row in right_table1.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==6:
        TZip.append(cells[0].find(text=True))
        TClass.append(cells[1].find(text=True))
        TCity.append(cells[2].find(text=True))
        str = re.sub(",","",cells[3].find(text=True))
        if str.isdigit():
            TPop.append(int(str))
        else:
            TPop.append(str)
        TTimezone.append(cells[4].find(text=True))
        TArea_codes.append(cells[5].find(text=True))
    
df_tableT = pd.DataFrame(data=[TClass, TCity, TPop]).transpose()
df_tableT.columns = ['Classification', 'City', 'Population']

df_tableT.drop(df_tableT[df_tableT['Classification']=="Unique"].index, axis=0, inplace=True)
df_tableT.drop(df_tableT[df_tableT['Classification']=="P.O. Box"].index, axis=0, inplace=True)
df_tableT.drop(df_tableT[df_tableT['Classification']=="Classification"].index, axis=0, inplace=True)
df_tableT

Unnamed: 0,Classification,City,Population
3,General,Del Valle,22210
4,General,Leander,9773
6,General,Manchaca,4466
7,General,Manor,16375
8,General,Pflugerville,68789
9,General,Spicewood,8731
11,General,Austin,6841
12,General,Austin,21334
13,General,Austin,19690
14,General,Austin,42117


In [7]:
source2 = 'https://www.zip-codes.com/county/tx-williamson.asp'
source2_get = requests.get(source2)
soup = BeautifulSoup(source2_get.content, 'lxml')
right_table2=soup.find('table', class_='statTable')

WZip=[]
WClass=[]
WCity=[]
WPop=[]
WTimezone=[]
WArea_codes=[]

for row in right_table2.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==6:
        WZip.append(cells[0].find(text=True))
        WClass.append(cells[1].find(text=True))
        WCity.append(cells[2].find(text=True))
        str = re.sub(",","",cells[3].find(text=True))
        if str.isdigit():
            WPop.append(int(str))
        else:
            WPop.append(str)
        WTimezone.append(cells[4].find(text=True))
        WArea_codes.append(cells[5].find(text=True))
df_tableW = pd.DataFrame(data=[WClass, WCity, WPop]).transpose()
df_tableW.columns = ['Classification', 'City', 'Population']

df_tableW.drop(df_tableW[df_tableW['Classification']=="Unique"].index, axis=0, inplace=True)
df_tableW.drop(df_tableW[df_tableW['Classification']=="P.O. Box"].index, axis=0, inplace=True)
df_tableW.drop(df_tableW[df_tableW['Classification']=="Classification"].index, axis=0, inplace=True)
df_tableW

Unnamed: 0,Classification,City,Population
1,General,Florence,4058
2,General,Granger,2540
3,General,Jarrell,3870
5,General,Taylor,17661
6,General,Thrall,1766
7,General,Cedar Park,65099
8,General,Coupland,1290
9,General,Georgetown,25996
11,General,Georgetown,23727
13,General,Georgetown,19349


In [8]:
df_tableT = df_tableT.drop(['Classification'], axis=1)
dfTgroup=df_tableT.groupby('City').agg(lambda x : x.sum())
dfTgroup

Unnamed: 0_level_0,Population
City,Unnamed: 1_level_1
Austin,878632
Del Valle,22210
Leander,9773
Manchaca,4466
Manor,16375
Pflugerville,68789
Spicewood,8731


In [9]:
df_tableW = df_tableW.drop(['Classification'], axis=1)
dfWgroup=df_tableW.groupby('City').agg(lambda x : x.sum())
dfWgroup

Unnamed: 0_level_0,Population
City,Unnamed: 1_level_1
Austin,49646
Cedar Park,65099
Coupland,1290
Florence,4058
Georgetown,69072
Granger,2540
Hutto,22791
Jarrell,3870
Leander,44295
Liberty Hill,9467


In [10]:
dfTW = pd.concat((dfWgroup, dfTgroup))
dfTWmerge=dfTW.groupby('City').agg(lambda x : x.sum())
dfTWfinal = dfTWmerge.reset_index()
dfTWfinal

Unnamed: 0,City,Population
0,Austin,928278
1,Cedar Park,65099
2,Coupland,1290
3,Del Valle,22210
4,Florence,4058
5,Georgetown,69072
6,Granger,2540
7,Hutto,22791
8,Jarrell,3870
9,Leander,54068


In [11]:
# Latitude and Longitude data for complete city list, gathered from https://latitude.to/articles-by-country/us/united-states/ and uploaded to github
coordinates = pd.read_csv(r'https://raw.githubusercontent.com/smichellibm/coursea-data-capstone/master/Texas_Geospatial_Coordinates.csv')

coordinates.set_index("City")
dfTWfinal.set_index("City")
locations=pd.merge(dfTWfinal, coordinates)
locations

Unnamed: 0,City,Population,Latitude,Longitude
0,Austin,928278,30.26715,-97.74306
1,Cedar Park,65099,30.5052,-97.8203
2,Coupland,1290,30.4596,-97.39
3,Florence,4058,30.8413,-97.7936
4,Georgetown,69072,30.63326,-97.67798
5,Granger,2540,30.7177,-97.4428
6,Hutto,22791,30.5427,-97.5467
7,Jarrell,3870,30.8246,-97.6044
8,Leander,54068,30.5788,-97.8531
9,Liberty Hill,9467,30.6649,-97.9225


In [206]:
loc = folium.Map(location=[30.492755, -97.68195], zoom_start=9)

for lat, lng, city, pop in zip(locations['Latitude'], locations['Longitude'], locations['City'], locations['Population']):
        label = '{}'.format(city)
        label = folium.Popup(label, parse_html=True) 
        folium.CircleMarker([lat, lng],radius=40,popup=label,color='blue',fill=True,fill_color='#3186cc',fill_opacity=0.7,parse_html=False).add_to(loc)
    
loc

## Get the top 100 venues that are in each of the major cities within a radius of 40000 meters (25 miles)

In [207]:
url = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
    client_id=CLIENT_ID,
    client_secret=CLIENT_SECRET,
    v=VERSION,
    ll='30.26715, -97.74306',
    query='pet store',  #query for just pet stores
    limit=100,  # limit of number of venues returned by Foursquare API
    radius = 40000 # define radius, since we want all around Williamson and Travis, testing with 25 mile radius
)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)

In [138]:
# Send the GET request and examine the results
results = requests.get(url=url, params=params).json()
results

{'meta': {'code': 200, 'requestId': '5e66718f949393001b5f43c9'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Austin',
  'headerFullLocation': 'Austin',
  'headerLocationGranularity': 'city',
  'query': 'pet store',
  'totalResults': 155,
  'suggestedBounds': {'ne': {'lat': 30.62715036000036,
    'lng': -97.32701904363701},
   'sw': {'lat': 29.90714963999964, 'lng': -98.15910095636299}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b6db973f964a520d6892ce3',
       'name': 'Austin Urban Vet Center',
       'location': {'address': '710 W 5th St',
        'crossStreet': 'Rio Grande',
        'lat': 30.269250072507777,
        'lng': -97.75007380117995,
        'labeledLatLngs':

In [139]:
# define a function that extracts the category of the venue

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# clean the json and structure it into a pandas dataframe.

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(100)

Unnamed: 0,name,categories,lat,lng
0,Austin Urban Vet Center,Pet Store,30.269250,-97.750074
1,Healthy Pet,Pet Store,30.221092,-97.840965
2,Great Outdogs,Pet Store,30.260965,-97.757509
3,Bark 'n' Purr,Pet Store,30.317129,-97.740927
4,Tomlinson's Feed & Pets,Pet Store,30.309915,-97.714979
5,Paws On Chicon,Pet Store,30.274751,-97.719591
6,Tomlinson's Feed & Pets,Pet Store,30.234249,-97.792593
7,Phydeaux & Friends,Pet Store,30.355245,-97.731963
8,Tomlinson's,Pet Store,30.277024,-97.751024
9,Mud Puppies - Riverside,Pet Store,30.240007,-97.728016


In [140]:
# Check how many venues there are in within a radius of 40000 meters (25 miles)

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


Create a nearby venues function for all the major cities in Williamson and Travis counties

In [141]:
print(nearby_venues.shape)

(100, 4)


Analyze each of the Cities

In [149]:
# Analyze each of the Cities from the results

# one hot encoding
TW_onehot = pd.get_dummies(nearby_venues[['name']], prefix="", prefix_sep="")

# add city column back to dataframe
TW_onehot['City'] = dfTWfinal['City'] 

# move city column to the first column
fixed_columns = [TW_onehot.columns[-1]] + list(TW_onehot.columns[:-1])
TW_onehot = TW_onehot[fixed_columns]

TW_onehot

Unnamed: 0,City,Action Pack Dog Center,All Around Austin Exotic Pets,All Around Austin Exotic Pets South Store,"Animals Staying Alive, Inc",Aquatek Tropical Fish,Austin Aqua-Dome,Austin Pets Alive SoCo Adoption Site,Austin Urban Vet Center,Bark 'n' Purr,...,Wag N' Wash,West End Grooming,Wild Birds Unlimited,Windsor Park Veterinary Clinic,Woof Gang Bakery,Woof Gang Bakery & Grooming,Zoo Keeper Exotic Pets,Zoom Room Dog Training,crookedtail,metrodog @ The Domain
0,Austin,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,Cedar Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Coupland,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Del Valle,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
4,Florence,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Georgetown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Granger,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Hutto,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Jarrell,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Leander,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [150]:
TW_onehot.shape

(100, 70)

Group rows by city and take the mean of the frequency of occurrence of each category

In [151]:
TW_grouped = TW_onehot.groupby('City').mean().reset_index()
TW_grouped

Unnamed: 0,City,Action Pack Dog Center,All Around Austin Exotic Pets,All Around Austin Exotic Pets South Store,"Animals Staying Alive, Inc",Aquatek Tropical Fish,Austin Aqua-Dome,Austin Pets Alive SoCo Adoption Site,Austin Urban Vet Center,Bark 'n' Purr,...,Wag N' Wash,West End Grooming,Wild Birds Unlimited,Windsor Park Veterinary Clinic,Woof Gang Bakery,Woof Gang Bakery & Grooming,Zoo Keeper Exotic Pets,Zoom Room Dog Training,crookedtail,metrodog @ The Domain
0,Austin,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,Cedar Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Coupland,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Del Valle,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
4,Florence,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Georgetown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Granger,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Hutto,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Jarrell,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Leander,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [152]:
TW_grouped.shape

(18, 70)

Print each city with it's top 10 most common pet stores

In [153]:
# Each  city with top 10 most common venues

num_top_venues = 10

for TWcity in TW_grouped['City']:
    print("----"+TWcity+"----")
    temp = TW_grouped[TW_grouped['City'] == TWcity].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Austin----
                         venue  freq
0      Austin Urban Vet Center   1.0
1       Action Pack Dog Center   0.0
2                        Petco   0.0
3                    Shampooch   0.0
4              SOCO Pet Lounge   0.0
5  Rivers and Reefs Pet Supply   0.0
6          River City Aquatics   0.0
7  Pride & Joy Canine Day Care   0.0
8           Phydeaux & Friends   0.0
9                     PetSmart   0.0


----Cedar Park----
                         venue  freq
0                  Healthy Pet   1.0
1       Action Pack Dog Center   0.0
2                        Petco   0.0
3              SOCO Pet Lounge   0.0
4  Rivers and Reefs Pet Supply   0.0
5          River City Aquatics   0.0
6  Pride & Joy Canine Day Care   0.0
7           Phydeaux & Friends   0.0
8                     PetSmart   0.0
9         Southpaws Playschool   0.0


----Coupland----
                         venue  freq
0                Great Outdogs   1.0
1       Action Pack Dog Center   0.0
2                   

Put that data into a pandas dataframe and sort the venues in descending order, then show the new dataframe for each city.

In [154]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [155]:
# create the new dataframe and display the top 10 venues for each city

num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top pet stores
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
# create a new dataframe
TW_venues_sorted = pd.DataFrame(columns=columns)
TW_venues_sorted['City'] = TW_grouped['City']

for ind in np.arange(TW_grouped.shape[0]):
    TW_venues_sorted.iloc[ind, 1:] = return_most_common_venues(TW_grouped.iloc[ind, :], num_top_venues)

TW_venues_sorted.head(20)

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Austin,Austin Urban Vet Center,Hollywood Feed,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets,metrodog @ The Domain,Dorian's Bubbles N Paws
1,Cedar Park,Healthy Pet,metrodog @ The Domain,Herpeton Exotic Pets,Fetch,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Hollywood Feed,Midtown Groom & Board
2,Coupland,Great Outdogs,metrodog @ The Domain,Herpeton Exotic Pets,Fetch,Fish Gallery,Gallery of Pets,Groomingdale's Of Austin,Healthy Pet,Hollywood Feed,Midtown Groom & Board
3,Del Valle,Bark 'n' Purr,metrodog @ The Domain,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets,Hollywood Feed,Dorian's Bubbles N Paws
4,Florence,Tomlinson's Feed & Pets,metrodog @ The Domain,Dirty Dog - South Lamar,Fetch,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets
5,Georgetown,Paws On Chicon,metrodog @ The Domain,Hollywood Feed,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets,Invisible Fence of Austin
6,Granger,Tomlinson's Feed & Pets,metrodog @ The Domain,Dirty Dog - South Lamar,Fetch,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets
7,Hutto,Phydeaux & Friends,metrodog @ The Domain,Herpeton Exotic Pets,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Hollywood Feed,Dorian's Bubbles N Paws
8,Jarrell,Tomlinson's,metrodog @ The Domain,Dirty Dog - South Lamar,Fetch,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets
9,Leander,Mud Puppies - Riverside,metrodog @ The Domain,Hollywood Feed,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets,Invisible Fence of Austin


In [156]:
TW_venues_sorted.shape

(18, 11)

## The business criteria specified by the client: 'No other pet supply stores nearby'

Let's look at the frequency of Pet Stores for all the nearby cities.

These are the venues that the client wants to not have an abundant density of in the ideal store location. I've used a k-means clustering grouping to show where the greatest density of existing stores are located.

In [218]:
# Test to see which k-means is the best option
from collections import Counter
TW_grouped_clustering = TW_grouped.drop('City', 1)

for i in range(1,14):
    kmeans = KMeans(init="k-means++", n_clusters=i, n_init=12).fit(TW_grouped_clustering)
    print(Counter(kmeans.labels_))

Counter({0: 18})
Counter({0: 15, 1: 3})
Counter({0: 13, 2: 3, 1: 2})
Counter({0: 12, 1: 3, 2: 2, 3: 1})
Counter({0: 11, 4: 3, 3: 2, 1: 1, 2: 1})
Counter({0: 10, 2: 3, 3: 2, 5: 1, 4: 1, 1: 1})
Counter({0: 9, 5: 3, 4: 2, 3: 1, 1: 1, 2: 1, 6: 1})
Counter({0: 8, 1: 3, 4: 2, 6: 1, 5: 1, 2: 1, 3: 1, 7: 1})
Counter({0: 7, 2: 3, 4: 2, 1: 1, 7: 1, 3: 1, 8: 1, 6: 1, 5: 1})
Counter({0: 6, 1: 3, 5: 2, 2: 1, 6: 1, 7: 1, 4: 1, 8: 1, 3: 1, 9: 1})
Counter({0: 5, 2: 3, 4: 2, 5: 1, 1: 1, 10: 1, 7: 1, 9: 1, 6: 1, 3: 1, 8: 1})
Counter({0: 4, 2: 3, 3: 2, 7: 1, 6: 1, 4: 1, 8: 1, 10: 1, 1: 1, 5: 1, 11: 1, 9: 1})
Counter({0: 3, 4: 3, 2: 2, 10: 1, 8: 1, 12: 1, 3: 1, 6: 1, 7: 1, 5: 1, 1: 1, 9: 1, 11: 1})


In [219]:
# set number of clusters to 11 based on outcome above.
kclusters = 11

TW_grouped_clustering = TW_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(TW_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([ 8,  0,  9,  3,  2,  5,  2,  4, 10,  6], dtype=int32)

In [220]:
# Let's create a new dataframe that includes the cluster as well as the top Pet Stores for each City.
TW_merged = dfTWfinal

# add clustering labels
TW_merged['Cluster Labels'] = kmeans.labels_

# merge TW_merged with TW_venues_sorted to add latitude/longitude for each city
TW_merged = TW_merged.join(TW_venues_sorted.set_index('City'), on='City')

TW_merged 

Unnamed: 0,City,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Austin,928278,8,Austin Urban Vet Center,Hollywood Feed,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets,metrodog @ The Domain,Dorian's Bubbles N Paws
1,Cedar Park,65099,0,Healthy Pet,metrodog @ The Domain,Herpeton Exotic Pets,Fetch,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Hollywood Feed,Midtown Groom & Board
2,Coupland,1290,9,Great Outdogs,metrodog @ The Domain,Herpeton Exotic Pets,Fetch,Fish Gallery,Gallery of Pets,Groomingdale's Of Austin,Healthy Pet,Hollywood Feed,Midtown Groom & Board
3,Del Valle,22210,3,Bark 'n' Purr,metrodog @ The Domain,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets,Hollywood Feed,Dorian's Bubbles N Paws
4,Florence,4058,2,Tomlinson's Feed & Pets,metrodog @ The Domain,Dirty Dog - South Lamar,Fetch,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets
5,Georgetown,69072,5,Paws On Chicon,metrodog @ The Domain,Hollywood Feed,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets,Invisible Fence of Austin
6,Granger,2540,2,Tomlinson's Feed & Pets,metrodog @ The Domain,Dirty Dog - South Lamar,Fetch,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets
7,Hutto,22791,4,Phydeaux & Friends,metrodog @ The Domain,Herpeton Exotic Pets,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Hollywood Feed,Dorian's Bubbles N Paws
8,Jarrell,3870,10,Tomlinson's,metrodog @ The Domain,Dirty Dog - South Lamar,Fetch,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets
9,Leander,54068,6,Mud Puppies - Riverside,metrodog @ The Domain,Hollywood Feed,Fish Gallery,Gallery of Pets,Great Outdogs,Groomingdale's Of Austin,Healthy Pet,Herpeton Exotic Pets,Invisible Fence of Austin


In [224]:
TW_clusters = folium.Map(location=[30.492755, -97.68195], zoom_start=9.5)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, city, cluster in zip(locations['Latitude'], locations['Longitude'], TW_merged['City'], TW_merged['Cluster Labels']):
    label = '{}-{}'.format(city, cluster)
    label = folium.Popup(label, parse_html=True) 
    folium.CircleMarker(
        [lat, lon],
        radius=10*cluster,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(TW_clusters)
       
TW_clusters

## Results

After clustering the data of the respective city areas, we can see where the primary focus of existing Pet Stores are clustered.

## Observations & Recommendations

Once we observe the clustered locations of existing pet stores compared with populations.  The recommended location closest to the largest cities, yet on the outskirts of existing coverage areas of pet stores, the recommendation would be a location in the Pflugerville area.

## Conclusion

Based on location the major city of Austin and the nearby also large suburbs of Round Rock, Cedar Park and Georgetown, the recommeded area to attract the most clientele for a new Pet Supply store would be in the Pflugerville area.