## Battle of The Neighborhoods Week 2

### Targeting Locations for Hair Salon Owners

### Table of Contents  

[Introduction](#introduction)  

[Data](#data)  

[Methodology / Analysis](#methodology)  

[Conclusion](#conclusion)  

[Discussion](#discussion)


### INTRODUCTION / PROBLEM<a name="introduction"></a>

WWFY, LLC is a Technology firm launched in November 2018. We are dedicated to helping companies increase equity through increased operational performance. We do this by providing business insights through data analytics.   

However, there are hair solons in my city of Greenville, SC who currently rely on city location only to attract new walk-in clientele. They open a shop on a busy street corner and hope that footfall traffic will notice their salon and walk in for service. But, when it comes to hair styles and beauty, this city's salons are known for catering to a specific demographic only. For instance, there are salons for caucasians, african-americans, hispanics, ect. It is unfortunate, but true.  

Therefore, just because a salon is located on a busy street with high footfall, it doesn't mean that that salon will be sucessful. Locating a salon in the proper area code that serves its demographic is paramount for success.  

I will utilize location data from Foursquare in combination with demographic data from the Census Bureau for Greenville, SC to target specific demographics of potential customers and provide that information to salon owners so that they can be successful.


   ####     PROBLEM

There are hair solons in my city who have historically relied on location to attract new walk-in customers in order to build their clientele.  

However, the problem with this approach is since most salons in this city cater to specific demographics, simply having a salon in a high footfall area may not necessarily generate a high number of walk-in clients, thus causing the owner to have money-losing operation.
 
I will utilize location data from Foursquare in combination with other demographic data to help the salon owner make a more informed decision on where to locate her shop.

### DATA<a name="data"></a>

The data that will be used in this effort is a combination of zip codes from the city of reference and Foursquare venue data.
 
The Census Bureau provides demographic data for specific zip codes. In this particular southern-state city, people of similar demographics generally patronize the same salons. 

As such, I shall utilize a K-means algorithm to segment / cluster those zip codes as they will be highly correlated to the demographics of the people living in that specific zip code.

This segmentation will help salon owners choose the ideal location of their salons in order to attract specific walk-in customers who fit their demographic – increasing clientele.

Examples of data that will be included in the final dataset:

Zip codes in the city

Foursquare Venue data with a search query against hair salons or similar.  

I will use geopy to get geospacial data in order to generate maps to show segmentation and clustering.

Segment the data based on certain cuisines.


Import libraries to be used

In [2]:
import pandas as pd
import numpy as np

Download zip code and neighborhood data from a csv file into a pandas dataframe and print the first 5 rows.

In [3]:
df = pd.read_csv('greenville_zip_codes.csv')

In [3]:
df.head()

Unnamed: 0,zip,burough
0,29617,berea
1,29635,cleveland
2,29644,fountain_inn
3,29605,gantt
4,29607,greenville 1


Change the column heading from burough to neighborhood

In [4]:
df.rename(columns={'burough': 'neighborhood'},inplace=True)
df.head()

Unnamed: 0,zip,neighborhood
0,29617,berea
1,29635,cleveland
2,29644,fountain_inn
3,29605,gantt
4,29607,greenville 1


Checking the size of my dataframe...

In [46]:
df.shape

(21, 2)

Getting geographical coordinates of the zip codes

In [4]:
from geopy.geocoders import Nominatim

In [5]:
nom=Nominatim(user_agent="my-locator")

Trying the method to see if it works.

In [6]:
location = nom.geocode("29617")
print(location.latitude)
print(location.longitude)

34.8954139393294
-82.4470594531725


Success!

For clarity, I will modify my dataframe by creating a cloumn, "coordinates" to hold the geospatial values.

In [7]:
df['coordinates'] = df.zip.apply(nom.geocode)

df.head()

Unnamed: 0,zip,burough,coordinates
0,29617,berea,"(Berea, Greenville County, South Carolina, 296..."
1,29635,cleveland,"(Pickens County, South Carolina, 29635, United..."
2,29644,fountain_inn,"(Fountain Inn, Greenville County, South Caroli..."
3,29605,gantt,"(Gantt, Greenville County, South Carolina, 296..."
4,29607,greenville 1,"(Greenville, Greenville County, South Carolina..."


Now, we have to fish out the values for latitude and longitude using a lamdba statement. I used an "if" statement inside just in case the dataframe contained a row with 'None' values.

In [8]:
df['latitude'] = df.coordinates.apply(lambda x: x.latitude if x != None else None)
df['longitude'] = df.coordinates.apply(lambda x: x.longitude if x != None else None)
df

Unnamed: 0,zip,burough,coordinates,latitude,longitude
0,29617,berea,"(Berea, Greenville County, South Carolina, 296...",34.895414,-82.447059
1,29635,cleveland,"(Pickens County, South Carolina, 29635, United...",35.069949,-82.600574
2,29644,fountain_inn,"(Fountain Inn, Greenville County, South Caroli...",34.687266,-82.218567
3,29605,gantt,"(Gantt, Greenville County, South Carolina, 296...",34.799591,-82.394795
4,29607,greenville 1,"(Greenville, Greenville County, South Carolina...",34.828555,-82.331346
5,29609,greenville 2,"(Paris Point, Greenville County, South Carolin...",34.940663,-82.408719
6,29613,greenville 3,"(Greenville County, South Carolina, 29613, Uni...",34.939586,-82.435416
7,29614,greenville 4,"(Soltau, Heidekreis, Niedersachsen, 29614, Deu...",52.981371,9.815011
8,29601,greenville 5,"(Greenville, Greenville County, South Carolina...",34.844651,-82.400664
9,29650,greer 1,"(Greer, Greenville County, South Carolina, 296...",34.89817,-82.256502


Inspecting the dataframe shows that geocoder had some problems with some of the zip codes, so I will drop rows that have longitude values greater than -80.

In [9]:

df.drop(df[df['longitude']>= -80 ].index , inplace=True)

df.head()

Unnamed: 0,zip,burough,coordinates,latitude,longitude
0,29617,berea,"(Berea, Greenville County, South Carolina, 296...",34.895414,-82.447059
1,29635,cleveland,"(Pickens County, South Carolina, 29635, United...",35.069949,-82.600574
2,29644,fountain_inn,"(Fountain Inn, Greenville County, South Caroli...",34.687266,-82.218567
3,29605,gantt,"(Gantt, Greenville County, South Carolina, 296...",34.799591,-82.394795
4,29607,greenville 1,"(Greenville, Greenville County, South Carolina...",34.828555,-82.331346


Drop column named "coordinates"

In [10]:
df2 = df.drop(['coordinates'], axis = 1)
df2.head()

Unnamed: 0,zip,burough,latitude,longitude
0,29617,berea,34.895414,-82.447059
1,29635,cleveland,35.069949,-82.600574
2,29644,fountain_inn,34.687266,-82.218567
3,29605,gantt,34.799591,-82.394795
4,29607,greenville 1,34.828555,-82.331346


Change column name "burough" to "neighborhood"

In [18]:
df2.rename(columns={'burough': 'neighborhood'},inplace=True)
df2.head()

Unnamed: 0,zip,neighborhood,latitude,longitude
0,29617,berea,34.895414,-82.447059
1,29635,cleveland,35.069949,-82.600574
2,29644,fountain_inn,34.687266,-82.218567
3,29605,gantt,34.799591,-82.394795
4,29607,greenville 1,34.828555,-82.331346


Now that we have our cleaned dataframe, I;ll check the size then move along.

In [12]:
df2.shape

(17, 4)

Set screen max width for columns and importing libraries to create maps.

In [11]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [12]:
import matplotlib.cm as cm
import matplotlib.colors as colors

Import libraries for ploting maps

In [13]:
from geopy.geocoders import Nominatim
print('geopy installed')

geopy installed


Since I will need to clasify or create labels for data, then segregate this data based on those labels, the best algorithm for this is K-Means. I will import it now.

In [14]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Getting geographical coordinates of Greenville County first to be used for plotting a map.

In [15]:
address = 'Greenville, SC'

geolocator = Nominatim(user_agent="gville_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Greenville, SC are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Greenville, SC are 34.851354, -82.3984882.


Now, creating a map of Greenville County

In [19]:
# create map of Greenville using latitude and longitude values
map_green = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df2['latitude'], df2['longitude'], df2['neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_green)  
    
map_green

### Four Square API

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

I have purposefully hidden my credentials

In [1]:
#CLIENT_ID = 'O53FBKP4LDDTEE4EORKYZFZQF********************' # your Foursquare ID
#CLIENT_SECRET = 'U3WFZMYEPOVH4V4DKIZWXP**********************' # your Foursquare Secret
#VERSION = '20180605' # Foursquare API version

#print('My credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

### Explore Neighborhoods in Greenville County

But, let's just analyze a the first neighborhood first.

In [21]:
neighborhood_name = df2.loc[0, 'neighborhood']
print(neighborhood_name)

berea


Reprint the nighborhood's coordinates

In [22]:
neighborhood_latitude = df2.loc[0, 'latitude'] # neighborhood latitude value
neighborhood_longitude = df2.loc[0, 'longitude'] # neighborhood longitude value

neighborhood_name = df2.loc[0, 'neighborhood'] # neighborhood name

print('latitude and longitude values of {} are: {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

latitude and longitude values of berea are: 34.8954139393294, -82.4470594531725.


Restaurants in this rural southern city are few and far between. Therefore, I will create a function to get up to 100 restaurants in this neighborhood that are within 8 miles.

Searching for Restaurants in Brea.  Once again, I have hidden my APIcredentials.

In [2]:
radius = 13000
LIMIT = 150
querry = 'restaurant'
#urlA = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, radius, LIMIT, querry)
#urlA

I will now inport a module that allows me to read the data pulled from Four Square. 

In [24]:
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

Let's send the GET request, then examine the results

In [25]:
resultsA = requests.get(urlA).json()
resultsA

{'meta': {'code': 200, 'requestId': '5dde8c36542890001b5f7e63'},
 'response': {'venues': [{'id': '4c22828f99282d7fef3d67b0',
    'name': "Li'l Cricket / Marathon",
    'location': {'address': '400 Sulphur Springs Rd',
     'crossStreet': 'at Watkins Rd.',
     'lat': 34.893800663228326,
     'lng': -82.44535714387894,
     'labeledLatLngs': [{'label': 'display',
       'lat': 34.893800663228326,
       'lng': -82.44535714387894}],
     'distance': 237,
     'postalCode': '29617',
     'cc': 'US',
     'city': 'Greenville',
     'state': 'SC',
     'country': 'United States',
     'formattedAddress': ['400 Sulphur Springs Rd (at Watkins Rd.)',
      'Greenville, SC 29617',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d113951735',
      'name': 'Gas Station',
      'pluralName': 'Gas Stations',
      'shortName': 'Gas Station',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/gas_',
       'suffix': '.png'},
      'primary': True}],
    'referra

Cleaaning up the dataframe and renaming it to "newdf"

In [26]:
rests = resultsA['response']['venues']
   
nearby_rests = json_normalize(rests) # flatten JSON

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in nearby_rests.columns if col.startswith('location.')] + ['id']
nearby_rests_filtered = nearby_rests.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
nearby_rests_filtered['categories'] = nearby_rests_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
nearby_rests_filtered.columns = [column.split('.')[-1] for column in nearby_rests_filtered.columns]
nearby_rests_filtered.head()

# Drop columns we don't need

newdf = nearby_rests_filtered.drop(['labeledLatLngs', 'distance', 'cc', 'city','state','country','formattedAddress','crossStreet'],axis=1)
newdf.head()

Unnamed: 0,name,categories,address,lat,lng,postalCode,id
0,Li'l Cricket / Marathon,Gas Station,400 Sulphur Springs Rd,34.893801,-82.445357,29617,4c22828f99282d7fef3d67b0
1,Goodwill,Thrift / Vintage Store,412 Sulphur Springs Rd #B,34.892352,-82.450551,29617,4c9a228578ffa09396bb6975
2,The Hood,Flea Market,,34.895046,-82.444686,29617,50b3f329e4b04c369f49abb6
3,Dollar Tree,Discount Store,1950 Cedar Lane Rd,34.895138,-82.448425,29617,5dbb216d0a01bf0007d39349
4,Swamp Rabbit Station,Other Great Outdoors,,34.887607,-82.442596,29617,5044b4b0e4b029b24a99e65a


In [27]:
newdf.shape

(145, 7)

From Foursquare, we know that all the information is in the items key. I will create a function to get the necessary information for all the restaurants in all the neighborhoods of Greenville County.

In [28]:
def getNearbyVenues(names, latitudes, longitudes, radius=1600):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['zip', 
                  'neighborhood latitude', 
                  'neighborhood longitude', 
                  'venue', 
                  'venue latitude', 
                  'venue longitude', 
                  'venue category']
    
    return(nearby_venues)

Using the function on each zip code and create a new dataframe called green_venues

In [29]:
green_venues = getNearbyVenues(names=df2['zip'],
                                   latitudes=df2['latitude'],
                                   longitudes=df2['longitude']
                                  )

29617
29635
29644
29605
29607
29609
29613
29601
29650
29651
29661
29662
29611
29673
29681
29687
29615


In [32]:
green_venues.shape

(300, 7)

Printing the first few rows

In [30]:
green_venues.head()

Unnamed: 0,zip,neighborhood latitude,neighborhood longitude,venue,venue latitude,venue longitude,venue category
0,29617,34.895414,-82.447059,Celebrity's Hot Dogs,34.887423,-82.456889,Hot Dog Joint
1,29617,34.895414,-82.447059,Tomato Vine,34.896697,-82.432131,Farmers Market
2,29617,34.895414,-82.447059,CVS pharmacy,34.894302,-82.431767,Pharmacy
3,29617,34.895414,-82.447059,Goodwill,34.892352,-82.450551,Thrift / Vintage Store
4,29617,34.895414,-82.447059,Subway,34.882473,-82.454265,Sandwich Place


### Methodology / Analysis<a name="methodology"></a>

Some more analysis to see how many venues are returned for each neighborhood.

In [31]:
green_venues.groupby('zip').count()

Unnamed: 0_level_0,neighborhood latitude,neighborhood longitude,venue,venue latitude,venue longitude,venue category
zip,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
29601,100,100,100,100,100,100
29605,9,9,9,9,9,9
29607,43,43,43,43,43,43
29609,5,5,5,5,5,5
29611,21,21,21,21,21,21
29613,7,7,7,7,7,7
29615,7,7,7,7,7,7
29617,13,13,13,13,13,13
29635,2,2,2,2,2,2
29644,8,8,8,8,8,8


Let's have more fun and dig deeper to know how many unique categories of venues there are.

In [32]:
print('Number of unique categories are: ', green_venues['venue category'].nunique())

Number of unique categories are:  135


#### Analysis of Restaurants in Each Neighborhood

Use one-hot encoding to change categorical values to numeric values for further processing.

In [33]:
# one hot encoding. If venue is present, then 1 - else, 0
green_onehot = pd.get_dummies(green_venues[['venue category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
green_onehot['zip'] = green_venues['zip'] 

# move neighborhood column to the first column
fixed_columns = [green_onehot.columns[-1]] + list(green_onehot.columns[:-1])
green_onehot = green_onehot[fixed_columns]

green_onehot

Unnamed: 0,zip,ATM,Accessories Store,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Auto Dealership,BBQ Joint,Bagel Shop,Bank,Bar,Baseball Field,Baseball Stadium,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Shop,Bistro,Boat or Ferry,Bookstore,Breakfast Spot,Brewery,Bridge,Burger Joint,Café,Caribbean Restaurant,Chinese Restaurant,Cocktail Bar,Coffee Shop,College Cafeteria,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Dance Studio,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,General Entertainment,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hobby Shop,Hockey Arena,Home Service,Hot Dog Joint,Hotel,Hunting Supply,Ice Cream Shop,Italian Restaurant,Japanese Restaurant,Lawyer,Liquor Store,Lounge,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Museum,Music Store,Music Venue,New American Restaurant,Optical Shop,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Rental Car Location,Resort,Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shipping Store,Shoe Store,Skating Rink,Smoothie Shop,Soup Place,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Video Game Store,Video Store,Women's Store,Yoga Studio,Zoo
0,29617,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,29617,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,29617,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,29617,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,29617,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,29617,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,29617,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,29617,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,29617,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,29617,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Some funny looking restaurants here, auto dealership, auto shop, bank... I'll count to see how many of these there really are.

In [37]:
green_onehot['Auto Dealership'].count()

300

In [38]:
green_onehot.shape

(300, 129)

Next, let's group rows by zip code. Since the Venue Category column is categorical data, I used a python function called one-hot encoding to convert to numerical data. Afterwards, I took the mean of the frequency of each venue. Doing this gives me a measure of the most popular restaurants in that area.

In [37]:
green_grouped = green_onehot.groupby('zip').mean().reset_index()
green_grouped.head()

Unnamed: 0,zip,ATM,Accessories Store,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Auto Dealership,BBQ Joint,Bagel Shop,Bank,Bar,Baseball Field,Baseball Stadium,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Shop,Bistro,Boat or Ferry,Bookstore,Breakfast Spot,Brewery,Bridge,Burger Joint,Café,Caribbean Restaurant,Chinese Restaurant,Cocktail Bar,Coffee Shop,College Cafeteria,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Dance Studio,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,General Entertainment,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hobby Shop,Hockey Arena,Home Service,Hot Dog Joint,Hotel,Hunting Supply,Ice Cream Shop,Italian Restaurant,Japanese Restaurant,Lawyer,Liquor Store,Lounge,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Museum,Music Store,Music Venue,New American Restaurant,Optical Shop,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Rental Car Location,Resort,Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shipping Store,Shoe Store,Skating Rink,Smoothie Shop,Soup Place,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Video Game Store,Video Store,Women's Store,Yoga Studio,Zoo
0,29601,0.0,0.0,0.06,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.03,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.02,0.03,0.01,0.01,0.02,0.01,0.0,0.01,0.05,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.06,0.0,0.02,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.02,0.0,0.0,0.05,0.01,0.02,0.01,0.0,0.03,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.01
1,29605,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,29607,0.023256,0.0,0.0,0.023256,0.0,0.023256,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.046512,0.0,0.023256,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.023256,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.046512,0.023256,0.023256,0.023256,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0
3,29609,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.6,0.0,0.0,0.0,0.0,0.0
4,29611,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Grouping by the mean gives me a much smaller dataframe to process.  
I'll show the size of the resulting dataframe

In [38]:
green_grouped.shape

(17, 136)

We will weed out those venues that aren't really what we are looking for and create a new dataframe with the top 10 most popular restaurants in each neighborhood for analysis

but first, sorting the venues in descending order

In [39]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [40]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['zip']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhood_venues_sorted = pd.DataFrame(columns=columns)
neighborhood_venues_sorted['zip'] = green_grouped['zip']

for ind in np.arange(green_grouped.shape[0]):
    neighborhood_venues_sorted.iloc[ind, 1:] = return_most_common_venues(green_grouped.iloc[ind, :], num_top_venues)

neighborhood_venues_sorted.head()

Unnamed: 0,zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,29601,American Restaurant,Hotel,Coffee Shop,Sushi Restaurant,Pizza Place,Trail,Bar,Theater,Brewery,Southern / Soul Food Restaurant
1,29605,Hotel,Gym / Fitness Center,Discount Store,Pizza Place,American Restaurant,Miscellaneous Shop,Golf Course,Thrift / Vintage Store,Farmers Market,Farm
2,29607,Rental Car Location,Fast Food Restaurant,Bookstore,Discount Store,Department Store,Sandwich Place,ATM,Coffee Shop,Restaurant,Resort
3,29609,Trail,Home Service,Golf Course,Zoo,Discount Store,Fast Food Restaurant,Farmers Market,Farm,Electronics Store,Donut Shop
4,29611,Grocery Store,Fast Food Restaurant,Convenience Store,Discount Store,Pharmacy,American Restaurant,Fried Chicken Joint,Sandwich Place,Food Court,Mexican Restaurant


### Results

#### Finally, we label / cluster our neighborhoods

I will segment into 4 unique clusters to get more seperation for cultural analysis. I want my results to segregate the most popular cuisine in a particular zip code or neighborhood.    

The data has no predefined labels. I do not currently have a method of clumping or clustering the restaurants into specific groups based on cuisines. Therefore, I will need to utilize a machine learning algorithm to create the labels. I created a K-Means algorithm which is a machine learning concept that creates labels and segregates those labels into 3 clusters with.

However, three clusters did not provide enough segregation of the restaurants for analysis. Most of the cuisines of the restaurants were clustered in the same vicinity and did not provide ample segregation to determine the type of demographic that patronized them for that area.

Therefore, I used a k-value of 4 which produced several zones with cluster values of 0 to 3 (red, green, yellow, purple) making it easier to determine the most popular restaurant cuisine in a particular zip code. See figure 5, Neighborhood Clusters map.


In [41]:
# set number of clusters
kclusters = 4

green_grouped_clustering = green_grouped.drop('zip', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=3).fit(green_grouped_clustering)

# check cluster labels generated for each row in the dataframe
clusters_ = kmeans.labels_.astype(int)
clusters_

array([3, 0, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 1, 3, 0, 3, 3])

Creating a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [42]:
# add clustering labels
neighborhood_venues_sorted.insert(0, 'Cluster Labels', clusters_)

green_merged = df2

# merge green_grouped with df2 to add latitude/longitude for each neighborhood
green_merged = green_merged.join(neighborhood_venues_sorted.set_index('zip'), on='zip')
green_merged.head()

Unnamed: 0,zip,neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,29617,berea,34.895414,-82.447059,3,Fast Food Restaurant,Convenience Store,Chinese Restaurant,Pharmacy,Sandwich Place,Discount Store,Diner,Gas Station,Thrift / Vintage Store,Arts & Crafts Store
1,29635,cleveland,35.069949,-82.600574,2,Trail,Park,Zoo,Discount Store,Fast Food Restaurant,Farmers Market,Farm,Electronics Store,Donut Shop,Dessert Shop
2,29644,fountain_inn,34.687266,-82.218567,3,Convenience Store,Rental Car Location,Pizza Place,Breakfast Spot,Baseball Field,Discount Store,Fast Food Restaurant,Grocery Store,Department Store,Dessert Shop
3,29605,gantt,34.799591,-82.394795,0,Hotel,Gym / Fitness Center,Discount Store,Pizza Place,American Restaurant,Miscellaneous Shop,Golf Course,Thrift / Vintage Store,Farmers Market,Farm
4,29607,greenville 1,34.828555,-82.331346,3,Rental Car Location,Fast Food Restaurant,Bookstore,Discount Store,Department Store,Sandwich Place,ATM,Coffee Shop,Restaurant,Resort


Let's map our clusters

In [43]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(green_merged['latitude'], green_merged['longitude'], green_merged['neighborhood'], green_merged['Cluster Labels']):
    label = folium.Popup(str(poi) +' Cluster '+ str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        
        color= rainbow[cluster-1],
        #color=rainbow[2],
        fill=True,
        #fill_color=rainbow[3],
        
        fill_color= rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters


#### Examine Our Clusters

Select clusters according to the resturants main cuisine or name and make a judgement as to the culture of its patrons in that zip code or area. This result will indicate the demographic of potential customers in each zip code.

#### Italian

In [44]:
green_merged.loc[green_merged['Cluster Labels'] == 0, green_merged.columns[[0] + list(range(5, green_merged.shape[1]))]]

Unnamed: 0,zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,29605,Hotel,Gym / Fitness Center,Discount Store,Pizza Place,American Restaurant,Miscellaneous Shop,Golf Course,Thrift / Vintage Store,Farmers Market,Farm
14,29673,Pizza Place,Discount Store,American Restaurant,Farm,Electronics Store,Flower Shop,Fast Food Restaurant,Farmers Market,Donut Shop,Diner


#### Working class Caucasian

In [46]:
green_merged.loc[green_merged['Cluster Labels'] == 1, green_merged.columns[[0] + list(range(5, green_merged.shape[1]))]]

Unnamed: 0,zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,29661,BBQ Joint,Discount Store,Flower Shop,Fast Food Restaurant,Farmers Market,Farm,Electronics Store,Donut Shop,Zoo,Food & Drink Shop


#### Caucasian - Professional

In [47]:
green_merged.loc[green_merged['Cluster Labels'] == 2, green_merged.columns[[0] + list(range(5, green_merged.shape[1]))]]

Unnamed: 0,zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,29635,Trail,Park,Zoo,Discount Store,Fast Food Restaurant,Farmers Market,Farm,Electronics Store,Donut Shop,Dessert Shop
5,29609,Trail,Home Service,Golf Course,Zoo,Discount Store,Fast Food Restaurant,Farmers Market,Farm,Electronics Store,Donut Shop


#### African-American and Hispanic

In [48]:
green_merged.loc[green_merged['Cluster Labels'] == 3, green_merged.columns[[0] + list(range(5, green_merged.shape[1]))]]

Unnamed: 0,zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,29617,Fast Food Restaurant,Convenience Store,Chinese Restaurant,Pharmacy,Sandwich Place,Discount Store,Diner,Gas Station,Thrift / Vintage Store,Arts & Crafts Store
2,29644,Convenience Store,Rental Car Location,Pizza Place,Breakfast Spot,Baseball Field,Discount Store,Fast Food Restaurant,Grocery Store,Department Store,Dessert Shop
4,29607,Rental Car Location,Fast Food Restaurant,Bookstore,Discount Store,Department Store,Sandwich Place,ATM,Coffee Shop,Restaurant,Resort
6,29613,Concert Hall,Boat or Ferry,Pool,Music Venue,Movie Theater,Playground,Golf Course,Gastropub,Gas Station,Convenience Store
8,29601,American Restaurant,Hotel,Coffee Shop,Sushi Restaurant,Pizza Place,Trail,Bar,Theater,Brewery,Southern / Soul Food Restaurant
9,29650,Grocery Store,Baseball Field,College Cafeteria,Park,Salon / Barbershop,Smoothie Shop,Fast Food Restaurant,Construction & Landscaping,Food & Drink Shop,Pizza Place
10,29651,American Restaurant,Discount Store,Park,Mexican Restaurant,Grocery Store,Pharmacy,Pizza Place,Brewery,Pub,Cosmetics Shop
12,29662,Pizza Place,Grocery Store,Sandwich Place,Fast Food Restaurant,American Restaurant,Mexican Restaurant,Convenience Store,Park,Pharmacy,Cosmetics Shop
13,29611,Grocery Store,Fast Food Restaurant,Convenience Store,Discount Store,Pharmacy,American Restaurant,Fried Chicken Joint,Sandwich Place,Food Court,Mexican Restaurant
16,29681,Pharmacy,Video Store,Movie Theater,Discount Store,Convenience Store,Park,Pet Store,Mexican Restaurant,Coffee Shop,Southern / Soul Food Restaurant


The clusters were selected and assigned to their demographic accordingly.  

Cluster 0 = Italian demographic  

Cluster 1 = Working class Caucasian demographic  

Cluster 2 = Caucasian - Professional demographic  

Cluster 3 = African-American / Hispanic demographic


### Conclusion<a name="conclusion"></a>

The analysis confidently showed cuisines of restaurants typically enjoyed by African-Americans, upscale Caucasians, working-class Caucasians and Italians. But failed to distinguish Asians or Hispanics which could be important to salon owners.  

This analysis indicates that there are distinct cusines that exist in each cluster of zip codes. Therefore, a salon owner can decide his/her shop location based on these clusters in order to attract a certain demographic.

### Discussion<a name="discussion"></a>

It may be possible to dig deeper into the analysis by searching Foursquare on the main menu item served by each restaurant, then correlating this to a specific demographic.
Further analysis could possibly give insights as to other cultures in the city of Greenville.
