
<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Lubbock</font></h1>

## Introduction

In this project, the neighborhoods in the city of Lubbock is explored, segmented, and clustered. Data is scraped the city of Lubbock website, the Lubbock area connect website and the Geonames website

Data from the website pages was scraped, wrangled, cleaned and then read it into an excel file and pandas dataframe so that it is in a structured format.

Once the data is in a structured format, the analysis is to explore and cluster the neighborhoods in the city of Lubbock.

## Project Description  
Lubbock is a city in Texas with approximately 307,500 residents living in the area. As at 2017, according to the US Bureau of  Statistics, the total employer establishments are 7,202 which shows an increasing trend. Majority of these businesses include service companies and small businesses. A car salesman is looking to invest in his business and expand to Lubbock. With the increasing number of student population at Texas Tech University, there seems to be more opportunities for a car sales business. Using this project, an analysis scan be done to show the features of the Lubbock neighborhood and where the proposed car business can be established, 
The project requires geo-locational information about Lubbock, Texas can provide the salesman data needed to make his decision. The project will help stakeholders including other car salespersons, parents/students/families in Lubbock, other small businesses and the city of Lubbock as a whole. 
To illustrate, the project will compare the different zip codes in Lubbock and analyze the neighborhoods with the most common car shops and where there are no car shops. Using the k-means clustering algorithm, this will help give a better understanding of the neighborhood and insights on areas to establish the car shop.  

##Data sets and APIs
The data for Lubbock neighborhood that include the zip codes was found on the city of Lubbock website, the Lubbock area connect website and the Geonames website. The major data scraped from these websites include the zip codes, the location data (longitude and latitude) and the county names. 
After data was downloaded from these three websites, the data was joined into one table. There was no missing data based on the zip codes obtained.  After cleaning the data, there were a total of 33 samples and 4 features. The features include: zip code, county name, longitude and latitude. 
The Foursquare API is a data gathering source that is also used for this project as it has a very large database which gives ability to share business locations and provide information based on location search that will be used to understand businesses in Lubbock. Photos and reviews by users provided by Foursquare API can also be used for gathering insights on car shops in Lubbock.
Python scientific libraries, visualization libraries, packages and dependences will also be utilized in getting information about the city of Lubbock. The k-mean clustering algorithm will be applied on the clusters of categories in the Lubbock neighborhood. 




## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Neighborhoods in Lubbock</a>

3. <a href="#item3">Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c anaconda xlrd --yes #!conda install -c anaconda xlrd

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
!pip install beautifulsoup4
from bs4 import BeautifulSoup

!pip install wikipedia
import wikipedia as wp

!pip install lxml
import lxml as lh

print('Libraries imported.')


Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - xlrd


The following packages will be UPDATED:

    certifi: 2019.11.28-py36_0 conda-forge --> 2019.11.28-py36_0 anaconda
    openssl: 1.1.1d-h516909a_0 conda-forge --> 1.1.1-h7b6447c_0  anaconda

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be UPDATED:

    certifi: 2019.11.28-py36_0 anaconda --> 2019.11.28-py36_0 conda-forge

The following packages will be DOWNGRADED:

    ope

<a id='item1'></a>

## 1. Download and Explore Dataset

Data is scraped, downloaded and extracted

In [2]:
#download data from geonames
import requests
import pandas as pd

url = 'https://www.geonames.org/postal-codes/US/TX/303/lubbock.html'
html = requests.get(url).content
df_list = pd.read_html(html)
df1 = df_list[-1]
print(df1)
df1.to_csv('my data.csv')

    Unnamed: 0            Place             Code          Country  \
0          1.0          Lubbock            79401    United States   
1          NaN  33.587/-101.861  33.587/-101.861  33.587/-101.861   
2          2.0          Lubbock            79404    United States   
3          NaN  33.526/-101.833  33.526/-101.833  33.526/-101.833   
4          3.0          Lubbock            79407    United States   
5          NaN  33.568/-101.942  33.568/-101.942  33.568/-101.942   
6          4.0          Lubbock            79408    United States   
7          NaN  33.566/-101.927  33.566/-101.927  33.566/-101.927   
8          5.0          Lubbock            79410    United States   
9          NaN   33.569/-101.89   33.569/-101.89   33.569/-101.89   
10         6.0          Lubbock            79412    United States   
11         NaN  33.546/-101.858  33.546/-101.858  33.546/-101.858   
12         7.0          Lubbock            79414    United States   
13         NaN   33.55/-101.919   

In [3]:
#download data from area connect
import requests
import pandas as pd

url = 'http://lubbock.areaconnect.com/zip2.htm?city=Lubbock&qs=TX&searchtype=bycity'
html = requests.get(url).content
df_list2 = pd.read_html(html)
df2 = df_list2[-4]
print(df2)
df2.to_csv('my data2.csv')


                                 0
0  Lubbock Zip Code Search Results


In [4]:
df1


Unnamed: 0.1,Unnamed: 0,Place,Code,Country,Admin1,Admin2,Admin3
0,1.0,Lubbock,79401,United States,Texas,Lubbock,
1,,33.587/-101.861,33.587/-101.861,33.587/-101.861,33.587/-101.861,33.587/-101.861,33.587/-101.861
2,2.0,Lubbock,79404,United States,Texas,Lubbock,
3,,33.526/-101.833,33.526/-101.833,33.526/-101.833,33.526/-101.833,33.526/-101.833,33.526/-101.833
4,3.0,Lubbock,79407,United States,Texas,Lubbock,
5,,33.568/-101.942,33.568/-101.942,33.568/-101.942,33.568/-101.942,33.568/-101.942,33.568/-101.942
6,4.0,Lubbock,79408,United States,Texas,Lubbock,
7,,33.566/-101.927,33.566/-101.927,33.566/-101.927,33.566/-101.927,33.566/-101.927,33.566/-101.927
8,5.0,Lubbock,79410,United States,Texas,Lubbock,
9,,33.569/-101.89,33.569/-101.89,33.569/-101.89,33.569/-101.89,33.569/-101.89,33.569/-101.89


In [5]:
#Merge dataframes to contain zipcode, county and location data

df = pd.read_excel(r'lubbockcodes.xlsx')

print('Data downloaded and read into a dataframe!')

Data downloaded and read into a dataframe!


In [6]:
df

Unnamed: 0,Zip Code,City Name,Latitude,Longitude
0,79401,Lubbock79401,33.585,-101.847
1,79402,Lubbock79402,33.588,-101.847
2,79403,Lubbock79403,33.603,-101.815
3,79404,Lubbock79404,33.554,-101.821
4,79405,Lubbock79405,33.572,-101.85
5,79406,Lubbock79406,33.58,-101.88
6,79407,Lubbock79407,33.57,-101.948
7,79408,Lubbock79408,33.581,-101.978
8,79409,Lubbock79409,33.586,-101.854
9,79410,Lubbock79410,33.568,-101.895


In [7]:
# let's rename the columns so that they make sense
df.rename(columns={'City Name':'County'}, inplace=True)

In [8]:
df.head()

Unnamed: 0,Zip Code,County,Latitude,Longitude
0,79401,Lubbock79401,33.585,-101.847
1,79402,Lubbock79402,33.588,-101.847
2,79403,Lubbock79403,33.603,-101.815
3,79404,Lubbock79404,33.554,-101.821
4,79405,Lubbock79405,33.572,-101.85


In [9]:
df.shape

(33, 4)

#### Use geopy library to get the latitude and longitude values of Lubbock.

In [10]:
address = 'Lubbock'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Lubbock are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Lubbock are 33.5778631, -101.8551665.


#### Create a map of Lubbock with neighborhoods superimposed on top.

In [11]:
# create map of Lubbock using latitude and longitude values
map_lubbock = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Zip Code'], df['County']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_lubbock)  
    
map_lubbock

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

Explore and cluster the neighborhoods in Lubbock.

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [13]:
df.loc[0, 'Zip Code']

79401

Get the neighborhood's latitude and longitude values.

In [14]:
neighborhood_latitude = df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df.loc[0, 'Zip Code'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of 79401 are 33.585, -101.847.


#### Now, let's get the top 100 venues that are in 79401 within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [15]:
# type your answer here

LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL


'https://api.foursquare.com/v2/venues/explore?&client_id=2UBVKFGCQ300Z00WQWZKHX2WTXLWE0PGWPK0AK2T3RKLMHNI&client_secret=00JRJHCFPFCTSS4RGGXHLUGUJ11WKTNG3PGO301041Z0WJBS&v=20180605&ll=33.585,-101.847&radius=500&limit=100'

Send the GET request and examine the resutls

In [16]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5de80a68edbcad001b83ce9c'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Downtown Lubbock',
  'headerFullLocation': 'Downtown Lubbock, Lubbock',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 6,
  'suggestedBounds': {'ne': {'lat': 33.5895000045, 'lng': -101.84160834796657},
   'sw': {'lat': 33.5804999955, 'lng': -101.85239165203342}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b16ac1af964a520ccbb23e3',
       'name': 'Giorgios Pizza',
       'location': {'address': '1020 Ave J',
        'lat': 33.584780488465334,
        'lng': -101.84666292196528,
        'labeledLatLngs': [{'label': 'display',
          'lat': 33.584

In [17]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [18]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Giorgios Pizza,Pizza Place,33.58478,-101.846663
1,The West Table,Bistro,33.584847,-101.848237
2,Wells Fargo,Bank,33.584911,-101.851885
3,Levine's,Men's Store,33.584292,-101.846655
4,Mcwhorter Tire & Auto,Automotive Shop,33.586174,-101.84596


In [134]:
#Determine the number of venues returned by Foursquare

In [19]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

6 venues were returned by Foursquare.


## 2. Explore Neighborhoods in Lubbock

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Zip Code', 
                  'Zip Code Latitude', 
                  'Zip Code Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
lubbock_venues = getNearbyVenues(names=df['Zip Code'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

79401
79402
79403
79404
79405
79406
79407
79408
79409
79410
79411
79412
79413
79414
79415
79416
79423
79424
79430
79452
79453
79457
79464
79490
79491
79493
79499
79329
79350
79363
79364
79366
79382


#### Let's check the size of the resulting dataframe

In [22]:
print(lubbock_venues.shape)
lubbock_venues.head()

(242, 7)


Unnamed: 0,Zip Code,Zip Code Latitude,Zip Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,79401,33.585,-101.847,Giorgios Pizza,33.58478,-101.846663,Pizza Place
1,79401,33.585,-101.847,The West Table,33.584847,-101.848237,Bistro
2,79401,33.585,-101.847,Wells Fargo,33.584911,-101.851885,Bank
3,79401,33.585,-101.847,Levine's,33.584292,-101.846655,Men's Store
4,79401,33.585,-101.847,Mcwhorter Tire & Auto,33.586174,-101.84596,Automotive Shop


Let's check how many venues were returned for each neighborhood

In [23]:
lubbock_venues.groupby('Zip Code').count()

Unnamed: 0_level_0,Zip Code Latitude,Zip Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Zip Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
79364,4,4,4,4,4,4
79366,1,1,1,1,1,1
79382,3,3,3,3,3,3
79401,6,6,6,6,6,6
79402,7,7,7,7,7,7
79404,4,4,4,4,4,4
79405,6,6,6,6,6,6
79406,12,12,12,12,12,12
79407,27,27,27,27,27,27
79408,2,2,2,2,2,2


#### Let's find out how many unique categories can be curated from all the returned venues

In [24]:
print('There are {} uniques categories.'.format(len(lubbock_venues['Venue Category'].unique())))

There are 102 uniques categories.


In [25]:
# one hot encoding
lubbock_onehot = pd.get_dummies(lubbock_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
lubbock_onehot['Zip Code'] = lubbock_venues['Zip Code'] 

# move neighborhood column to the first column
fixed_columns = [lubbock_onehot.columns[-1]] + list(lubbock_onehot.columns[:-1])
lubbock_onehot = lubbock_onehot[fixed_columns]

lubbock_onehot.head()


Unnamed: 0,Zip Code,ATM,Accessories Store,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Big Box Store,Bistro,Bookstore,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Business Service,Cajun / Creole Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall,College Theater,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Electronics Store,Farm,Fast Food Restaurant,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Service,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gym,Gym / Fitness Center,Hardware Store,Hobby Shop,Hot Dog Joint,Hotel,Hotel Pool,Hunting Supply,IT Services,Ice Cream Shop,Intersection,Italian Restaurant,Karaoke Bar,Kids Store,Locksmith,Men's Store,Mexican Restaurant,Mobile Phone Shop,Motorcycle Shop,Music Store,Outdoor Supply Store,Park,Pharmacy,Pizza Place,Pool,Recreation Center,Rental Car Location,Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Smoke Shop,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Thai Restaurant,Thrift / Vintage Store,Train Station,Video Game Store,Video Store,Wine Bar,Wings Joint,Women's Store
0,79401,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,79401,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,79401,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,79401,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,79401,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [26]:
lubbock_onehot.shape

(242, 103)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [27]:
lubbock_grouped = lubbock_onehot.groupby('Zip Code').mean().reset_index()
lubbock_grouped

Unnamed: 0,Zip Code,ATM,Accessories Store,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Big Box Store,Bistro,Bookstore,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Business Service,Cajun / Creole Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall,College Theater,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Electronics Store,Farm,Fast Food Restaurant,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Service,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gym,Gym / Fitness Center,Hardware Store,Hobby Shop,Hot Dog Joint,Hotel,Hotel Pool,Hunting Supply,IT Services,Ice Cream Shop,Intersection,Italian Restaurant,Karaoke Bar,Kids Store,Locksmith,Men's Store,Mexican Restaurant,Mobile Phone Shop,Motorcycle Shop,Music Store,Outdoor Supply Store,Park,Pharmacy,Pizza Place,Pool,Recreation Center,Rental Car Location,Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Smoke Shop,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Thai Restaurant,Thrift / Vintage Store,Train Station,Video Game Store,Video Store,Wine Bar,Wings Joint,Women's Store
0,79364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,79366,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,79382,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0
3,79401,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,79402,0.0,0.0,0.0,0.0,0.428571,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,79404,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,79405,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0
7,79406,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.083333,0.083333,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,79407,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.037037,0.037037,0.0,0.0,0.0,0.0,0.0,0.074074,0.037037,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.037037,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037
9,79408,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size


In [28]:
lubbock_grouped.shape


(28, 103)

In [29]:
lubbock_grouped.head()

Unnamed: 0,Zip Code,ATM,Accessories Store,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Big Box Store,Bistro,Bookstore,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Business Service,Cajun / Creole Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall,College Theater,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Electronics Store,Farm,Fast Food Restaurant,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Service,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gym,Gym / Fitness Center,Hardware Store,Hobby Shop,Hot Dog Joint,Hotel,Hotel Pool,Hunting Supply,IT Services,Ice Cream Shop,Intersection,Italian Restaurant,Karaoke Bar,Kids Store,Locksmith,Men's Store,Mexican Restaurant,Mobile Phone Shop,Motorcycle Shop,Music Store,Outdoor Supply Store,Park,Pharmacy,Pizza Place,Pool,Recreation Center,Rental Car Location,Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Smoke Shop,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Thai Restaurant,Thrift / Vintage Store,Train Station,Video Game Store,Video Store,Wine Bar,Wings Joint,Women's Store
0,79364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,79366,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,79382,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0
3,79401,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,79402,0.0,0.0,0.0,0.0,0.428571,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [30]:
lubbock_grouped.dtypes

Zip Code                       int64
ATM                          float64
Accessories Store            float64
American Restaurant          float64
Antique Shop                 float64
Art Gallery                  float64
Arts & Crafts Store          float64
Asian Restaurant             float64
Athletics & Sports           float64
Automotive Shop              float64
BBQ Joint                    float64
Bakery                       float64
Bank                         float64
Bar                          float64
Baseball Field               float64
Basketball Court             float64
Big Box Store                float64
Bistro                       float64
Bookstore                    float64
Brewery                      float64
Bridal Shop                  float64
Bubble Tea Shop              float64
Buffet                       float64
Burger Joint                 float64
Burrito Place                float64
Business Service             float64
Cajun / Creole Restaurant    float64
C

#### Let's print each neighborhood along with the top 5 most common venues

In [31]:
num_top_venues = 5

for hood in lubbock_grouped['Zip Code']:
    print(hood)
    temp = lubbock_grouped[lubbock_grouped['Zip Code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

79364
               venue  freq
0  Convenience Store  0.25
1        Flower Shop  0.25
2          BBQ Joint  0.25
3             Bakery  0.25
4                ATM  0.00


79366
                  venue  freq
0        Scenic Lookout   1.0
1                   ATM   0.0
2           Pizza Place   0.0
3                  Park   0.0
4  Outdoor Supply Store   0.0


79382
                  venue  freq
0        Discount Store  0.33
1         Train Station  0.33
2        Sandwich Place  0.33
3    Italian Restaurant  0.00
4  Outdoor Supply Store  0.00


79401
             venue  freq
0      Pizza Place  0.17
1           Bistro  0.17
2  Automotive Shop  0.17
3           Bakery  0.17
4             Bank  0.17


79402
             venue  freq
0      Art Gallery  0.43
1      Pizza Place  0.14
2  Automotive Shop  0.14
3      Men's Store  0.14
4           Bistro  0.14


79404
            venue  freq
0  Hardware Store  0.25
1    Food Service  0.25
2            Food  0.25
3  Sandwich Place  0.25
4           

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [33]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Zip Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Zip Code'] = lubbock_grouped['Zip Code']

for ind in np.arange(lubbock_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(lubbock_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Zip Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,79364,Flower Shop,Convenience Store,BBQ Joint,Bakery,Women's Store,Dessert Shop,Coffee Shop,College Bookstore,College Cafeteria,College Quad
1,79366,Scenic Lookout,Women's Store,Business Service,Chinese Restaurant,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall
2,79382,Train Station,Sandwich Place,Discount Store,Women's Store,Cosmetics Shop,Chinese Restaurant,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria
3,79401,Pizza Place,Men's Store,Automotive Shop,Bakery,Bank,Bistro,Women's Store,Department Store,College Bookstore,College Cafeteria
4,79402,Art Gallery,Pizza Place,Men's Store,Automotive Shop,Bistro,Women's Store,Department Store,Coffee Shop,College Bookstore,College Cafeteria


In [34]:
neighborhoods_venues_sorted

Unnamed: 0,Zip Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,79364,Flower Shop,Convenience Store,BBQ Joint,Bakery,Women's Store,Dessert Shop,Coffee Shop,College Bookstore,College Cafeteria,College Quad
1,79366,Scenic Lookout,Women's Store,Business Service,Chinese Restaurant,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall
2,79382,Train Station,Sandwich Place,Discount Store,Women's Store,Cosmetics Shop,Chinese Restaurant,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria
3,79401,Pizza Place,Men's Store,Automotive Shop,Bakery,Bank,Bistro,Women's Store,Department Store,College Bookstore,College Cafeteria
4,79402,Art Gallery,Pizza Place,Men's Store,Automotive Shop,Bistro,Women's Store,Department Store,Coffee Shop,College Bookstore,College Cafeteria
5,79404,Food Service,Food,Hardware Store,Sandwich Place,Department Store,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad
6,79405,Pizza Place,Flea Market,Fast Food Restaurant,Thrift / Vintage Store,Thai Restaurant,Bar,Women's Store,Convenience Store,Chinese Restaurant,Clothing Store
7,79406,Park,Food Service,Pool,Art Gallery,Farm,Coffee Shop,College Bookstore,College Cafeteria,College Quad,Sandwich Place
8,79407,Clothing Store,Furniture / Home Store,Cajun / Creole Restaurant,Women's Store,Mobile Phone Shop,Mexican Restaurant,Hunting Supply,Electronics Store,Dessert Shop,Department Store
9,79408,Flower Shop,Arts & Crafts Store,Women's Store,Dessert Shop,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall


## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [35]:
# set number of clusters
kclusters = 5

lubbock_grouped_clustering = lubbock_grouped.drop('Zip Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lubbock_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 2, 1, 0, 1, 1, 0, 1, 0, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [36]:
# add clustering labels
neighborhoods_venues_sorted.insert(0,'Label Cluster', kmeans.labels_)

lubbock_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
lubbock_merged = lubbock_merged.join(neighborhoods_venues_sorted.set_index('Zip Code'), on='Zip Code')

lubbock_merged.head() # check the last columns!

Unnamed: 0,Zip Code,County,Latitude,Longitude,Label Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,79401,Lubbock79401,33.585,-101.847,0.0,Pizza Place,Men's Store,Automotive Shop,Bakery,Bank,Bistro,Women's Store,Department Store,College Bookstore,College Cafeteria
1,79402,Lubbock79402,33.588,-101.847,1.0,Art Gallery,Pizza Place,Men's Store,Automotive Shop,Bistro,Women's Store,Department Store,Coffee Shop,College Bookstore,College Cafeteria
2,79403,Lubbock79403,33.603,-101.815,,,,,,,,,,,
3,79404,Lubbock79404,33.554,-101.821,1.0,Food Service,Food,Hardware Store,Sandwich Place,Department Store,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad
4,79405,Lubbock79405,33.572,-101.85,0.0,Pizza Place,Flea Market,Fast Food Restaurant,Thrift / Vintage Store,Thai Restaurant,Bar,Women's Store,Convenience Store,Chinese Restaurant,Clothing Store


In [37]:
#convert clusters
lubbock_merged["Label Cluster"] = lubbock_merged["Label Cluster"].fillna("0").astype(int)

In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(lubbock_merged['Latitude'], lubbock_merged['Longitude'], lubbock_merged['Zip Code'], lubbock_merged['Label Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, each cluster is assigned a name

#### Cluster 1

In [39]:
lubbock_merged.loc[lubbock_merged['Label Cluster'] == 0, lubbock_merged.columns[[1] + list(range(5, lubbock_merged.shape[1]))]]

Unnamed: 0,County,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Lubbock79401,Pizza Place,Men's Store,Automotive Shop,Bakery,Bank,Bistro,Women's Store,Department Store,College Bookstore,College Cafeteria
2,Lubbock79403,,,,,,,,,,
4,Lubbock79405,Pizza Place,Flea Market,Fast Food Restaurant,Thrift / Vintage Store,Thai Restaurant,Bar,Women's Store,Convenience Store,Chinese Restaurant,Clothing Store
6,Lubbock79407,Clothing Store,Furniture / Home Store,Cajun / Creole Restaurant,Women's Store,Mobile Phone Shop,Mexican Restaurant,Hunting Supply,Electronics Store,Dessert Shop,Department Store
8,Lubbock79409,Bank,Diner,Pharmacy,Italian Restaurant,Convenience Store,Hotel,Hot Dog Joint,Spa,Chinese Restaurant,Video Game Store
11,Lubbock79412,Mexican Restaurant,Dive Bar,American Restaurant,Pharmacy,Pizza Place,Bubble Tea Shop,Ice Cream Shop,Bank,Discount Store,Burger Joint
12,Lubbock79413,Coffee Shop,Discount Store,Supermarket,Automotive Shop,Bakery,Dessert Shop,Clothing Store,College Bookstore,College Cafeteria,College Quad
13,Lubbock79414,Fast Food Restaurant,Discount Store,Bar,ATM,Seafood Restaurant,Chinese Restaurant,Mexican Restaurant,Hunting Supply,Bookstore,Pharmacy
15,Lubbock79416,Pizza Place,Sports Bar,Big Box Store,Rental Car Location,Cosmetics Shop,Sandwich Place,Burger Joint,Gas Station,Discount Store,Fried Chicken Joint
16,Lubbock79423,,,,,,,,,,


#### Cluster 2

In [40]:
lubbock_merged.loc[lubbock_merged['Label Cluster'] == 1, lubbock_merged.columns[[1] + list(range(5, lubbock_merged.shape[1]))]]

Unnamed: 0,County,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Lubbock79402,Art Gallery,Pizza Place,Men's Store,Automotive Shop,Bistro,Women's Store,Department Store,Coffee Shop,College Bookstore,College Cafeteria
3,Lubbock79404,Food Service,Food,Hardware Store,Sandwich Place,Department Store,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad
5,Lubbock79406,Park,Food Service,Pool,Art Gallery,Farm,Coffee Shop,College Bookstore,College Cafeteria,College Quad,Sandwich Place
7,Lubbock79408,Flower Shop,Arts & Crafts Store,Women's Store,Dessert Shop,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall
9,Lubbock79410,Recreation Center,Gym,Women's Store,Department Store,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall
10,Lubbock79411,Business Service,Locksmith,Convenience Store,IT Services,Park,Cosmetics Shop,Chinese Restaurant,Clothing Store,Coffee Shop,College Bookstore
14,Lubbock79415,Baseball Field,Motorcycle Shop,Burger Joint,Brewery,Women's Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall
24,Lubbock79491,Gym / Fitness Center,Hobby Shop,Athletics & Sports,Women's Store,Dessert Shop,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall
30,Slaton,Flower Shop,Convenience Store,BBQ Joint,Bakery,Women's Store,Dessert Shop,Coffee Shop,College Bookstore,College Cafeteria,College Quad
32,Wolfforth,Train Station,Sandwich Place,Discount Store,Women's Store,Cosmetics Shop,Chinese Restaurant,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria


#### Cluster 3

In [41]:
lubbock_merged.loc[lubbock_merged['Label Cluster'] == 2, lubbock_merged.columns[[1] + list(range(5, lubbock_merged.shape[1]))]]

Unnamed: 0,County,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
31,Ransom Canyon,Scenic Lookout,Women's Store,Business Service,Chinese Restaurant,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall


#### Cluster 4

In [42]:
lubbock_merged.loc[lubbock_merged['Label Cluster'] == 3, lubbock_merged.columns[[1] + list(range(5, lubbock_merged.shape[1]))]]

Unnamed: 0,County,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Lubbock79430,Fast Food Restaurant,Women's Store,Dessert Shop,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall,College Theater


#### Cluster 5

In [43]:
lubbock_merged.loc[lubbock_merged['Label Cluster'] == 4, lubbock_merged.columns[[1] + list(range(5, lubbock_merged.shape[1]))]]

Unnamed: 0,County,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Lubbock79490,Antique Shop,Women's Store,Dessert Shop,Clothing Store,Coffee Shop,College Bookstore,College Cafeteria,College Quad,College Residence Hall,College Theater


### Thank you. This was done by Olufunke Oladimeji