# The battle of neighborhoods
## Author: [Carlos Morlan](https://www.linkedin.com/in/carlos-morlan-96343a15/)
### Published date: July 24<sup>th</sup>, 2019

[![Battle of neighborhoods](https://www.garybarker.co.uk/files/uk-city-life-cartoon-illustration.jpg)](https://www.garybarker.co.uk)

# Table of contents

  - [Executive Summary](#ExSum)
  - [Introduction](#Intro)
  - [Methodology](#Metho)
  - [Detailed results](#DetRe)
  - [Discussion section](#DiSec)
  - [Conclusions](#Concl)
  - [References](#Refer)
  - [Appendices](#Appen)

### <a name="ExSum"></a>Executive Summary

Taking a Mexico state as input create clusters by category, visitors profile and time series

Explain the crux of your arguments in 3 paragraphs or less.

> The ultimate purpose of analytics is to communicate findings to the concerned who might use these insights to formulate policy or strategy.
> The data scientist should then use the insights to build the narrative to communicate the findings.

This text you see here is *actually* written in Markdown! To get a feel for Markdown's syntax, type some text into the left window and watch the results in the right.

### <a name="Intro"></a>Introduction

Setting up the problem for the reader who might be new to the topic and who might need to be gently introduced to the subject matter before being imersed in intricate details.

### <a name="Metho"></a>Methodology (this section will be expanded in the second week's submission)

Research methods will be introduced and data sources used for the analysis will be described.

### <a name="DetRe"></a>Detailed results (this section will be expanded in the second week's submission)

Illustrative graphics will be included to present my empirical findings and I will formally test my hypothesis.

### <a name="DiSec"></a>Discussion section (this section will be expanded in the second week's submission)

I will craft my main arguments supported on the results presented earlier. I will try to rely on the power of narrative to enable numbers to communicate my thesis to my readers.

### <a name="Concl"></a>Conclusions (this section will be expanded in the second week's submission)

Generalize specific findings and will promote them.

### <a name="Refer"></a>References

Housekeeping
[Notebook image](https://www.garybarker.co.uk)
[Geolocation Mexico Postal Codes](http://download.geonames.org/export/zip/)

#### <a name="Appen"></a>Appendices (this section will be expanded in the second week's submission)

This section will be included only if needed.


# Data Processing

The data format is tab-delimited text in utf8 encoding, with the following fields :

* country code      : iso country code, 2 characters
* postal code       : varchar(20)
* place name        : varchar(180)
* admin name1       : 1. order subdivision (state) varchar(100)
* admin code1       : 1. order subdivision (state) varchar(20)
* admin name2       : 2. order subdivision (county/province) varchar(100)
* admin code2       : 2. order subdivision (county/province) varchar(20)
* admin name3       : 3. order subdivision (community) varchar(100)
* admin code3       : 3. order subdivision (community) varchar(20)
* latitude          : estimated latitude (wgs84)
* longitude         : estimated longitude (wgs84)
* accuracy          : accuracy of lat/lng from 1=estimated, 4=geonameid, 6=centroid of addresses or shape

In [1]:
import pandas as pd
import numpy as np

# Read source, the file is tab delimited and the postal code column (#2) should be treated as string
postal_codes_tmp = pd.read_csv('MX.txt', sep='\t', header=None, dtype={1:str})

# Assign column headers because the file doesn't have it
postal_codes_tmp.columns = ['CountryCode', 'PostalCode', 'PlaceName', 'State', 'StateCode', 'TownHall', 'TownHallCode', 'AdminName3', 'AdminCode3', 'Latitude', 'Longitude', 'Accuracy']
# print(postal_codes_tmp.dtypes)

# Add leading zeros to the Postal code
# postal_codes_tmp['PostalCode'] = postal_codes_tmp['PostalCode'].apply(lambda x: x.zfill(5))

# Add filters to process a single state town hall for testing purposes
state_filter = postal_codes_tmp['StateCode']==9
#townhall_filter = postal_codes_tmp['TownHallCode']==3
#filtered_postal_codes = postal_codes_tmp[state_filter & townhall_filter]
filtered_postal_codes = postal_codes_tmp[state_filter]
#filtered_postal_codes.head()

# Remove unused columns
filtered_postal_codes.drop(columns=['CountryCode', 'State', 'StateCode', 'TownHall', 'TownHallCode', 'AdminName3', 'AdminCode3'], inplace=True)

# Leave only one latitude-longitude by PostalCode (mean will be used)
#unique_coordinates = filtered_postal_codes.groupby('PostalCode').agg({'PlaceName': [(', '.join)], 'Latitude': 'mean', 'Longitude': 'mean', 'Accuracy': 'count'}).reset_index()

# Leave unique latitude-longitude combination; otherwise, the map will show only one point for multiple places
unique_coordinates = filtered_postal_codes.groupby(['Latitude','Longitude']).agg({'PlaceName': [(', '.join)], 'Accuracy': 'count'}).reset_index()

# Rename column headers
unique_coordinates.columns = ['Latitude', 'Longitude', 'PlaceName', 'RecordCount']

# Rearrange columns
cols = ['PlaceName', 'Latitude', 'Longitude', 'RecordCount']
unique_coordinates = unique_coordinates[cols]

# Kmoso filter to be removed
places = unique_coordinates.head(50)

# Verify data frame consistency
print('{} rows in dataframe'.format(places.shape[0]))
total_places = len(places['PlaceName'].unique())
print('{} unique places'.format(total_places))
places.head()

50 rows in dataframe
50 unique places


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,PlaceName,Latitude,Longitude,RecordCount
0,Parres El Guarda,19.1361,-99.1738,1
1,"La Concepción, San Mateo, Los Ángeles, Emilian...",19.1395,-99.0511,10
2,San Marcos,19.1694,-99.0257,1
3,San Lorenzo Tlacoyucan,19.1761,-99.0322,1
4,"San Juan, San Juan Tepenahuac",19.1877,-98.9945,2


## Creating Mexico City Map

In [2]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

import requests # library to handle requests

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

address = 'Mexico City, Mexico'

geolocator = Nominatim(user_agent="city_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of {} are {}, {}.'.format(address, latitude, longitude))

# create map of the chosen city using latitude and longitude values
map_city = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, placename in zip(places['Latitude'], places['Longitude'], places['PlaceName']):
    label = '{}'.format(placename)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_city)  
    
map_city

The geograpical coordinates of Mexico City, Mexico are 19.4326009, -99.1333416.


## Defining Foursquare functions
### 1. To get nearby venues
### 2. To sort venues in descending order

In [3]:
# Define Foursquare Credentials and Version
CLIENT_ID = '020DHIJQ5OJ4YZ12HXXY4O0D33CXV4OT0QXK25QO3Y03IK1I'
CLIENT_SECRET = 'P3SIW32METMPEVCC1WEZ3DXWQEFVGA2YZC5ELTDWT2FSYVW4'
VERSION = '20180605'
LIMIT = 10

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print('Getting data for ' + name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['PlaceName', 
                  'PlaceName Latitude', 
                  'PlaceName Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

def searchNearbyVenues(names, latitudes, longitudes, radius=500, intent='checkin'):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print('Getting data for ' + name)

        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&intent={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            intent)

        # make the GET request
        results = requests.get(url).json()["response"]['venues']

        try:
            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['name'], 
                v['location']['lat'], 
                v['location']['lng'],  
                v['categories'][0]['name']) for v in results])
        except Exception as e:
            print('Exception when calling ' + url + ' with (lat,lon) = (' + str(lat) + ',' + str(lng) + ')')
            print(str(e))
            pass
    
    try:
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['PlaceName', 
                      'PlaceName Latitude', 
                      'PlaceName Longitude', 
                      'Venue', 
                      'Venue Latitude', 
                      'Venue Longitude', 
                      'Venue Category']
    except Exception as e:
        print('Exception when creating dataframe')
        print(str(e))
        nearby_venues = pd.DataFrame()
        pass
    
    return(nearby_venues)

## Getting Foursquare information

In [4]:
city_venues = pd.DataFrame(pd.np.empty((0, 7))) 

# Get nearby venues data using explore end point
#city_venues = getNearbyVenues(names=places['PlaceName'],
#                                   latitudes=places['Latitude'],
#                                   longitudes=places['Longitude']
#                                  )

# Get nearby venues data using search end point
city_venues = searchNearbyVenues(names=places['PlaceName'],
                                   latitudes=places['Latitude'],
                                   longitudes=places['Longitude']
                                  )

print(str(city_venues.shape[0]) + ' venues with ' + str(city_venues.shape[1]) + ' columns')
print('There are {} uniques categories.'.format(len(city_venues['Venue Category'].unique())))

city_venues.head()

Getting data for Parres El Guarda
Exception when calling https://api.foursquare.com/v2/venues/search?&client_id=020DHIJQ5OJ4YZ12HXXY4O0D33CXV4OT0QXK25QO3Y03IK1I&client_secret=P3SIW32METMPEVCC1WEZ3DXWQEFVGA2YZC5ELTDWT2FSYVW4&v=20180605&ll=19.1361,-99.1738&radius=500&limit=10&intent=checkin with (lat,lon) = (19.1361,-99.1738)
list index out of range
Getting data for La Concepción, San Mateo, Los Ángeles, Emiliano Zapata, La Conchita, Tula, San Miguel, Centro, Chalmita, San Miguel
Getting data for San Marcos
Exception when calling https://api.foursquare.com/v2/venues/search?&client_id=020DHIJQ5OJ4YZ12HXXY4O0D33CXV4OT0QXK25QO3Y03IK1I&client_secret=P3SIW32METMPEVCC1WEZ3DXWQEFVGA2YZC5ELTDWT2FSYVW4&v=20180605&ll=19.1694,-99.0257&radius=500&limit=10&intent=checkin with (lat,lon) = (19.1694,-99.0257)
list index out of range
Getting data for San Lorenzo Tlacoyucan
Getting data for San Juan, San Juan Tepenahuac
Getting data for Cabeza de Juárez 5 o Frente 5
Exception when calling https://api.four

Unnamed: 0,PlaceName,PlaceName Latitude,PlaceName Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"La Concepción, San Mateo, Los Ángeles, Emilian...",19.1395,-99.0511,Deportivo San Pablo Oztotepec,19.184272,-99.069881,Soccer Field
1,"La Concepción, San Mateo, Los Ángeles, Emilian...",19.1395,-99.0511,LURIASROCK,19.176086,-99.031683,Plaza
2,"La Concepción, San Mateo, Los Ángeles, Emilian...",19.1395,-99.0511,Mercado San Pablo,19.184831,-99.070734,Farmers Market
3,"La Concepción, San Mateo, Los Ángeles, Emilian...",19.1395,-99.0511,Advanced Deisel LLC,19.138053,-99.058506,Automotive Shop
4,"La Concepción, San Mateo, Los Ángeles, Emilian...",19.1395,-99.0511,CICS UMA IPN,19.174541,-99.062473,Medical School


In [36]:
city_onehot = pd.get_dummies(city_venues[['Venue Category']], prefix="", prefix_sep="")

# add place name column back to dataframe
city_onehot['PlaceName'] = city_venues['PlaceName'] 

# move place name column to the first column
fixed_columns = [city_onehot.columns[-1]] + list(city_onehot.columns[:-1])
city_onehot = city_onehot[fixed_columns]

# city_onehot.head()
print('{} places and {} categories before grouping by Place Name'.format(city_onehot.shape[0], city_onehot.shape[1]))

city_grouped = city_onehot.groupby('PlaceName').mean().reset_index()
# print(city_grouped)

print('{} places and {} categories after grouping by Place Name'.format(city_grouped.shape[0], city_grouped.shape[1]))

num_top_venues = 10

for hood in city_grouped['PlaceName']:
    print("----"+hood+"----")
    temp = city_grouped[city_grouped['PlaceName'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    if hood == 'Del Carmen':
        print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
        print('\n')
        
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['PlaceName']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
city_venues_sorted = pd.DataFrame(columns=columns)
city_venues_sorted['PlaceName'] = city_grouped['PlaceName']

for ind in np.arange(city_grouped.shape[0]):
    city_venues_sorted.iloc[ind, 1:] = return_most_common_venues(city_grouped.iloc[ind, :], num_top_venues)

city_venues_sorted.head()

330 places and 117 categories before grouping by Place Name
33 places and 117 categories after grouping by Place Name
----Cantera----
----Cruztitla----
----Culhuacán CTM Sección IX-A, Culhuacán CTM Sección IX-B, Culhuacán CTM Sección VIII----
----Degollado, La Magueyera, Prados, Arboledas Zafiro, Predio San Rafael----
----El Arenal----
----El Cedral----
----Jaime Torres Bodet----
----Jardines del Llano----
----La Concepción, San Mateo, Los Ángeles, Emiliano Zapata, La Conchita, Tula, San Miguel, Centro, Chalmita, San Miguel----
----La Quinta, Estación Ajusco----
----Ocotitla----
----Ojo de Agua, Guadalupe Tlaltenco----
----Quiahuatla, San Isidro, San Andrés----
----Reclusorio Sur----
----San Andrés Ahuayucan----
----San Bartolomé Xicomulco----
----San Francisco Tecoxpa----
----San Jerónimo Miacatlán----
----San José----
----San Juan, San Juan Tepenahuac----
----San Lorenzo Tlacoyucan----
----San Mateo Xalpa----
----San Nicolás Tetelco----
----San Salvador Cuauhtenco----
----Santa Cruz-

Unnamed: 0,PlaceName,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Cantera,Mexican Restaurant,French Restaurant,Candy Store,Convenience Store,Scenic Lookout,Rest Area,University,Cupcake Shop,College Academic Building,College Auditorium
1,Cruztitla,Field,General Entertainment,BBQ Joint,Coworking Space,Diner,Event Space,Building,Garden,Taco Place,College Classroom
2,"Culhuacán CTM Sección IX-A, Culhuacán CTM Secc...",Mexican Restaurant,Street Fair,Cocktail Bar,Business Service,Garden,Clothing Store,College Auditorium,College Classroom,Cupcake Shop,College Library
3,"Degollado, La Magueyera, Prados, Arboledas Zaf...",Housing Development,Temple,Church,Farm,Movie Theater,Speakeasy,Spa,College Rec Center,Outdoors & Recreation,General College & University
4,El Arenal,Food Truck,Pharmacy,Festival,General College & University,Cupcake Shop,Outdoors & Recreation,Food Stand,Alternative Healer,Stationery Store,Auto Dealership


## K-Means Clustering

In [37]:
# set number of clusters
kclusters = 7

city_grouped_clustering = city_grouped.drop('PlaceName', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(city_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

# add clustering labels
city_venues_sorted.insert(0, 'ClusterL', kmeans.labels_)

city_venues_merged = places

# merge city_grouped with places to add latitude/longitude for each place
city_venues_merged = city_venues_merged.join(city_venues_sorted.set_index('PlaceName'), on='PlaceName')
city_venues_merged.fillna(99, inplace=True) # This will identify centroids which "die" while iterating and can be safely excluded
# city_venues_merged.dtypes

# store places not classified for future analysis in a different dataframe
city_venues_merged['ClusterL'] = city_venues_merged['ClusterL'].astype(int)
city_venues_wo_cluster = city_venues_merged[city_venues_merged['ClusterL']==99]
city_venues_merged = city_venues_merged[city_venues_merged['ClusterL']<99]

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(city_venues_merged['Latitude'], city_venues_merged['Longitude'], city_venues_merged['PlaceName'], city_venues_merged['ClusterL']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Labeling clusters

In [40]:
cluster0 = city_venues_merged.loc[city_venues_merged['ClusterL'] == 0, city_venues_merged.columns[[0] + list(range(5, city_venues_merged.shape[1]))]]
cluster1 = city_venues_merged.loc[city_venues_merged['ClusterL'] == 1, city_venues_merged.columns[[0] + list(range(5, city_venues_merged.shape[1]))]]
cluster2 = city_venues_merged.loc[city_venues_merged['ClusterL'] == 2, city_venues_merged.columns[[0] + list(range(5, city_venues_merged.shape[1]))]]
cluster3 = city_venues_merged.loc[city_venues_merged['ClusterL'] == 3, city_venues_merged.columns[[0] + list(range(5, city_venues_merged.shape[1]))]]
cluster4 = city_venues_merged.loc[city_venues_merged['ClusterL'] == 4, city_venues_merged.columns[[0] + list(range(5, city_venues_merged.shape[1]))]]
cluster5 = city_venues_merged.loc[city_venues_merged['ClusterL'] == 5, city_venues_merged.columns[[0] + list(range(5, city_venues_merged.shape[1]))]]
cluster6 = city_venues_merged.loc[city_venues_merged['ClusterL'] == 6, city_venues_merged.columns[[0] + list(range(5, city_venues_merged.shape[1]))]]

print('Cluster 0: {} rows --> {}% places included'.format(cluster0.shape[0], round(cluster0.shape[0]/total_places*100,2)))
print('Cluster 1: {} rows --> {}% places included'.format(cluster1.shape[0], round(cluster1.shape[0]/total_places*100,2)))
print('Cluster 2: {} rows --> {}% places included'.format(cluster2.shape[0], round(cluster2.shape[0]/total_places*100,2)))
print('Cluster 3: {} rows --> {}% places included'.format(cluster3.shape[0], round(cluster3.shape[0]/total_places*100,2)))
print('Cluster 4: {} rows --> {}% places included'.format(cluster4.shape[0], round(cluster4.shape[0]/total_places*100,2)))
print('Cluster 5: {} rows --> {}% places included'.format(cluster5.shape[0], round(cluster5.shape[0]/total_places*100,2)))
print('Cluster 6: {} rows --> {}% places included'.format(cluster6.shape[0], round(cluster6.shape[0]/total_places*100,2)))

cluster0.head()

Cluster 0: 7 rows --> 14.0% places included
Cluster 1: 7 rows --> 14.0% places included
Cluster 2: 5 rows --> 10.0% places included
Cluster 3: 5 rows --> 10.0% places included
Cluster 4: 5 rows --> 10.0% places included
Cluster 5: 2 rows --> 4.0% places included
Cluster 6: 2 rows --> 4.0% places included


Unnamed: 0,PlaceName,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,San Lorenzo Tlacoyucan,Church,Seafood Restaurant,Medical Center,Mexican Restaurant,Pizza Place,Plaza,Burger Joint,Building,Bookstore,Food Court
4,"San Juan, San Juan Tepenahuac",Taco Place,Mexican Restaurant,Auto Dealership,Pizza Place,Bank,Recreation Center,Street Fair,Student Center,Church,Coffee Shop
7,San Salvador Cuauhtenco,Mexican Restaurant,Restaurant,Church,Taco Place,Plaza,Stables,Speakeasy,Bar,College Rec Center,Country Dance Club
14,Santísima Trinidad,Mexican Restaurant,Food Truck,School,Convenience Store,Italian Restaurant,Scenic Lookout,Field,Assisted Living,Taco Place,Auto Dealership
17,Ocotitla,Taco Place,Mexican Restaurant,Pharmacy,Non-Profit,Factory,Auto Dealership,Plaza,Market,University,Comfort Food Restaurant


### C0 analysis: Taco Place (6), Seafood Restaurant (4), Café (4), Housing Development (4) and Bakery (3)

In [43]:
mcv1 = cluster0.groupby('1st Most Common Venue').count()
mcv2 = cluster0.groupby('2nd Most Common Venue').count()
mcv3 = cluster0.groupby('3rd Most Common Venue').count()
mcv4 = cluster0.groupby('4th Most Common Venue').count()
mcv5 = cluster0.groupby('5th Most Common Venue').count()
tmcv = mcv1.add(mcv2.add(mcv3.add(mcv4.add(mcv5, fill_value=0), fill_value=0), fill_value=0), fill_value=0)
print('{} records in cluster 0, 5 top venues = {} entries'.format(str(cluster0.shape[0]),str(tmcv['PlaceName'].sum())))
tmcv['PlaceName'].sort_values(ascending=False)

7 records in cluster 0, 5 top venues = 35.0 entries


Mexican Restaurant              7.0
Taco Place                      3.0
Church                          2.0
Pizza Place                     2.0
General College & University    2.0
Housing Development             1.0
Auto Dealership                 1.0
Bank                            1.0
Campaign Office                 1.0
Convenience Store               1.0
Country Dance Club              1.0
Factory                         1.0
Food Truck                      1.0
Internet Cafe                   1.0
Soccer Stadium                  1.0
Italian Restaurant              1.0
Medical Center                  1.0
Non-Profit                      1.0
Pharmacy                        1.0
Plaza                           1.0
Restaurant                      1.0
School                          1.0
Seafood Restaurant              1.0
Alternative Healer              1.0
Name: PlaceName, dtype: float64

### C1 analysis: Mexican Restaurant(16), Restaurant(10), Park  (8), Japanese Restaurant (7) and Dog Run (7)

In [44]:
mcv1 = cluster1.groupby('1st Most Common Venue').count()
mcv2 = cluster1.groupby('2nd Most Common Venue').count()
mcv3 = cluster1.groupby('3rd Most Common Venue').count()
mcv4 = cluster1.groupby('4th Most Common Venue').count()
mcv5 = cluster1.groupby('5th Most Common Venue').count()
tmcv = mcv1.add(mcv2.add(mcv3.add(mcv4.add(mcv5, fill_value=0), fill_value=0), fill_value=0), fill_value=0)
print('{} records in cluster 1, 5 top venues = {} entries'.format(str(cluster1.shape[0]),str(tmcv['PlaceName'].sum())))
tmcv['PlaceName'].sort_values(ascending=False)

7 records in cluster 1, 5 top venues = 35.0 entries


General College & University    3.0
Taco Place                      2.0
Church                          2.0
Festival                        2.0
University                      1.0
General Entertainment           1.0
BBQ Joint                       1.0
Business Center                 1.0
City Hall                       1.0
Coffee Shop                     1.0
College Library                 1.0
Cupcake Shop                    1.0
Farm                            1.0
Farmers Market                  1.0
Food Truck                      1.0
High School                     1.0
Temple                          1.0
Housing Development             1.0
Miscellaneous Shop              1.0
Movie Theater                   1.0
Outdoor Sculpture               1.0
Outdoors & Recreation           1.0
Park                            1.0
Pharmacy                        1.0
Plaza                           1.0
Pool                            1.0
Salon / Barbershop              1.0
Stables                     

### C2 analysis: Residential Building (14), Taco places(5), Mexican Restaurant (5), Hardware Store(3)

In [45]:
mcv1 = cluster2.groupby('1st Most Common Venue').count()
mcv2 = cluster2.groupby('2nd Most Common Venue').count()
mcv3 = cluster2.groupby('3rd Most Common Venue').count()
mcv4 = cluster2.groupby('4th Most Common Venue').count()
mcv5 = cluster2.groupby('5th Most Common Venue').count()
tmcv = mcv1.add(mcv2.add(mcv3.add(mcv4.add(mcv5, fill_value=0), fill_value=0), fill_value=0), fill_value=0)
print('{} records in cluster 2, 5 top venues = {} entries'.format(str(cluster2.shape[0]),str(tmcv['PlaceName'].sum())))
tmcv['PlaceName'].sort_values(ascending=False)

5 records in cluster 2, 5 top venues = 25.0 entries


Coworking Space          2.0
Diner                    2.0
Field                    2.0
Toll Booth               1.0
Taco Place               1.0
BBQ Joint                1.0
Bar                      1.0
Cafeteria                1.0
Campground               1.0
Electronics Store        1.0
Food Truck               1.0
French Restaurant        1.0
General Entertainment    1.0
Italian Restaurant       1.0
Mexican Restaurant       1.0
Office                   1.0
Park                     1.0
Scenic Lookout           1.0
School                   1.0
Seafood Restaurant       1.0
Soccer Field             1.0
Auto Garage              1.0
Name: PlaceName, dtype: float64

### C3 analysis Park (8), Dog Run (5), Public Art (4), Convenience Store (4)

In [46]:
mcv1 = cluster3.groupby('1st Most Common Venue').count()
mcv2 = cluster3.groupby('2nd Most Common Venue').count()
mcv3 = cluster3.groupby('3rd Most Common Venue').count()
mcv4 = cluster3.groupby('4th Most Common Venue').count()
mcv5 = cluster3.groupby('5th Most Common Venue').count()
tmcv = mcv1.add(mcv2.add(mcv3.add(mcv4.add(mcv5, fill_value=0), fill_value=0), fill_value=0), fill_value=0)
print('{} records in cluster 3, 5 top venues = {} entries'.format(str(cluster3.shape[0]),str(tmcv['PlaceName'].sum())))
tmcv['PlaceName'].sort_values(ascending=False)

5 records in cluster 3, 5 top venues = 25.0 entries


Mexican Restaurant        5.0
Farm                      2.0
Clothing Store            2.0
Garden                    2.0
Funeral Home              2.0
Street Fair               1.0
Business Service          1.0
Candy Store               1.0
Cocktail Bar              1.0
Convenience Store         1.0
Frozen Yogurt Shop        1.0
French Restaurant         1.0
Scenic Lookout            1.0
Market                    1.0
Pizza Place               1.0
Pool                      1.0
Argentinian Restaurant    1.0
Name: PlaceName, dtype: float64

### C4 analysis: Mexican Restaurant (9), Medical Center (5), Park (5), Student Center(5), Health & Beauty Service (5)

In [47]:
mcv1 = cluster4.groupby('1st Most Common Venue').count()
mcv2 = cluster4.groupby('2nd Most Common Venue').count()
mcv3 = cluster4.groupby('3rd Most Common Venue').count()
mcv4 = cluster4.groupby('4th Most Common Venue').count()
mcv5 = cluster4.groupby('5th Most Common Venue').count()
tmcv = mcv1.add(mcv2.add(mcv3.add(mcv4.add(mcv5, fill_value=0), fill_value=0), fill_value=0), fill_value=0)
print('{} records in cluster 4, 5 top venues = {} entries'.format(str(cluster4.shape[0]),str(tmcv['PlaceName'].sum())))
tmcv['PlaceName'].sort_values(ascending=False)

5 records in cluster 4, 5 top venues = 25.0 entries


Church                       3.0
Breakfast Spot               2.0
Cemetery                     2.0
Field                        2.0
Dentist's Office             1.0
Brewery                      1.0
Café                         1.0
Coffee Shop                  1.0
College Academic Building    1.0
College Auditorium           1.0
Student Center               1.0
Soccer Field                 1.0
Gas Station                  1.0
Lake                         1.0
Medical School               1.0
Mexican Restaurant           1.0
Museum                       1.0
Plaza                        1.0
Salad Place                  1.0
Farmers Market               1.0
Name: PlaceName, dtype: float64

### C5 Analysis: Office (4), Church (4), Plaza (3), College Administrative Building (3), Nightclub (3)

In [48]:
mcv1 = cluster5.groupby('1st Most Common Venue').count()
mcv2 = cluster5.groupby('2nd Most Common Venue').count()
mcv3 = cluster5.groupby('3rd Most Common Venue').count()
mcv4 = cluster5.groupby('4th Most Common Venue').count()
mcv5 = cluster5.groupby('5th Most Common Venue').count()
tmcv = mcv1.add(mcv2.add(mcv3.add(mcv4.add(mcv5, fill_value=0), fill_value=0), fill_value=0), fill_value=0)
print('{} records in cluster 5, 5 top venues = {} entries'.format(str(cluster5.shape[0]),str(tmcv['PlaceName'].sum())))
tmcv['PlaceName'].sort_values(ascending=False)

2 records in cluster 5, 5 top venues = 10.0 entries


Medical Center          2.0
Public Art              1.0
Other Great Outdoors    1.0
Library                 1.0
Housing Development     1.0
High School             1.0
College Classroom       1.0
Bank                    1.0
African Restaurant      1.0
Name: PlaceName, dtype: float64

### C6 Analysis: Pedestrian Plaza, Park, Flea Market, Farmers Market, Doctor's Office (3)

In [49]:
mcv1 = cluster6.groupby('1st Most Common Venue').count()
mcv2 = cluster6.groupby('2nd Most Common Venue').count()
mcv3 = cluster6.groupby('3rd Most Common Venue').count()
mcv4 = cluster6.groupby('4th Most Common Venue').count()
mcv5 = cluster6.groupby('5th Most Common Venue').count()
tmcv = mcv1.add(mcv2.add(mcv3.add(mcv4.add(mcv5, fill_value=0), fill_value=0), fill_value=0), fill_value=0)
print('{} records in cluster 5, 5 top venues = {} entries'.format(str(cluster6.shape[0]),str(tmcv['PlaceName'].sum())))
tmcv['PlaceName'].sort_values(ascending=False)

2 records in cluster 5, 5 top venues = 10.0 entries


Office                2.0
Mexican Restaurant    2.0
Café                  2.0
Scenic Lookout        1.0
Plaza                 1.0
Market                1.0
Dry Cleaner           1.0
Name: PlaceName, dtype: float64

### ¿?

In [50]:
city_venues_wo_cluster

Unnamed: 0,PlaceName,Latitude,Longitude,RecordCount,ClusterL,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parres El Guarda,19.1361,-99.1738,1,99,99,99,99,99,99,99,99,99,99,99
2,San Marcos,19.1694,-99.0257,1,99,99,99,99,99,99,99,99,99,99,99
5,Cabeza de Juárez 5 o Frente 5,19.1883,-99.1347,1,99,99,99,99,99,99,99,99,99,99,99
9,Estrella Mora,19.1935,-99.1635,1,99,99,99,99,99,99,99,99,99,99,99
10,Club Monte Sur,19.196,-99.0902,1,99,99,99,99,99,99,99,99,99,99,99
11,San Francisco Tlalnepantla,19.1974,-99.1223,1,99,99,99,99,99,99,99,99,99,99,99
12,Los Encinos,19.1977,-99.1307,1,99,99,99,99,99,99,99,99,99,99,99
13,"Isidro Fabela, Luis Donaldo Colosio, Cuitlahua...",19.1983,-99.2063,9,99,99,99,99,99,99,99,99,99,99,99
15,San Miguel Topilejo,19.2026,-99.1419,1,99,99,99,99,99,99,99,99,99,99,99
29,San Jorge,19.2187,-99.219,1,99,99,99,99,99,99,99,99,99,99,99
