<h1>Capstone Project - The Battle of the Neighborhoods</h1>
<h3>Applied Data Science Capstone by IBM on Coursera</h3>

<hr>
<h2>Introduction: Business Problem</h2>

<p>Many people in the United States move from one state to another for many reasons including but not limited to family bonds, professional opportunities, lower cost of living and overall pursuit of happiness. When people make the decision to move, they must gather relevant information about the new location. Most likely, they would like to keep some things they liked while living in an existing (old) location. Also, hopefully, they can solve some of the problems that were not desirable in an old setting.</p>

<p>There are a lot of reasons for many people to love Chicago for cultural diversity, beautiful architecture, convenient public transportation, lots of professional opportunities and attractions such as museums, theaters and great dining options. There are several challenges of living in Chicago such as long cold winters and high property costs, including high property taxes. One of the ways to achieve a better life balance would be to seek for places in the United States that have warmer weather and lower cost of living. Florida combines both, warm weather and generally lower cost of living, when compared to Chicago. On the other hand, Florida has a completely different culture and style of living compared to the Midwest. People who consider moving to Florida should do their research to find the best fit for them. This project will be limited to analysis of venues data of neighborhoods of Palm Beach County in Florida, but a similar approach can be used to analyze any other county of Florida.</p>

<p>Target audience are people who consider moving to warmer states, such as Florida. This analysis will help target audience to learn about distribution of surrounding venues within Palm Beach County such as restaurants, health clubs, parks, and other venues.</p>

<hr>
<h2>Data</h2>

<p>Following data will be used to solve defined problem:</p>
<ul>
<li>Zip Codes for the county of Palm Beach, Florida:
<a href="https://www.zipcodestogo.com/Palm%20beach/FL/">https://www.zipcodestogo.com/Palm%20beach/FL/</a>
</li>

<li>US Zip Code Latitude and Longitude values: 
<a href="https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/?refine.state=FL">https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/?refine.state=FL</a>
</li>

<li>Foursquare API to get the most common venues of given zip code of Palm Beach, Florida</li>
</ul>




In [1]:
#Import the pandas library as pd
import pandas as pd
#Import the Numpy library as np
import numpy as np

<h3>Use <i>pandas</i> to read the table into a pandas dataframe.</h3>
<p>I used <i>read_html()</i> function to access html table and storing its' contents in a pandas dataframe by parsing given url: https://www.zipcodestogo.com/Palm%20beach/FL/</p>

In [2]:
url = 'https://www.zipcodestogo.com/Palm%20beach/FL/'
df = pd.read_html(url, header=1)
df = df[0]
df.head()

Unnamed: 0,Zip Code,City,State,Zip Code Map
0,33401,West Palm Beach,Florida,View Map
1,33402,West Palm Beach,Florida,View Map
2,33403,West Palm Beach,Florida,View Map
3,33404,West Palm Beach,Florida,View Map
4,33405,West Palm Beach,Florida,View Map


<h3>Data clean up:</h3>

In [3]:
df = df.drop(columns=['State', 'Zip Code Map'])
df = df.rename(columns={'Zip Code':'Zipcodes'})

<h3>Print the number of rows of your dataframe:</h3>

In [4]:
df.shape

(74, 2)

<h3>Read data from csv file into a pandas dataframe:</h3>

In [5]:
df2 = pd.read_html('https://github.com/jversinina/Coursera_Capstone/blob/master/my-file-2.csv',header=0)
df2 = df2[0]
df2.head()

Unnamed: 0.1,Unnamed: 0,Zipcode,Latitude,Longitude
0,,33446,26.452473,-80.16509
1,,33499,26.645895,-80.430269
2,,33415,26.659344,-80.12704
3,,33431,26.381304,-80.09623
4,,33434,26.382408,-80.16699


<h3>Data clean up:</h3>

In [6]:
df2 = df2.drop(columns="Unnamed: 0")
df2 = df2.sort_values('Zipcode', ascending=True)
df2 = df2.reset_index(drop=True)

<h3>Combine two dataframes:</h3>

In [7]:
df = df.join(df2).drop(['Zipcodes'], axis=1)
df

Unnamed: 0,City,Zipcode,Latitude,Longitude
0,West Palm Beach,33401,26.711192,-80.060430
1,West Palm Beach,33402,26.645895,-80.430269
2,West Palm Beach,33403,26.802139,-80.070320
3,West Palm Beach,33404,26.782114,-80.065280
4,West Palm Beach,33405,26.669744,-80.058500
5,West Palm Beach,33406,26.659294,-80.091180
6,West Palm Beach,33407,26.750991,-80.072960
7,North Palm Beach,33408,26.840684,-80.063120
8,West Palm Beach,33409,26.709575,-80.094430
9,Palm Beach Gardens,33410,26.839588,-80.088240


<h3>Download all the dependencies:</h3>

In [8]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


<h3>Check how many unique zipcodes and towns:</h3>

In [9]:
print('The dataframe has {} zipcodes and {} cities.'.format(
        len(df['Zipcode'].unique()),
        len(df['City'].unique())
    )
)

The dataframe has 74 zipcodes and 16 cities.


<h3>Use geopy library to get the latitude and longitude values of Palm Beach county.</h3>

<p>In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent palmbeach_explorer:</p>

In [10]:
address = 'Palm Beach county, FL'
geolocator = Nominatim(user_agent="palmbeach_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Palm Beach county are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Palm Beach county are 26.6279798, -80.4494174.


<h3>Create a map of Palm Beach county with cities superimposed on top.</h3>

In [11]:
# create map of Palm Beach county using latitude and longitude values
map_palmbeach = folium.Map(location=[latitude, longitude], zoom_start=10)

In [12]:
# add markers to map
for lat, lng, city, zipcode in zip(df['Latitude'], df['Longitude'], df['City'], df['Zipcode']):
    label = '{}, {}'.format(city, zipcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_palmbeach)  

In [13]:
map_palmbeach

<h3>Display unique cities from original dataframe:</h3>

In [14]:
df['City'].unique()

array(['West Palm Beach', 'North Palm Beach', 'Palm Beach Gardens',
       'Boynton Beach', 'Boca Raton', 'Belle Glade', 'Canal Point',
       'Bryant', 'Delray Beach', 'Lake Worth', 'Jupiter', 'Lake Harbor',
       'Loxahatchee', 'Pahokee', 'Palm Beach', 'South Bay'], dtype=object)

<hr>
<h2>Methodology</h2>

<p>We will use Pandas Python library to work with data. We will use collected data from various public web sources to convert it into Pandas dataframes for further clean up and manipulation. We will clean up Pandas dataframes by using existing methods built in into pandas. We will combine several dataframes into one dataframe for final analysis.</p>

<p>We will use several libraries for additional data retrieval and analysis. Specifically, we will use Geopy library to retrieve Latitude and Longitude values of Palm Beach county for each zipcode. We will use Matplotlib and Folium libraries for visualizing data as maps. We will use Foursquare API to locate closest venues per any desired zipcode. We will use Sklearn library to compute k-means and to identify clusters of areas per zipcode of one chosen city of Palm Beach county. </p>

<hr>
<h2>Analysis</h2>

<h3>Define Foursquare Credentials and Version</h3>

In [15]:
CLIENT_ID = 'TS5UQ2IG5OQJUKHUTNJBEJ2ULQSREGI1A3CGDZIQN3FAQRLH' # your Foursquare ID
CLIENT_SECRET = 'C00JAJANFYP5IJJUTG1GMBMTU3MICVZL1GRJSMSQTU2RSW0T' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

<h3>Let's examine West Palm Beach town:</h3>

In [16]:
west_palm_beach_df = df[df['City'] == 'West Palm Beach'].reset_index(drop=True)
west_palm_beach_df

Unnamed: 0,City,Zipcode,Latitude,Longitude
0,West Palm Beach,33401,26.711192,-80.06043
1,West Palm Beach,33402,26.645895,-80.430269
2,West Palm Beach,33403,26.802139,-80.07032
3,West Palm Beach,33404,26.782114,-80.06528
4,West Palm Beach,33405,26.669744,-80.0585
5,West Palm Beach,33406,26.659294,-80.09118
6,West Palm Beach,33407,26.750991,-80.07296
7,West Palm Beach,33409,26.709575,-80.09443
8,West Palm Beach,33411,26.719596,-80.22077
9,West Palm Beach,33412,26.795367,-80.24044


<p>Get first zipcode:</p>

In [17]:
west_palm_beach_df.loc[0, 'Zipcode']

33401

<p>Get latitude and longitude values:</p>

In [18]:
zipcode_latitude = west_palm_beach_df.loc[0, 'Latitude'] # zipcode latitude value
zipcode_longitude = west_palm_beach_df.loc[0, 'Longitude'] # zipcode longitude value

zipcode_name = west_palm_beach_df.loc[0, 'Zipcode'] # zipcode name

print('Latitude and longitude values of {} are {}, {}.'.format(zipcode_name, 
                                                               zipcode_latitude, 
                                                               zipcode_longitude))

Latitude and longitude values of 33401 are 26.711191999999997, -80.06043000000001.


<h3>Get the top 100 venues that are in West Palm Beach within a radius of 500 meters.</h3>

<p>Create the GET request URL. Name URL url:</p>

In [19]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    zipcode_latitude, 
    zipcode_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=TS5UQ2IG5OQJUKHUTNJBEJ2ULQSREGI1A3CGDZIQN3FAQRLH&client_secret=C00JAJANFYP5IJJUTG1GMBMTU3MICVZL1GRJSMSQTU2RSW0T&v=20180605&ll=26.711191999999997,-80.06043000000001&radius=500&limit=100'

<p>Send the GET request and examine the resutls:</p>

In [20]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d7968a9ad1789002cd0e1b2'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Downtown West Palm Beach',
  'headerFullLocation': 'Downtown West Palm Beach, West Palm Beach',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 23,
  'suggestedBounds': {'ne': {'lat': 26.715692004500003,
    'lng': -80.05540180686641},
   'sw': {'lat': 26.70669199549999, 'lng': -80.06545819313361}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c59e5c95c57c9b6eff11b4a',
       'name': 'The Cheesecake Factory',
       'location': {'address': '701 S Rosemary Ave',
        'crossStreet': 'at Cityplace',
        'la

<p>Function that extracts the category of the venue:</p>

In [21]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

<p>Clean the json and structure it into a pandas dataframe:</p>

In [22]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,The Cheesecake Factory,American Restaurant,26.708346,-80.057215
1,Brightline West Palm Beach,Train Station,26.711554,-80.055798
2,Brio Tuscan Grille,Italian Restaurant,26.708645,-80.057064
3,Uptown Art,Art Gallery,26.711106,-80.055754
4,Cityplace,Shopping Mall,26.708155,-80.056851


<p>Check how many venues were returned by Foursquare:</p>

In [23]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

23 venues were returned by Foursquare.


In [24]:
nearby_venues

Unnamed: 0,name,categories,lat,lng
0,The Cheesecake Factory,American Restaurant,26.708346,-80.057215
1,Brightline West Palm Beach,Train Station,26.711554,-80.055798
2,Brio Tuscan Grille,Italian Restaurant,26.708645,-80.057064
3,Uptown Art,Art Gallery,26.711106,-80.055754
4,Cityplace,Shopping Mall,26.708155,-80.056851
5,"Kravis Center for the Performing Arts, Inc.",Performing Arts Venue,26.706785,-80.059656
6,Blue Martini,Bar,26.709093,-80.056826
7,Rita's Italian Ice & Frozen Custard,Ice Cream Shop,26.709735,-80.057185
8,Publix,Grocery Store,26.710622,-80.057529
9,Sloan's Ice Cream,Ice Cream Shop,26.708334,-80.056669


<h3>Create a function to repeat the same process to all the zipcodes in West Palm Beach:</h3>

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Zipcode', 
                  'Zipcode Latitude', 
                  'Zipcode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<p>Create a new dataframe called west_palm_beach_venues:</p>

In [26]:
west_palm_beach_venues = getNearbyVenues(names=west_palm_beach_df['Zipcode'],
                                   latitudes=west_palm_beach_df['Latitude'],
                                   longitudes=west_palm_beach_df['Longitude']
                                  )

33401
33402
33403
33404
33405
33406
33407
33409
33411
33412
33413
33414
33415
33416
33417
33419
33420
33421
33422


<p>Let's check the size of the resulting dataframe:</p>

In [27]:
print(west_palm_beach_venues.shape)
west_palm_beach_venues.head()

(100, 7)


Unnamed: 0,Zipcode,Zipcode Latitude,Zipcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,33401,26.711192,-80.06043,The Cheesecake Factory,26.708346,-80.057215,American Restaurant
1,33401,26.711192,-80.06043,Brightline West Palm Beach,26.711554,-80.055798,Train Station
2,33401,26.711192,-80.06043,Brio Tuscan Grille,26.708645,-80.057064,Italian Restaurant
3,33401,26.711192,-80.06043,Uptown Art,26.711106,-80.055754,Art Gallery
4,33401,26.711192,-80.06043,Cityplace,26.708155,-80.056851,Shopping Mall


<p>Check how many venues were returned for each zipcode:</p>

In [28]:
west_palm_beach_venues.groupby('Zipcode').count()

Unnamed: 0_level_0,Zipcode Latitude,Zipcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Zipcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
33401,23,23,23,23,23,23
33403,9,9,9,9,9,9
33404,4,4,4,4,4,4
33405,5,5,5,5,5,5
33407,8,8,8,8,8,8
33409,30,30,30,30,30,30
33411,1,1,1,1,1,1
33412,2,2,2,2,2,2
33413,2,2,2,2,2,2
33414,4,4,4,4,4,4


<p>How many unique categories can be curated from all the returned venues:</p>

In [29]:
print('There are {} uniques categories.'.format(len(west_palm_beach_venues['Venue Category'].unique())))

There are 66 uniques categories.


<h3>Analyze each zipcode</h3>

In [30]:
# one hot encoding
west_palm_beach_onehot = pd.get_dummies(west_palm_beach_venues[['Venue Category']], prefix="", prefix_sep="")

# add zipcode column back to dataframe
west_palm_beach_onehot['Zipcode'] = west_palm_beach_venues['Zipcode'] 

# move zipcode column to the first column
fixed_columns = [west_palm_beach_onehot.columns[-1]] + list(west_palm_beach_onehot.columns[:-1])
west_palm_beach_onehot = west_palm_beach_onehot[fixed_columns]

west_palm_beach_onehot.head()

Unnamed: 0,Zipcode,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Bus Stop,Business Service,Café,Clothing Store,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Diner,Discount Store,Electronics Store,Fast Food Restaurant,Food Truck,Furniture / Home Store,Gas Station,Gastropub,Golf Course,Grocery Store,Gym / Fitness Center,Home Service,Ice Cream Shop,Italian Restaurant,Juice Bar,Latin American Restaurant,Light Rail Station,Lingerie Store,Market,Mattress Store,Mediterranean Restaurant,Mexican Restaurant,Mobile Phone Shop,Movie Theater,Music Venue,Other Repair Shop,Park,Pawn Shop,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Pool,Pool Hall,Pub,Sandwich Place,Seafood Restaurant,Shopping Mall,Sports Bar,Steakhouse,Tanning Salon,Thrift / Vintage Store,Train Station,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,33401,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,33401,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
2,33401,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,33401,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,33401,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0


<p>New dataframe size:</p>

In [31]:
west_palm_beach_onehot.shape

(100, 67)

<p>Group rows by zipcode and by taking the mean of the frequency of occurrence of each category:</p>

In [32]:
west_palm_beach_grouped = west_palm_beach_onehot.groupby('Zipcode').mean().reset_index()
west_palm_beach_grouped

Unnamed: 0,Zipcode,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Bus Stop,Business Service,Café,Clothing Store,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Diner,Discount Store,Electronics Store,Fast Food Restaurant,Food Truck,Furniture / Home Store,Gas Station,Gastropub,Golf Course,Grocery Store,Gym / Fitness Center,Home Service,Ice Cream Shop,Italian Restaurant,Juice Bar,Latin American Restaurant,Light Rail Station,Lingerie Store,Market,Mattress Store,Mediterranean Restaurant,Mexican Restaurant,Mobile Phone Shop,Movie Theater,Music Venue,Other Repair Shop,Park,Pawn Shop,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Pool,Pool Hall,Pub,Sandwich Place,Seafood Restaurant,Shopping Mall,Sports Bar,Steakhouse,Tanning Salon,Thrift / Vintage Store,Train Station,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,33401,0.043478,0.043478,0.0,0.043478,0.0,0.043478,0.0,0.0,0.086957,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.086957,0.086957,0.043478,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0
1,33403,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0
2,33404,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,33405,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2
4,33407,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,33409,0.0,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.033333,0.033333,0.0,0.033333,0.033333,0.0,0.133333,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.1,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.066667,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0
6,33411,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,33412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,33413,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,33414,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


<p>Confirm the new size:</p>

In [33]:
west_palm_beach_grouped.shape

(13, 67)

<p>Print each zipcode along with the top 5 most common venues:</p>

In [34]:
num_top_venues = 5

for hood in west_palm_beach_grouped['Zipcode']:
    
    temp = west_palm_beach_grouped[west_palm_beach_grouped['Zipcode'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

                 venue  freq
0                  Bar  0.09
1   Italian Restaurant  0.09
2       Ice Cream Shop  0.09
3  American Restaurant  0.04
4       Cosmetics Shop  0.04


                  venue  freq
0   American Restaurant  0.11
1                  Café  0.11
2           Art Gallery  0.11
3                Market  0.11
4  Gym / Fitness Center  0.11


                venue  freq
0  Light Rail Station  0.25
1    Business Service  0.25
2              Bakery  0.25
3       Grocery Store  0.25
4           Pet Store  0.00


            venue  freq
0           Diner   0.4
1     Yoga Studio   0.2
2            Park   0.2
3  Cosmetics Shop   0.2
4  Mattress Store   0.0


                  venue  freq
0           Pizza Place  0.25
1    Seafood Restaurant  0.12
2        Sandwich Place  0.12
3  Fast Food Restaurant  0.12
4     Convenience Store  0.12


                    venue  freq
0  Furniture / Home Store  0.13
1             Pizza Place  0.10
2              Sports Bar  0.07
3       Electron

<p>Put that into a <i>pandas</i> dataframe:</p>

<p>Function to sort the venues in descending order:</p>

In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

<p>Create the new dataframe and display the top 10 venues for each neighborhood:</p>

In [36]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Zipcode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Zipcode'] = west_palm_beach_grouped['Zipcode']

for ind in np.arange(west_palm_beach_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(west_palm_beach_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,33401,Italian Restaurant,Bar,Ice Cream Shop,Performing Arts Venue,Art Gallery,Asian Restaurant,BBQ Joint,Clothing Store,Cosmetics Shop,Grocery Store
1,33403,American Restaurant,Café,Gastropub,Market,Pizza Place,Gym / Fitness Center,Vietnamese Restaurant,Art Gallery,Thrift / Vintage Store,Asian Restaurant
2,33404,Light Rail Station,Grocery Store,Bakery,Business Service,Yoga Studio,Food Truck,Diner,Discount Store,Electronics Store,Fast Food Restaurant
3,33405,Diner,Yoga Studio,Cosmetics Shop,Park,Gym / Fitness Center,Grocery Store,Golf Course,Gastropub,Gas Station,Furniture / Home Store
4,33407,Pizza Place,Coffee Shop,Fast Food Restaurant,Sandwich Place,Convenience Store,Seafood Restaurant,Grocery Store,Furniture / Home Store,Food Truck,Gas Station


<h3>Cluster Neighbourhoods by Zipcode</h3>

<p>Run <i>k</i>-means to cluster the zicodes into 5 clusters.</p>

In [37]:
# set number of clusters
kclusters = 5

west_palm_beach_grouped_clustering = west_palm_beach_grouped.drop('Zipcode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(west_palm_beach_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 3, 0, 0, 2, 1, 4, 0], dtype=int32)

<p>Create a new dataframe that includes the cluster as well as the top 10 venues for each zipcode:</p>

In [38]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

west_palm_beach_merged = west_palm_beach_df

# merging
west_palm_beach_merged = west_palm_beach_merged.join(neighborhoods_venues_sorted.set_index('Zipcode'), on='Zipcode')

west_palm_beach_merged.head() # check the last columns!

Unnamed: 0,City,Zipcode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,West Palm Beach,33401,26.711192,-80.06043,0.0,Italian Restaurant,Bar,Ice Cream Shop,Performing Arts Venue,Art Gallery,Asian Restaurant,BBQ Joint,Clothing Store,Cosmetics Shop,Grocery Store
1,West Palm Beach,33402,26.645895,-80.430269,,,,,,,,,,,
2,West Palm Beach,33403,26.802139,-80.07032,0.0,American Restaurant,Café,Gastropub,Market,Pizza Place,Gym / Fitness Center,Vietnamese Restaurant,Art Gallery,Thrift / Vintage Store,Asian Restaurant
3,West Palm Beach,33404,26.782114,-80.06528,0.0,Light Rail Station,Grocery Store,Bakery,Business Service,Yoga Studio,Food Truck,Diner,Discount Store,Electronics Store,Fast Food Restaurant
4,West Palm Beach,33405,26.669744,-80.0585,3.0,Diner,Yoga Studio,Cosmetics Shop,Park,Gym / Fitness Center,Grocery Store,Golf Course,Gastropub,Gas Station,Furniture / Home Store


<h3>Examine Clusters</h3>

<h4>Cluster 1</h4>

In [39]:
west_palm_beach_merged.loc[west_palm_beach_merged['Cluster Labels'] == 0, west_palm_beach_merged.columns[[1] + list(range(5, west_palm_beach_merged.shape[1]))]]

Unnamed: 0,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,33401,Italian Restaurant,Bar,Ice Cream Shop,Performing Arts Venue,Art Gallery,Asian Restaurant,BBQ Joint,Clothing Store,Cosmetics Shop,Grocery Store
2,33403,American Restaurant,Café,Gastropub,Market,Pizza Place,Gym / Fitness Center,Vietnamese Restaurant,Art Gallery,Thrift / Vintage Store,Asian Restaurant
3,33404,Light Rail Station,Grocery Store,Bakery,Business Service,Yoga Studio,Food Truck,Diner,Discount Store,Electronics Store,Fast Food Restaurant
6,33407,Pizza Place,Coffee Shop,Fast Food Restaurant,Sandwich Place,Convenience Store,Seafood Restaurant,Grocery Store,Furniture / Home Store,Food Truck,Gas Station
7,33409,Furniture / Home Store,Pizza Place,Sports Bar,Italian Restaurant,Home Service,Fast Food Restaurant,Electronics Store,Latin American Restaurant,Diner,Deli / Bodega
11,33414,Pet Store,Grocery Store,Park,Playground,Electronics Store,Dance Studio,Deli / Bodega,Diner,Discount Store,Fast Food Restaurant
12,33415,Food Truck,Discount Store,Playground,Pawn Shop,Golf Course,Gastropub,Gas Station,Furniture / Home Store,Cosmetics Shop,Fast Food Restaurant
13,33416,Italian Restaurant,Fast Food Restaurant,Furniture / Home Store,Sandwich Place,Dance Studio,Deli / Bodega,Diner,Discount Store,Electronics Store,Food Truck
14,33417,Pharmacy,Music Venue,Pool,Pool Hall,Yoga Studio,Discount Store,Dance Studio,Deli / Bodega,Diner,Electronics Store


<h4>Cluster 2</h4>

In [40]:
west_palm_beach_merged.loc[west_palm_beach_merged['Cluster Labels'] == 1, west_palm_beach_merged.columns[[1] + list(range(5, west_palm_beach_merged.shape[1]))]]

Unnamed: 0,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,33412,Construction & Landscaping,Yoga Studio,Food Truck,Deli / Bodega,Diner,Discount Store,Electronics Store,Fast Food Restaurant,Furniture / Home Store,Cosmetics Shop


<h4>Cluster 3</h4>

In [41]:
west_palm_beach_merged.loc[west_palm_beach_merged['Cluster Labels'] == 2, west_palm_beach_merged.columns[[1] + list(range(5, west_palm_beach_merged.shape[1]))]]

Unnamed: 0,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,33411,Bus Stop,Yoga Studio,Food Truck,Deli / Bodega,Diner,Discount Store,Electronics Store,Fast Food Restaurant,Furniture / Home Store,Cosmetics Shop


<h4>Cluster 4</h4>

In [42]:
west_palm_beach_merged.loc[west_palm_beach_merged['Cluster Labels'] == 3, west_palm_beach_merged.columns[[1] + list(range(5, west_palm_beach_merged.shape[1]))]]

Unnamed: 0,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,33405,Diner,Yoga Studio,Cosmetics Shop,Park,Gym / Fitness Center,Grocery Store,Golf Course,Gastropub,Gas Station,Furniture / Home Store


<h4>Cluster 5</h4>

In [43]:
west_palm_beach_merged.loc[west_palm_beach_merged['Cluster Labels'] == 4, west_palm_beach_merged.columns[[1] + list(range(5, west_palm_beach_merged.shape[1]))]]

Unnamed: 0,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,33413,Golf Course,Other Repair Shop,Yoga Studio,Cosmetics Shop,Gym / Fitness Center,Grocery Store,Gastropub,Gas Station,Furniture / Home Store,Food Truck


<hr>
<h2>Results and Discussion</h2>
<p>Based on the results, we can observe that Cluster 1 returned much more zip codes that Cluster 2, Cluster 3, Cluster 4 and Cluster 5. We can see that Cluster 1 includes nine unique zip codes: 33401, 33403, 33404, 33407, 33409, 33414, 33415, 33416, and 33417. Cluster 1 returned a lot of fun and convenient venues such as various restaurants, entertainment places and retail stores. Cluster 2, Cluster 3, Cluster 4 and Cluster 5 only have one zip code each with just a few venues such as construction, landscaping, discount store and park venues.</p>
<p>Based on these observations, if I would consider moving from the city of Chicago to Florida, West Palm Beach could be a good candidate. I would specifically look into zip codes from Cluster 1 since it seem to offer a lot of things that I already like in Chicago such as great dining and access to health clubs.</p>

<hr>
<h2>Conclusion</h2>
<p>Similar calculations and analysis can be performed on other counties and zip codes in Florida or other states to examine clusters based on venues. This sort of analysis can help people to decide what specific locations and zip codes have ideal set of venues in order to plan to move to another state.</p>