# Final Capstone Project (Week 2)

## Table of Contents

* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

### Introduction and Business Problem <a id="introduction"></a>

#### A theorectical residential home builder is looking to set up a new location in the Houston Texas Area. This analysis will help decide where in this area might be a good location to consider based on neighborhoods.

We will be analyzing neighborhoods in the Houston area to try and determine if some similarities and characteristics between them exists.  Since the builder has not provided details about what market he is looking to serve (custom vs. high volume, luxury vs. not, etc.), an exact location cannot be determined. However, we can make observations about the different types of neighborhoods and present these back to the builder so they can make a final decision.

### Data <a id="data"></a>

Data is from several sources.  The neighborhood information, with some real estate information, is from Houstonia Magazine.  The article is tiled "Neighborhoods by the Numbers 2017" and was published in March 2017. It can be found at "https://www.houstoniamag.com/home-and-real-estate/2017/03/neighborhoods-by-the-numbers-real-estate-data-2017". The information will have median home pricing and home value growth by percent for specified time periods. The geocoordinates of the neighborhoods is derived by using their zipcodes.  Using the site zipinfo.com, I will download a csv file with all zipcodes and corresponding latitude and longitude.  Lastly, Foursquare will be used to pull in the venues data for the neighborhoods and this can be found at Foursquare.com.

The real estate data will be standardized and combined with the venue data to group the neighborhoods using **KMeans**.  Once completed, analysis will be performed by mapping of the clusters of neighborhoods in the Houston area to visually see how they relate to each other. Also, a review of the top venues in each neighborhood cluster should provide observations about characteristics and similarities. Certain clusters may prove more suited for a new location versus others and this information will be provided to the homebuilder.  

Import the needed resources to start the project.

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt

Read in the dataframe containing the Houston, TX area neighborhoods using Beautiful Soup

In [2]:
res = requests.get("https://www.houstoniamag.com/home-and-real-estate/2017/03/neighborhoods-by-the-numbers-real-estate-data-2017")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df_raw = pd.read_html(str(table))

In [3]:
df_neighborhood = df_raw[0]
df_neighborhood.dtypes

Unnamed: 0                      object
ZIP Code                        object
2016 Median Home Price          object
% Growth 2010-2016              object
% Growth 2015-2016              object
Avg. Days on Market in 2016    float64
% Owner Occupied                object
dtype: object

In [4]:
df_neighborhood.head()

Unnamed: 0.1,Unnamed: 0,ZIP Code,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,Avg. Days on Market in 2016,% Owner Occupied
0,1960/Cypress,77065,"$179,000",45.50%,8.50%,32.0,47%
1,Aldine Area,77039,"$133,500",57.10%,7.70%,35.1,61%
2,Alief,77072,"$164,000",80.20%,14.70%,31.3,47%
3,Alvin North,77511,"$227,000",43.40%,5.60%,58.3,71%
4,Alvin South,77511,"$163,900",46.30%,6.20%,35.1,71%


Examination of the dataframe shows most columns as object type.  I want to convert those that will not be dropped to numeric.  There is also an abnormal zipcode format in 1 row that will be adjusted.

In [5]:
df_neighborhood[(df_neighborhood['ZIP Code'].str.len() > 5)]

Unnamed: 0.1,Unnamed: 0,ZIP Code,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,Avg. Days on Market in 2016,% Owner Occupied
144,Willis/New Waverly,77318/77378,"$146,250",68.10%,4.50%,49.5,74%


In [6]:
df_neighborhood.at[144, 'ZIP Code'] = '77318'

In [7]:
df_neighborhood.at[144, 'ZIP Code']

'77318'

Clean up some of the colums so they can be converted to numeric

In [8]:
df_neighborhood['2016 Median Home Price'] = df_neighborhood['2016 Median Home Price'].str.replace(',', '')
df_neighborhood['2016 Median Home Price'] = df_neighborhood['2016 Median Home Price'].str.replace('$', '')

df_neighborhood['% Growth 2010-2016'] = df_neighborhood['% Growth 2010-2016'].str.replace('%', '')
df_neighborhood['% Growth 2015-2016'] = df_neighborhood['% Growth 2015-2016'].str.replace('%', '')

Convert type on the columns to keep and drop the remaining

In [9]:
df_neighborhood['ZIP Code'] = df_neighborhood['ZIP Code'].astype(int)
df_neighborhood['2016 Median Home Price'] = pd.to_numeric(df_neighborhood['2016 Median Home Price'], errors='coerce')
df_neighborhood['% Growth 2010-2016'] = pd.to_numeric(df_neighborhood['% Growth 2010-2016'], errors='coerce')/100
df_neighborhood['% Growth 2015-2016'] = pd.to_numeric(df_neighborhood['% Growth 2015-2016'], errors='coerce')/100

In [10]:
df_neighborhood.dtypes

Unnamed: 0                      object
ZIP Code                         int32
2016 Median Home Price           int64
% Growth 2010-2016             float64
% Growth 2015-2016             float64
Avg. Days on Market in 2016    float64
% Owner Occupied                object
dtype: object

In [11]:
df_neighborhood.drop(['Avg. Days on Market in 2016','% Owner Occupied'], axis=1, inplace=True)

In [12]:
df_neighborhood.columns.values[0] = 'Neighborhood'

In [13]:
print(df_neighborhood.shape)
print(df_neighborhood.dtypes)
df_neighborhood.head()

(147, 5)
Neighborhood               object
ZIP Code                    int32
2016 Median Home Price      int64
% Growth 2010-2016        float64
% Growth 2015-2016        float64
dtype: object


Unnamed: 0,Neighborhood,ZIP Code,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016
0,1960/Cypress,77065,179000,0.455,0.085
1,Aldine Area,77039,133500,0.571,0.077
2,Alief,77072,164000,0.802,0.147
3,Alvin North,77511,227000,0.434,0.056
4,Alvin South,77511,163900,0.463,0.062


Read in the zipcode csv file containing latitude and longitude for each zipcode

In [14]:
ZipCode = pd.read_csv('ZipCode List - TX.csv')

In [15]:
ZipCode.dtypes

Country        object
Zip Code        int64
City           object
State          object
St             object
Area           object
A               int64
Unnamed: 7    float64
Unnamed: 8    float64
lat           float64
long          float64
B             float64
dtype: object

Drop all unnecessary columns so the 2 dataframes can be merged.

In [16]:
ZipCode.rename(columns={"lat": "latitude", "long": "longitude"}, inplace=True)

In [17]:
ZipCode.drop(ZipCode.columns[ZipCode.columns.str.contains('unnamed',case = False)],axis = 1, inplace = True)

In [18]:
ZipCode.drop(['Country','City','State','St','Area','A','B'], axis=1, inplace=True)

In [19]:
print(ZipCode.dtypes)
ZipCode.head()

Zip Code       int64
latitude     float64
longitude    float64
dtype: object


Unnamed: 0,Zip Code,latitude,longitude
0,75763,32.0535,-95.5163
1,75779,31.8668,-95.4958
2,75801,31.7588,-95.6342
3,75802,31.7621,-95.6308
4,75803,31.7571,-95.6545


Merge the 2 dataframes now that they are prepared

In [20]:
df = pd.merge(df_neighborhood, ZipCode, left_on='ZIP Code', right_on='Zip Code', how='left')

In [21]:
df.head()

Unnamed: 0,Neighborhood,ZIP Code,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,Zip Code,latitude,longitude
0,1960/Cypress,77065,179000,0.455,0.085,77065,29.9319,-95.6106
1,Aldine Area,77039,133500,0.571,0.077,77039,29.9067,-95.3334
2,Alief,77072,164000,0.802,0.147,77072,29.699,-95.5862
3,Alvin North,77511,227000,0.434,0.056,77511,29.412,-95.2515
4,Alvin South,77511,163900,0.463,0.062,77511,29.412,-95.2515


In [22]:
df.drop(['Zip Code'], axis=1, inplace=True)

Dataframe now has all the information desired and is ready for analysis

In [23]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 147 entries, 0 to 146
Data columns (total 7 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Neighborhood            147 non-null    object 
 1   ZIP Code                147 non-null    int32  
 2   2016 Median Home Price  147 non-null    int64  
 3   % Growth 2010-2016      147 non-null    float64
 4   % Growth 2015-2016      147 non-null    float64
 5   latitude                147 non-null    float64
 6   longitude               147 non-null    float64
dtypes: float64(4), int32(1), int64(1), object(1)
memory usage: 8.6+ KB


### Methodology <a id="methodology"></a>

I plan to use KMeans to cluster the neighborhoods using some real estate data and venue data obtained from Foursquare.

Add the necessary resources to plot the neighborhoods and run analysis

In [24]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [25]:
address = 'Houston, Texas'

geolocator = Nominatim(user_agent="houston_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Houston,TX are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Houston,TX are 29.7589382, -95.3676974.


In [26]:
# create map of Houston neighborhoods using the latitude and longitude values
map_houston = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(df['latitude'], df['longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_houston)  
    
map_houston

### Start process of gathering Foursquare data

Define credentials for Foursquare

In [27]:
CLIENT_ID = 'SPJVTOPRDIRUMXDQS33NEUYZUFNLNI13NUCY4E0SIOB4BKOT' # your Foursquare ID
CLIENT_SECRET = '5JJ2KJGVXSI2LWTAUM3RGLRUK1EZ1XVFD3MDXFMRIBCMC3FK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: SPJVTOPRDIRUMXDQS33NEUYZUFNLNI13NUCY4E0SIOB4BKOT
CLIENT_SECRET:5JJ2KJGVXSI2LWTAUM3RGLRUK1EZ1XVFD3MDXFMRIBCMC3FK


In [28]:
df.loc[0, 'Neighborhood']

'1960/Cypress'

In [29]:
neighborhood_latitude = df.loc[0, 'latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[0, 'longitude'] # neighborhood longitude value

neighborhood_name = df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and Longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and Longitude values of 1960/Cypress are 29.9319, -95.6106.


In [30]:
LIMIT=150
radius=700

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url


'https://api.foursquare.com/v2/venues/explore?client_id=SPJVTOPRDIRUMXDQS33NEUYZUFNLNI13NUCY4E0SIOB4BKOT&client_secret=5JJ2KJGVXSI2LWTAUM3RGLRUK1EZ1XVFD3MDXFMRIBCMC3FK&ll=29.7589382,-95.3676974&v=20180605&radius=700&limit=150'

In [31]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ec40b651a4b0a002837f9b2'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Downtown Houston',
  'headerFullLocation': 'Downtown Houston, Houston',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 103,
  'suggestedBounds': {'ne': {'lat': 29.765238206300005,
    'lng': -95.36045389291286},
   'sw': {'lat': 29.752638193699994, 'lng': -95.37494090708714}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bd9ae2d2e6f0f47c9730b08',
       'name': 'Becks Prime',
       'location': {'address': '910 Travis St',
        'lat': 29.75818486875766,
        'lng': -95.36617176399422,
        'labeledLat

Define a function to help clean up returned data

In [32]:
# function that extracts the category of the venue
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [33]:
houston_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['latitude'],
                                   longitudes=df['longitude']
                                  )

1960/Cypress
Aldine Area
Alief
Alvin North
Alvin South
Atascocita North
Atascocita South
Bacliff/San Leon
Bayou Vista
Baytown/Chambers County
Baytown/Harris County
Bear Creek
Bellaire
Braeswood Place
Brays Oaks
Briargrove
Briargrove Park/Walnut Bend
Briarmeadow/Tanglewilde
Brookshire
Chambers County East
Chambers County West
Champions Area
Charnwood/Briarbend
Clear Lake Area
Cleveland Area
Coldspring/South San Jacinto County
Conroe Northeast
Conroe Southeast
Conroe Southwest
Copperfield Area
Cottage Grove
Crosby Area
Crystal Beach
Cypress North
Cypress South
Dayton
Deer Park
Denver Harbor
Dickinson
East End-Galveston
East End Revitalized
Eldridge North
Energy Corridor
Fall Creek Area
Five Corners
Fort Bend County North/Richmond
Fort Bend Southeast
Friendswood
Fulshear/South Brookshire/Simonton
Galleria
Garden Oaks
Greenway Plaza
Gulfton
Heights/Greater Heights
Hempstead
Highland Village/Midlane
Hitchcock
Hobby Area
Hockley
Huffman Area
Humble Area East
Humble Area South
Humble Area Wes

Check the returned dataframe shape and information

In [34]:
print(houston_venues.shape)
houston_venues.head()

(1288, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,1960/Cypress,29.9319,-95.6106,Smart Daddys Pizza,29.934253,-95.607823,Pizza Place
1,1960/Cypress,29.9319,-95.6106,Taqueria Acapulcos,29.934069,-95.607536,Mexican Restaurant
2,1960/Cypress,29.9319,-95.6106,Karma Kolache,29.9342,-95.607617,Donut Shop
3,1960/Cypress,29.9319,-95.6106,Eldridge & Fallbrook,29.932655,-95.606442,Intersection
4,Aldine Area,29.9067,-95.3334,Domino's Pizza,29.902884,-95.3311,Pizza Place


In [35]:
houston_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1960/Cypress,4,4,4,4,4,4
Aldine Area,7,7,7,7,7,7
Alief,3,3,3,3,3,3
Alvin North,4,4,4,4,4,4
Alvin South,4,4,4,4,4,4
...,...,...,...,...,...,...
Webster,15,15,15,15,15,15
West University/Southside Area,5,5,5,5,5,5
Westchase Area,15,15,15,15,15,15
Willow Meadows Area,3,3,3,3,3,3


In [36]:
print('There are {} uniques categories.'.format(len(houston_venues['Venue Category'].unique())))

There are 215 uniques categories.


Convert the category column to binary columns

In [37]:
# one hot encoding
houston_onehot = pd.get_dummies(houston_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
houston_onehot['Neighborhood'] = houston_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [houston_onehot.columns[-1]] + list(houston_onehot.columns[:-1])
houston_onehot = houston_onehot[fixed_columns]

houston_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Advertising Agency,Airport,Airport Terminal,American Restaurant,Art Gallery,Arts & Entertainment,Asian Restaurant,...,Trail,Tree,Turkish Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,1960/Cypress,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1960/Cypress,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1960/Cypress,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1960/Cypress,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Aldine Area,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [38]:
houston_onehot.shape

(1288, 216)

Group all the venues by neighborhood

In [39]:
houston_grouped = houston_onehot.groupby('Neighborhood').mean().reset_index()
houston_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Advertising Agency,Airport,Airport Terminal,American Restaurant,Art Gallery,Arts & Entertainment,Asian Restaurant,...,Trail,Tree,Turkish Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,1960/Cypress,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
1,Aldine Area,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
2,Alief,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
3,Alvin North,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
4,Alvin South,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
125,Webster,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.066667,...,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0
126,West University/Southside Area,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
127,Westchase Area,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
128,Willow Meadows Area,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0


Print each neighborhood with its top 5 venues listed.

In [40]:
num_top_venues = 5

for hood in houston_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = houston_grouped[houston_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----1960/Cypress----
                venue  freq
0          Donut Shop  0.25
1         Pizza Place  0.25
2        Intersection  0.25
3  Mexican Restaurant  0.25
4                 ATM  0.00


----Aldine Area----
                venue  freq
0         Pizza Place  0.14
1            Pharmacy  0.14
2         Gas Station  0.14
3  Chinese Restaurant  0.14
4          Taco Place  0.14


----Alief----
         venue  freq
0       Lounge  0.33
1  Gas Station  0.33
2         Park  0.33
3          ATM  0.00
4    Nightclub  0.00


----Alvin North----
              venue  freq
0  Insurance Office  0.25
1        Restaurant  0.25
2      Dessert Shop  0.25
3    Baseball Field  0.25
4               ATM  0.00


----Alvin South----
              venue  freq
0  Insurance Office  0.25
1        Restaurant  0.25
2      Dessert Shop  0.25
3    Baseball Field  0.25
4               ATM  0.00


----Atascocita North----
                    venue  freq
0        Department Store  0.18
1  Furniture / Home Store  0.09


                       venue  freq
0              Garden Center   1.0
1                        ATM   0.0
2                 Non-Profit   0.0
3  Middle Eastern Restaurant   0.0
4          Mobile Phone Shop   0.0


----Humble Area East----
                  venue  freq
0    Mexican Restaurant  0.06
1      Department Store  0.06
2   Fried Chicken Joint  0.06
3  Fast Food Restaurant  0.06
4        Sandwich Place  0.06


----Humble Area South----
                  venue  freq
0    Mexican Restaurant  0.06
1      Department Store  0.06
2   Fried Chicken Joint  0.06
3  Fast Food Restaurant  0.06
4        Sandwich Place  0.06


----Humble Area West----
                  venue  freq
0    Mexican Restaurant  0.06
1      Department Store  0.06
2   Fried Chicken Joint  0.06
3  Fast Food Restaurant  0.06
4        Sandwich Place  0.06


----Jersey Village----
                        venue  freq
0  Construction & Landscaping  0.33
1          Advertising Agency  0.33
2          Athletics & Sports  0.33

                  venue  freq
0  Fast Food Restaurant  0.29
1        Ice Cream Shop  0.14
2           Pizza Place  0.14
3           Coffee Shop  0.14
4        Sandwich Place  0.14


----Sugar Land East----
                       venue  freq
0       Gym / Fitness Center  0.14
1  Latin American Restaurant  0.14
2                 Food Truck  0.14
3                 Smoke Shop  0.14
4             Shipping Store  0.14


----Sugar Land North----
                       venue  freq
0                Pizza Place   0.5
1                 Restaurant   0.5
2                        ATM   0.0
3                 Non-Profit   0.0
4  Middle Eastern Restaurant   0.0


----Sugar Land South----
                  venue  freq
0        Baseball Field  0.33
1        Sandwich Place  0.17
2    Athletics & Sports  0.17
3  Fast Food Restaurant  0.17
4             BBQ Joint  0.17


----Sugar Land West----
                       venue  freq
0     Furniture / Home Store  0.50
1                        Gym  0.25
2        

Define function to sort venues in descending order.

In [41]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a dataframe displaying the top 10 venues for each neighborhood.

In [42]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = houston_grouped['Neighborhood']

for ind in np.arange(houston_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(houston_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1960/Cypress,Pizza Place,Mexican Restaurant,Intersection,Donut Shop,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant
1,Aldine Area,Pizza Place,Pharmacy,Mexican Restaurant,Fast Food Restaurant,Chinese Restaurant,Taco Place,Gas Station,Dive Bar,Flower Shop,Factory
2,Alief,Lounge,Park,Gas Station,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory
3,Alvin North,Insurance Office,Restaurant,Baseball Field,Dessert Shop,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant
4,Alvin South,Insurance Office,Restaurant,Baseball Field,Dessert Shop,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant


Scale the real estate data and merge with Foursquare data to run KMeans

In [43]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

In [44]:
df_scaled = df_neighborhood[['Neighborhood', '2016 Median Home Price','% Growth 2010-2016','% Growth 2015-2016']]
df_scaled.loc[:,'2016 Median Home Price':] = scaler.fit_transform(df_neighborhood[['2016 Median Home Price','% Growth 2010-2016','% Growth 2015-2016']])
df_scaled

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Unnamed: 0,Neighborhood,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016
0,1960/Cypress,0.031609,0.269124,0.332820
1,Aldine Area,0.015266,0.322581,0.326656
2,Alief,0.026221,0.429032,0.380586
3,Alvin North,0.048851,0.259447,0.310478
4,Alvin South,0.026185,0.272811,0.315100
...,...,...,...,...
142,West University/Southside Area,0.395474,0.316590,0.247304
143,Westchase Area,0.184896,0.045161,0.159476
144,Willis/New Waverly,0.019846,0.373272,0.302003
145,Willow Meadows Area,0.078664,0.251152,0.254237


In [45]:
houston_grouped1=pd.merge(df_scaled,houston_grouped, how='right', on='Neighborhood')

In [46]:
houston_grouped1

Unnamed: 0,Neighborhood,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,ATM,Accessories Store,Advertising Agency,Airport,Airport Terminal,American Restaurant,...,Trail,Tree,Turkish Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,1960/Cypress,0.031609,0.269124,0.332820,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
1,Aldine Area,0.015266,0.322581,0.326656,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
2,Alief,0.026221,0.429032,0.380586,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
3,Alvin North,0.048851,0.259447,0.310478,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
4,Alvin South,0.026185,0.272811,0.315100,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
125,Webster,0.080460,0.641014,0.529276,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0
126,West University/Southside Area,0.395474,0.316590,0.247304,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
127,Westchase Area,0.184896,0.045161,0.159476,0.0,0.0,0.0,0.0,0.0,0.066667,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
128,Willow Meadows Area,0.078664,0.251152,0.254237,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0


In [47]:
houston_grouped1.describe()

Unnamed: 0,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,ATM,Accessories Store,Advertising Agency,Airport,Airport Terminal,American Restaurant,Art Gallery,...,Trail,Tree,Turkish Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
count,130.0,130.0,130.0,130.0,130.0,130.0,130.0,130.0,130.0,130.0,...,130.0,130.0,130.0,130.0,130.0,130.0,130.0,130.0,130.0,130.0
mean,0.102611,0.291648,0.305108,0.000405,0.00052,0.005128,0.001923,0.001282,0.004259,0.007846,...,0.000699,0.003846,0.000549,0.00035,0.008094,0.001009,0.000794,0.000869,0.001465,0.001923
std,0.149366,0.147095,0.108856,0.004616,0.003408,0.041184,0.021926,0.014618,0.0193,0.08771,...,0.007973,0.043853,0.004413,0.003987,0.030386,0.007066,0.005326,0.004371,0.011767,0.021926
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.030814,0.212903,0.259823,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.056932,0.268664,0.29661,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.105833,0.335945,0.33282,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,1.0,0.880184,1.0,0.052632,0.02381,0.333333,0.25,0.166667,0.142857,1.0,...,0.090909,0.5,0.035714,0.045455,0.25,0.055556,0.04,0.023256,0.095238,0.25


Start Cluster analysis of neighborhoods

In [48]:
# set number of clusters
kclusters = 7

houston_grouped_clustering = houston_grouped1.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=42).fit(houston_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([5, 5, 5, 5, 5, 5, 1, 5, 4, 5])

In [49]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

houston_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
houston_merged = houston_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

houston_merged.head(10) # check the last columns!

Unnamed: 0,Neighborhood,ZIP Code,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1960/Cypress,77065,179000,0.455,0.085,29.9319,-95.6106,5.0,Pizza Place,Mexican Restaurant,Intersection,Donut Shop,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant
1,Aldine Area,77039,133500,0.571,0.077,29.9067,-95.3334,5.0,Pizza Place,Pharmacy,Mexican Restaurant,Fast Food Restaurant,Chinese Restaurant,Taco Place,Gas Station,Dive Bar,Flower Shop,Factory
2,Alief,77072,164000,0.802,0.147,29.699,-95.5862,5.0,Lounge,Park,Gas Station,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory
3,Alvin North,77511,227000,0.434,0.056,29.412,-95.2515,5.0,Insurance Office,Restaurant,Baseball Field,Dessert Shop,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant
4,Alvin South,77511,163900,0.463,0.062,29.412,-95.2515,5.0,Insurance Office,Restaurant,Baseball Field,Dessert Shop,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant
5,Atascocita North,77346,189900,0.461,0.043,30.0042,-95.1728,5.0,Department Store,Coffee Shop,Trail,Big Box Store,Furniture / Home Store,Shoe Store,Business Service,Gym / Fitness Center,Spa,Paper / Office Supplies Store
6,Atascocita South,77396,199000,0.327,0.008,29.9507,-95.2622,1.0,Boat or Ferry,Home Service,Construction & Landscaping,Auto Workshop,Electronics Store,Business Service,Doctor's Office,Food Service,Food,Fondue Restaurant
7,Bacliff/San Leon,77518,165941,0.739,0.153,29.5055,-94.9893,5.0,Mexican Restaurant,Furniture / Home Store,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
8,Bayou Vista,77563,240000,0.601,0.16,29.3398,-94.9926,4.0,Bar,Yoga Studio,Doctor's Office,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
9,Baytown/Chambers County,77521,167950,0.375,0.059,29.7705,-94.9695,5.0,Hotel,Sandwich Place,Dive Bar,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor


Check merged DataFrame for nulls. Most of these locations are far suburbs or at the coast. OK to drop

In [50]:
houston_merged_null = houston_merged[houston_merged['Cluster Labels'].isnull()]
houston_merged_null.head()

Unnamed: 0,Neighborhood,ZIP Code,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Chambers County East,77514,134750,0.162,0.091,29.662,-94.593,,,,,,,,,,,
24,Cleveland Area,77327,125000,0.894,0.111,30.33,-95.0202,,,,,,,,,,,
25,Coldspring/South San Jacinto County,77331,109250,0.561,-0.09,30.6027,-95.1086,,,,,,,,,,,
35,Dayton,77535,157500,0.373,0.212,30.0102,-94.8787,,,,,,,,,,,
38,Dickinson,77539,158500,0.203,-0.006,29.4585,-95.0345,,,,,,,,,,,


Map the null neighborhoods

In [51]:
# create map of Houston neighborhoods using the latitude and longitude values
map_houston = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(houston_merged_null['latitude'], houston_merged_null['longitude'], houston_merged_null['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_houston)  
    
map_houston

Drop the neighborhoods that had no venues returned and have null Cluster labels

In [52]:
houston_merged.dropna(inplace=True)
houston_merged.head()

Unnamed: 0,Neighborhood,ZIP Code,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1960/Cypress,77065,179000,0.455,0.085,29.9319,-95.6106,5.0,Pizza Place,Mexican Restaurant,Intersection,Donut Shop,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant
1,Aldine Area,77039,133500,0.571,0.077,29.9067,-95.3334,5.0,Pizza Place,Pharmacy,Mexican Restaurant,Fast Food Restaurant,Chinese Restaurant,Taco Place,Gas Station,Dive Bar,Flower Shop,Factory
2,Alief,77072,164000,0.802,0.147,29.699,-95.5862,5.0,Lounge,Park,Gas Station,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory
3,Alvin North,77511,227000,0.434,0.056,29.412,-95.2515,5.0,Insurance Office,Restaurant,Baseball Field,Dessert Shop,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant
4,Alvin South,77511,163900,0.463,0.062,29.412,-95.2515,5.0,Insurance Office,Restaurant,Baseball Field,Dessert Shop,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant


In [53]:
houston_merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 130 entries, 0 to 146
Data columns (total 18 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Neighborhood            130 non-null    object 
 1   ZIP Code                130 non-null    int32  
 2   2016 Median Home Price  130 non-null    int64  
 3   % Growth 2010-2016      130 non-null    float64
 4   % Growth 2015-2016      130 non-null    float64
 5   latitude                130 non-null    float64
 6   longitude               130 non-null    float64
 7   Cluster Labels          130 non-null    float64
 8   1st Most Common Venue   130 non-null    object 
 9   2nd Most Common Venue   130 non-null    object 
 10  3rd Most Common Venue   130 non-null    object 
 11  4th Most Common Venue   130 non-null    object 
 12  5th Most Common Venue   130 non-null    object 
 13  6th Most Common Venue   130 non-null    object 
 14  7th Most Common Venue   130 non-null    ob

Create a map of the neighborhood clusters color coded

In [54]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(houston_merged['latitude'], houston_merged['longitude'], houston_merged['Neighborhood'], houston_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
#toronto_merged['Cluster Labels']       
map_clusters

Let's examine more detail from each cluster

Cluster 0  (Red dots on Map)

In [55]:
houston_merged.loc[houston_merged['Cluster Labels'] == 0, houston_merged.columns[[0] + list(range(2, houston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Braeswood Place,715000,0.585,0.014,29.6889,-95.4341,0.0,Recreation Center,Park,Yoga Studio,Dive Bar,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
72,Knollwood/Woodside Area,430000,0.352,-0.075,29.6889,-95.4341,0.0,Recreation Center,Park,Yoga Studio,Dive Bar,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
90,Mission Bend Area,171000,0.513,0.097,29.6947,-95.6511,0.0,Park,Yoga Studio,Dive Bar,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor


Cluster 1 (Purple dots on map)

In [56]:
houston_merged.loc[houston_merged['Cluster Labels'] == 1, houston_merged.columns[[0] + list(range(2, houston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Atascocita South,199000,0.327,0.008,29.9507,-95.2622,1.0,Boat or Ferry,Home Service,Construction & Landscaping,Auto Workshop,Electronics Store,Business Service,Doctor's Office,Food Service,Food,Fondue Restaurant
14,Brays Oaks,225000,0.772,0.1,29.6581,-95.5413,1.0,Construction & Landscaping,Historic Site,Dive Bar,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
20,Chambers County West,260000,0.396,0.036,29.77,-94.8608,1.0,Moving Target,Business Service,Yoga Studio,Doctor's Office,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory
21,Champions Area,234195,0.338,0.055,29.9863,-95.5208,1.0,Home Service,Locksmith,Boutique,Playground,Historic Site,Dive Bar,Fondue Restaurant,Hobby Shop,Flower Shop,Fast Food Restaurant
33,Cypress North,230995,0.32,-0.006,29.9766,-95.6358,1.0,Yoga Studio,Home Service,Video Store,Gym / Fitness Center,Dive Bar,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory
43,Fall Creek Area,302000,0.162,-0.005,29.9507,-95.2622,1.0,Boat or Ferry,Home Service,Construction & Landscaping,Auto Workshop,Electronics Store,Business Service,Doctor's Office,Food Service,Food,Fondue Restaurant
48,Fulshear/South Brookshire/Simonton,380058,0.071,-0.071,29.7217,-95.8977,1.0,Business Service,Home Service,Discount Store,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor,Event Space
65,Katy-Old Towne,257509,0.715,0.144,29.8678,-95.8298,1.0,Locksmith,Home Service,Auto Garage,Business Service,Yoga Studio,Doctor's Office,Food Service,Food,Fondue Restaurant,Flower Shop
66,Katy-Southeast,284575,0.308,0.006,29.745,-95.7326,1.0,Home Service,Advertising Agency,Lake,Yoga Studio,Doctor's Office,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant
76,Lake Livingston Area,133500,0.233,0.161,30.6829,-94.8976,1.0,Home Service,Electronics Store,Yoga Studio,Dive Bar,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory


Cluster 2 (Dark blue dots on map)

In [57]:
houston_merged.loc[houston_merged['Cluster Labels'] == 2, houston_merged.columns[[0] + list(range(2, houston_merged.shape[1]))]].head(50)

Unnamed: 0,Neighborhood,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Bellaire,933000,0.333,0.009,29.7023,-95.4611,2.0,Pharmacy,Yoga Studio,Dive Bar,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
83,Memorial Close In,2348200,1.539,0.269,29.7696,-95.5201,2.0,Clothing Store,Business Service,Yoga Studio,Doctor's Office,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory
85,Memorial Villages,1368088,0.41,-0.148,29.7696,-95.5201,2.0,Clothing Store,Business Service,Yoga Studio,Doctor's Office,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory
107,Rice/Museum District,819200,0.366,0.1,29.7179,-95.4263,2.0,Boutique,Pet Store,Clothing Store,Beach,Bakery,Yoga Studio,Electronics Store,Food Service,Food,Fondue Restaurant
108,River Oaks Area,2194000,0.809,0.139,29.7517,-95.4054,2.0,Men's Store,Gym,Cosmetics Shop,Spa,Clothing Store,Kids Store,French Restaurant,Italian Restaurant,Café,Gym / Fitness Center
109,Rivercrest,2875000,0.597,0.042,29.7404,-95.5589,2.0,Hotel,Dessert Shop,Pizza Place,Gas Station,Shipping Store,Bank,Bakery,Doctor's Office,Music Store,Juice Bar
129,Tanglewood Area,1643500,0.758,0.028,29.7446,-95.4683,2.0,Hotel,Hotel Bar,Automotive Shop,Gym / Fitness Center,Deli / Bodega,Pharmacy,Clothing Store,Residential Building (Apartment / Condo),Sandwich Place,Shopping Mall
142,West University/Southside Area,1192000,0.558,-0.026,29.7179,-95.4263,2.0,Boutique,Pet Store,Clothing Store,Beach,Bakery,Yoga Studio,Electronics Store,Food Service,Food,Fondue Restaurant


Cluster 3 (Light blue dot on map)

In [58]:
houston_merged.loc[houston_merged['Cluster Labels'] == 3, houston_merged.columns[[0] + list(range(2, houston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,Denver Harbor,105000,1.543,0.235,29.7758,-95.3121,3.0,Fast Food Restaurant,Sandwich Place,Pizza Place,Mexican Restaurant,Pharmacy,Gas Station,Discount Store,Bus Station,Grocery Store,Intersection
39,East End-Galveston,176900,0.474,0.109,29.2983,-94.793,3.0,Museum,Pharmacy,College Football Field,Library,Mexican Restaurant,Fried Chicken Joint,Donut Shop,Bar,Monument / Landmark,Taco Place
51,Greenway Plaza,1005000,1.781,0.951,29.733,-95.4306,3.0,Mexican Restaurant,Bank,Hotel,Thai Restaurant,Seafood Restaurant,Spa,Sandwich Place,Burger Joint,Bar,Café
61,Humble Area South,108300,0.74,0.61,30.0041,-95.2825,3.0,Fried Chicken Joint,Sandwich Place,Fast Food Restaurant,Department Store,Mexican Restaurant,Sporting Goods Shop,Smoke Shop,Shipping Store,Seafood Restaurant,Buffet
73,La Marque,137000,0.73,0.489,29.3676,-94.9742,3.0,Burger Joint,Discount Store,Yoga Studio,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
95,Northeast Houston,91000,0.685,0.022,29.8297,-95.2879,3.0,Fried Chicken Joint,Burger Joint,Convenience Store,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
96,Northside,105000,1.121,0.141,29.8324,-95.472,3.0,Hotel,Gym / Fitness Center,Dive Bar,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
99,Oak Forest West Area,250000,0.678,0.004,29.8324,-95.472,3.0,Hotel,Gym / Fitness Center,Dive Bar,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
110,Riverside,309000,0.907,-0.031,29.7247,-95.3625,3.0,Hotel,Bagel Shop,Food Service,Theater,Cajun / Creole Restaurant,Burger Joint,Snack Place,College Rec Center,Fast Food Restaurant,Factory
130,Texas City,118500,0.823,0.274,29.397,-94.9203,3.0,Convenience Store,Soccer Field,Clothing Store,Discount Store,Bakery,Donut Shop,Food Service,Food,Fondue Restaurant,Flower Shop


Cluster 4 (Teal dot on map)

In [59]:
houston_merged.loc[houston_merged['Cluster Labels'] == 4, houston_merged.columns[[0] + list(range(2, houston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Bayou Vista,240000,0.601,0.16,29.3398,-94.9926,4.0,Bar,Yoga Studio,Doctor's Office,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
56,Hitchcock,114900,0.017,-0.123,29.3398,-94.9926,4.0,Bar,Yoga Studio,Doctor's Office,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
100,Omega Bay,289900,0.734,0.035,29.3398,-94.9926,4.0,Bar,Yoga Studio,Doctor's Office,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor


Cluster 5 (Light brown dot on map)

In [62]:
houston_merged.loc[houston_merged['Cluster Labels'] == 5, houston_merged.columns[[0] + list(range(2, houston_merged.shape[1]))]].head(50)

Unnamed: 0,Neighborhood,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1960/Cypress,179000,0.455,0.085,29.9319,-95.6106,5.0,Pizza Place,Mexican Restaurant,Intersection,Donut Shop,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant
1,Aldine Area,133500,0.571,0.077,29.9067,-95.3334,5.0,Pizza Place,Pharmacy,Mexican Restaurant,Fast Food Restaurant,Chinese Restaurant,Taco Place,Gas Station,Dive Bar,Flower Shop,Factory
2,Alief,164000,0.802,0.147,29.699,-95.5862,5.0,Lounge,Park,Gas Station,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory
3,Alvin North,227000,0.434,0.056,29.412,-95.2515,5.0,Insurance Office,Restaurant,Baseball Field,Dessert Shop,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant
4,Alvin South,163900,0.463,0.062,29.412,-95.2515,5.0,Insurance Office,Restaurant,Baseball Field,Dessert Shop,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant
5,Atascocita North,189900,0.461,0.043,30.0042,-95.1728,5.0,Department Store,Coffee Shop,Trail,Big Box Store,Furniture / Home Store,Shoe Store,Business Service,Gym / Fitness Center,Spa,Paper / Office Supplies Store
7,Bacliff/San Leon,165941,0.739,0.153,29.5055,-94.9893,5.0,Mexican Restaurant,Furniture / Home Store,Yoga Studio,Doctor's Office,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
9,Baytown/Chambers County,167950,0.375,0.059,29.7705,-94.9695,5.0,Hotel,Sandwich Place,Dive Bar,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
10,Baytown/Harris County,149900,0.486,0.063,29.7461,-94.9653,5.0,Martial Arts Dojo,Airport,Dessert Shop,Convenience Store,Hobby Shop,Doctor's Office,Food Service,Food,Fondue Restaurant,Flower Shop
15,Briargrove,824000,0.511,0.03,29.7422,-95.4903,5.0,Steakhouse,Italian Restaurant,Spa,Cajun / Creole Restaurant,Smoothie Shop,Smoke Shop,Sporting Goods Shop,Men's Store,Restaurant,Diner


Cluster 6 (Orange dot on map)

In [61]:
houston_merged.loc[houston_merged['Cluster Labels'] == 6, houston_merged.columns[[0] + list(range(2, houston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Bear Creek,162500,0.548,0.062,29.8323,-95.736,6.0,Home Service,Yoga Studio,Dive Bar,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
28,Conroe Southwest,325000,0.383,-0.044,30.3217,-95.5285,6.0,Home Service,Locksmith,Yoga Studio,Doctor's Office,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory
29,Copperfield Area,204000,0.333,0.046,29.8941,-95.6481,6.0,Home Service,Yoga Studio,Dive Bar,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
64,Katy-North,182000,0.517,0.058,29.8323,-95.736,6.0,Home Service,Yoga Studio,Dive Bar,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
74,La Porte/Shoreacres,159900,0.355,0.088,29.6884,-95.0513,6.0,Home Service,Food,Yoga Studio,Dive Bar,Food Service,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
104,Porter/New Caney East,227000,1.536,-0.025,30.1579,-95.198,6.0,Home Service,Yoga Studio,Dive Bar,Food Service,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor
112,Santa Fe,181250,0.353,0.076,29.4032,-95.0734,6.0,Home Service,Garden Center,Yoga Studio,Dive Bar,Food,Fondue Restaurant,Flower Shop,Fast Food Restaurant,Factory,Eye Doctor


### Results and Discussion <a id="results"></a>

I used 7 clusters in the KMeans model.  There were distinct differences between the clusters and grouped the neighborhoods well.  Below is a simple summary of each cluster with some observations and comments for the client.

Cluster 0:  These neighborhoods are represetned by the Red dots on the map.  The most popular venue data for these is park or recreation center.  The remaining venues are oriented toward residential customers.  The home value appreciation for these neighborhoods were good, with all over 30% for the 2010 - 2016 period.

Cluster1:  These neighborhoods are represented by Purple dots on the map.  They are grouped due the the first or second most popular venue is oriented toward industrial customers.  This indicates that many of the neighborhoods are located in or near an industrial area.  The home appreciation for the period 2010 - 2016 ranged anywhere from just over 10% to over 100%.

Cluster 2:  These neighborhoods are represented by dark Blue dots on the map.  These are most of the luxury neighborhoods in the area with the average 2016 home price over $800,000 and all but 2 neighborhoods well over 1 million.  The venue data confirms this.

Cluster 3:  These neighborhoods are represented on the map by light Blue dots.  These neighborhoods show strong growth over the 2010 - 2016 period. All are over 60% growth with the exception of one at 47%.  Several had growth rates over 100%.  The venue data shows that most of the popular are oriented toward residential customers.  

Cluster4:  These neighborhoods are represented by Teal dots on the map.  There are only three neighborhoods in this group and they all share the same venue data.  The most popular is a bar followed by Yoga studio and Doctor's office.  This indicates that these neighborhoods are close to some type of retail center.  The growth ranged from just under 2% to over 70% for period 2010 - 2016.

Cluster 5:  These neighborhoods are represented by light Brown on the map.  This is the largest group of neighborhoods.  The growth for these has a large range from little growth to nearly 100%.  The venue data, while diverse, does indicate that they are located near retail centers.  Most of the these neighborhoods had moderate home values in 2016.

Cluster 6:  These neighborhoods are represented by Orange dots on the map.  Growth for all these neighborhoods was good with all over 30% and one over 100%.  The venue data for all has home service as the most popular.  All are located further out from the center of the metro area.  From the data, I assume that many venues are oriented toward residential services.

My recommendation to the residential builder would be as follows.  If they want to build luxury homes, start looking in the neighborhoods from cluster 2 as a start point.  If they are looking to build moderate customer or planned neighborhoods, clusters 0, 3, 5, and 6 might be good areas to start looking.  Cluster 3 are all located close to center of metro area and may be a good place to look for urban style living while clusters 0, 5 and 6 are generally further out and might be better for suburan living.  Clusters 1 and 4 seem to be located near industrial or major retail area and should be investigated for further insight. 

### Conclusion <a id="conclusion"></a>

We were able to pull in data from several sources and run an anlysis to try and determine a good location for a residential home builder to locate.  We were able to analyze most of the neighborhoods by some real estate metrics and using Foursquare to present the top 10 venues in each neighborhood.  Based on this analysis, we have provided the home builder with several neighborhoods that could be targets for further study, depending on type and style of homes they want to build.