# *Investigation of Houston City Neighborhoods*

## **1. Introduction**

#### **1.1 Background**


The city of Houston is one of the largest cities in Texas. It is the fourth most populous solo city in the United States. Houston is famous for its economy in broad industries such as energy, manufacturing, aeronautics, and medical. 

Houston is the largest city in the United States by total area. It has many neighborhoods. Houson city has been described as the most racially and ethnically diverse major metropolis in the U.S.A. It is home of many cultural institutions, restaurants and millions of people from all over the world. 

#### **1.2 Business Problem**

Houston is a big city and heavily diversified. Most people don't know the features of the neighborhoods outside their living/working area. It will be interesting to investigate the neighborhoods, hence people can quickly find places to eat, shop, entertaiment, etc. This project is going to investigate the venues within neighborhoods of Houston city and make a thorough report.

#### **1.3 Interest**

Then of thousands of people visit Houston over the world each year. Knowing more about the city before arrival is always helpful. This investigation will be very handy for visitors to get familiar with the city and plan activities. Besides that, Houston residents also benefit from the results and can enjoy more about this great city.

## **2. Data Acquisition**

#### **2.1 Data Sources**

This project is divided into two parts: 1. data acquisition; 2. data process and analysis. We will first download the neighborhoods information from the Wikipedia page. Next use the Geocoder Python package to get the latitude and the longitude coordinates of each neighborhood. Then use the development API to access Foursquare database to explore the neighborhoods. Finally we will cluster the neighborhoods into certain groups and plot them on maps via folium tool.

The data acquisition steps are:
1. Get the neighborhood names from the Wikipedia page of Houston.
2. Use the Geocoder Python package to get the latitude and the longitude coordinates of each neighborhood.
3. Use the Foursquare to collect the venues of the neighborhoods.

The data process and analysis steps are:
1. Clean the data and prepare for clustering
2. Check the venues and plot on the map

#### **2.2 Download Raw Data**

<font color='blue'> __Get the Houston's neighborhood information from Wikipedia page.__ </font> <br>
https://en.wikipedia.org/wiki/List_of_Houston_neighborhoods

In [1]:
!pip install lxml



In [2]:
import pandas as pd
url='https://en.wikipedia.org/wiki/List_of_Houston_neighborhoods'
tables = pd.read_html(url)
table = tables[0]
table.head()

Unnamed: 0,#,Name,Location relative to Downtown Houston,Approximate boundaries
0,1,Willowbrook,Northwest,Along Texas State Highway 249 northwest of Bel...
1,2,Greater Greenspoint,North,Around the junction of Beltway 8 and Interstat...
2,3,Carverdale,Northwest,South of the junction of Beltway 8 and U.S. Ro...
3,4,Fairbanks / Northwest Crossing,Northwest,Along U.S. Route 290 between Interstate 610 an...
4,5,Greater Inwood,Northwest,North of Fairbanks / Northwest Crossing and ea...


## **3. Methodology**

The table from the Wikipedia page needs a lot of cleaning work prior of use. First, we perform some basic analysis to the table, drop useless columns and rename certain clumns for easy understanding.

When looking at the table more carefully, the neighborhood name is not always regular. For example, some of these are A/B pattern. Such pattern may cause problem when getting the Latitude/Longitude information from the Geocode. In this case, it will be splitted into 2 more new columns for A & B. There are some neighborhoods have parenthesis, and it may cause similar issues. This will also be taken care of using the lambda function. Once all the columns are created, a loop with priority will be set to get the geocode information. 

Once the longitude and lengitude information are successfully achieved, a good check will be performed using the Folium to create maps showing the neighborhoods. This will present the neighborhoods of the Houston city to confirm the data were collected correctly.

The next step will use the Foursquare credential to get the venues information for all the neighborhoods. Foursquare can help us to get the venues with given latitude/longitude, the output is json format. We can flat that format to get data frame, grab the key information we need, and prepare a new data frame.

After we have the venues information for all the neighborhoods, we will test K-Means method to group the data. Customer segmentation is the practice of partitioning
a customer base into groups of individuals that have similar characteristics. In our case, it is unsupervised. And K-means can group data based on the similarity of neighborhoods to each other.


#### **3.1 Exploratory Data Analysis (EDA)**

<font color='blue'> __Basic analysis to the table.__ </font>

In [3]:
print(table.shape,'\n' ,table.columns, '\n' ,table.index)

(88, 4) 
 Index(['#', 'Name', 'Location relative to Downtown Houston',
       'Approximate boundaries'],
      dtype='object') 
 RangeIndex(start=0, stop=88, step=1)


In [4]:
table.describe(include='all')

Unnamed: 0,#,Name,Location relative to Downtown Houston,Approximate boundaries
count,88.0,88,88,88
unique,,88,9,88
top,,Lazybrook / Timbergrove,Southwest,"North of Buffalo Bayou, east of Elysian Street..."
freq,,1,15,1
mean,44.5,,,
std,25.547342,,,
min,1.0,,,
25%,22.75,,,
50%,44.5,,,
75%,66.25,,,


<font color='blue'> __Drop the unnecessary columns, and rename the columns for easy understanding.__ </font>

In [5]:
table.drop(['#','Approximate boundaries'], axis=1, inplace=True)
table.rename(columns={'Location relative to Downtown Houston':'Location','Name':'Neighborhood'},inplace=True)
table.head()

Unnamed: 0,Neighborhood,Location
0,Willowbrook,Northwest
1,Greater Greenspoint,North
2,Carverdale,Northwest
3,Fairbanks / Northwest Crossing,Northwest
4,Greater Inwood,Northwest


<font color='blue'> __Save a copy of the table and start to work on data cleaning.__ </font>

In [6]:
table1=table.copy()
table1.iloc[1:16:2,:]

Unnamed: 0,Neighborhood,Location
1,Greater Greenspoint,North
3,Fairbanks / Northwest Crossing,Northwest
5,Acres Home,Northwest
7,Westbranch,West
9,Spring Branch West,West
11,Central Northwest (formerly Near Northwest),Northwest
13,Lazybrook / Timbergrove,Northwest
15,Memorial,West


<font color='blue'> __Looking at the table, there are some cleaning work needed, otherwise it will bring problems when loading the geocode information.__ </font> <br>
<font color='blue'> __1. There are some A/B patterns, which will be split into 2 columns for A & B each.__ </font> <br>
<font color='blue'> __2. There are some values have parenthesis() with values, which will also be split into columns.__ </font> <br>


In [7]:
table1['Left1']=table1['Neighborhood'].apply(lambda x:x.split("/")[0])
table1['Right1']=table1['Neighborhood'].apply(lambda x:x.split("/")[-1])
table1['Left1Left2']=table1['Left1'].apply(lambda x:x.split("(")[0])
table1['Left1Right2']=table1['Left1'].apply(lambda x:x.split("(")[-1])
table1['Left1Right2']=table1['Left1Right2'].apply(lambda x:x.split(")")[0])
table1['Left1Right2']=table1['Left1Right2'].apply(lambda x:x.split("formerly")[-1])
table1['Right1Left2']=table1['Right1'].apply(lambda x:x.split("(")[0])
table1['Right1Right2']=table1['Right1'].apply(lambda x:x.split("(")[-1])
table1['Right1Right2']=table1['Right1Right2'].apply(lambda x:x.split(")")[0])
table1['Right1Right2']=table1['Right1Right2'].apply(lambda x:x.split("formerly")[-1])
table1.drop(['Left1','Right1'],axis=1,inplace=True)
table1.rename(columns={'Left1Left2':'Name_1','Left1Right2':'Name_2','Right1Left2':'Name_3','Right1Right2':'Name_4'},inplace=True)
table1.iloc[1:12:2,:]

Unnamed: 0,Neighborhood,Location,Name_1,Name_2,Name_3,Name_4
1,Greater Greenspoint,North,Greater Greenspoint,Greater Greenspoint,Greater Greenspoint,Greater Greenspoint
3,Fairbanks / Northwest Crossing,Northwest,Fairbanks,Fairbanks,Northwest Crossing,Northwest Crossing
5,Acres Home,Northwest,Acres Home,Acres Home,Acres Home,Acres Home
7,Westbranch,West,Westbranch,Westbranch,Westbranch,Westbranch
9,Spring Branch West,West,Spring Branch West,Spring Branch West,Spring Branch West,Spring Branch West
11,Central Northwest (formerly Near Northwest),Northwest,Central Northwest,Near Northwest,Central Northwest,Near Northwest


<font color='blue'> __In addition, there are a few neighborhoods need minor editing in order to get the geocode information.__ </font> 

In [8]:
#del table2
table2=table1.copy()
for r in (("Eastex - Jensen","Eastex"),
          ("Greater Fifth Ward","Fifth Ward"),
          ("South Main","South Main Street"),
          ("Greater Hobby Area","Hobby Area")):
    table2 = table2.replace(*r)

In [9]:
table2

Unnamed: 0,Neighborhood,Location,Name_1,Name_2,Name_3,Name_4
0,Willowbrook,Northwest,Willowbrook,Willowbrook,Willowbrook,Willowbrook
1,Greater Greenspoint,North,Greater Greenspoint,Greater Greenspoint,Greater Greenspoint,Greater Greenspoint
2,Carverdale,Northwest,Carverdale,Carverdale,Carverdale,Carverdale
3,Fairbanks / Northwest Crossing,Northwest,Fairbanks,Fairbanks,Northwest Crossing,Northwest Crossing
4,Greater Inwood,Northwest,Greater Inwood,Greater Inwood,Greater Inwood,Greater Inwood
5,Acres Home,Northwest,Acres Home,Acres Home,Acres Home,Acres Home
6,Hidden Valley,North,Hidden Valley,Hidden Valley,Hidden Valley,Hidden Valley
7,Westbranch,West,Westbranch,Westbranch,Westbranch,Westbranch
8,Addicks / Park Ten,West,Addicks,Addicks,Park Ten,Park Ten
9,Spring Branch West,West,Spring Branch West,Spring Branch West,Spring Branch West,Spring Branch West


#### **3.2 Data Collection of Latitude/Longitude**

In [11]:
!conda install -c conda-forge geopy --yes 
# uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim 
# convert an address into latitude and longitude values

Collecting package metadata: ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\ProgramData\Anaconda3

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.3.9           |           py37_0         149 KB  conda-forge
    conda-4.7.12               |           py37_0         3.0 MB  conda-forge
    conda-package-handling-1.6.0|   py37h2fa13f4_0         733 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will be INSTALLED:

  conda-package-han~ conda-forge/win-64::conda-package-handling-1.6.0-py37h2fa13f4_0
  ge



  current version: 4.6.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



EnvironmentNotWritableError: The current user does not have write permissions to the target environment.
  environment location: C:\ProgramData\Anaconda3




In [12]:
pd.set_option('display.max_rows', 100)

<font color='blue'> __As shown above, we will use a for loop to use the location information from Neighborhood and splitted names, each by each, \
    and then collect the latitude/longitude information from the geocode. We also output those into lists.__ </font> 

In [13]:
list_0=table1['Neighborhood'].tolist()
list_1=table1['Name_1'].tolist()
list_2=table1['Name_2'].tolist()
list_3=table1['Name_3'].tolist()
list_4=table1['Name_4'].tolist()
i=0
latitude_list=[]
longitude_list=[]
for i in range(0,88):
    data=table2['Neighborhood'][i]
    neighbor=data+', Houston, TX'
    geolocator = Nominatim(user_agent="houston_explorer")
    location = geolocator.geocode(neighbor)
    if location!= None:
        latitude = location.latitude
        longitude = location.longitude
    else:
        data0=table2['Neighborhood'][i]
        neighbor0=data0+', TX'
        geolocator = Nominatim(user_agent="houston_explorer")
        location = geolocator.geocode(neighbor0)
        if location!= None:
            latitude = location.latitude
            longitude = location.longitude
        else:
            data1=table2['Name_1'][i]
            neighbor1=data1+', Houston, TX'
            geolocator = Nominatim(user_agent="houston_explorer")
            location = geolocator.geocode(neighbor1)
            if location!= None:
                latitude = location.latitude
                longitude = location.longitude
            else:
                data2=table2['Name_2'][i]
                neighbor2=data2+', Houston, TX'
                geolocator = Nominatim(user_agent="houston_explorer")
                location = geolocator.geocode(neighbor2)
                if location!= None:
                    latitude = location.latitude
                    longitude = location.longitude
                else:
                    data3=table2['Name_3'][i]
                    neighbor3=data3+', Houston, TX'
                    geolocator = Nominatim(user_agent="houston_explorer")
                    location = geolocator.geocode(neighbor3)
                    if location!= None:
                        latitude = location.latitude
                        longitude = location.longitude
                    else:
                        data4=table2['Name_4'][i]
                        neighbor4=data4+', Houston, TX'
                        geolocator = Nominatim(user_agent="houston_explorer")
                        location = geolocator.geocode(neighbor4)
                        if location!= None:
                            latitude = location.latitude
                            longitude = location.longitude
                        else:
                            latitude = 'NaN'
                            longitude = 'NaN'
    print('Number',i,': The geograpical coordinate of ', data , 'are {}, {}.'.format(latitude, longitude))
    latitude_list.append(latitude)
    longitude_list.append(longitude)
    i=i+1

Number 0 : The geograpical coordinate of  Willowbrook are 29.66025385, -95.4560961512888.
Number 1 : The geograpical coordinate of  Greater Greenspoint are 29.944719, -95.4160739082852.
Number 2 : The geograpical coordinate of  Carverdale are 29.8486866, -95.5394501446645.
Number 3 : The geograpical coordinate of  Fairbanks / Northwest Crossing are 29.85382035, -95.5045965150636.
Number 4 : The geograpical coordinate of  Greater Inwood are 51.4626008, -0.361756032593902.
Number 5 : The geograpical coordinate of  Acres Home are 29.770793, -95.328129.
Number 6 : The geograpical coordinate of  Hidden Valley are 39.489543, -119.752991.
Number 7 : The geograpical coordinate of  Westbranch are 29.839541, -95.5515223510907.
Number 8 : The geograpical coordinate of  Addicks / Park Ten are 29.7824508, -95.6424457.
Number 9 : The geograpical coordinate of  Spring Branch West are 29.8014103, -95.5478793.
Number 10 : The geograpical coordinate of  Langwood are 29.8262816, -95.4827951690116.
Number

<font color='blue'> __Now let's add these latitude/longitude information to the table, clean that, and review the table.__ </font> 

In [14]:
table2['latitude']=latitude_list
table2['longitude']=longitude_list
table2.drop(['Name_1','Name_2','Name_3','Name_4'],axis=1, inplace=True)
table2['ID']=list(table2.index+1)
#table2=table2[['ID','Neighborhood','Location','latitude','longitude']]
#table2.set_index('ID',inplace=True)
table2

Unnamed: 0,Neighborhood,Location,latitude,longitude,ID
0,Willowbrook,Northwest,29.660254,-95.456096,1
1,Greater Greenspoint,North,29.944719,-95.416074,2
2,Carverdale,Northwest,29.848687,-95.53945,3
3,Fairbanks / Northwest Crossing,Northwest,29.85382,-95.504597,4
4,Greater Inwood,Northwest,51.462601,-0.361756,5
5,Acres Home,Northwest,29.770793,-95.328129,6
6,Hidden Valley,North,39.489543,-119.752991,7
7,Westbranch,West,29.839541,-95.551522,8
8,Addicks / Park Ten,West,29.782451,-95.642446,9
9,Spring Branch West,West,29.80141,-95.547879,10


<font color='blue'> __To confirm, let's check if there is any NaN in the table.__ </font> 

In [15]:
table2[table2.isnull().any(axis=1)]

Unnamed: 0,Neighborhood,Location,latitude,longitude,ID


In [16]:
table2[table2.Neighborhood.str.contains('NaN')]

Unnamed: 0,Neighborhood,Location,latitude,longitude,ID


#### **3.3 Houston Map Using Latitude/Longitude Collected**

In [18]:
import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
# uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


Collecting package metadata: ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\ProgramData\Anaconda3

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-3.2.0               |           py37_0         749 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         749 KB

The following NEW packages will be INSTALLED:

  altair             conda-forge/win-64::altair-3.2.0-py37_0
  branca             conda-forge/noarch::branca-0.3.1-py_0
  conda-package-han~ conda-forge/win-64::conda-package-handling-1.6.0-py37h2fa13f4_0
  folium             conda-forge/noarch::folium-0.5.0-py_0
  vincent            conda-forge/noarch::vincent-0.4.4-py_1

The following packages will be UPDATED:

  conda                      



  current version: 4.6.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



EnvironmentNotWritableError: The current user does not have write permissions to the target environment.
  environment location: C:\ProgramData\Anaconda3




In [19]:
# create map of Houston using latitude and longitude values
map_houston = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neigh, ID in zip(table2['latitude'], table2['longitude'], table2['Neighborhood'], table2['ID']):
    label = '{}, {}'.format(neigh, ID)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_houston)  
    
map_houston

In [20]:
table2

Unnamed: 0,Neighborhood,Location,latitude,longitude,ID
0,Willowbrook,Northwest,29.660254,-95.456096,1
1,Greater Greenspoint,North,29.944719,-95.416074,2
2,Carverdale,Northwest,29.848687,-95.53945,3
3,Fairbanks / Northwest Crossing,Northwest,29.85382,-95.504597,4
4,Greater Inwood,Northwest,51.462601,-0.361756,5
5,Acres Home,Northwest,29.770793,-95.328129,6
6,Hidden Valley,North,39.489543,-119.752991,7
7,Westbranch,West,29.839541,-95.551522,8
8,Addicks / Park Ten,West,29.782451,-95.642446,9
9,Spring Branch West,West,29.80141,-95.547879,10


#### **3.4 Define Foursquare Credentials and Version**

In [21]:
CLIENT_ID = '2XIKZYRLDQYID3TKI15J30RCPNPKYFMNTXNELVCWTVYYA0UD' # your Foursquare ID
CLIENT_SECRET = '4QEYKHANLVBUBZHGYIGN1TXFBXGXF2KKREZ0XWP3UVXFANXL' # your Foursquare Secret
VERSION = '20191014'
#LIMIT = 30
print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: 2XIKZYRLDQYID3TKI15J30RCPNPKYFMNTXNELVCWTVYYA0UD
CLIENT_SECRET:4QEYKHANLVBUBZHGYIGN1TXFBXGXF2KKREZ0XWP3UVXFANXL


<font color='blue'> __As a test, let's check the top 100 venues that are in a downtown neighborhood within a radius of 500 meters.__ </font> 

In [22]:
downtown_latitude = table2.loc[35, 'latitude'] # neighborhood latitude value
downtown_longitude = table2.loc[35, 'longitude'] # neighborhood longitude value

downtown_name = table2.loc[35, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} Houston are {}, {}.'.format(downtown_name, 
                                                               downtown_latitude, 
                                                               downtown_longitude))

Latitude and longitude values of Brays Oaks (formerly Greater Fondren S.W.) Houston are 29.724266, -95.286365.


In [23]:
LIMIT = 10 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    downtown_latitude, 
    downtown_longitude, 
    radius, 
    LIMIT)
url # display URL


'https://api.foursquare.com/v2/venues/explore?&client_id=2XIKZYRLDQYID3TKI15J30RCPNPKYFMNTXNELVCWTVYYA0UD&client_secret=4QEYKHANLVBUBZHGYIGN1TXFBXGXF2KKREZ0XWP3UVXFANXL&v=20191014&ll=29.724266,-95.286365&radius=1000&limit=10'

In [24]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5da53e7ebf7dde002c1a97d9'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Eastside',
  'headerFullLocation': 'Eastside, Houston',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 9,
  'suggestedBounds': {'ne': {'lat': 29.73326600900001,
    'lng': -95.27602070989629},
   'sw': {'lat': 29.715265990999992, 'lng': -95.29670929010372}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ba673cbf964a5200c5439e3',
       'name': 'Mason Park',
       'location': {'address': '449-499 S 75th St',
        'lat': 29.7305507051007,
        'lng': -95.29315818029116,
        'labeledLatLngs': [{'label': 'display',
          'lat': 29.73055070

<font color='blue'> __From the Foursquare lab in the previous module, we know that all the information is in the _items_ key. Before we proceed, let's borrow the _get_category_type_ function from the Foursquare lab.__ </font> 

In [25]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [26]:
venues = results['response']['groups'][0]['items']
venues

[{'reasons': {'count': 0,
   'items': [{'summary': 'This spot is popular',
     'type': 'general',
     'reasonName': 'globalInteractionReason'}]},
  'venue': {'id': '4ba673cbf964a5200c5439e3',
   'name': 'Mason Park',
   'location': {'address': '449-499 S 75th St',
    'lat': 29.7305507051007,
    'lng': -95.29315818029116,
    'labeledLatLngs': [{'label': 'display',
      'lat': 29.7305507051007,
      'lng': -95.29315818029116}],
    'distance': 959,
    'postalCode': '77023',
    'cc': 'US',
    'city': 'Houston',
    'state': 'TX',
    'country': 'United States',
    'formattedAddress': ['449-499 S 75th St',
     'Houston, TX 77023',
     'United States']},
   'categories': [{'id': '4bf58dd8d48988d163941735',
     'name': 'Park',
     'pluralName': 'Parks',
     'shortName': 'Park',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/park_',
      'suffix': '.png'},
     'primary': True}],
   'photos': {'count': 0, 'groups': []}},
  'referralId': 'e-0-4b

In [27]:
nearby_venues = json_normalize(venues) # flatten JSON
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Mason Park,Park,29.730551,-95.293158
1,El Jardin Mexican Restaurant,Mexican Restaurant,29.730553,-95.284302
2,Domino's Pizza,Pizza Place,29.7165,-95.2898
3,Jack in the Box,Fast Food Restaurant,29.732392,-95.29044
4,REY DEL POLLO- THE CHICKEN KING,Mexican Restaurant,29.716755,-95.291112
5,Mason Park Baseball Field,Baseball Field,29.724155,-95.294011
6,Rostisados Venecia,Food,29.72308,-95.278568
7,La Victoria,Taco Place,29.717383,-95.292643
8,Martini True Value Hardware,Hardware Store,29.71739,-95.29178


#### **3.5 Using Foursquare to Investigate All the neighborhoods**

In [28]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    LIMIT = 100
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [30]:
houston_venues=getNearbyVenues(names=table2['Neighborhood'],
                                   latitudes=table2['latitude'],
                                   longitudes=table2['longitude']
                                  )

Willowbrook
Greater Greenspoint
Carverdale
Fairbanks / Northwest Crossing
Greater Inwood
Acres Home
Hidden Valley
Westbranch
Addicks / Park Ten
Spring Branch West
Langwood
Central Northwest (formerly Near Northwest)
Independence Heights
Lazybrook / Timbergrove
Greater Heights
Memorial
Eldridge / West Oaks
Briar Forest
Westchase
Mid-West (formerly Woodlake/Briarmeadow)
Greater Uptown
Washington Avenue Coalition / Memorial Park
Afton Oaks / River Oaks
Neartown / Montrose
Alief
Sharpstown
Gulfton
University Place
Westwood
Braeburn
Meyerland
Braeswood
Medical Center
Astrodome Area
South Main Street
Brays Oaks (formerly Greater Fondren S.W.)
Westbury
Willow Meadows / Willowbend
Fondren Gardens
Central Southwest
Fort Bend / Houston
IAH Airport
Kingwood
Lake Houston
Northside / Northline
Eastex
East Little York / Homestead
Trinity / Houston Gardens
East Houston
Settegast
Northside Village
Kashmere Gardens
El Dorado / Oates Prairie
Hunterwood
Fifth Ward
Denver Harbor / Port Houston
Pleasantvil

In [31]:
houston_venues.shape

(4259, 7)

In [32]:
houston_venues.iloc[0:2876:200,:]

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Willowbrook,29.660254,-95.456096,Cue's Burgers 'n More,29.668602,-95.462011,Burger Joint
200,Fairbanks / Northwest Crossing,29.85382,-95.504597,APP Manufacturing,29.846065,-95.514525,Factory
400,Addicks / Park Ten,29.782451,-95.642446,Residence Inn Marriot,29.7839,-95.650501,Hotel
600,Lazybrook / Timbergrove,29.803899,-95.439195,Applebee's Grill + Bar,29.802599,-95.44774,American Restaurant
800,Eldridge / West Oaks,29.742049,-95.625406,Paisa's Twin Colombian Restaurant,29.735317,-95.637883,South American Restaurant
1000,Mid-West (formerly Woodlake/Briarmeadow),29.739332,-95.535835,Fuzzy's Pizza,29.737458,-95.522356,Pizza Place
1200,Afton Oaks / River Oaks,29.7315,-95.453725,Amorino,29.743416,-95.453363,Ice Cream Shop
1400,Sharpstown,29.687611,-95.521051,Maharaja Bhog,29.68887,-95.531143,Indian Restaurant
1600,Westwood,29.682478,-95.554908,Mambo Seafood,29.683145,-95.559513,Seafood Restaurant
1800,Meyerland,29.67983,-95.467977,Shipley Do-Nuts,29.667189,-95.462706,Donut Shop


In [33]:
len(houston_venues['Neighborhood'].unique())

87

In [34]:
len(houston_venues['Venue Category'].unique())

318

In [35]:
print(houston_venues['Venue Category'].unique())

['Burger Joint' 'Bar' 'Accessories Store' 'Salon / Barbershop'
 'Breakfast Spot' 'Fried Chicken Joint' 'Dive Bar' 'Gas Station'
 'Supermarket' 'Mexican Restaurant' 'Wings Joint' 'Department Store'
 'Grocery Store' 'Restaurant' 'Fast Food Restaurant' 'Coffee Shop'
 'Video Store' 'Resort' 'Sandwich Place' 'Donut Shop' 'Taco Place'
 'Pizza Place' 'BBQ Joint' 'Park' 'Cosmetics Shop' 'Mobile Phone Shop'
 'Bank' 'Pharmacy' 'Discount Store' 'Bakery' 'Event Service'
 'Massage Studio' 'IT Services' 'Shop & Service' 'Food Truck'
 'Soccer Stadium' 'Furniture / Home Store' 'Pawn Shop' "Women's Store"
 'Movie Theater' 'American Restaurant' 'Smoothie Shop' 'Supplement Shop'
 'Japanese Restaurant' 'Shoe Store' 'Chinese Restaurant'
 'Seafood Restaurant' 'Frozen Yogurt Shop' 'Wine Bar' 'Gym'
 'Storage Facility' 'Asian Restaurant' 'Indian Restaurant'
 'Italian Restaurant' 'Hotel' 'Flea Market' 'Thrift / Vintage Store'
 'Hotel Pool' 'Shopping Mall' 'Moving Target'
 'Paper / Office Supplies Store' 'Clothi

In [36]:
houston_venues['Venue Category'].value_counts()

Fast Food Restaurant               171
Mexican Restaurant                 157
Pizza Place                        149
Coffee Shop                        131
Sandwich Place                     126
Hotel                              126
Park                                97
Burger Joint                        96
Discount Store                      80
Gas Station                         75
Bar                                 71
American Restaurant                 67
Fried Chicken Joint                 67
Pharmacy                            63
Italian Restaurant                  61
Seafood Restaurant                  61
Chinese Restaurant                  58
Grocery Store                       58
Mobile Phone Shop                   57
Restaurant                          55
Bank                                54
BBQ Joint                           52
Convenience Store                   47
Gym / Fitness Center                47
Gym                                 43
Clothing Store           

#### **3.6 Houston Neighborhoods Analysis Using K-Means Method**

In [37]:
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 1000)

In [38]:
# one hot encoding
houston_onehot = pd.get_dummies(houston_venues[['Venue Category']], prefix="", prefix_sep="")
print(houston_onehot.shape)
#houston_onehot.iloc[0:1046:100,:]

(4259, 318)


In [39]:
houston_onehot['Neighborhood']=houston_venues['Neighborhood']
column_list=[houston_onehot.columns[-1]]+list(houston_onehot.columns[:-1])
houston_onehot=houston_onehot[column_list]
houston_onehot.iloc[0:2890:500,:]

Unnamed: 0,Neighborhood,ATM,Accessories Store,Advertising Agency,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Casino,Check Cashing Service,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Baseball Diamond,College Residence Hall,College Theater,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Creperie,Cupcake Shop,Currency Exchange,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Electronics Store,Empanada Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gymnastics Gym,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,Office,Optical Shop,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoors & Recreation,Outlet Store,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pawn Shop,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Print Shop,Pub,Public Art,Ramen Restaurant,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Restaurant,Rock Club,Rugby Pitch,Salad Place,Salon / Barbershop,Sandwich Place,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,South American Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Toll Plaza,Track,Track Stadium,Trail,Turkish Restaurant,Used Bookstore,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Willowbrook,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
500,Central Northwest (formerly Near Northwest),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1000,Mid-West (formerly Woodlake/Briarmeadow),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1500,Gulfton,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2000,Astrodome Area,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2500,Eastex,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


<font color='blue'> __From the table, let's group by taking the means of the frequency of occurrence of each category.__ </font> 

In [40]:
houston_group=houston_onehot.groupby('Neighborhood').max().reset_index()
print(houston_group.shape)
houston_group.head()

(87, 319)


Unnamed: 0,Neighborhood,ATM,Accessories Store,Advertising Agency,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Casino,Check Cashing Service,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Baseball Diamond,College Residence Hall,College Theater,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Creperie,Cupcake Shop,Currency Exchange,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Electronics Store,Empanada Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gymnastics Gym,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,Office,Optical Shop,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoors & Recreation,Outlet Store,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pawn Shop,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Print Shop,Pub,Public Art,Ramen Restaurant,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Restaurant,Rock Club,Rugby Pitch,Salad Place,Salon / Barbershop,Sandwich Place,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,South American Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Toll Plaza,Track,Track Stadium,Trail,Turkish Restaurant,Used Bookstore,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Acres Home,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
1,Addicks / Park Ten,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
2,Afton Oaks / River Oaks,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,1,1,0,0,1,1,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,1,0,1,1,1,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,1,0,1,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0
3,Alief,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
4,Astrodome Area,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1,0,0,1,0,1,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,1,1,0,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0


<font color='blue'>  __Foe each neighborhood, let's check the top 5 most common venues.__ </font> 

In [41]:
num_top_venues = 5
count=0

for hood in houston_group['Neighborhood']:
#    print("----"+hood+"----")
    temp = houston_group[houston_group['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    count=count+1
    if count<6:
        print("----"+hood+"----")
        print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
        print('Neighborhood sequence number as ',count,'\n')
    else:
        continue
print('Note here only print first 5 neighborhoods to check',' \n')

----Acres Home----
              venue  freq
0       Flower Shop   1.0
1       Video Store   1.0
2    Sandwich Place   1.0
3              Bank   1.0
4  Business Service   1.0
Neighborhood sequence number as  1 

----Addicks / Park Ten----
            venue  freq
0  Cosmetics Shop   1.0
1     Pizza Place   1.0
2      Sports Bar   1.0
3            Café   1.0
4  Shipping Store   1.0
Neighborhood sequence number as  2 

----Afton Oaks / River Oaks----
            venue  freq
0   Jewelry Store   1.0
1      Steakhouse   1.0
2     Coffee Shop   1.0
3      Nail Salon   1.0
4  Clothing Store   1.0
Neighborhood sequence number as  3 

----Alief----
              venue  freq
0  Football Stadium   1.0
1  Asian Restaurant   1.0
2          Pharmacy   1.0
3    Discount Store   1.0
4        Restaurant   1.0
Neighborhood sequence number as  4 

----Astrodome Area----
                   venue  freq
0             Hookah Bar   1.0
1             Taco Place   1.0
2             Shoe Store   1.0
3     Chinese

<font color='blue'>  __Now let's create the new dataframe and display the top 5 venues for each neighborhood.__ </font> 

In [42]:
import numpy as np
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [43]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = houston_group['Neighborhood']

for ind in np.arange(houston_group.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(houston_group.iloc[ind, :], num_top_venues)

print('Dataframe shape is',neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.loc[0:38:5,:]

Dataframe shape is (87, 6)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Acres Home,Health Food Store,Intersection,Pharmacy,Park,Pub
5,Braeburn,Bar,Food,Indian Restaurant,Bank,School
10,Central Northwest (formerly Near Northwest),Pizza Place,Chinese Restaurant,Mexican Restaurant,Spa,Big Box Store
15,East Houston,BBQ Joint,Museum,Convenience Store,Moving Target,Seafood Restaurant
20,Eldridge / West Oaks,Home Service,Convenience Store,Paper / Office Supplies Store,Clothing Store,Pet Store
25,Fourth Ward,Park,Hotel Bar,Restaurant,Rock Club,Salon / Barbershop
30,Greater Inwood,Pet Store,Bakery,Bed & Breakfast,Sporting Goods Shop,Fish & Chips Shop
35,Gulfgate Riverview / Pine Valley,Accessories Store,Baseball Field,Park,Skate Park,Zoo Exhibit


<font color='blue'>  __With the data we created, we can sort the neighborhoods into 6 groups.__ </font> <br>
<font color='blue'>  __K-Means was applied here.__ </font>

In [44]:
#houston_group.drop('Neighborhood', 1)
houston_group.shape

(87, 319)

In [45]:
# set number of clusters
kclusters = 6

houston_group_clustering = houston_group.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(houston_group_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[1:20:2] 

array([2, 1, 5, 5, 1, 1, 1, 1, 5, 1])

In [46]:
#neighborhoods_venues_sorted.drop(columns='Cluster Labels',inplace=True)

In [47]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

houston_merged = table2

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
houston_merged = houston_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

#houston_merged # check the last columns!

In [48]:
houston_final=houston_merged.dropna()
houston_final.reset_index(drop=True)
houston_final.astype({"Cluster Labels":int})

Unnamed: 0,Neighborhood,Location,latitude,longitude,ID,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Willowbrook,Northwest,29.660254,-95.456096,1,5,Bank,Gas Station,Furniture / Home Store,Burger Joint,Fried Chicken Joint
1,Greater Greenspoint,North,29.944719,-95.416074,2,5,Coffee Shop,Movie Theater,BBQ Joint,Indian Restaurant,Frozen Yogurt Shop
2,Carverdale,Northwest,29.848687,-95.53945,3,1,Clothing Store,Gas Station,Moving Target,Burger Joint,Paper / Office Supplies Store
3,Fairbanks / Northwest Crossing,Northwest,29.85382,-95.504597,4,4,Fast Food Restaurant,Seafood Restaurant,Buffet,Electronics Store,Rental Service
4,Greater Inwood,Northwest,51.462601,-0.361756,5,5,Pet Store,Bakery,Bed & Breakfast,Sporting Goods Shop,Fish & Chips Shop
5,Acres Home,Northwest,29.770793,-95.328129,6,1,Health Food Store,Intersection,Pharmacy,Park,Pub
6,Hidden Valley,North,39.489543,-119.752991,7,5,Italian Restaurant,Furniture / Home Store,Sandwich Place,Discount Store,Shipping Store
7,Westbranch,West,29.839541,-95.551522,8,1,Gas Station,Park,Taco Place,Café,Furniture / Home Store
8,Addicks / Park Ten,West,29.782451,-95.642446,9,2,Cosmetics Shop,Indian Restaurant,Furniture / Home Store,Thai Restaurant,Bakery
9,Spring Branch West,West,29.80141,-95.547879,10,5,Paper / Office Supplies Store,Mexican Restaurant,Café,Donut Shop,Fried Chicken Joint


In [49]:
houston_final['Cluster Labels'].value_counts()

1.0    30
5.0    28
2.0    14
3.0     7
4.0     5
0.0     3
Name: Cluster Labels, dtype: int64

## **4. Results**

Now we can visualize the neighborhoods groups of Houston. We will first make a comparison of the Houston map to see the distribution of the venues. From the analysis we get 6 groups. And we will see the map of each group individually as well.
1. Houston Neighborhood Map
2. Houston Clustered Neighborhood Map (of all groups)
3, Individual Clustered Neighborhood Map

From the visualization results, we will discuss the observation and recommendation to people need. 
 

<font color='blue'>  __Now we can visualize the neighborhoods groups of Houston.__ </font> <font color='blue'>  __We will do a comparison of the Houston map.__ </font>

<font color='blue'>  __This is the raw Houston neighborhoods map without clustering.__ </font>

In [50]:
# create map of Houston using latitude and longitude values
map_houston_no_group = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neigh, ID in zip(table2['latitude'], table2['longitude'], table2['Neighborhood'], table2['ID']):
    label = '{}, {}'.format(neigh, ID)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=False,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_houston_no_group)  
    
map_houston_no_group

<font color='blue'>  __This is the Houston neighborhoods clustered map for all groups.__ </font>

In [51]:
# create map of Houston using latitude and longitude values
map_houston_grouped_all = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(houston_final['latitude'], houston_final['longitude'], houston_final['Neighborhood'], houston_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-2)],
        fill_opacity=2).add_to(map_houston_grouped_all)
    
map_houston_grouped_all

<font color='blue'>  __This is the Houston neighborhoods clustered map for the first group.__ </font>

In [52]:
# create map of Houston using latitude and longitude values
map_houston_grouped_0 = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(houston_final['latitude'], houston_final['longitude'], houston_final['Neighborhood'], houston_final['Cluster Labels']):
    if cluster == 0:
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[int(cluster-1)],
            fill=True,
            fill_color=rainbow[int(cluster-2)],
            fill_opacity=2).add_to(map_houston_grouped_0)
    
map_houston_grouped_0

<font color='blue'>  __This is the Houston neighborhoods clustered map for the second group.__ </font>

In [53]:
# create map of Houston using latitude and longitude values
map_houston_grouped_1 = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(houston_final['latitude'], houston_final['longitude'], houston_final['Neighborhood'], houston_final['Cluster Labels']):
    if cluster == 1:
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[int(cluster-1)],
            fill=True,
            fill_color=rainbow[int(cluster-2)],
            fill_opacity=2).add_to(map_houston_grouped_1)
    
map_houston_grouped_1

<font color='blue'>  __This is the Houston neighborhoods clustered map for the third group.__ </font>

In [54]:
# create map of Houston using latitude and longitude values
map_houston_grouped_2 = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(houston_final['latitude'], houston_final['longitude'], houston_final['Neighborhood'], houston_final['Cluster Labels']):
    if cluster == 2:
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[int(cluster-1)],
            fill=True,
            fill_color=rainbow[int(cluster-2)],
            fill_opacity=2).add_to(map_houston_grouped_2)
    
map_houston_grouped_2

<font color='blue'>  __This is the Houston neighborhoods clustered map for the fourth group.__ </font>

In [55]:
# create map of Houston using latitude and longitude values
map_houston_grouped_3 = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(houston_final['latitude'], houston_final['longitude'], houston_final['Neighborhood'], houston_final['Cluster Labels']):
    if cluster == 3:
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[int(cluster-1)],
            fill=True,
            fill_color=rainbow[int(cluster-2)],
            fill_opacity=2).add_to(map_houston_grouped_3)
    
map_houston_grouped_3

<font color='blue'>  __This is the Houston neighborhoods clustered map for the fifth group.__ </font>

In [56]:
# create map of Houston using latitude and longitude values
map_houston_grouped_4 = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(houston_final['latitude'], houston_final['longitude'], houston_final['Neighborhood'], houston_final['Cluster Labels']):
    if cluster == 4:
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[int(cluster-1)],
            fill=True,
            fill_color=rainbow[int(cluster-2)],
            fill_opacity=2).add_to(map_houston_grouped_4)
    
map_houston_grouped_4

<font color='blue'>  __This is the Houston neighborhoods clustered map for the sixth group.__ </font>

In [57]:
# create map of Houston using latitude and longitude values
map_houston_grouped_5 = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(houston_final['latitude'], houston_final['longitude'], houston_final['Neighborhood'], houston_final['Cluster Labels']):
    if cluster == 5:
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[int(cluster-1)],
            fill=True,
            fill_color=rainbow[int(cluster-2)],
            fill_opacity=2).add_to(map_houston_grouped_5)
    
map_houston_grouped_5

## **5. Discussion**

Looking at the analysis results, we can see that distribution of the Houston neighborhoods is quite interesting. AS a big city, it clearly offers various venues for people to know, find and enjoy.

In this case the neighborhoods are clustered into 6 groups. The number size of each group are:

Group_0 - 11  
Group_1 - 37  
Group_2 - 7  
Group_3 - 24  
Group_4 - 2  
Group_5 - 6  

The biggest group has 37 venus, and it shows the wide spread of the restaurants. This makes good sense as Houston is a city with name for its diversity and good food. Due to weather conditions, Houston is not a place with a lot of travel sites. The second largest one has 24, and it shows the shopping is a popular activity. The third one has 11, which indicates the activities around the Banking. Considering it as the oil Capital city, one can imagine how busy the banks are. The fourth group shows the coffee and fast food. The fifth one which has a number as 6, shows the real life of Houstonians, they go to supermarkets, home furniture stores, etc. The smallest one, which only has a number 2, reflects the only popular visiting area of Houston, the Memorial park area. It has a big zoo, a good outdoor park, and museums nearby. This is the place people looking for relaxation with family.

The clustering definitely can help people coming to Houston to have some ideas first. They probably won't bring kids to here for a long vacation as there is no Disney or Lego Land. For people who likes various food, they probably will have a good impression of the city. If that's a business trip, then it's perfect probably. For anyone who likes to shopping, Houston is also a good choice, it has everything from regular to luxury. Overall, people will find the part they like Houston, and the part Houston is missing.



## **6. Conclusion**

The project gives good results with useful information. As we know, everyday many people come to Houston over the world. Houston is so big and diversified. The visitors would want to know more about the city before they arrive. This result is very useful for them to get familiar with the city and plan for their trip. The local people can get benefits as well.

Looking forward, this project have space to be further improved. One example is that it may combine with zip code along with the population information. As the project is based on the neighborhoods, and the neighborhoods varies in the size, population, etc. One example is that if we weighs the population factor, that probably will improve the analysis accuracy. 
