 <h1 align=center><font size = 6>Where & What Investment in Mobile County, Alabama Will Yeild Maximum Return on Investment (ROI)?</font></h1>
 <h1 align = right><font size = 2>Created By - Neel Patel</font></h1>

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Introduction/Business Problem</a>
2. <a href="#item2">Data Description</a>
3. <a href="#item3">Data Preparation</a>
4. <a href="#item4">Visualizing and Exploring</a>
5. <a href="#item5">Modeling and Clustering</a>

    </div>
    </font>

## 1. Introduction/Business Problem

   In the last decade, Mobile county has become home to Airbus final assembly line, Amazon sorting facility, WalMart distributing to name a few, which has led to influx of residents. The county is also home to two universities, a private and a community college and draws people from all walks of life including professionals, college students and tourists (the county is known for it's Mardi Gras origins and beaches). Hence, in the recent past, the county has attracted many investors from around the world and they would like to know which type of investment will be ideal in each neighborhood.

   The following analysis will explore the neighborhoods of Mobile county and determine ideal investment type in each neighborhood based on the existing trends.      

## 2. Data

To execute the aforementioned idea, a dataset (base dataset) containing borough, neighborhood, postal code and geographical coordinates (for borough) along with Foursqaure location data will be used. In the base dataset the neighborhoods are defined by multiple listing service zones (MLS), a measure used by local realtors in the real estate community. Whereas, the boroughs are defined by the cities in the county and the geographical coordinates represents each neighborhood. 

Additionally, the Foursqaure data will be used to explore each borough and neighborhood in Mobile county. An in-built function 'Explore' will be used to get top venues in each clusters and then develop clusters of top venues (grouped by neighborhood) using k-means machine learning algorithm.  

## 3. Data Preparation

First, all the required dependencies will be downloaded and imported.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import seaborn as sns
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

!pip install lxml
!pip install BeautifulSoup4
!pip install html5lib
!pip install geocoder

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ###############################

Now, the base dataset will be downloaded and read into a DataFrame using pandas.

In [2]:
raw_data_link = 'https://github.com/patelneel17/Coursera_Capstone/blob/master/base_data.csv'
data = pd.read_html(raw_data_link)
df = data[0]
print(df.shape)
df.head()

(446, 6)


Unnamed: 0.1,Unnamed: 0,City,Postal Code,MLS Area,Latitude (generated),Longitude (generated)
0,,ATMORE,36502.0,29 - NE Mobile City/Mt. Vernon,31.0238,-87.4939
1,,AXIS,36502.0,28 - Saraland/Satsuma/Axis/Creola,30.9299,-88.0272
2,,AXIS,36505.0,29 - NE Mobile City/Mt. Vernon,30.9299,-88.0272
3,,AXIS,36505.0,28 - Saraland/Satsuma/Axis/Creola,30.9299,-88.0272
4,,AXIS,36525.0,28 - Saraland/Satsuma/Axis/Creola,30.9299,-88.0272


Next, Unnamed0 column will be dropped and then all rows with NULL values in Latitude and Longitude columns will be filtered out as they represent unincorporated areas in Mobile county 

In [3]:
df = df.drop('Unnamed: 0',axis =1)
df = df.dropna()
print(df.shape)
df.head()

(368, 5)


Unnamed: 0,City,Postal Code,MLS Area,Latitude (generated),Longitude (generated)
0,ATMORE,36502.0,29 - NE Mobile City/Mt. Vernon,31.0238,-87.4939
1,AXIS,36502.0,28 - Saraland/Satsuma/Axis/Creola,30.9299,-88.0272
2,AXIS,36505.0,29 - NE Mobile City/Mt. Vernon,30.9299,-88.0272
3,AXIS,36505.0,28 - Saraland/Satsuma/Axis/Creola,30.9299,-88.0272
4,AXIS,36525.0,28 - Saraland/Satsuma/Axis/Creola,30.9299,-88.0272


In [4]:
#Rename the columns names
df = df.rename(columns = {"City":"Borough","Postal Code":"PostalCode","MLS Area":"Neighborhood","Latitude (generated)":"Lat-Boro","Longitude (generated)":"Long-Boro"})
df.head()

Unnamed: 0,Borough,PostalCode,Neighborhood,Lat-Boro,Long-Boro
0,ATMORE,36502.0,29 - NE Mobile City/Mt. Vernon,31.0238,-87.4939
1,AXIS,36502.0,28 - Saraland/Satsuma/Axis/Creola,30.9299,-88.0272
2,AXIS,36505.0,29 - NE Mobile City/Mt. Vernon,30.9299,-88.0272
3,AXIS,36505.0,28 - Saraland/Satsuma/Axis/Creola,30.9299,-88.0272
4,AXIS,36525.0,28 - Saraland/Satsuma/Axis/Creola,30.9299,-88.0272


In [5]:
#Remove numbers from Neighborhood (MLS Area) and decimal from Postal Code
df['Neighborhood'] = df['Neighborhood'].str[5:]

df['PostalCode'] = df['PostalCode'].astype(str).str[:-2].astype(np.int64)
df.head()

Unnamed: 0,Borough,PostalCode,Neighborhood,Lat-Boro,Long-Boro
0,ATMORE,36502,NE Mobile City/Mt. Vernon,31.0238,-87.4939
1,AXIS,36502,Saraland/Satsuma/Axis/Creola,30.9299,-88.0272
2,AXIS,36505,NE Mobile City/Mt. Vernon,30.9299,-88.0272
3,AXIS,36505,Saraland/Satsuma/Axis/Creola,30.9299,-88.0272
4,AXIS,36525,Saraland/Satsuma/Axis/Creola,30.9299,-88.0272


In [6]:
#Grouping Neighborhoods by Postal Code 
df_1 = df.groupby('PostalCode').agg({'Borough':'first','Neighborhood':','.join}).reset_index()
print(df_1.shape)

(71, 3)


In [7]:
#moving Borough column to first index and checking for non assigned or null values in Neighborhood
df_1 = df_1.set_index('Borough')
df_2  = df_1[df_1.Neighborhood == 'NaN']
df_1 = df_1.reset_index()
print(df_2)
df_1.head()

Empty DataFrame
Columns: [PostalCode, Neighborhood]
Index: []


Unnamed: 0,Borough,PostalCode,Neighborhood
0,MOBILE,33605,Dauphin Island Pkwy South
1,MOBILE,33617,MidCentral
2,MOBILE,33695,West Mobile
3,MOBILE,35595,"West Mobile County,West Mobile"
4,ATMORE,36502,"NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axi..."


In [8]:
#Downloading latitude and longitude data
coord_raw_data_link = 'https://github.com/patelneel17/Coursera_Capstone/blob/master/zip_code_lat_and_long.csv'
data_coord = pd.read_html(coord_raw_data_link)
df_coord = data_coord[0]
df_coord.head()

Unnamed: 0.1,Unnamed: 0,Zip,City,State,Latitude,Longitude
0,,35979,Higdon,AL,34.831242,-85.61564
1,,36350,Midland City,AL,31.319083,-85.48718
2,,36879,Waverly,AL,32.733511,-85.55322
3,,35004,Moody,AL,33.606379,-86.50249
4,,36744,Greensboro,AL,32.703529,-87.60177


In [9]:
#cleaning and dropping unwanted columns
df_coord = df_coord.drop('Unnamed: 0',axis =1)
df_coord = df_coord.drop('City',axis =1)
df_coord = df_coord.dropna()
df_coord = df_coord.rename(columns ={'Zip':'PostalCode'})
df_coord.head()

Unnamed: 0,PostalCode,State,Latitude,Longitude
0,35979,AL,34.831242,-85.61564
1,36350,AL,31.319083,-85.48718
2,36879,AL,32.733511,-85.55322
3,35004,AL,33.606379,-86.50249
4,36744,AL,32.703529,-87.60177


In [10]:
#merge to keep rows with latitude and longitude values
DataFrame = pd.merge(df_1,df_coord,on = 'PostalCode')
DataFrame = DataFrame.reset_index(drop = True)
DataFrame.head()

Unnamed: 0,Borough,PostalCode,Neighborhood,State,Latitude,Longitude
0,ATMORE,36502,"NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axi...",AL,31.090528,-87.49715
1,AXIS,36505,"NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axi...",AL,30.930065,-88.00103
2,BAY MINETTE,36507,NBald/SpFt/BayMin/Loxley,AL,30.875697,-87.76592
3,BAYOU LA BATRE,36509,"S Mobile City/Theodore,Western Bay Shores,West...",AL,30.401384,-88.24671
4,MOBILE,36512,Eight Mile/Prichard,AL,30.658865,-88.177975


## 4. Visualizing Base Dataset and Exploring Using Foursquare Data. 

Initially, Mobile, AL coordinates will be generated to view the map using folium and then latitude and longitudes of the boroughs will be superimposed to add the markers on the map.

In [11]:
address = 'Mobile, AL'

geolocator = Nominatim(user_agent="mob_town_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude


map_mobile = folium.Map(location=[latitude, longitude], zoom_start=7)

# add markers to map
for lat, lng, borough, neighborhood, postal in zip(DataFrame['Latitude'], DataFrame['Longitude'], DataFrame['Borough'], DataFrame['Neighborhood'], DataFrame['PostalCode']):
    label = '{}, {}, {}'.format(neighborhood, borough, postal)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mobile)  
    
map_mobile

In [12]:
#Calling Foursquare API 
CLIENT_ID = 'ZPBBGPDK2AWAST2ANC5XX5YI55LATCZDNGZG55FYZ5VZK0SO' 
CLIENT_SECRET = 'NLBQ533RB3OB32VKJ32UFDN03YDQMRFWFTGJIUKBKQFWR3E2' 
VERSION = '20200505' 
LIMIT = 100


#Function to get nearby venues
def getNearbyVenues(names, latitudes, longitudes, radius=10000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
#Dataframe for each venue generated using Foursquare

mobile_venues = getNearbyVenues(names=DataFrame['Neighborhood'],
                                   latitudes=DataFrame['Latitude'],
                                   longitudes=DataFrame['Longitude'])

NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axis/Creola,Tillman's Corner/Theodore
NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axis/Creola,Semmes/Wilmer/Lott Rd,Semmes/Wilmer/Lott Rd
NBald/SpFt/BayMin/Loxley
S Mobile City/Theodore,Western Bay Shores,Western Bay Shores
Eight Mile/Prichard
NE Mobile City/Mt. Vernon
NW Mobile City/Citronelle,NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axis/Creola,Saraland,Semmes/Wilmer/Lott Rd,West Mobile/Semmes,NW Mobile City/Citronelle,Semmes/Wilmer/Lott Rd,Municipal Park/West Central,NW Mobile City/Citronelle,Semmes/Wilmer/Lott Rd,Saraland,Tillman's Corner/Theodore
NW Mobile City/Citronelle,Saraland,NW Mobile City/Citronelle,NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axis/Creola,Semmes/Wilmer/Lott Rd,NW Mobile City/Citronelle
Grand Bay North,S Mobile City/Theodore,Dauphin Island Pkwy South,Western Bay Shores,Grand Bay North,Western Bay Shores,S Mobile City/Theodore
Saraland/Satsuma/Axis/Creola,Western Bay Shores
SpFt/Daph/Fairhope/Mont,Lake Forest/Daphne,S

In [14]:
print(mobile_venues.shape)
mobile_venues.head()

(3771, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axi...",31.090528,-87.49715,Poarch Creek Travel Plaza,31.08678,-87.538245,Convenience Store
1,"NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axi...",31.090528,-87.49715,Waffle House,31.107221,-87.47734,Breakfast Spot
2,"NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axi...",31.090528,-87.49715,Walmart,31.038058,-87.494428,Big Box Store
3,"NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axi...",31.090528,-87.49715,the coffee house,31.025738,-87.494461,Café
4,"NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axi...",31.090528,-87.49715,Wind Creek Casino & Hotel Atmore,31.10286,-87.482801,Casino


In [15]:
mobile_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CentBald/Summerdale/Loxley,83,83,83,83,83,83
"Cottage Hill N,Dauphin Island Pkwy South,Tillman's Corner/Theodore,West Mobile,Cottage Hill S,Cottage Hill N,Springhill/USA,Municipal Park/West Central,MidCentral,MidTown West,Dauphin Island Pkwy North,MidTown East,MidTown/South",100,100,100,100,100,100
"Cottage Hill S,Springhill/USA,Municipal Park/West Central,Chickasaw,MidCentral,MidTown West,Dauphin Island Pkwy North,MidTown East,Downtown",100,100,100,100,100,100
"Dauphin Island Pkwy South,Dauphin Island Pkwy South,Springhill/USA,Municipal Park/West Central,Chickasaw,MidCentral,MidTown West,Dauphin Island Pkwy North,MidTown East,MidTown/South,Downtown",100,100,100,100,100,100
"Dauphin Island,Dauphin Island,Dauphin Island,Dauphin Island,Dauphin Island,Dauphin Island,Dauphin Island,Dauphin Island,Municipal Park/West Central,MidTown East,S Mobile City/Theodore",28,28,28,28,28,28
Eight Mile/Prichard,100,100,100,100,100,100
"Eight Mile/Prichard,Eight Mile/Prichard",100,100,100,100,100,100
"Eight Mile/Prichard,Eight Mile/Prichard,MidCentral",100,100,100,100,100,100
"Eight Mile/Prichard,Municipal Park/West Central,Chickasaw,MidTown East",100,100,100,100,100,100
Fairhope/PointClear,86,86,86,86,86,86



Now, using 'one hot encoding' method we will get and display top 10 venues in each neighborhood 

In [16]:
# one hot encoding
mobile_onehot = pd.get_dummies(mobile_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mobile_onehot['Neighborhood'] = mobile_venues['Neighborhood'] 

#groupby Neighborhood to get frequency of each venue type in neighborhood
mobile_grouped = mobile_onehot.groupby('Neighborhood').mean().reset_index()
mobile_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Amphitheater,Aquarium,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bay,Beach,Beach Bar,Beer Bar,Beer Garden,Big Box Store,Board Shop,Boat or Ferry,Bookstore,Border Crossing,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Business Service,Café,Cajun / Creole Restaurant,Campground,Canal,Candy Store,Caribbean Restaurant,Casino,Chinese Restaurant,City,Clothing Store,Coffee Shop,Comic Shop,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop,Electronics Store,Exhibit,Fabric Shop,Farm,Farmers Market,Fast Food Restaurant,Fish Market,Fishing Spot,Fishing Store,Flea Market,Flower Shop,Food,Food Service,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Home Service,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Martial Arts Dojo,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Motel,Movie Theater,Museum,Music Venue,Nature Preserve,New American Restaurant,Outdoors & Recreation,Outlet Store,Park,Pet Store,Pharmacy,Pier,Pizza Place,Playground,Plaza,Pub,RV Park,Racetrack,Rental Car Location,Reservoir,Resort,Rest Area,Restaurant,River,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Social Club,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Sports Bar,State / Provincial Park,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tourist Information Center,Track,Trail,Truck Stop,Vacation Rental,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Zoo
0,CentBald/Summerdale/Loxley,0.0,0.0,0.012048,0.0,0.0,0.0,0.012048,0.012048,0.0,0.024096,0.0,0.012048,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.024096,0.0,0.0,0.012048,0.012048,0.0,0.024096,0.048193,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.108434,0.0,0.0,0.012048,0.0,0.0,0.012048,0.012048,0.060241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.048193,0.0,0.012048,0.0,0.072289,0.0,0.0,0.0,0.012048,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.036145,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.024096,0.0,0.036145,0.0,0.0,0.0,0.012048,0.012048,0.0,0.0,0.012048,0.0,0.0,0.012048,0.0,0.0,0.060241,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.012048
1,"Cottage Hill N,Dauphin Island Pkwy South,Tillm...",0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.03,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.05,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0
2,"Cottage Hill S,Springhill/USA,Municipal Park/W...",0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.03,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.05,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.04,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0
3,"Dauphin Island Pkwy South,Dauphin Island Pkwy ...",0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.04,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0
4,"Dauphin Island,Dauphin Island,Dauphin Island,D...",0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.071429,0.107143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [17]:
#function to sort venues in desc order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


#####
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = mobile_grouped['Neighborhood']

for ind in np.arange(mobile_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mobile_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CentBald/Summerdale/Loxley,Discount Store,Gas Station,Intersection,Sandwich Place,Fast Food Restaurant,Cajun / Creole Restaurant,Fried Chicken Joint,Mexican Restaurant,Pizza Place,Convenience Store
1,"Cottage Hill N,Dauphin Island Pkwy South,Tillm...",Coffee Shop,Southern / Soul Food Restaurant,Grocery Store,Sandwich Place,Mexican Restaurant,BBQ Joint,Bar,Italian Restaurant,Seafood Restaurant,Café
2,"Cottage Hill S,Springhill/USA,Municipal Park/W...",Coffee Shop,Southern / Soul Food Restaurant,Grocery Store,Italian Restaurant,Sandwich Place,Mexican Restaurant,Bar,Café,BBQ Joint,Seafood Restaurant
3,"Dauphin Island Pkwy South,Dauphin Island Pkwy ...",Seafood Restaurant,Coffee Shop,Southern / Soul Food Restaurant,Mexican Restaurant,Bar,Sandwich Place,Grocery Store,BBQ Joint,Café,Italian Restaurant
4,"Dauphin Island,Dauphin Island,Dauphin Island,D...",Beach,Bay,Historic Site,Seafood Restaurant,Resort,Bakery,Sandwich Place,Restaurant,Taco Place,Science Museum


## 5. Modeling and Clustering using _k-means_ Algorithm.

#### Modeling

In [18]:
mobile_grouped_clustering = mobile_grouped.drop('Neighborhood', 1)

#decalre number of clusters
kclusters = 5

#fit the model by running k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mobile_grouped_clustering)

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

mobile_merged = DataFrame

# merge mobile_grouped with DataFrame (mobile data) to add latitude/longitude for each neighborhood
mobile_merged = mobile_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

mobile_merged.head()

Unnamed: 0,Borough,PostalCode,Neighborhood,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ATMORE,36502,"NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axi...",AL,31.090528,-87.49715,2,Fast Food Restaurant,Hotel,Fried Chicken Joint,Sandwich Place,Discount Store,Gas Station,Breakfast Spot,Café,Spa,Seafood Restaurant
1,AXIS,36505,"NE Mobile City/Mt. Vernon,Saraland/Satsuma/Axi...",AL,30.930065,-88.00103,1,River,Convenience Store,Fast Food Restaurant,Harbor / Marina,Food,Gas Station,Baseball Field,Boat or Ferry,Discount Store,Lake
2,BAY MINETTE,36507,NBald/SpFt/BayMin/Loxley,AL,30.875697,-87.76592,2,Fast Food Restaurant,Discount Store,Pizza Place,Seafood Restaurant,Pharmacy,Grocery Store,Gas Station,Convenience Store,Shoe Store,Motel
3,BAYOU LA BATRE,36509,"S Mobile City/Theodore,Western Bay Shores,West...",AL,30.401384,-88.24671,2,Discount Store,Seafood Restaurant,Breakfast Spot,Fast Food Restaurant,City,Pizza Place,Sandwich Place,Pharmacy,Donut Shop,Electronics Store
4,MOBILE,36512,Eight Mile/Prichard,AL,30.658865,-88.177975,0,Sandwich Place,Grocery Store,Coffee Shop,Italian Restaurant,Mexican Restaurant,Seafood Restaurant,Burger Joint,BBQ Joint,Fried Chicken Joint,Donut Shop


#### Clustering

In [19]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=8)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mobile_merged['Latitude'], mobile_merged['Longitude'], mobile_merged['Neighborhood'], mobile_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [20]:
cluster_0 = mobile_merged.loc[mobile_merged['Cluster Labels'] == 0, mobile_merged.columns[[0] + list(range(7, mobile_merged.shape[1]))]]
cluster_0.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,MOBILE,Sandwich Place,Grocery Store,Coffee Shop,Italian Restaurant,Mexican Restaurant,Seafood Restaurant,Burger Joint,BBQ Joint,Fried Chicken Joint,Donut Shop
10,DAPHNE,Seafood Restaurant,Grocery Store,American Restaurant,Hotel,Coffee Shop,Pizza Place,Department Store,Sandwich Place,Cosmetics Shop,Fast Food Restaurant
11,SPANISH FORT,Seafood Restaurant,American Restaurant,Hotel,Grocery Store,Pizza Place,Fried Chicken Joint,Department Store,Italian Restaurant,Furniture / Home Store,Fast Food Restaurant
13,FAIRHOPE,Seafood Restaurant,American Restaurant,Sandwich Place,Coffee Shop,Breakfast Spot,Mexican Restaurant,Hotel,Gas Station,Café,Beach
28,MOBILE,Coffee Shop,Southern / Soul Food Restaurant,Grocery Store,Sandwich Place,Mexican Restaurant,Bar,Italian Restaurant,Café,BBQ Joint,Seafood Restaurant


After analyzing cluster 0, there is redundancy in venues and boroughs. Hence, we will group boroughs in cluster 0 to count number of top 10 venues in each borough.

In [21]:
cluster_0_grouped = cluster_0.groupby('Borough').agg({'1st Most Common Venue':'value_counts',\
                                                      '2nd Most Common Venue':'value_counts',\
                                                      '3rd Most Common Venue':'value_counts',\
                                                      '4th Most Common Venue':'value_counts',\
                                                      '5th Most Common Venue':'value_counts',\
                                                      '6th Most Common Venue':'value_counts',\
                                                      '7th Most Common Venue':'value_counts',\
                                                      '8th Most Common Venue':'value_counts',\
                                                      '9th Most Common Venue':'value_counts',\
                                                      '10th Most Common Venue':'value_counts',\
                                                     })

cluster_0_grouped = cluster_0_grouped.fillna(0)
cluster_0_grouped.loc[:,'Count_Venue'] = cluster_0_grouped.sum(numeric_only = True,axis=1)

cluster_0_grouped = cluster_0_grouped.drop(['1st Most Common Venue',\
                                            '2nd Most Common Venue',\
                                            '3rd Most Common Venue',\
                                            '4th Most Common Venue',\
                                            '5th Most Common Venue',\
                                            '6th Most Common Venue',\
                                            '7th Most Common Venue',\
                                            '8th Most Common Venue',\
                                            '9th Most Common Venue',\
                                            '10th Most Common Venue'], axis=1)
cluster_0_grouped

Unnamed: 0,Unnamed: 1,Count_Venue
AXIS,BBQ Joint,1.0
AXIS,Café,1.0
AXIS,Coffee Shop,1.0
AXIS,Fried Chicken Joint,1.0
AXIS,Grocery Store,1.0
AXIS,Hotel,1.0
AXIS,Italian Restaurant,1.0
AXIS,Mexican Restaurant,1.0
AXIS,Sandwich Place,1.0
AXIS,Seafood Restaurant,1.0


Next, similar assessment of cluster 1, 2, 3 and 4 is carried out for analysis. 

In [22]:
cluster_1 = mobile_merged.loc[mobile_merged['Cluster Labels'] == 1, mobile_merged.columns[[0] + list(range(7, mobile_merged.shape[1]))]]
cluster_1.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,AXIS,River,Convenience Store,Fast Food Restaurant,Harbor / Marina,Food,Gas Station,Baseball Field,Boat or Ferry,Discount Store,Lake
5,CALVERT,American Restaurant,Sandwich Place,Grocery Store,BBQ Joint,Discount Store,Pharmacy,Gas Station,Food,Snack Place,Exhibit
8,CODEN,Discount Store,Food,Seafood Restaurant,Gun Range,Gift Shop,Pharmacy,Sandwich Place,Chinese Restaurant,City,River
9,AXIS,Discount Store,Pizza Place,Fast Food Restaurant,Convenience Store,Pharmacy,Gas Station,Grocery Store,Harbor / Marina,Sandwich Place,River
14,GRAND BAY,Discount Store,Gas Station,Convenience Store,Fast Food Restaurant,Pharmacy,Grocery Store,Home Service,Thai Restaurant,Rest Area,Donut Shop


In [23]:
cluster_1_grouped = cluster_1.groupby('Borough').agg({'1st Most Common Venue':'value_counts',\
                                                      '2nd Most Common Venue':'value_counts',\
                                                      '3rd Most Common Venue':'value_counts',\
                                                      '4th Most Common Venue':'value_counts',\
                                                      '5th Most Common Venue':'value_counts',\
                                                      '6th Most Common Venue':'value_counts',\
                                                      '7th Most Common Venue':'value_counts',\
                                                      '8th Most Common Venue':'value_counts',\
                                                      '9th Most Common Venue':'value_counts',\
                                                      '10th Most Common Venue':'value_counts',\
                                                     })

cluster_1_grouped = cluster_1_grouped.fillna(0)
cluster_1_grouped.loc[:,'Count_Venue'] = cluster_1_grouped.sum(numeric_only = True,axis=1)

cluster_1_grouped = cluster_1_grouped.drop(['1st Most Common Venue',\
                                            '2nd Most Common Venue',\
                                            '3rd Most Common Venue',\
                                            '4th Most Common Venue',\
                                            '5th Most Common Venue',\
                                            '6th Most Common Venue',\
                                            '7th Most Common Venue',\
                                            '8th Most Common Venue',\
                                            '9th Most Common Venue',\
                                            '10th Most Common Venue'], axis=1)
cluster_1_grouped

Unnamed: 0,Unnamed: 1,Count_Venue
AXIS,Baseball Field,1.0
AXIS,Boat or Ferry,1.0
AXIS,Convenience Store,2.0
AXIS,Discount Store,2.0
AXIS,Fast Food Restaurant,2.0
AXIS,Food,1.0
AXIS,Gas Station,2.0
AXIS,Grocery Store,1.0
AXIS,Harbor / Marina,2.0
AXIS,Lake,1.0


In [24]:
cluster_2 = mobile_merged.loc[mobile_merged['Cluster Labels'] == 2, mobile_merged.columns[[0] + list(range(7, mobile_merged.shape[1]))]]
cluster_2.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ATMORE,Fast Food Restaurant,Hotel,Fried Chicken Joint,Sandwich Place,Discount Store,Gas Station,Breakfast Spot,Café,Spa,Seafood Restaurant
2,BAY MINETTE,Fast Food Restaurant,Discount Store,Pizza Place,Seafood Restaurant,Pharmacy,Grocery Store,Gas Station,Convenience Store,Shoe Store,Motel
3,BAYOU LA BATRE,Discount Store,Seafood Restaurant,Breakfast Spot,Fast Food Restaurant,City,Pizza Place,Sandwich Place,Pharmacy,Donut Shop,Electronics Store
7,CHUNCHULA,Fast Food Restaurant,Gas Station,Discount Store,Sandwich Place,Fried Chicken Joint,Pizza Place,Pharmacy,Furniture / Home Store,Zoo,Fish Market


In [25]:
cluster_2_grouped = cluster_2.groupby('Borough').agg({'1st Most Common Venue':'value_counts',\
                                                      '2nd Most Common Venue':'value_counts',\
                                                      '3rd Most Common Venue':'value_counts',\
                                                      '4th Most Common Venue':'value_counts',\
                                                      '5th Most Common Venue':'value_counts',\
                                                      '6th Most Common Venue':'value_counts',\
                                                      '7th Most Common Venue':'value_counts',\
                                                      '8th Most Common Venue':'value_counts',\
                                                      '9th Most Common Venue':'value_counts',\
                                                      '10th Most Common Venue':'value_counts',\
                                                     })

cluster_2_grouped = cluster_2_grouped.fillna(0)
cluster_2_grouped.loc[:,'Count_Venue'] = cluster_2_grouped.sum(numeric_only = True,axis=1)

cluster_2_grouped = cluster_2_grouped.drop(['1st Most Common Venue',\
                                            '2nd Most Common Venue',\
                                            '3rd Most Common Venue',\
                                            '4th Most Common Venue',\
                                            '5th Most Common Venue',\
                                            '6th Most Common Venue',\
                                            '7th Most Common Venue',\
                                            '8th Most Common Venue',\
                                            '9th Most Common Venue',\
                                            '10th Most Common Venue'], axis=1)
cluster_2_grouped

Unnamed: 0,Unnamed: 1,Count_Venue
ATMORE,Breakfast Spot,1.0
ATMORE,Café,1.0
ATMORE,Discount Store,1.0
ATMORE,Fast Food Restaurant,1.0
ATMORE,Fried Chicken Joint,1.0
ATMORE,Gas Station,1.0
ATMORE,Hotel,1.0
ATMORE,Sandwich Place,1.0
ATMORE,Seafood Restaurant,1.0
ATMORE,Spa,1.0


In [26]:
cluster_3 = mobile_merged.loc[mobile_merged['Cluster Labels'] == 3, mobile_merged.columns[[0] + list(range(7, mobile_merged.shape[1]))]]
cluster_3.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,CHUNCHULA,Discount Store,American Restaurant,Flea Market,Gas Station,Zoo,Fabric Shop,Flower Shop,Fishing Store,Fishing Spot,Fish Market


In [27]:
cluster_3_grouped = cluster_3.groupby('Borough').agg({'1st Most Common Venue':'value_counts',\
                                                      '2nd Most Common Venue':'value_counts',\
                                                      '3rd Most Common Venue':'value_counts',\
                                                      '4th Most Common Venue':'value_counts',\
                                                      '5th Most Common Venue':'value_counts',\
                                                      '6th Most Common Venue':'value_counts',\
                                                      '7th Most Common Venue':'value_counts',\
                                                      '8th Most Common Venue':'value_counts',\
                                                      '9th Most Common Venue':'value_counts',\
                                                      '10th Most Common Venue':'value_counts',\
                                                     })

cluster_3_grouped = cluster_3_grouped.fillna(0)
cluster_3_grouped.loc[:,'Count_Venue'] = cluster_3_grouped.sum(numeric_only = True,axis=1)

cluster_3_grouped = cluster_3_grouped.drop(['1st Most Common Venue',\
                                            '2nd Most Common Venue',\
                                            '3rd Most Common Venue',\
                                            '4th Most Common Venue',\
                                            '5th Most Common Venue',\
                                            '6th Most Common Venue',\
                                            '7th Most Common Venue',\
                                            '8th Most Common Venue',\
                                            '9th Most Common Venue',\
                                            '10th Most Common Venue'], axis=1)
cluster_3_grouped

Unnamed: 0,Unnamed: 1,Count_Venue
CHUNCHULA,American Restaurant,1.0
CHUNCHULA,Discount Store,1.0
CHUNCHULA,Fabric Shop,1.0
CHUNCHULA,Fish Market,1.0
CHUNCHULA,Fishing Spot,1.0
CHUNCHULA,Fishing Store,1.0
CHUNCHULA,Flea Market,1.0
CHUNCHULA,Flower Shop,1.0
CHUNCHULA,Gas Station,1.0
CHUNCHULA,Zoo,1.0


In [28]:
cluster_4 = mobile_merged.loc[mobile_merged['Cluster Labels'] == 4, mobile_merged.columns[[0] + list(range(7, mobile_merged.shape[1]))]]
cluster_4.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,DAUPHIN ISLAND,Beach,Bay,Historic Site,Seafood Restaurant,Resort,Bakery,Sandwich Place,Restaurant,Taco Place,Science Museum
15,GRAND BAY,Resort,Seafood Restaurant,Beach,Hotel,Bar,American Restaurant,Harbor / Marina,Gas Station,Campground,State / Provincial Park
19,GRAND BAY,Resort,Seafood Restaurant,Beach,Hotel,Bar,American Restaurant,Harbor / Marina,Gas Station,Campground,State / Provincial Park


In [None]:
cluster_4_grouped = cluster_4.groupby('Borough').agg({'1st Most Common Venue':'value_counts',\
                                                      '2nd Most Common Venue':'value_counts',\
                                                      '3rd Most Common Venue':'value_counts',\
                                                      '4th Most Common Venue':'value_counts',\
                                                      '5th Most Common Venue':'value_counts',\
                                                      '6th Most Common Venue':'value_counts',\
                                                      '7th Most Common Venue':'value_counts',\
                                                      '8th Most Common Venue':'value_counts',\
                                                      '9th Most Common Venue':'value_counts',\
                                                      '10th Most Common Venue':'value_counts',\
                                                     })

cluster_4_grouped = cluster_4_grouped.fillna(0)
cluster_4_grouped.loc[:,'Count_Venue'] = cluster_4_grouped.sum(numeric_only = True,axis=1)

cluster_4_grouped = cluster_4_grouped.drop(['1st Most Common Venue',\
                                            '2nd Most Common Venue',\
                                            '3rd Most Common Venue',\
                                            '4th Most Common Venue',\
                                            '5th Most Common Venue',\
                                            '6th Most Common Venue',\
                                            '7th Most Common Venue',\
                                            '8th Most Common Venue',\
                                            '9th Most Common Venue',\
                                            '10th Most Common Venue'], axis=1)
cluster_4_grouped