# IBM Applied Data Science Capstone Project

## Week 5 Final Report: Finding the best location for shopping malls in Nashville, Tennessee

### Import Libraries

In [12]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    branca-0.4.1               |             py_0          26 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         713 KB

The following NEW packages will be INSTALLED:

    altair:  4.1.0-py_1 conda-forge
    branca:  0.4.1-py_0 conda-forge
    folium:  0.5.0-py_0 conda-forge
    vincent: 0.4.4-py_1 conda-forge


Down

In [13]:
import requests 
from bs4 import BeautifulSoup

In [14]:
!pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 10.1MB/s ta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


### Scrape Data from Wikipedia into Dataframe

In [15]:
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighborhoods_in_Nashville,_Tennessee").text

In [16]:
soup = BeautifulSoup(data, 'html.parser')

In [17]:
neighborhoodList = []

In [18]:
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [19]:
nv_df = pd.DataFrame({"Neighborhood": neighborhoodList})

nv_df.head()

Unnamed: 0,Neighborhood
0,"Antioch, Tennessee"
1,"Bakers, Tennessee"
2,"Bellevue, Tennessee"
3,"Donelson, Tennessee"
4,"East Nashville, Tennessee"


In [20]:
nv_df.shape #number of rows in the nv_df dataframe

(20, 1)

### Geographical Coordinates

In [21]:
# define function to get coordinates
def get_latlng(neighborhood):
    # initialize variable to None
    lat_lng_coords = None
    # loop to get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Nashville, TN'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [None]:
coords = [ get_latlng(neighborhood) for neighborhood in nv_df["Neighborhood"].tolist() ]

In [None]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [None]:
# merge the coordinates into the original dataframe
df1['Latitude'] = df_coords['Latitude']
df1['Longitude'] = 

In [25]:
# check the neighborhoods and the coordinates
df1.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Antioch,36.162757,86.781661
1,Bakers,36.264589,86.873351
2,Bellvue,36.213641,86.677695
3,Donelson,36.188542,86.598365
4,East Nashville,36.193642,86.552143


### Create a map of Nashville with Neighborhoods Superimposed on Top

In [None]:
# get the coordinates of Nashville, TN
address = 'Nashville, TN'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [None]:
# create map of Nashville using latitude and longitude values
map_nv = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(kl_df['Latitude'], df1['Longitude'], df1['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_kl)  
    
map_nv

In [None]:
# save the map as HTML file
map_nv.save('map_nv.html')

### Use the Foursquare API to Explore Neighborhoods

In [None]:
### Credentials are hidden

##### Get top 100 venues that are within a radius of 2000m

In [None]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df1['Latitude'], df1['Longitude'], df1['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [None]:
venues_df = pd.DataFrame(venues)

# define the column names
df2.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

In [27]:
df2.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Antioch,36.162757,86.781661,Target,36.162887,86.789656,Shopping
1,Antioch,36.162757,86.781661,Subway,36.159744,86.814555,Restaurant
2,Antioch,36.162757,86.781661,Jimmie Johns,36.162261,86.799832,Restaurant
3,Antioch,36.162757,86.781661,Walmart,36.171882,86.781663,Shopping
4,Antioch,36.162757,86.781661,Walgreens,36.170876,86.781661,Pharmacy


##### Check for how many venues were returned for each neighborhood

In [None]:
df3 = df2.groupby(["Neighborhood"]).count()

In [29]:
df3

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Antioch,100,100,100,100,100,100
1,Bakers,100,100,100,100,100,100
2,Bellvue,100,100,100,100,100,100
3,Donelson,89,89,89,89,89,89
4,East Nashville,100,100,100,100,100,100
5,Green Hills,79,79,79,79,79,79
6,The Gulch,100,100,100,100,100,100
7,Hermitage,100,100,100,100,100,100
8,Hillsboro,100,100,100,100,100,100
9,Hopewell,90,90,90,90,90,90


##### Unique Venue Categories

In [None]:
# print out the list of categories
z = df2['VenueCategory'].unique()[:50]

In [47]:
z

['Noodle House',
 'Supplement Shop',
 'Chinese Restaurant',
 'Restaurant',
 'Food Court',
 'Vegetarian / Vegan Restaurant',
 'Asian Restaurant',
 'Dim Sum Restaurant',
 'Snack Place',
 'Other Great Outdoors',
 'Seafood Restaurant',
 'Spa',
 'Food Truck',
 'Café',
 'Park',
 'Chinese Breakfast Place',
 'Indian Restaurant',
 'Japanese Restaurant',
 'Outlet Store',
 'Convenience Store',
 'Bubble Tea Shop',
 'Dessert Shop',
 'Farmers Market',
 'Cantonese Restaurant',
 'Malay Restaurant',
 'Bakery',
 'Hakka Restaurant',
 'Supermarket',
 'Steakhouse',
 'Pet Store',
 'Middle Eastern Restaurant',
 'Badminton Court',
 'Athletics & Sports',
 'Hookah Bar',
 'Winery',
 'Burger Joint',
 'Gym / Fitness Center',
 'Bistro',
 'Grocery Store',
 'Halal Restaurant',
 'College Bookstore',
 'Flea Market',
 'Vietnamese Restaurant',
 'Italian Restaurant',
 'Coffee Shop',
 'Juice Bar',
 'Korean Restaurant',
 'Pizza Place',
 'Ice Cream Shop',
 'Massage Studio']

### Analyze Each Neighborhood

In [None]:
# one hot encoding
df1_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
df1_onehot['Neighborhoods'] = df3['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [df1_onehot.columns[-1]] + list(df1_onehot.columns[:-1])
df4 = df_onehot[fixed_columns]

In [50]:
df4.head()

Unnamed: 0,Neighborhoods,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Service,Airport Terminal,American Restaurant,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Badminton Court,Bagel Shop,Bakery,Bar,Baseball Field,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Belgian Restaurant,Bistro,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Business Service,Butcher,Cafeteria,Café,Campground,Candy Store,Cantonese Restaurant,Casino,Chettinad Restaurant,Chinese Breakfast Place,Chinese Restaurant,Circus,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Bookstore,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Dive Bar,Dive Shop,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fish Market,Fishing Store,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,Frame Store,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Government Building,Grilled Meat Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hainan Restaurant,Hakka Restaurant,Halal Restaurant,Harbor / Marina,Hawaiian Restaurant,Health & Beauty Service,High School,Himalayan Restaurant,Historic Site,History Museum,Hockey Arena,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hot Spring,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Housing Development,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kushikatsu Restaurant,Lake,Latin American Restaurant,Leather Goods Store,Light Rail Station,Lingerie Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Mosque,Motel,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Neighborhood,Night Market,Nightclub,Noodle House,North Indian Restaurant,Office,Optical Shop,Other Great Outdoors,Other Nightlife,Outlet Store,Pakistani Restaurant,Palace,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Piano Bar,Pie Shop,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,Road,Rock Climbing Spot,Rock Club,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Satay Restaurant,Scandinavian Restaurant,Scenic Lookout,Science Museum,Seafood Restaurant,Shabu-Shabu Restaurant,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Ski Area,Ski Lodge,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,South Indian Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Temple,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toll Plaza,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trail,Train Station,Tunnel,Turkish Restaurant,Udon Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Antioch,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Antioch,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Antioch,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Antioch,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Antioch,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


##### Mean Frequency Occurance of each category

In [None]:
nv_grouped = nv_onehot.groupby(["Neighborhoods"]).mean().reset_index()

##### Create a new dataframe for Restaurants only

In [None]:
df5=nv_grouped[["Neighborhoods","Shopping Malls"]]

In [2]:
df5.head()

Unnamed: 0,Neighborhood,Shopping Malls
0,Antioch,0.01
1,Bakers,0.0
2,Bellvue,0.02
3,Donelson,0.01
4,East Nashville,0.0


### Cluster Neighborhoods

In [None]:
# set number of clusters
kclusters = 3

nv_clustering = nv_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(nv_clustering)

In [None]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
nv_merged = nv_mall.copy()

# add clustering labels
nv_merged["Cluster Labels"] = kmeans.labels_

In [None]:
df6=nv_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)

In [5]:
df6.head()

Unnamed: 0,Neighborhood,Shopping Malls,Cluster Labels
0,Antioch,0.01,1
1,Bakers,0.0,1
2,Bellvue,0.02,1
3,Donelson,0.01,1
4,East Nashville,0.0,1


In [None]:
# merge nashville_grouped with nashville_data to add latitude/longitude for each neighborhood
df7 = nv_merged = nv_merged.join(nv_df.set_index("Neighborhood"), on="Neighborhood")

In [7]:
df7.head()

Unnamed: 0,Neighborhood,Shopping Malls,Cluster Labels,Latitude,Longitude
0,Antioch,0.01,1,36.162757,86.781661
1,Bakers,0.0,1,36.264589,86.873351
2,Bellvue,0.02,1,36.213641,86.677695
3,Donelson,0.01,1,36.188542,86.598365
4,East Nashville,0.0,1,36.193642,86.552143


In [None]:
# sort the results by Cluster Labels
nv_merged.sort_values(["Cluster Labels"], inplace=True)

##### Visualize Clusters

In [None]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(nv_merged['Latitude'],nv_merged['Longitude'], nv_merged['Neighborhood'], nv_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

### Examine Clusters

##### Cluster 0

In [None]:
df8 = nv_merged.loc[nv_merged['Cluster Labels'] == 0]

In [2]:
df8

Unnamed: 0,Neighborhood,Shopping Malls,Cluster Labels,Latitude,Longitude
0,Green Hills,0.02,0,36.162757,86.781446
1,Hermitage,0.01,0,36.135674,86.904766
2,Hopewell,0.01,0,36.145989,86.884123
3,Joelton,0.01,0,36.142218,86.790034
4,Old Hickory,0.02,0,36.376198,86.771131
5,Pasquo,0.01,0,36.245091,86.073658
6,Tusculum,0.01,0,36.200437,86.698772


##### Cluster 1

In [None]:
df9 = nv_merged.loc[nv_merged['Cluster Labels'] == 1]

In [4]:
df9

Unnamed: 0,Neighborhood,Shopping Malls,Cluster Labels,Latitude,Longitude
0,Antioch,0,1,36.162757,86.781661
1,Bakers,0,1,36.264589,86.873351
2,Bellvue,0,1,36.213641,86.677695
3,Donelson,0,1,36.188542,86.598365
4,East Nashville,0,1,36.193642,86.552143
5,Inglewood,0,1,36.18343,86.556721
6,Richland-West End,0,1,36.29943,86.619832


##### Cluster 2

In [None]:
df10 = nv_merged.loc[nv_merged['Cluster Labels'] == 2]

In [6]:
df10

Unnamed: 0,Neighborhood,Shopping Malls,Cluster Labels,Latitude,Longitude
0,The Gulch,0.03,2,36.162686,86.781911
1,Hillsboro,0.03,2,36.155903,86.763308
2,Lakewood,0.04,2,36.288931,86.689945
3,Lockeland,0.03,2,36.147206,86.814455
4,Madison,0.02,2,36.150122,86.753391
5,Whites Creek,0.04,2,36.176189,86.724867


##### Observations

Most of the shopping malls are concentrated in the central area of Nashville, Tennesse, with the highest number in cluster 2 and moderate number in cluster 0. On the other hand, cluster 1 has very low number to totally no shopping malls in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 1 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 0 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of shopping malls and suffering from intense competition.