# The Battle of Neighborhoods (Week 1)

## Business Opportunities for StartUp

Date: 23 April 2020

### 1. Introdution

It is very important for Start-Up to understand the business possibility. 
So I have the idea to open in my city a new supermarket. I need to interest businessmen to invest money for my project. So I need to understand how many supermarkets already is opened in my city and how much is the density for each district of the city.


This research also can be used for someone is interested in opening a new business in the city

### 2. Data

To explore this issue I will use the information of Foursquare API about the number of supermarkets in the city: https://developer.foursquare.com/.
Additionally, I will use information about the city districts from Wikipedia: https://en.wikipedia.org/wiki/Administrative_divisions_of_Nizhny_Novgorod.

### 3. Methodology

For research we import Python libraries 

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


Now we need to scrap Data about districts from Wikipedia, using BeautifulSoap.

In [3]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Administrative_divisions_of_Nizhny_Novgorod").text

In [4]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [6]:
# create a list to store neighborhood data
neighborhoodList = []

In [7]:
import re #import Reg Exp module
# append the data into the list
neighborhoodList.clear()
for row in soup.find_all(class_=re.compile("toctext")):
    reg = "District$"
    x = re.search(reg, row.text)
    if x:
        neighborhoodList.append(row.text)
neighborhoodList

['Nizhegorodsky District',
 'Prioksky District',
 'Sovetsky District',
 'Avtozavodsky District',
 'Kanavinsky District',
 'Leninsky District',
 'Moskovsky District',
 'Sormovsky District']

In [8]:
# create a new DataFrame from the list
nn_df = pd.DataFrame({"Neighborhood": neighborhoodList})
nn_df

Unnamed: 0,Neighborhood
0,Nizhegorodsky District
1,Prioksky District
2,Sovetsky District
3,Avtozavodsky District
4,Kanavinsky District
5,Leninsky District
6,Moskovsky District
7,Sormovsky District


In [9]:
nn_df.shape

(8, 1)

Next we need to get coordinates for each district using Geocoder

In [10]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Nizhny Novgorod, Russia'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [11]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in nn_df["Neighborhood"].tolist() ]
coords

[[56.323240000000055, 44.03143000000006],
 [56.24166000000008, 43.979340000000036],
 [56.30964000000006, 44.02209000000005],
 [56.24613000000005, 43.85072000000008],
 [56.32359000000008, 43.952180000000055],
 [56.27573000000007, 43.920930000000055],
 [56.33635000000004, 43.84272000000004],
 [56.36460000000005, 43.81774000000007]]

In [12]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
nn_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [13]:
# merge the coordinates into the original dataframe
nn_df['Latitude'] = nn_coords['Latitude']
nn_df['Longitude'] = nn_coords['Longitude']

In [14]:
# check the neighborhoods and the coordinates
print(nn_df.shape)
nn_df

(8, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Nizhegorodsky District,56.32324,44.03143
1,Prioksky District,56.24166,43.97934
2,Sovetsky District,56.30964,44.02209
3,Avtozavodsky District,56.24613,43.85072
4,Kanavinsky District,56.32359,43.95218
5,Leninsky District,56.27573,43.92093
6,Moskovsky District,56.33635,43.84272
7,Sormovsky District,56.3646,43.81774


In [15]:
# save the DataFrame as CSV file
nn_df.to_csv("nn_df.csv", index=False)

Let's create Map of Nizhny Novgorod with marked Districts 

In [16]:
# get the coordinates of Nizhny Novgorod
address = 'Nizhny Novgorod, Russia'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Nizhny Novgorod, Russia {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Nizhny Novgorod, Russia 56.328571, 44.003506.


In [17]:
# create map of Nizhny Novgorod using latitude and longitude values
map_nn = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(nn_df['Latitude'], nn_df['Longitude'], nn_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_nn)  
    
map_nn

In [18]:
# save the map as HTML file
map_nn.save('map_nn.html')

Using the Foursquare API to explore districts

In [19]:
# define Foursquare Credentials and Version
CLIENT_ID = 'PWUSW0SQDZCORGXLZTF5F4SD1GJSEYAN5QZJDNXAS2GENAFX' # your Foursquare ID
CLIENT_SECRET = 'PLNOLZIKLRUAQX1TPI2WDQ3PMVEDUIFRJ4XICU3JG3W0G2E3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PWUSW0SQDZCORGXLZTF5F4SD1GJSEYAN5QZJDNXAS2GENAFX
CLIENT_SECRET:PLNOLZIKLRUAQX1TPI2WDQ3PMVEDUIFRJ4XICU3JG3W0G2E3


Now, let's get the top 100 venues that are within a radius of 5000 meters.

In [21]:
radius = 5000
LIMIT = 100
venues = []
for lat, long, neighborhood in zip(nn_df['Latitude'], nn_df['Longitude'], nn_df['Neighborhood']):    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

KeyError: 'groups'

In [None]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

Let's check how many venues were returned for each neighorhood

In [22]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Avtozavodsky District,63,63,63,63,63,63
Kanavinsky District,100,100,100,100,100,100
Leninsky District,96,96,96,96,96,96
Moskovsky District,56,56,56,56,56,56
Nizhegorodsky District,100,100,100,100,100,100
Prioksky District,45,45,45,45,45,45
Sormovsky District,53,53,53,53,53,53
Sovetsky District,55,55,55,55,55,55


Let's find out how many unique categories can be curated from all the returned venues

In [23]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 120 uniques categories.


In [24]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Hotel', 'Road', 'Gym / Fitness Center', 'Arcade',
       'Scenic Lookout', 'Pub', 'Supermarket', 'Historic Site', 'Café',
       'Outdoor Sculpture', 'Coffee Shop', 'Plaza', 'Art Gallery',
       'Wine Shop', 'Sri Lankan Restaurant', 'Pizza Place', 'Wine Bar',
       'Electronics Store', 'Pelmeni House', 'College Arts Building',
       'Burger Joint', 'Theme Restaurant', 'Beer Bar', 'Fountain',
       'Burrito Place', 'History Museum', 'Hostel', 'Bank',
       'Cocktail Bar', 'Mobile Phone Shop', 'Tea Room', 'Bookstore',
       'Gastropub', 'Hookah Bar', 'Gym', 'Restaurant', 'Yoga Studio',
       'Theater', 'Blini House', 'Caucasian Restaurant',
       'Falafel Restaurant', 'Grocery Store', 'Movie Theater', 'Market',
       'Steakhouse', 'Stables', 'Big Box Store',
       'Middle Eastern Restaurant', 'Bakery', 'Clothing Store'],
      dtype=object)

In [25]:
# check if the results contain "Supermarket"
"Supermarket" in venues_df['VenueCategory'].unique()

True

Let's analyze each district

In [26]:
# one hot encoding
nn_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
nn_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [nn_onehot.columns[-1]] + list(nn_onehot.columns[:-1])
nn_onehot = nn_onehot[fixed_columns]

print(nn_onehot.shape)
nn_onehot.head()

(568, 121)


Unnamed: 0,Neighborhoods,ATM,Airport,Airport Service,American Restaurant,Arcade,Art Gallery,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Workshop,Bakery,Bank,Bar,Beach,Beer Bar,Beer Store,Big Box Store,Blini House,Bookstore,Bowling Alley,Breakfast Spot,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Café,Caucasian Restaurant,Church,Circus,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,Concert Hall,Dance Studio,Design Studio,Doner Restaurant,Eastern European Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Food Court,Fountain,Furniture / Home Store,Garden,Gas Station,Gastropub,General Entertainment,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Historic Site,History Museum,Hockey Arena,Hookah Bar,Hostel,Hotel,Intersection,Italian Restaurant,Japanese Restaurant,Lake,Light Rail Station,Lounge,Market,Middle Eastern Restaurant,Mobile Phone Shop,Movie Theater,Multiplex,Nightclub,Outdoor Sculpture,Park,Pedestrian Plaza,Pelmeni House,Pie Shop,Pizza Place,Plane,Platform,Playground,Plaza,Pub,Racetrack,Restaurant,Road,Sandwich Place,Scenic Lookout,Shopping Mall,Ski Area,Snack Place,Soccer Field,Soccer Stadium,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stables,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Tea Room,Tennis Court,Theater,Theme Park,Theme Restaurant,Train Station,Ukrainian Restaurant,Volleyball Court,Wine Bar,Wine Shop,Yoga Studio,Zoo,Zoo Exhibit
0,Nizhegorodsky District,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Nizhegorodsky District,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Nizhegorodsky District,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Nizhegorodsky District,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Nizhegorodsky District,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [27]:
nn_grouped = nn_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(nn_grouped.shape)
nn_grouped

(8, 121)


Unnamed: 0,Neighborhoods,ATM,Airport,Airport Service,American Restaurant,Arcade,Art Gallery,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Workshop,Bakery,Bank,Bar,Beach,Beer Bar,Beer Store,Big Box Store,Blini House,Bookstore,Bowling Alley,Breakfast Spot,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Café,Caucasian Restaurant,Church,Circus,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,Concert Hall,Dance Studio,Design Studio,Doner Restaurant,Eastern European Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Food Court,Fountain,Furniture / Home Store,Garden,Gas Station,Gastropub,General Entertainment,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Historic Site,History Museum,Hockey Arena,Hookah Bar,Hostel,Hotel,Intersection,Italian Restaurant,Japanese Restaurant,Lake,Light Rail Station,Lounge,Market,Middle Eastern Restaurant,Mobile Phone Shop,Movie Theater,Multiplex,Nightclub,Outdoor Sculpture,Park,Pedestrian Plaza,Pelmeni House,Pie Shop,Pizza Place,Plane,Platform,Playground,Plaza,Pub,Racetrack,Restaurant,Road,Sandwich Place,Scenic Lookout,Shopping Mall,Ski Area,Snack Place,Soccer Field,Soccer Stadium,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stables,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Tea Room,Tennis Court,Theater,Theme Park,Theme Restaurant,Train Station,Ukrainian Restaurant,Volleyball Court,Wine Bar,Wine Shop,Yoga Studio,Zoo,Zoo Exhibit
0,Avtozavodsky District,0.0,0.015873,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.015873,0.015873,0.0,0.0,0.047619,0.0,0.015873,0.015873,0.0,0.031746,0.0,0.0,0.0,0.0,0.0,0.063492,0.0,0.0,0.0,0.015873,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.063492,0.015873,0.063492,0.0,0.0,0.031746,0.0,0.0,0.0,0.031746,0.0,0.0,0.015873,0.0,0.015873,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.015873,0.015873,0.015873,0.0,0.015873,0.0,0.0,0.015873,0.0,0.0,0.0,0.031746,0.0,0.0,0.015873,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.015873,0.0,0.095238,0.0,0.063492,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.015873,0.0,0.0,0.0,0.0,0.0
1,Kanavinsky District,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.01,0.0,0.02,0.06,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.07,0.0,0.02,0.01,0.0,0.02,0.02,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.02,0.01,0.02,0.02,0.0,0.04,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0
2,Leninsky District,0.0,0.0,0.0,0.010417,0.0,0.0,0.010417,0.0,0.010417,0.0,0.0,0.0,0.010417,0.0,0.0,0.0,0.010417,0.0,0.010417,0.010417,0.010417,0.0,0.010417,0.010417,0.010417,0.020833,0.0,0.020833,0.010417,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.03125,0.0,0.0,0.041667,0.010417,0.020833,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.083333,0.041667,0.052083,0.010417,0.0,0.020833,0.0,0.0,0.0,0.010417,0.0,0.0,0.010417,0.0,0.0,0.0,0.010417,0.0,0.03125,0.010417,0.0,0.0,0.010417,0.010417,0.010417,0.0,0.0,0.020833,0.0,0.010417,0.010417,0.010417,0.0,0.0,0.010417,0.0,0.010417,0.010417,0.020833,0.020833,0.010417,0.020833,0.0,0.0,0.03125,0.0,0.010417,0.0,0.0,0.010417,0.0,0.083333,0.0,0.03125,0.010417,0.010417,0.0,0.0,0.0,0.010417,0.010417,0.0,0.0,0.0,0.0,0.0,0.010417
3,Moskovsky District,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.0,0.017857,0.0,0.071429,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.017857,0.0,0.0,0.017857,0.017857,0.0,0.017857,0.017857,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.053571,0.017857,0.017857,0.017857,0.0,0.0,0.0,0.0,0.017857,0.0,0.017857,0.017857,0.0,0.0,0.017857,0.0,0.017857,0.053571,0.0,0.017857,0.017857,0.0,0.035714,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.017857,0.017857,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.035714,0.017857,0.0,0.0,0.0,0.0,0.0,0.089286,0.017857,0.035714,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0
4,Nizhegorodsky District,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.01,0.01,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.07,0.02,0.01,0.0,0.01,0.02,0.07,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.03,0.02,0.0,0.02,0.01,0.0,0.03,0.01,0.06,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.02,0.0,0.01,0.02,0.0,0.04,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0
5,Prioksky District,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.044444,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.022222,0.022222,0.022222,0.0,0.0,0.088889,0.066667,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.022222,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.022222,0.022222,0.0,0.0,0.022222,0.022222,0.022222,0.0,0.0,0.0,0.0,0.088889,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Sormovsky District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09434,0.018868,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.037736,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.056604,0.037736,0.018868,0.0,0.018868,0.0,0.018868,0.0,0.0,0.018868,0.0,0.018868,0.018868,0.018868,0.0,0.018868,0.0,0.018868,0.037736,0.0,0.018868,0.0,0.0,0.037736,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.018868,0.037736,0.018868,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.018868,0.018868,0.018868,0.0,0.0,0.0,0.018868,0.0,0.09434,0.0,0.037736,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0
7,Sovetsky District,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.036364,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,0.036364,0.0,0.0,0.0,0.0,0.036364,0.054545,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.018182,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.036364,0.018182,0.054545,0.0,0.036364,0.018182,0.0,0.018182,0.0,0.054545,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.018182,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.036364,0.018182,0.0,0.0,0.036364,0.0,0.072727,0.018182,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.018182,0.0,0.0,0.018182,0.0,0.018182,0.0,0.018182,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0


In [28]:
len(nn_grouped[nn_grouped["Supermarket"] > 0])

8

Create a new DataFrame for Supermarket data only

In [29]:
nn_mall = nn_grouped[["Neighborhoods","Supermarket"]]
nn_mall

Unnamed: 0,Neighborhoods,Supermarket
0,Avtozavodsky District,0.095238
1,Kanavinsky District,0.03
2,Leninsky District,0.083333
3,Moskovsky District,0.089286
4,Nizhegorodsky District,0.01
5,Prioksky District,0.088889
6,Sormovsky District,0.09434
7,Sovetsky District,0.018182


To understand the best district for opening supermarket and the density of each district it needs to use the clusterization of districts.

Run k-means to cluster the neighborhoods in Nizhny Novgorod into 3 clusters.

In [30]:
# set number of clusters
kclusters = 3

nn_clustering = nn_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(nn_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 2, 1, 1, 0, 1, 1, 0])

In [31]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
nn_merged = nn_mall.copy()

# add clustering labels
nn_merged["Cluster Labels"] = kmeans.labels_

In [32]:
nn_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
nn_merged.head()

Unnamed: 0,Neighborhood,Supermarket,Cluster Labels
0,Avtozavodsky District,0.095238,1
1,Kanavinsky District,0.03,2
2,Leninsky District,0.083333,1
3,Moskovsky District,0.089286,1
4,Nizhegorodsky District,0.01,0


In [33]:
# merge nn_grouped with nn_data to add latitude/longitude for each neighborhood
nn_merged = nn_merged.join(nn_df.set_index("Neighborhood"), on="Neighborhood")

print(nn_merged.shape)
nn_merged.head() # check the last columns!

(8, 5)


Unnamed: 0,Neighborhood,Supermarket,Cluster Labels,Latitude,Longitude
0,Avtozavodsky District,0.095238,1,56.24613,43.85072
1,Kanavinsky District,0.03,2,56.32359,43.95218
2,Leninsky District,0.083333,1,56.27573,43.92093
3,Moskovsky District,0.089286,1,56.33635,43.84272
4,Nizhegorodsky District,0.01,0,56.32324,44.03143


In [34]:
# sort the results by Cluster Labels
print(nn_merged.shape)
nn_merged.sort_values(["Cluster Labels"], inplace=True)
nn_merged

(8, 5)


Unnamed: 0,Neighborhood,Supermarket,Cluster Labels,Latitude,Longitude
4,Nizhegorodsky District,0.01,0,56.32324,44.03143
7,Sovetsky District,0.018182,0,56.30964,44.02209
0,Avtozavodsky District,0.095238,1,56.24613,43.85072
2,Leninsky District,0.083333,1,56.27573,43.92093
3,Moskovsky District,0.089286,1,56.33635,43.84272
5,Prioksky District,0.088889,1,56.24166,43.97934
6,Sormovsky District,0.09434,1,56.3646,43.81774
1,Kanavinsky District,0.03,2,56.32359,43.95218


Finally, let's visualize the resulting clusters

In [5]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(nn_merged['Latitude'], nn_merged['Longitude'], nn_merged['Neighborhood'], nn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

NameError: name 'latitude' is not defined

In [36]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

Let's examine clustering

Cluster 0.

In [37]:
nn_merged.loc[nn_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Supermarket,Cluster Labels,Latitude,Longitude
4,Nizhegorodsky District,0.01,0,56.32324,44.03143
7,Sovetsky District,0.018182,0,56.30964,44.02209


Cluster 1.

In [38]:
nn_merged.loc[nn_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Supermarket,Cluster Labels,Latitude,Longitude
0,Avtozavodsky District,0.095238,1,56.24613,43.85072
2,Leninsky District,0.083333,1,56.27573,43.92093
3,Moskovsky District,0.089286,1,56.33635,43.84272
5,Prioksky District,0.088889,1,56.24166,43.97934
6,Sormovsky District,0.09434,1,56.3646,43.81774


Cluster 2.

In [39]:
nn_merged.loc[nn_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Supermarket,Cluster Labels,Latitude,Longitude
1,Kanavinsky District,0.03,2,56.32359,43.95218


### 4. Conclusion

Most supermarkets are concentrated in Districts for living, not in the center. Fewer supermarkets are in Nizhegorodsky and Sovetsky Districts. Nizhegorodsky District is cultural and business center. It is not available many squares for building and all available squares are expensive. 
And Sovetsky District is also district for living near with center. There are not many supermarkets. Available land is not expensive. 
So it will be a good choice Sovetsky District for building a new Supermarket and potentially profitable.