# IBM Applied Data Science Capstone

##  Opening shopping mall in Paris

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a shopping mall. Specifically, this report will be targeted to stakeholders interested in opening an **shopping mall** in **Paris**, France.

Since there are lots of shopping malls in Paris we will try to detect **locations that are not already crowded with shopping malls**. We are also particularly interested in **areas with no shopping malls in vicinity**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

To solve the problem, we will need the following data:<li>
List of neighbourhoods in Hyderabad. This defines the scope of this project which is confined to the city of Hyderabad, the capital city of Telangana, which is in South India<li>
 Latitude and longitude coordinates of those neighbourhoods. This is required in order to plot the map and also to get the venue data <li>
Venue data, particularly data related to shopping malls. We will use this data to perform clustering on the neighbourhoods.

#### Sources of Data and methods to extract the Data

This<a href="https://en.wikipedia.org/wiki/Category:Districts_of_Paris"> Wikipedia page</a> is a list of neighborhoods in Paris, with 29 neighborhoods. I have used web scraping techniques to extract the data from the Wikipedia page, with the help of Python requests and beautiful soup packages. Then we can get the latitude and longitude coordinates of the neighborhoods using Python Geocoder package. After that, I have used the Foursquare API to get the venue data for those neighborhoods.<li>
Foursquare API will provide many categories of the venue data, and we are particularly interested in the Shopping Mall category to help us solve the business problem. This is a project that will make use of many data science skills, from web scraping (Wikipedia), working with API (Foursquare), data cleaning, data wrangling, to machine learning (K-means clustering) and map visualization (Folium)


## Methodology <a name="methodology"> </a>

The Foursquare API allows application developers to interact with the Foursquare platform. The API itself is a RESTful set of addresses to which you can send requests, so there’s really nothing to download onto your server.

**1. Import Libraries**

In [1]:
!pip install geopy
!pip install bs4
!pip install geocoder

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library
print("Libraries imported")

Libraries imported


**2. Scrape data from the Wikipedia page and create a dataframe using Python requests and beautifulsoup packages to extract the list of neighbourhoods data.**

In [2]:
# Send the GET request
url = "https://en.wikipedia.org/wiki/Category:Districts_of_Paris"
data = requests.get(url).text
# Parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')
# Create a list to store neighbourhood data
neighborhoodList = []
# Append the data into the list
for i in range(len(soup.find_all("div", class_="mw-category-group"))):
    if(i):
        for row in soup.find_all("div", class_="mw-category-group")[i].findAll("li"):
            neighborhoodList.append(row.text)
# Create a new DataFrame from the list
df = pd.DataFrame({"Neighborhood": neighborhoodList})
df     

Unnamed: 0,Neighborhood
0,Batignolles
1,"Belleville, Paris"
2,Bercy
3,Cité des Fleurs
4,Cour des miracles
5,Épinettes
6,Faubourg Saint-Antoine
7,Faubourg Saint-Germain
8,Front de Seine
9,Goutte d'Or


**3. Get location coordinates of each neighborhood use Foursquare API. To do so, we will use the Geocoder package that will allow us to convert the address into geographical coordinates in the form of latitude and longitude.**

In [3]:
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Paris,French'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [4]:
neigh_latlng = [ get_latlng(neighborhood) for neighborhood in df["Neighborhood"].tolist() ]

In [5]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_latlng = pd.DataFrame(neigh_latlng, columns=['Latitude', 'Longitude'])

# merge the coordinates into the original dataframe
df['Latitude'] = df_latlng['Latitude']
df['Longitude'] = df_latlng['Longitude']

# check the neighborhoods and the coordinates
df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Batignolles,48.88333,2.31667
1,"Belleville, Paris",48.87018,2.38423
2,Bercy,48.83488,2.38459
3,Cité des Fleurs,48.892615,2.320325
4,Cour des miracles,46.100403,4.323412
5,Épinettes,48.842963,2.325298
6,Faubourg Saint-Antoine,48.85094,2.37567
7,Faubourg Saint-Germain,48.857815,2.323802
8,Front de Seine,48.84935,2.28573
9,Goutte d'Or,48.88504,2.35395


**4. Create map of Paris**

In [6]:
address = "Paris,French"

geolocator = Nominatim(user_agent="Paris_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are -34.5922042, -58.3968701.


In [7]:
map_p = folium.Map(location=[latitude, longitude], zoom_start=12)

# adding markers to map
for latitude, longitude, neighbhorhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighbhorhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_p)  
    
map_p

**5. Use Foursquare to explore neighborhoods**

In [8]:
CLIENT_ID = 'TRMCBJLCFXW2LBN2SOKVQMJ5Z5PZJMVGONFBMY2AYULOD4GD' 
CLIENT_SECRET = 'YUMA5ZTROCR3D2NXGAN2NHGHQLOTVL4OIZ1BDWK0KWMUGXP1'
VERSION = '20210615'

In [9]:
# Get top 100 venues that are within the radius of 2000 meters
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'],
            venue['venue']['location']['lng'], 
            venue['venue']['categories'][0]['name']))

In [10]:
# convert the venues list into a new DataFrame
df_venues = pd.DataFrame(venues)

# define the column names
df_venues.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

df_venues.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Batignolles,48.88333,2.31667,Les Beaux Gamins,48.88364,2.31667,Bar
1,Batignolles,48.88333,2.31667,Marché de Levis,48.88313,2.314958,Farmers Market
2,Batignolles,48.88333,2.31667,Saïdoune,48.884715,2.315185,Lebanese Restaurant
3,Batignolles,48.88333,2.31667,L'Ébéniste du Vin,48.886152,2.317851,Wine Bar
4,Batignolles,48.88333,2.31667,Bistrot du Passage,48.882409,2.317221,French Restaurant


In [11]:
# Number of venues returned for each neighborhood
df_venues.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Batignolles,100,100,100,100,100,100
"Belleville, Paris",100,100,100,100,100,100
Bercy,100,100,100,100,100,100
Cité des Fleurs,100,100,100,100,100,100
Cour des miracles,1,1,1,1,1,1
Faubourg Saint-Antoine,100,100,100,100,100,100
Faubourg Saint-Germain,100,100,100,100,100,100
Front de Seine,100,100,100,100,100,100
Goutte d'Or,100,100,100,100,100,100
Grenelle,100,100,100,100,100,100


In [12]:
# Number of unique categories
print('There are {} uniques categories.'.format(len(df_venues['VenueCategory'].unique())))

There are 176 uniques categories.


**6. Analyze each neighborhood Here we apply one hot encoding to all the venues. So now the number of columns becomes 178**

In [13]:
# one hot encoding
onehot = pd.get_dummies(df_venues[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
onehot['Neighborhoods'] = df_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [onehot.columns[-1]] + list(onehot.columns[:-1])
onehot =onehot[fixed_columns]

onehot.head()

Unnamed: 0,Neighborhoods,African Restaurant,Alsatian Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Auvergne Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Basque Restaurant,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Breton Restaurant,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Canal,Candy Store,Caribbean Restaurant,Cemetery,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comic Shop,Concert Hall,Corsican Restaurant,Cosmetics Shop,Creperie,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Doner Restaurant,Donut Shop,Electronics Store,Empanada Restaurant,Escape Room,Ethiopian Restaurant,Exhibit,Falafel Restaurant,Farmers Market,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,Gourmet Shop,Greek Restaurant,Grocery Store,Gym / Fitness Center,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Island,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Karaoke Bar,Korean BBQ Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Library,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Multiplex,Museum,Music Store,Music Venue,New American Restaurant,Opera House,Outdoor Sculpture,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pizza Place,Planetarium,Playground,Plaza,Pop-Up Shop,Portuguese Restaurant,Provençal Restaurant,Radio Station,Record Shop,Recording Studio,Restaurant,Roof Deck,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Seafood Restaurant,Shopping Plaza,Soccer Field,Southern / Soul Food Restaurant,Spa,Sports Bar,Steakhouse,Street Art,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Thai Restaurant,Theater,Toy / Game Store,Track,Trail,Tram Station,Trattoria/Osteria,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,Batignolles,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Batignolles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Batignolles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Batignolles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
4,Batignolles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [14]:
# Let's group rows by neighborhood and by taking the sum of the frequency of occurrence of each category
grouped = onehot.groupby('Neighborhoods').sum().reset_index()
grouped.head()

Unnamed: 0,Neighborhoods,African Restaurant,Alsatian Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Auvergne Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Basque Restaurant,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Breton Restaurant,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Canal,Candy Store,Caribbean Restaurant,Cemetery,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comic Shop,Concert Hall,Corsican Restaurant,Cosmetics Shop,Creperie,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Doner Restaurant,Donut Shop,Electronics Store,Empanada Restaurant,Escape Room,Ethiopian Restaurant,Exhibit,Falafel Restaurant,Farmers Market,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,Gourmet Shop,Greek Restaurant,Grocery Store,Gym / Fitness Center,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Island,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Karaoke Bar,Korean BBQ Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Library,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Multiplex,Museum,Music Store,Music Venue,New American Restaurant,Opera House,Outdoor Sculpture,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pizza Place,Planetarium,Playground,Plaza,Pop-Up Shop,Portuguese Restaurant,Provençal Restaurant,Radio Station,Record Shop,Recording Studio,Restaurant,Roof Deck,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Seafood Restaurant,Shopping Plaza,Soccer Field,Southern / Soul Food Restaurant,Spa,Sports Bar,Steakhouse,Street Art,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Thai Restaurant,Theater,Toy / Game Store,Track,Trail,Tram Station,Trattoria/Osteria,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,Batignolles,0,0,0,0,2,1,0,0,1,1,3,4,0,0,0,1,0,0,0,2,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,3,0,0,0,19,0,0,0,0,0,0,1,0,1,0,1,0,0,9,1,2,1,1,0,0,4,2,0,0,0,1,0,2,0,1,1,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,2,3,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,2,0,1,1,0,0,0,0,0,1,0,0,0,4,0,0
1,"Belleville, Paris",0,0,0,0,0,1,2,0,1,0,5,4,0,0,0,1,1,1,0,4,0,2,0,0,1,1,0,1,0,0,0,0,0,1,0,1,1,0,0,1,0,2,1,0,0,1,1,4,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,11,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,7,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,2,0,0,1,1,0,0,0,3,0,0,0,0,0,0,4,0,0,1,0,0,0,0,0,0,4,0,0,0,2,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,1,1,0,0,0,0,1,1,0,1,2,4,0,0
2,Bercy,0,0,0,0,0,0,1,0,0,1,2,2,0,0,0,1,3,0,0,2,0,1,0,0,0,0,1,1,0,0,1,0,1,1,1,0,0,1,1,0,0,0,3,0,0,0,1,3,0,0,0,0,0,3,0,0,0,0,1,0,0,0,0,0,0,0,0,1,1,1,1,0,11,0,0,2,0,0,0,0,0,0,0,0,0,0,3,0,1,0,0,0,0,4,1,0,0,0,1,0,0,0,0,1,0,0,0,0,2,1,0,0,0,0,2,2,0,1,0,0,0,1,0,2,1,1,0,0,0,0,0,0,1,0,0,0,0,1,2,0,0,0,0,0,0,1,1,1,0,0,0,0,1,0,3,1,0,0,0,0,1,0,5,0,0,0,1,0,0,1,0,0,3,4,0,0
3,Cité des Fleurs,0,0,1,0,1,0,0,0,1,1,3,4,0,0,0,3,0,0,0,3,0,1,0,0,0,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,3,0,0,0,16,0,0,0,0,0,0,1,0,1,0,1,0,0,5,0,2,1,1,0,0,6,2,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,1,0,0,0,0,2,0,0,2,0,1,0,0,0,0,5,0,0,0,1,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,6,1,0
4,Cour des miracles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [15]:
len(grouped[grouped["Shopping Plaza"] > 0])

1

There are a one shopping mall in Paris which is very low. So now we have to select a suitable location where the number of shopping malls is 0 so that our chances of setting up a shopping mall at that location should be good.

In [16]:
mall = grouped[["Neighborhoods","Shopping Plaza"]]
mall.head()

Unnamed: 0,Neighborhoods,Shopping Plaza
0,Batignolles,0
1,"Belleville, Paris",0
2,Bercy,1
3,Cité des Fleurs,0
4,Cour des miracles,0


**6. Cluster neighborhoods**

Now we need to cluster all the neighbourhoods into 2 clusters. The results will allow us to identify which neighbourhood have a one of shopping mall while which neighbourhoods have a 0 number of shopping malls. Based on the occurrence of shopping malls in different neighbourhoods, it will help us answer the question as to which neighbourhoods are most suitable to open new shopping malls.
We set the number of clusters to 2 and run the algorithm. After applying the K-Means clustering algorithm

In [17]:
# set number of clusters
chclusters = 2

clustering = mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=chclusters, random_state=0).fit(clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0])

In [18]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
merged = mall.copy()

# add clustering labels
merged["Cluster Labels"] = kmeans.labels_

Here the Shopping Mall column represents the number of shopping malls in that particular area and Cluster Labels represents the cluster number (either 0 or 1)

In [19]:
merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
merged.head()

Unnamed: 0,Neighborhood,Shopping Plaza,Cluster Labels
0,Batignolles,0,0
1,"Belleville, Paris",0,0
2,Bercy,1,1
3,Cité des Fleurs,0,0
4,Cour des miracles,0,0


In [20]:
# merge chicago_grouped with chicago_data to add latitude/longitude for each neighborhood
merged = merged.join(df.set_index("Neighborhood"), on="Neighborhood")

merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Shopping Plaza,Cluster Labels,Latitude,Longitude
0,Batignolles,0,0,48.88333,2.31667
1,"Belleville, Paris",0,0,48.87018,2.38423
2,Bercy,1,1,48.83488,2.38459
3,Cité des Fleurs,0,0,48.892615,2.320325
4,Cour des miracles,0,0,46.100403,4.323412


In [21]:
# sort the results by Cluster Labels
merged.sort_values(["Cluster Labels"], inplace=True)
merged

Unnamed: 0,Neighborhood,Shopping Plaza,Cluster Labels,Latitude,Longitude
0,Batignolles,0,0,48.88333,2.31667
25,The Marais,0,0,48.85868,2.36098
24,Saint-Germain-des-Prés,0,0,48.85377,2.33331
23,Quartier des Grandes-Carrières,0,0,48.89088,2.33102
22,Quartier de La Chapelle,0,0,48.884149,2.357046
21,Quartier Pigalle,0,0,48.882026,2.337575
20,Quarters of Paris,0,0,48.85717,2.3414
19,Petit-Montrouge,0,0,48.82642,2.3252
18,Passy,0,0,46.54055,4.54073
17,Paris Rive Gauche,0,0,48.83188,2.33953


## Visualizing the resulting clusters

In [22]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(chclusters)
ys = [i+x+(i*x)**2 for i in range(chclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(merged['Latitude'], merged['Longitude'], merged['Neighborhood'], merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**7. Examine clusters**

In [23]:
# Cluster 0
merged.loc[merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Plaza,Cluster Labels,Latitude,Longitude
0,Batignolles,0,0,48.88333,2.31667
25,The Marais,0,0,48.85868,2.36098
24,Saint-Germain-des-Prés,0,0,48.85377,2.33331
23,Quartier des Grandes-Carrières,0,0,48.89088,2.33102
22,Quartier de La Chapelle,0,0,48.884149,2.357046
21,Quartier Pigalle,0,0,48.882026,2.337575
20,Quarters of Paris,0,0,48.85717,2.3414
19,Petit-Montrouge,0,0,48.82642,2.3252
18,Passy,0,0,46.54055,4.54073
17,Paris Rive Gauche,0,0,48.83188,2.33953


In [24]:
# Cluster 1
merged.loc[merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Plaza,Cluster Labels,Latitude,Longitude
2,Bercy,1,1,48.83488,2.38459


## Results <a name="results"></a>

There are 28 places in cluster 0 which is the highest among the 2 clusters, and cluster 0 contains all the places which do not have a shopping mall. Cluster 1 contains a one places and all of them contain exactly 1 shopping mall. 
We visualize the results of the clustering in the map with cluster 0 in red colour, cluster 1 in purple colour

## Conclusion <a name="conclusion"></a>

Cluster 0 has a 0 number of malls. This represents a great opportunity and high potential areas to open new shopping malls, as there is very little to no competition from existing malls

But for setting up a shopping mall we need to consider other factors such as the cost of rent, the surroundings around the shopping mall, the kind of people in the locality-if it's a luxurious area many people prefer going out, their lifestyle will be different from others and therefore spend a lot. If we decide a place where the competition is less, then we need to consider the people living in that locality as well. If the people in that area spend a lot and love going out then it’ll be a success. If the people staying near the mall don't prefer going out, then it's better to consider some other place with less competition and a good crowd.