# The optimal location for a new shopping center - Report

## Table of contents

* [Introduction](#Introduction)
* [Data](#Data)
* [Methodology](#Methodology)
* [Results and Discussion](#Results-and-Discussion)
* [Conclusion](#Conclusion)


# Introduction

Milano is a relatively small city (with ~1.4 million inhabitants and a city area of ~182 km<sup>2</sup>), but it is full of small shops and large shopping centers. For this reason, it is not easy to find an area to open new commercial activities. 

Our target are stakeholders who are interested in establishing a new shopping center in the city. Their interest is obvious: they want to choose a location in which the competition of other shops is not too strong, but at the same time ensuring that enough people will make shopping there. 

In this project, we want to identify the ideal position for a new mall, considering different factors:

* the area for the new shopping center should not have a large number of shops already established
* the area should be lively and animated, and already visited by people for other type of venues
* the area should be as far as possible from the city center, because in the center of the city it is more difficult to find space for a new commercial activity, and construction costs are much higher

We will analyse the possible locations within the city and group the areas according to some indicators and try to identify the area that is best suited for the new shopping center.

# Data

Based on the goal of our analysis, we need the following information:

* a measure of the **density and of the diversity of the shops**, that indicates how "crowded" the area is in terms of commercial activities;
* a measure of the **global number of venues** in the area, regardless of the type of activity; this indicates how lively the area is and how many people are expected to visit the area and hopefully do some shopping there;
* the **distance from the city center**, to get an idea on how feasible is to establish a new shopping center, in terms of costs and of available space.

In practice, we need a *division of the city in suburbs of a similar size*. For this, we make use of the already existing division based on **postal codes** (in Italian: CAP - "Codici di avviamento postale"). The postal codes in Italy are 5-digit numbers, and the codes of Milano are the numbers comprised between 20121 and 20162.

To get **geographical coordinates** for each of the postal codes, we use OpenStreetMap, with the **Nominatim API** of the **geopy library**.

Using the suburb coordinates identified as described in the above, we then use the **Foursquare API** to get the number of shops, their type, and the number of venues in every neighborhood.

# Methodology

In the previous section we have described how to collect the required data.
Once this data is available, we need to process it and extract some numbers that would represent a score for each of the criteria that we have outlined before, that are the basis of our analysis. 

To compute these indicators, we identify an area with a radius of 1 km around the center of the neighborhood. In this area, we extract from Foursquare information about the venues and compute:
* the density of commercial activities, by counting the number of venues that belong to the category "Shop & Service"
(``4d4b7105d754a06378d81259``);
* the diversity of commercial activities, by counting the number of sub-categories of the category "Shop & Service" that are present in the area;
* the density of venues, by counting the total number of venues

The distance of the area from the center is immediate to compute from the latitude and longitude information.

The first step of the analysis is a clustering the data based on these four indicators, to indentify the most promising cluster of neighborhoods in terms of low density and diversity of the commercial activities, high total number of venues, largest distance from the center.

Then, as a second step, we make a finer analysis on the type of commercial activities present in the area. We take the full list of venues of the category "Shop & Service", grouped by the sub-category they belong to. 

The analysis will be done by clustering the data using k-means clustering, and will help identifying the areas that are worth further explorations for indentifying the best location for the shopping area.

# Results and Discussion

Using the list of the Milano postal codes and associating coordinates to each postal code (via the Nominatim API of the geopy library) we have identified the following areas in the city of Milano:

![Screenshot_20200401_113321.png](attachment:Screenshot_20200401_113321.png)

### First step: clustering of all the areas of Milano

For each area we have assigned the following four numbers:
1. the total number of venues
2. the total number of shops
3. the number of types of shops
4. the distance from the city center ("Piazza del Duomo")

Based on these four numbers, we have clustered the areas in 5 groups. This is the map with the areas color-coded according to their belonging to the clusters:

![Screenshot_20200401_130508.png](attachment:Screenshot_20200401_130508.png)

The analysis show that the areas of Milano can be classified in this way:

1. central areas, with a **large number of venues**, a **large number of shops** of diverse type (cluster 2, color cyan)
2. non central areas, with a **large number of venues**, and a **relatively high number of shops** still of diverse type (cluster 4, color orange)
3. non central areas, with a **moderate number of shops** and a **large number of venues** (cluster 0, color red)
4. non central areas, with a **moderate number of shops** and a **moderate number of venues** (cluster 3, color green)
5. non central areas, with a **small number of shops and venues** (cluster 1, violet)

Evidently, **the interesting areas for opening a new shopping center would be the 3rd and 4th group**, with a particular preference for the 3rd group. Areas of the 1st and 2nd group are already too crowded in terms of shops, while areas of the 5th group are too peripheral, as shown by the small number of venues.

### Second step: analysis of the shops in the interesting areas

We have then compiled the list of all the shops from the areas of the 3rd and 4th group, in the attempt to gain a better understanding of the commercial offer in these areas.

Clustering the areas according to the number of shops per each category, we have identified 4 clusters, depicted in the following map.

![Screenshot_20200401_130529.png](attachment:Screenshot_20200401_130529.png)

* The clusters 1 and 2 (violet and cyan dots on the map) have only one area each. These two areas have a large number of shops, with a large number of supermarket (7 and 6 respectively) and a large variety of shops (18 and 19 different types of shops, respectively).

* The cluster 3 (yellow dots on the map) comprises three areas. These areas have a relatively large number of supermarkets (4/5), plus many other shops of different types (there are 16, 19 and 22 different types of shops in these three areas).

* The cluster 0 contains the rest of the areas, for a total of 8 areas. These areas are characterized by a lower number of supermarkets, in comparison with the other clusters (the number of supermarkets ranges from 2 to 5). Apart from these, there is in general less variety of shops.



# Conclusions

In conclusion, we have identified a number of areas in Milano that meet two :
* have a relatively large number of venues,
* have a moderate number of shops, 
* are far from the city center.

We have further restricted our seach by analyzing the type of shops present in these areas. As a consequence, we have identified 8 areas that have the least number of supermarkets and the least variety of shops.

These areas are:
* Segnano (Postal Code 20126)
* Gorla (Postal Code 20127)
* Precotto (Postal Code 20128)
* Cimiano (Postal Code 20132)
* Lambrate (Postal Code 20134)
* Arzaga (Postal Code 20147)
* Bullona (Postal Code 20155)
* Pratocentenaro (Postal Code 20162)

Our final indication for the stakeholders is that there areas appear particularly promising for the location of a new shopping center, and are then worth some additional investigation.