# SOLVING A BUSINESS OPTIMAL LOCATION PROBLEM
### 1. Problem Description
The main task is to find the best possible location or the most optimal, for a Colombian restaurant in the city of Madrid, Spain. To accomplish this task, an analytical approach will be used, based on advanced Machine Learning techniques and Data Analysis, mainly Clustering and some Data Visualization techniques.

During the process of analysis, several data transformations will be performed to find the best possible data format for the Machine Learning model. Once the data is set up and prepared, a modeling process will be carried out, and this statistical analysis will provide the best possible places to locate the Colombian restaurant.

### 2. Data Presentation

The data that will be used to develop this project is based on two sites:

1. The Foursquare Api: This data will be accesed via Python, and used to obtain the most common venues per neighborhood in the city of Madrid. This way, it is possible to have a taste of how the city's venues are distributed, what the most common places are for leisure, and for the most part, it will provide an idea about the preferences of the inhabitants.

2. The Madrid City Hall's Web Portal, this site provides several data sources of great utility to solve this problem. The files are provided in Excel format, and they are built over a statistical exploitation. The data contains updated information about the inmigrant population per nationality. This data will be analyzed in such a way that one could determine the best location of a new venue/restaurant/other based on people's nationalities. For the sake of simplicity, it will be assumed for this exercise that people's likes varies according to their nationality, and that people from one specific country will be more attracted to place that matches the environment and culture of their own countries, rather than the ones from foreign countries.


## Let's see what the data looks like

In [3]:
import pandas as pd

data = pd.read_excel('DEMOGRAFIAMADRID.xls', sheet_name = 'DEMOGRAFIAYPOBLACION', skiprows = 12)

In [4]:
data.drop(columns = ['Unnamed: 0'], inplace = True)
data.columns = ['Country of Origin', 'Total Ciudad de Madrid', 'Centro', 'Arganzuela', 'Retiro', 'Salamanca', 'Chamartin',
                'Tetuán', 'Chamberí', 'Fuencarral-El Pardo', 'Moncloa-Aravaca', 'Latina', 'Carabanchel',
                'Usera', 'Puente de Vallecas', 'Moratalaz', 'Ciudad Lineal', 'Hortaleza', 'Villaverde',
                'Villa de Vallecas', 'Vicálvaro', 'San Blas-Canillejas', 'Barajas']
data.to_csv('Madrid Districts.csv')
data.head(35)

Unnamed: 0,Country of Origin,Total Ciudad de Madrid,Centro,Arganzuela,Retiro,Salamanca,Chamartin,Tetuán,Chamberí,Fuencarral-El Pardo,...,Usera,Puente de Vallecas,Moratalaz,Ciudad Lineal,Hortaleza,Villaverde,Villa de Vallecas,Vicálvaro,San Blas-Canillejas,Barajas
0,Rumanía,45036.0,815.0,754.0,480.0,753.0,680.0,1468.0,597.0,1830.0,...,2241.0,4784.0,1286.0,2888.0,1466.0,3646.0,3384.0,2606.0,2929.0,661.0
1,China,37276.0,1508.0,1356.0,564.0,755.0,652.0,1988.0,816.0,1733.0,...,9207.0,3602.0,564.0,1960.0,1104.0,1236.0,685.0,472.0,972.0,190.0
2,Ecuador,23953.0,647.0,741.0,265.0,619.0,380.0,1395.0,453.0,632.0,...,1806.0,3290.0,491.0,2471.0,401.0,2017.0,498.0,439.0,1015.0,138.0
3,Venezuela,23359.0,1563.0,913.0,638.0,1564.0,933.0,1310.0,794.0,1428.0,...,875.0,1829.0,480.0,1858.0,1434.0,909.0,762.0,321.0,1486.0,314.0
4,Colombia,22618.0,998.0,717.0,483.0,803.0,551.0,822.0,659.0,999.0,...,1752.0,1733.0,482.0,1792.0,910.0,1618.0,740.0,384.0,1282.0,258.0
5,Marruecos,21909.0,1101.0,390.0,184.0,322.0,280.0,1393.0,320.0,930.0,...,942.0,3437.0,258.0,1011.0,426.0,3372.0,1655.0,802.0,649.0,333.0
6,Italia,20308.0,3030.0,1219.0,840.0,1817.0,1060.0,1194.0,1640.0,1195.0,...,412.0,704.0,310.0,1258.0,1109.0,330.0,427.0,189.0,786.0,337.0
7,Perú,18829.0,563.0,521.0,253.0,612.0,419.0,965.0,567.0,805.0,...,1131.0,2079.0,668.0,1726.0,603.0,1280.0,564.0,338.0,810.0,106.0
8,Paraguay,18682.0,364.0,474.0,237.0,521.0,657.0,3311.0,584.0,1024.0,...,727.0,1354.0,360.0,1619.0,583.0,870.0,217.0,199.0,581.0,151.0
9,República Dominicana,17511.0,365.0,654.0,204.0,344.0,322.0,2272.0,443.0,589.0,...,1202.0,1989.0,223.0,1581.0,359.0,1881.0,296.0,151.0,889.0,103.0


The dataset outlined shown above will be used to find the optimal business location in combination with the Foursquare API data, it should be enough to carry out a good analytical approach to solve this problem.

### 3. Target Audience

The target audience of this project could be any business owner that is planning to open a new business. Since this approach could be aplicable not only to Colombian Food restaurant but to other kind of businesses, anybody who is considering to run a new business local or even relocate it.