# Introduction/Business Problem:

### In this capstone project I have chosen to find out the best location in Madrid to open a French style bakery based on the most visited locations.

# Background Discussion:

### Although, in general terms, the bread market in Spain tends to shrink from year to year, it should be noted that there are also many changes in the types of offerings being marketed.
### Specialty and artisan breads are increasingly in demand by a significant contingent of consumers. This has led to the emergence of new specialized bakeries that provide very diversified offerings of high quality and greater added value.
### In general, bread is mainly channeled through the specialized trade (51.6%) and supermarkets (34.8%), as the latter are the main channels for these products. (34.8%), while the hypermarkets (7.4%) and the (7.4%) and other channels (6.2%) are less represented. are less represented.
### Although Madrid is not one of the cities that consumes the most bread per capita, it is one of the cities that spends the most compared to other autonomous communities.


# Data methods to use:

### In order to accomplish this I want to find the best bakeries within a radius of 40 km from the Plaza Puerta del Sol in Madrid capital. Then I will use the folio to find the best place to put a bakery. We don't want to have two sites of the same typology close to each other. After showing those data in a data frame, I will show on a folio map the locations so that it is easy to see the clusters and understand where the most populated areas are. Then, the best place to open our site will be proposed.

# Who would be interested in this?:

### Our project can be of great help to both individuals and real estate investment consultants specialized in buying and selling real estate.

# Data that will be used:

### I will use the Foursquare API to bring in venues located near Madrid.

# Methodology:

### I plan to use Foursquare API to bring the code in and Folium to show the location of the bakery so that I can determine where a good location might be for a new business.

# Project:

### Import all the libraries necessary.

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: \ 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/noarch::ibm-wsrt-py37main-keep==0.0.0=1962
  - defaults/noarch::ibm-wsrt-py37main-main==custom=1962
  - conda-forge/linux-64::pytorch==1.8.0=cpu_py37hafa7651_0
done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: | 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/noarch::ibm-wsrt-py37main-keep==0.0.0=1962
  - defaults/noarch::ibm-wsrt-py37main-main==custom=1962
  - conda-forge/linux-64::pytorch==1.8.0=cpu_py37hafa7651_0
done

## Package Plan ##

  environment location

In [30]:
url = 'https://raw.githubusercontent.com/sltg-92/test/main/Madrid.csv'
data = pd.read_csv(url, sep=';', header=[0])

print(data.shape)
data.head(20)

(131, 5)


Unnamed: 0,Postal Code,Name,Neighborhood,Latitude,Longitude
0,28050,FUENCARRAL-EL PARDO,VALVERDE,40.497887,-3.679938
1,28050,HORTALEZA,VALDEFUENTES,40.493046,-3.635102
2,28027,CIUDAD LINEAL,SAN JUAN BAUTISTA,40.450695,-3.656308
3,28027,CIUDAD LINEAL,SAN PASCUAL,40.442683,-3.653454
4,28027,HORTALEZA,PIOVERA,40.455438,-3.63596
5,28027,SAN BLAS-CANILLEJAS,EL SALVADOR,40.444125,-3.627987
6,28042,HORTALEZA,PALOMAS,40.452238,-3.614116
7,28042,SAN BLAS-CANILLEJAS,CANILLEJAS,40.447322,-3.608456
8,28042,BARAJAS,ALAMEDA DE OSUNA,40.457581,-3.587975
9,28022,SAN BLAS-CANILLEJAS,REJAS,40.444082,-3.569656


## We have a complete dataset of all District. Next we will map the neighborhood on the map of Madrid, using folium pacakge

In [4]:
# getting the lat & lon of Toronto using geocoder
address = 'Madrid, España'

geolocator = Nominatim(user_agent="madrid_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Madrid are {}, {}.'.format(latitude, longitude))

The coordinates of Madrid are 40.4167047, -3.7035825.


In [20]:
map_Madrid = folium.Map(location=[40.4167047, -3.7035825], zoom_start=11)

# adding markers to map
for latitude, longitude, name, neighborhood in zip(data['Latitude'], data['Longitude'], data['Name'], data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_Madrid)  
    
map_Madrid

## Now we will initialize Foursqare API and 'explore' the venue data for all the Neighborhoods

In [21]:
CLIENT_ID = 'DJOPEKTYQH0YT04MQGNYX2R5F12RULW3THGSS0YRWGHN5OPS' 
CLIENT_SECRET = 'A43RAXWFTQ2ZLBRFYD0TSA40KRLMPXFTMODET32RG5KTQXGY'
VERSION = '20180604' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DJOPEKTYQH0YT04MQGNYX2R5F12RULW3THGSS0YRWGHN5OPS
CLIENT_SECRET:A43RAXWFTQ2ZLBRFYD0TSA40KRLMPXFTMODET32RG5KTQXGY


In [24]:
# type your answer here
LIMIT = 500 # Maximum is 100
cities = ['Madrid']
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "4bf58dd8d48988d16a941735") # BAKERY PLACE CATEGORY ID
    results[city] = requests.get(url).json()

In [25]:
df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']

  app.launch_new_instance()


In [26]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  
    print(f"Total number of bakeries in {city} = ", results[city]['response']['totalResults'])
    print("Showing Top 100")

Total number of bakeries in Madrid =  157
Showing Top 100


In [27]:
maps[cities[0]]

### As we can see, the largest number of bakeries is concentrated in the north center of Madrid.
### We will now look for the average distance between these bakeries.

In [28]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)
    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="green", weight=1.5, opacity=0.5).add_to(maps[city])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(maps[city])

    print(city)
    print("Mean Distance from Mean coordinates")
    print(np.mean(np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)))

Madrid
Mean Distance from Mean coordinates
0.029770239328859704


In [29]:
maps[cities[0]]

## Los análisis anteriores nos arrojan que debido al la densidad de panaderías que encontramos en el centro norte de Madrid. Sería prudente buscar opciones de inversión para el sur con el objetivo de brindar un servicio que no esta tan generalizado. Algunos de los candidatos sería (Aluche, Carabanchel, Villaverde, Camapamento, Useras, Puente de vallecas, etc)

### As an extension of this project, it would be necessary to study the purchasing and consumption levels in these neighborhoods in order to determine with certainty the best location for our bakery. 

# Project end