### Breve explicacion de los datos y como se obtuvieron.
Los datos empleados provienen de la API gratuita de Yelp, una API que nos permite obtener informacion de empresas
en una determinada ubicacion (dirreccion, rango de precios, ratings, numero se resenas). En este caso informacion sobre restaurantes en los distintos distritos de Madrid.
El primer problema que se ha encontrado es que la API de Yelp devuelve un maximo de 1000 observaciones por llamada, y
al utilizar como ubicacion la ciudad de madrid, la mayoria de esos mil datos son restaurantes del distrito Centro. Para
poder obtener mas de 1000 observaciones y obtener datos de todos los distritos se creara una funcion que reciba como 
parametros las coordenadas del centroide de cada distrito y un radio especifico para cada distrito para tratar de maximizar
los datos por distrito. Luego con un bucle haremos 21 llamadas a la API con esta funcion, variando los parametros mencionados.
Evidentemente los distritos no son circulares y el radio se excedera en muchas ocasiones, dando lugar a datos duplicados.
Tambien, claramente la API no puede recibir como parametro el nombre de un distrito o el poligono para solo devolver datos
de ese distrito.

Por la descripcion mencionada anteriormente podemos ver que un claro porblema (y esto se reflejara mas adelante) es que hay distritos (aproximadamente 3-4) que tendran muy pocas observaciones y probabalemente las conclusiones en estos no sean reales. Para mejorar este trabajo se puediese recabar la informacion con la API de Google Places, esto no se hizo porque dicha API es paga.

In [1]:
# Paquetes adicionales necesarios.
# !pip install shapely
# !pip install folium

In [2]:
import pandas as pd
import folium
import numpy as np
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon

In [3]:
datos = pd.read_csv("data/yelp_data.csv")

In [4]:
# Importamos los datos de los poligonos de los 21 distritos de la Comunidad de Madrid.
import json

with open('data/madrid-districts.geojson.json', 'r', encoding = 'utf-8') as file:
    datos_mapa = json.load(file)

In [5]:
datos.head()

Unnamed: 0.1,Unnamed: 0,id,alias,name,is_closed,url,review_count,categories,rating,transactions,...,phone,display_phone,distance,coordinates.latitude,coordinates.longitude,location.address1,location.city,location.zip_code,location.country,location.state
0,0,rQSFuKAyrkZtRRdOnJglJQ,el-sur-madrid,El Sur,False,https://www.yelp.com/biz/el-sur-madrid?adjust_...,686,"[{'alias': 'tapas', 'title': 'Tapas Bars'}]",4.5,[],...,34915280000.0,+34 915 27 83 40,909.513229,40.411048,-3.699545,"Calle de la Torrecilla del Leal, 12",Madrid,28012.0,ES,M
1,1,BOrDvpojA8U1Z1BBHFY6Fg,takos-al-pastor-madrid,Takos al Pastor,False,https://www.yelp.com/biz/takos-al-pastor-madri...,378,"[{'alias': 'mexican', 'title': 'Mexican'}]",4.5,[],...,34680250000.0,+34 680 24 72 17,58.149932,40.418963,-3.703647,"Calle de la Abada, 2",Madrid,28013.0,ES,M
2,2,l4Y3Qmb510T_hbGzc3WG5g,carmencita-madrid,Carmencita,False,https://www.yelp.com/biz/carmencita-madrid?adj...,148,"[{'alias': 'modern_european', 'title': 'Modern...",4.5,[],...,34915240000.0,+34 915 23 80 73,893.279561,40.426088,-3.707439,"Calle San Vicente Ferrer, 51",Madrid,28015.0,ES,M
3,3,yjGE3mlUOvJfTDZecObTHg,celso-y-manolo-madrid,Celso y Manolo,False,https://www.yelp.com/biz/celso-y-manolo-madrid...,170,"[{'alias': 'tapas', 'title': 'Tapas Bars'}, {'...",4.5,[],...,34915320000.0,+34 915 31 80 79,580.686452,40.420185,-3.697477,"Calle de la Libertad, 1",Madrid,28004.0,ES,M
4,4,uHL7ravKYyrTl07fv_hfUg,rosi-la-loca-madrid,Rosi La Loca,False,https://www.yelp.com/biz/rosi-la-loca-madrid?a...,158,"[{'alias': 'tapas', 'title': 'Tapas Bars'}, {'...",4.5,[],...,34915330000.0,+34 915 32 66 81,310.324098,40.415814,-3.702979,"Calle de Cádiz, 4",Madrid,28012.0,ES,M


In [6]:
datos.drop(["Unnamed: 0"], inplace = True, axis = 1)

In [7]:
len(datos)

15603

In [8]:
# Crear una columna con las coordenadas a partir de la latitud y la longitud.
columna = dict()
for i in range(len(datos)):
    lista = []
    lista.extend([datos["coordinates.longitude"][i], datos["coordinates.latitude"][i]])
    columna[i] = lista

datos["coordinates"] = pd.Series(columna)

In [9]:
datos.head()

Unnamed: 0,id,alias,name,is_closed,url,review_count,categories,rating,transactions,price,...,display_phone,distance,coordinates.latitude,coordinates.longitude,location.address1,location.city,location.zip_code,location.country,location.state,coordinates
0,rQSFuKAyrkZtRRdOnJglJQ,el-sur-madrid,El Sur,False,https://www.yelp.com/biz/el-sur-madrid?adjust_...,686,"[{'alias': 'tapas', 'title': 'Tapas Bars'}]",4.5,[],€€,...,+34 915 27 83 40,909.513229,40.411048,-3.699545,"Calle de la Torrecilla del Leal, 12",Madrid,28012.0,ES,M,"[-3.6995454, 40.4110475]"
1,BOrDvpojA8U1Z1BBHFY6Fg,takos-al-pastor-madrid,Takos al Pastor,False,https://www.yelp.com/biz/takos-al-pastor-madri...,378,"[{'alias': 'mexican', 'title': 'Mexican'}]",4.5,[],€,...,+34 680 24 72 17,58.149932,40.418963,-3.703647,"Calle de la Abada, 2",Madrid,28013.0,ES,M,"[-3.70364716931158, 40.4189626550889]"
2,l4Y3Qmb510T_hbGzc3WG5g,carmencita-madrid,Carmencita,False,https://www.yelp.com/biz/carmencita-madrid?adj...,148,"[{'alias': 'modern_european', 'title': 'Modern...",4.5,[],€€,...,+34 915 23 80 73,893.279561,40.426088,-3.707439,"Calle San Vicente Ferrer, 51",Madrid,28015.0,ES,M,"[-3.707439, 40.426088]"
3,yjGE3mlUOvJfTDZecObTHg,celso-y-manolo-madrid,Celso y Manolo,False,https://www.yelp.com/biz/celso-y-manolo-madrid...,170,"[{'alias': 'tapas', 'title': 'Tapas Bars'}, {'...",4.5,[],€€,...,+34 915 31 80 79,580.686452,40.420185,-3.697477,"Calle de la Libertad, 1",Madrid,28004.0,ES,M,"[-3.6974769, 40.420185]"
4,uHL7ravKYyrTl07fv_hfUg,rosi-la-loca-madrid,Rosi La Loca,False,https://www.yelp.com/biz/rosi-la-loca-madrid?a...,158,"[{'alias': 'tapas', 'title': 'Tapas Bars'}, {'...",4.5,[],€€,...,+34 915 32 66 81,310.324098,40.415814,-3.702979,"Calle de Cádiz, 4",Madrid,28012.0,ES,M,"[-3.70297881289753, 40.4158141746249]"


In [10]:
# Creamos una columna vacia para los Distritos de cada observacion
datos['Distrito'] = np.nan

In [11]:
# Utilizando los datos GeoJson de poligonos de los distritos vemos a que distrito pertenece cada observacion y agregamos esta
# informacion a la columna distrito del dataframe.
for punto in range(len(datos)):
    for distrito in range(21):
        if Polygon(datos_mapa["features"][distrito]["geometry"]["coordinates"][0][0]).contains(Point(datos['coordinates'][punto])):
            datos["Distrito"][punto] = datos_mapa['features'][distrito]['properties']['name']

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  datos["Distrito"][punto] = datos_mapa['features'][distrito]['properties']['name']
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [12]:
datos.head()

Unnamed: 0,id,alias,name,is_closed,url,review_count,categories,rating,transactions,price,...,distance,coordinates.latitude,coordinates.longitude,location.address1,location.city,location.zip_code,location.country,location.state,coordinates,Distrito
0,rQSFuKAyrkZtRRdOnJglJQ,el-sur-madrid,El Sur,False,https://www.yelp.com/biz/el-sur-madrid?adjust_...,686,"[{'alias': 'tapas', 'title': 'Tapas Bars'}]",4.5,[],€€,...,909.513229,40.411048,-3.699545,"Calle de la Torrecilla del Leal, 12",Madrid,28012.0,ES,M,"[-3.6995454, 40.4110475]",Centro
1,BOrDvpojA8U1Z1BBHFY6Fg,takos-al-pastor-madrid,Takos al Pastor,False,https://www.yelp.com/biz/takos-al-pastor-madri...,378,"[{'alias': 'mexican', 'title': 'Mexican'}]",4.5,[],€,...,58.149932,40.418963,-3.703647,"Calle de la Abada, 2",Madrid,28013.0,ES,M,"[-3.70364716931158, 40.4189626550889]",Centro
2,l4Y3Qmb510T_hbGzc3WG5g,carmencita-madrid,Carmencita,False,https://www.yelp.com/biz/carmencita-madrid?adj...,148,"[{'alias': 'modern_european', 'title': 'Modern...",4.5,[],€€,...,893.279561,40.426088,-3.707439,"Calle San Vicente Ferrer, 51",Madrid,28015.0,ES,M,"[-3.707439, 40.426088]",Centro
3,yjGE3mlUOvJfTDZecObTHg,celso-y-manolo-madrid,Celso y Manolo,False,https://www.yelp.com/biz/celso-y-manolo-madrid...,170,"[{'alias': 'tapas', 'title': 'Tapas Bars'}, {'...",4.5,[],€€,...,580.686452,40.420185,-3.697477,"Calle de la Libertad, 1",Madrid,28004.0,ES,M,"[-3.6974769, 40.420185]",Centro
4,uHL7ravKYyrTl07fv_hfUg,rosi-la-loca-madrid,Rosi La Loca,False,https://www.yelp.com/biz/rosi-la-loca-madrid?a...,158,"[{'alias': 'tapas', 'title': 'Tapas Bars'}, {'...",4.5,[],€€,...,310.324098,40.415814,-3.702979,"Calle de Cádiz, 4",Madrid,28012.0,ES,M,"[-3.70297881289753, 40.4158141746249]",Centro


In [13]:
# Cuatos datos de los totales no son duplicados.
len(datos["id"].unique())

5028

In [14]:
# Eliminamos los duplicados
datos = datos.drop_duplicates(subset = ["id"])
len(datos)

5028

In [15]:
# Datos por distrito. Como mencionamos en la introduccion para algunos distritos tenemos muy pocos datos.
datos.Distrito.value_counts()

Centro                 1401
Salamanca               449
Arganzuela              348
Chamartin               334
Chamberi                295
Retiro                  244
Tetuan                  243
Ciudad Lineal           203
Moncloa-Aravaca         167
Fuencarral-El Pardo     166
Carabanchel             148
Latina                  135
Hortaleza               124
San Blas                119
Puente de Vallecas      104
Barajas                  88
Villa de Vallecas        56
Vicalvaro                54
Usera                    48
Moratalaz                39
Villaverde               17
Name: Distrito, dtype: int64

In [16]:
# Cambiar la variable de categorias de precios por un rango de numeros.
datos.loc[datos.price == '€', 'price'] = 1
datos.loc[datos.price == '€€', 'price'] = 2
datos.loc[datos.price == '€€€', 'price'] = 3
datos.loc[datos.price == '€€€€', 'price'] = 4

In [17]:
# Creamos las variables que nos van a permitir agregar al Geojson los datos de yelp a cada distrito que utilizaremos mas
# adelante para las vizualizaciones en el mapa.

In [21]:
avg_rating = datos.groupby(['Distrito']).mean().reset_index()[['Distrito', 'rating']]
dist_avg_rating = dict(zip(list(avg_rating['Distrito']), list(round(avg_rating['rating'], 1))))

In [22]:
mean_reviews = datos.groupby(['Distrito']).mean().reset_index()[['Distrito', 'review_count']]
dist_avg_review_count = dict(zip(list(mean_reviews['Distrito']), list(round((mean_reviews['review_count']), 0))))

In [23]:
precios = datos.groupby(['Distrito', 'price']).count().reset_index()[['Distrito', 'price', 'id']]
d = datos.groupby(['Distrito', 'price']).count().reset_index(['Distrito']).groupby(['Distrito']).max('id')['id']

In [24]:
precios_modales = pd.merge(d, precios, on = ['Distrito', 'id']).rename(columns = {'id' : 'More frequent price'})
dist_prices = dict(zip(list(precios_modales['Distrito']), list(precios_modales['price'])))

In [25]:
# Agrgeamos a los datos del GeoJson el rating medio, numero medio de reviews y precio mas frecuente de cada distrito.
for distrito in range(21):
    datos_mapa['features'][distrito]['properties']['yelp_rating'] = dist_avg_rating[datos_mapa['features'][distrito]['properties']['name']]
    datos_mapa['features'][distrito]['properties']['yelp_reviews'] = dist_avg_review_count[datos_mapa['features'][distrito]['properties']['name']]
    datos_mapa['features'][distrito]['properties']['yelp_price'] = dist_prices[datos_mapa['features'][distrito]['properties']['name']]

In [26]:
datos_mapa

{'type': 'FeatureCollection',
 'features': [{'type': 'Feature',
   'properties': {'name': 'Centro',
    'cartodb_id': 1,
    'created_at': '2013-12-02T07:20:26+0100',
    'updated_at': '2013-12-02T07:20:26+0100',
    'yelp_rating': 4.1,
    'yelp_reviews': 18.0,
    'yelp_price': 2},
   'geometry': {'type': 'MultiPolygon',
    'coordinates': [[[[-3.691853, 40.408527],
       [-3.691893, 40.408377],
       [-3.691919, 40.408167],
       [-3.692368, 40.408309],
       [-3.692541, 40.408438],
       [-3.692594, 40.408467],
       [-3.692657, 40.408475],
       [-3.692796, 40.408454],
       [-3.69398, 40.408134],
       [-3.694102, 40.408179],
       [-3.696566, 40.407529],
       [-3.696634, 40.407509],
       [-3.698068, 40.40711],
       [-3.698772, 40.406915],
       [-3.699676, 40.406683],
       [-3.701244, 40.40625],
       [-3.701663, 40.406245],
       [-3.702577, 40.406379],
       [-3.705198, 40.40686],
       [-3.707094, 40.40725],
       [-3.708229, 40.407459],
       [-3.708

In [27]:
datos.head()

Unnamed: 0,id,alias,name,is_closed,url,review_count,categories,rating,transactions,price,...,distance,coordinates.latitude,coordinates.longitude,location.address1,location.city,location.zip_code,location.country,location.state,coordinates,Distrito
0,rQSFuKAyrkZtRRdOnJglJQ,el-sur-madrid,El Sur,False,https://www.yelp.com/biz/el-sur-madrid?adjust_...,686,"[{'alias': 'tapas', 'title': 'Tapas Bars'}]",4.5,[],2,...,909.513229,40.411048,-3.699545,"Calle de la Torrecilla del Leal, 12",Madrid,28012.0,ES,M,"[-3.6995454, 40.4110475]",Centro
1,BOrDvpojA8U1Z1BBHFY6Fg,takos-al-pastor-madrid,Takos al Pastor,False,https://www.yelp.com/biz/takos-al-pastor-madri...,378,"[{'alias': 'mexican', 'title': 'Mexican'}]",4.5,[],1,...,58.149932,40.418963,-3.703647,"Calle de la Abada, 2",Madrid,28013.0,ES,M,"[-3.70364716931158, 40.4189626550889]",Centro
2,l4Y3Qmb510T_hbGzc3WG5g,carmencita-madrid,Carmencita,False,https://www.yelp.com/biz/carmencita-madrid?adj...,148,"[{'alias': 'modern_european', 'title': 'Modern...",4.5,[],2,...,893.279561,40.426088,-3.707439,"Calle San Vicente Ferrer, 51",Madrid,28015.0,ES,M,"[-3.707439, 40.426088]",Centro
3,yjGE3mlUOvJfTDZecObTHg,celso-y-manolo-madrid,Celso y Manolo,False,https://www.yelp.com/biz/celso-y-manolo-madrid...,170,"[{'alias': 'tapas', 'title': 'Tapas Bars'}, {'...",4.5,[],2,...,580.686452,40.420185,-3.697477,"Calle de la Libertad, 1",Madrid,28004.0,ES,M,"[-3.6974769, 40.420185]",Centro
4,uHL7ravKYyrTl07fv_hfUg,rosi-la-loca-madrid,Rosi La Loca,False,https://www.yelp.com/biz/rosi-la-loca-madrid?a...,158,"[{'alias': 'tapas', 'title': 'Tapas Bars'}, {'...",4.5,[],2,...,310.324098,40.415814,-3.702979,"Calle de Cádiz, 4",Madrid,28012.0,ES,M,"[-3.70297881289753, 40.4158141746249]",Centro


In [32]:
pricer = datos[(datos.Distrito == 'Centro') & (datos.price == 4)]
coor1 = list(reversed(pricer['coordinates'].iloc[[0]].values[0]))
for coor in range(len(pricer)):
    list(reversed(pricer['coordinates'].iloc[[coor]].values[0]))

[40.41776, -3.70027]

In [25]:
# Escribimos un nuevo archivo Json con los nuevos datos.
json_yelp = json.dumps(datos_mapa)
with open('data/json_data_yelp.json', 'w') as outfile:
    outfile.write(json_yelp)

In [26]:
# Escribimos un archivo csv con el nuevo dataframe de los datos de yelp y sus correspondientes distritos.
datos.to_csv("data/yelp_distritos.csv", sep = ",")

In [None]:
# Algunos mapas a usar en streamlit

In [33]:
import branca.colormap as cmp

colores = cmp.LinearColormap(
    ['red','lightblue', 'steelblue', 'blue'],
    vmin = min(list(avg_rating['rating'])), vmax = max(list(avg_rating['rating'])),
    caption = 'Average rating'
)

def estilo_provincias (feature):

    return{ 'radius': 7,
        'fillColor': colores(feature['properties']['yelp_rating']), 
        'color': colores(feature['properties']['yelp_rating']), 
        'weight': 1,
        'opacity' : 1,
        'fillOpacity' : 0.8}

In [34]:
m = folium.Map(location=[40.4309, -3.6878], zoom_start = 10)

folium.GeoJson(datos_mapa, style_function = estilo_provincias,
               tooltip = folium.GeoJsonTooltip(fields = ['name', 'yelp_rating'],
                                                    aliases=['Distrito: ', 'Avg Rating: '],
                                                    style = ("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;"),
                                                    sticky = True
                    )).add_to(m)

colores.add_to(m)
m

In [35]:
colores = cmp.LinearColormap(
    ['red','lightblue', 'steelblue', 'blue'],
    vmin = min(list(mean_reviews['review_count'])), vmax = max(list(mean_reviews['review_count'])),
    caption = 'Number of reviews'
)

def estilo_provincias (feature):

    return{ 'radius': 7,
        'fillColor': colores(feature['properties']['yelp_reviews']), 
        'color': colores(feature['properties']['yelp_reviews']), 
        'weight': 1,
        'opacity' : 1,
        'fillOpacity' : 0.8}

In [36]:
m = folium.Map(location=[40.4309, -3.6878], zoom_start = 10)

folium.GeoJson(datos_mapa, style_function = estilo_provincias,
               tooltip = folium.GeoJsonTooltip(fields = ['name', 'yelp_reviews'],
                                                    aliases=['Distrito: ', 'Avg reviews: '],
                                                    style = ("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;"),
                                                    sticky = True
                    )).add_to(m)

colores.add_to(m)
m

In [45]:
colores = cmp.LinearColormap(
    ['red','lightblue', 'steelblue', 'blue'],
    vmin = min(list(precios_modales['price'])), vmax = max(list(precios_modales['price'])),
    caption = 'Modal Price Category'
)

def estilo_provincias (feature):

    return{ 'radius': 7,
        'fillColor': colores(feature['properties']['yelp_price']), 
        'color': colores(feature['properties']['yelp_price']), 
        'weight': 1,
        'opacity' : 1,
        'fillOpacity' : 0.8}

m = folium.Map(location=[40.4309, -3.6878], zoom_start = 10)

folium.GeoJson(datos_mapa, style_function = estilo_provincias,
               tooltip = folium.GeoJsonTooltip(fields = ['name', 'yelp_price'],
                                                    aliases=['Distrito: ', 'Price category: '],
                                                    style = ("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;"),
                                                    sticky = True
                    )).add_to(m)

colores.add_to(m)
m