### <div align="center">***LIMPIEZA Y ANALISIS DE DATOS***</div>
***
Concatenamos los csv con los datos extraídos de los comercios de los 21 distritos de Madrid.

#### **IMPORTS**

In [35]:
import pandas as pd
from utils.utils import LoadSetCSV
import folium

##### **CONCATENACION DE LOS CSV EN UN UNICO DF**

In [36]:
## LECTURA Y CONCATENACION DE CSV

lista_distritos = [
('../data/arganzuela.csv', 'Arganzuela'),              # Arganzuela 
('../data/barajas.csv', 'Barajas'),                    # Barajas  
('../data/carabanchel.csv', 'Carabanchel'),            # Carabanchel  
('../data/centro.csv', 'Centro'),                      # Centro   
('../data/chamartin.csv', 'Chamartin'),                # Chamartin 
('../data/chamberi.csv', 'Chamberi'),                  # Chamberi 
('../data/ciudadlineal.csv', 'Ciudad_Lineal'),         # Ciudad Lineal 
('../data/fuencarral.csv', 'Fuencarral-El_Pardo'),     # Fuencarral - El Pardo 
('../data/hortaleza.csv', 'Hortaleza'),                # Hortaleza  
('../data/latina.csv', 'Latina'),                      # Latina  
('../data/moncloa.csv', 'Moncloa-Aravaca'),            # Moncloa - Aravaca 
('../data/moratalaz.csv', 'Moratalaz'),                # Moratalaz  
('../data/retiro.csv', 'Retiro'),                      # Retiro   
('../data/salamanca.csv', 'Salamanca'),                # Salamanca 
('../data/sanblas.csv', 'San_Blas-Canillejas'),        # San Blas - Canillejas 
('../data/tetuan.csv', 'Tetuan'),                      # Tetuan 
('../data/usera.csv', 'Usera'),                        # Usera  
('../data/puentevallecas.csv', 'Puente_de_Vallecas'),  # Puente de Vallecas  
('../data/vicalvaro.csv', 'Vicalvaro'),                # Vicalvaro  
('../data/villaverde.csv', 'Villaverde'),              # Villaverde 
('../data/villavallecas.csv', 'Villa_de_Vallecas'),    # Villaverde de vallecas  
]

df_comercios = LoadSetCSV(lista_distritos)

In [37]:
df_comercios.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1039 entries, 0 to 49
Data columns (total 27 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Unnamed: 0       1039 non-null   int64  
 1   id               1039 non-null   object 
 2   name             1039 non-null   object 
 3   alias            1039 non-null   object 
 4   categories       1039 non-null   object 
 5   description      1039 non-null   object 
 6   latitude         1039 non-null   float64
 7   longitude        1039 non-null   float64
 8   address          1039 non-null   object 
 9   phone            995 non-null    float64
 10  rating           1039 non-null   float64
 11  review_count     1039 non-null   int64  
 12  review           667 non-null    object 
 13  url              1039 non-null   object 
 14  image_url        996 non-null    object 
 15  is_closed        1039 non-null   bool   
 16  price            822 non-null    object 
 17  hours           

#### **LIMPIEZA Y EDA**

Tenemos un dataframe compuesto por:
- 712 registros, equivalente al numero de comercios registrados en Madrid como punto de reciclado de aceite.
- 14 columnas: id, name, alias, descripcion, latitude, longitude, address, phone, rating, review_count, reviez, url, image_url, price.
- Hay valores nulos en las columnas 'review','image_url', 'phone' y 'price'.
- Había 327 registros duplicados que eliminamos mediante el id único.

In [38]:
## ELIMINAMOS COLUMNAS INUTILES

df_comercios.drop(['Unnamed: 0','hours','transactions','languages','attributes','transit','best_time','delivery','pickup','online_ordering','categories', 'is_closed'], axis=1, inplace= True)

In [39]:
## COMPROBAMOS DUPLICADOS. 

print(df_comercios['id'].duplicated().any()) # Hay comercios duplicados. Filtramos por id que es único.
df_comercios[df_comercios['id'].duplicated()]

True


Unnamed: 0,id,name,alias,description,latitude,longitude,address,phone,rating,review_count,review,url,image_url,price,origin
0,m-suxON2HwsWYImvTNX1Jw,Botín,botín-madrid-3,Spanish Steakhouses,40.414060,-3.708030,"Calle de Cuchilleros, 17, 28005 Madrid, Spain",3.491366e+10,3.0,648,Historical restaurant that we felt honored to ...,https://www.yelp.com/biz/bot%C3%ADn-madrid-3?a...,https://s3-media2.fl.yelpcdn.com/bphoto/UpGNFa...,€€€,Centro
1,S-Md-BF6C53iB2OlB5BCww,Casa Lucas,casa-lucas-madrid,Tapas Bars Wine Bars,40.412325,-3.709429,"Calle de la Cava Baja, 30, 28005 Madrid, Spain",3.491365e+10,5.0,157,Great little tapas wine bar on Calle cava baja...,https://www.yelp.com/biz/casa-lucas-madrid?adj...,https://s3-media4.fl.yelpcdn.com/bphoto/xqe6Mv...,€€,Centro
2,afcRqfWrcA1soyoNI__FRQ,Juana la Loca,juana-la-loca-madrid-2,Tapas/Small Plates Spanish,40.411358,-3.711094,"Plaza de Puerta de Moros, 4, 28005 Madrid, Spain",3.491364e+10,5.0,220,"This is my first meal in Spain, that is easily...",https://www.yelp.com/biz/juana-la-loca-madrid-...,https://s3-media2.fl.yelpcdn.com/bphoto/NgunxE...,€€€,Centro
3,uHL7ravKYyrTl07fv_hfUg,Rosi La Loca,rosi-la-loca-madrid,Tapas Bars Spanish,40.415814,-3.702979,"Calle de Cádiz, 4, 28012 Madrid, Spain",3.491533e+10,4.0,194,Rosie La Loca is the result of someone saying ...,https://www.yelp.com/biz/rosi-la-loca-madrid?a...,https://s3-media3.fl.yelpcdn.com/bphoto/SN6sHZ...,€€€,Centro
4,iNtK9yWv7wOXaPai3uAv-g,InClan Brutal,inclan-brutal-madrid,Tapas Bars Mediterranean,40.415080,-3.701920,"Calle Álvarez Gato, 4, 28012 Madrid, Spain",3.491024e+10,5.0,171,Amazing cocktails and good service!!!!! Also l...,https://www.yelp.com/biz/inclan-brutal-madrid?...,https://s3-media3.fl.yelpcdn.com/bphoto/qnACZF...,€€,Centro
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27,jgBTxwO0CDJQnxT5IpNcbw,Sienna Vicalvaro,sienna-vicalvaro-madrid,Ice Cream & Frozen Yogurt Cafeteria,40.402510,-3.606250,"Calle Galeote, 1, 28032 Madrid, Spain",3.465689e+10,4.5,4,,https://www.yelp.com/biz/sienna-vicalvaro-madr...,https://s3-media4.fl.yelpcdn.com/bphoto/KpU7n1...,€€€,Villa_de_Vallecas
28,Clsa6wDBgxsgowTGueMLnQ,Kapicúa,kapicúa-madrid-2,Tapas Bars,40.382820,-3.607890,"Calle de Puentelarra, 7, 28031 Madrid, Spain",,5.0,1,,https://www.yelp.com/biz/kapic%C3%BAa-madrid-2...,https://s3-media3.fl.yelpcdn.com/bphoto/FiUidY...,,Villa_de_Vallecas
31,eBgwj5HxASNHuIj7aQntLA,La Birra es Bella,la-birra-es-bella-madrid,Spanish,40.396825,-3.621028,"Bulevar de José Prat, 32, 28032 Madrid, Spain",3.467877e+10,4.0,1,,https://www.yelp.com/biz/la-birra-es-bella-mad...,https://s3-media2.fl.yelpcdn.com/bphoto/Ba4GI4...,,Villa_de_Vallecas
37,lozi5r68Dhp0tGEP2z5ltw,Reina Banana,reina-banana-madrid,Vegan,40.402233,-3.608465,"Calle del Lago Titicaca, 10, 28032 Madrid, Spain",3.491255e+10,5.0,1,,https://www.yelp.com/biz/reina-banana-madrid?a...,https://s3-media3.fl.yelpcdn.com/bphoto/Z2Pa8Z...,€€,Villa_de_Vallecas


In [40]:
## ELIMINAMOS DUPLICADOS

df_comercios = df_comercios.drop_duplicates(subset='id', keep='first') # keep='first' mantiene la primera aparición de cada valor único y elimina las duplicaciones posteriores.

In [41]:
df_comercios.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 712 entries, 0 to 49
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            712 non-null    object 
 1   name          712 non-null    object 
 2   alias         712 non-null    object 
 3   description   712 non-null    object 
 4   latitude      712 non-null    float64
 5   longitude     712 non-null    float64
 6   address       712 non-null    object 
 7   phone         672 non-null    float64
 8   rating        712 non-null    float64
 9   review_count  712 non-null    int64  
 10  review        412 non-null    object 
 11  url           712 non-null    object 
 12  image_url     674 non-null    object 
 13  price         527 non-null    object 
 14  origin        712 non-null    object 
dtypes: float64(4), int64(1), object(10)
memory usage: 89.0+ KB


In [42]:
df_comercios.head(3)

Unnamed: 0,id,name,alias,description,latitude,longitude,address,phone,rating,review_count,review,url,image_url,price,origin
0,iNtK9yWv7wOXaPai3uAv-g,InClan Brutal,inclan-brutal-madrid,Tapas Bars Mediterranean,40.41508,-3.70192,"Calle Álvarez Gato, 4, 28012 Madrid, Spain",34910240000.0,5.0,171,Amazing cocktails and good service!!!!! Also l...,https://www.yelp.com/biz/inclan-brutal-madrid?...,https://s3-media3.fl.yelpcdn.com/bphoto/qnACZF...,€€,Arganzuela
1,afcRqfWrcA1soyoNI__FRQ,Juana la Loca,juana-la-loca-madrid-2,Tapas/Small Plates Spanish,40.411358,-3.711094,"Plaza de Puerta de Moros, 4, 28005 Madrid, Spain",34913640000.0,5.0,220,"This is my first meal in Spain, that is easily...",https://www.yelp.com/biz/juana-la-loca-madrid-...,https://s3-media2.fl.yelpcdn.com/bphoto/NgunxE...,€€€,Arganzuela
2,uHL7ravKYyrTl07fv_hfUg,Rosi La Loca,rosi-la-loca-madrid,Tapas Bars Spanish,40.415814,-3.702979,"Calle de Cádiz, 4, 28012 Madrid, Spain",34915330000.0,4.0,194,Rosie La Loca is the result of someone saying ...,https://www.yelp.com/biz/rosi-la-loca-madrid?a...,https://s3-media3.fl.yelpcdn.com/bphoto/SN6sHZ...,€€€,Arganzuela


In [43]:
map_madrid = folium.Map(location=[40.427919,-3.680877], zoom_start=14)

for (index, row) in df_comercios.iterrows():
    folium.Marker(location = [row.loc["latitude"], row.loc["longitude"]],
    #popup = row.loc["rotulo"] + " " + row["desc_distrito_local"],
    tooltip = "click").add_to(map_madrid)

map_madrid

#### **CONVESION A CSV PARA BBDD**

In [44]:
df_comercios.to_csv('../data/df_comerios.csv', index =False) # No se incluye el índice del DataFrame en el archivo CSV.