### <div align="center">***LIMPIEZA Y ANALISIS DE DATOS***</div>
***
Concatenamos los csv con los datos extraídos de los comercios de los 21 distritos de Madrid.

#### **IMPORTS**

In [31]:
import pandas as pd
from utils import LoadSetCSV

##### **CONCATENACION DE LOS CSV EN UN UNICO DF**

In [32]:
## LECTURA Y CONCATENACION DE CSV

lista_distritos = [
('../data/arganzuela.csv', 'Arganzuela'),               # Arganzuela 
('../data/barajas.csv', 'Barajas'),                    # Barajas  
('../data/carabanchel.csv', 'Carabanchel'),            # Carabanchel  
('../data/centro.csv', 'Centro'),                      # Centro   
('../data/chamartin.csv', 'Chamartin'),                # Chamartin 
('../data/chamberi.csv', 'Chamberi'),                  # Chamberi 
('../data/ciudadlineal.csv', 'Ciudad_Lineal'),         # Ciudad Lineal 
('../data/fuencarral.csv', 'Fuencarral-El_Pardo'),     # Fuencarral - El Pardo 
('../data/hortaleza.csv', 'Hortaleza'),                # Hortaleza  
('../data/latina.csv', 'Latina'),                      # Latina  
('../data/moncloa.csv', 'Moncloa-Aravaca'),            # Moncloa - Aravaca 
('../data/moratalaz.csv', 'Moratalaz'),                # Moratalaz  
('../data/retiro.csv', 'Retiro'),                      # Retiro   
('../data/salamanca.csv', 'Salamanca'),                # Salamanca 
('../data/sanblas.csv', 'San_Blas-Canillejas'),        # San Blas - Canillejas 
('../data/tetuan.csv', 'Tetuan'),                      # Tetuan 
('../data/usera.csv', 'Usera'),                        # Usera  
('../data/puentevallecas.csv', 'Puente_de_Vallecas'),  # Puente de Vallecas  
('../data/vicalvaro.csv', 'Vicalvaro'),                # Vicalvaro  
('../data/villaverde.csv', 'Villaverde'),              # Villaverde 
('../data/villavallecas.csv', 'Villa_de_Vallecas'),    # Villaverde de vallecas
]

df_comercios = LoadSetCSV(lista_distritos)

In [33]:
df_comercios.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1009 entries, 0 to 49
Data columns (total 26 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Unnamed: 0       1009 non-null   int64  
 1   id               1009 non-null   object 
 2   name             1009 non-null   object 
 3   rating           1009 non-null   float64
 4   address          1009 non-null   object 
 5   description      1007 non-null   object 
 6   review           473 non-null    object 
 7   alias            1009 non-null   object 
 8   image_url        926 non-null    object 
 9   is_closed        1009 non-null   bool   
 10  review_count     1009 non-null   int64  
 11  latitude         1009 non-null   float64
 12  longitude        1009 non-null   float64
 13  location         1009 non-null   object 
 14  display_phone    910 non-null    object 
 15  price            651 non-null    object 
 16  hours            0 non-null      float64
 17  transactions    

#### **LIMPIEZA Y EDA**

Tenemos un dataframe compuesto por:
- 920 registros, equivalente al numero de comercios registrados en Madrid como punto de reciclado de aceite.
- 15 columnas
- Hay valores nulos en las columnas 'descripcion','review','image_url', display_phone y 'price'.
- Había 89 registros duplicados que eliminamos mediante el id único.

In [34]:
## ELIMINAMOS COLUMNAS INUTILES

df_comercios.drop(['Unnamed: 0','hours','transactions','languages','attributes','transit','best_time','delivery','pickup','online_ordering'], axis=1, inplace= True)

In [36]:
## COMPROBAMOS DUPLICADOS. 

print(df_comercios['id'].duplicated().any()) # Hay comercios duplicados. Filtramos por id que es único.
df_comercios[df_comercios['id'].duplicated()]

True

In [38]:
## ELIMINAMOS DUPLICADOS

df_comercios = df_comercios.drop_duplicates(subset='id', keep='first') # keep='first' mantiene la primera aparición de cada valor único y elimina las duplicaciones posteriores.

In [39]:
df_comercios.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 920 entries, 0 to 49
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             920 non-null    object 
 1   name           920 non-null    object 
 2   rating         920 non-null    float64
 3   address        920 non-null    object 
 4   description    918 non-null    object 
 5   review         426 non-null    object 
 6   alias          920 non-null    object 
 7   image_url      841 non-null    object 
 8   is_closed      920 non-null    bool   
 9   review_count   920 non-null    int64  
 10  latitude       920 non-null    float64
 11  longitude      920 non-null    float64
 12  location       920 non-null    object 
 13  display_phone  828 non-null    object 
 14  price          587 non-null    object 
 15  Distrito       920 non-null    object 
dtypes: bool(1), float64(3), int64(1), object(11)
memory usage: 115.9+ KB


In [41]:
df_comercios

Unnamed: 0,id,name,rating,address,description,review,alias,image_url,is_closed,review_count,latitude,longitude,location,display_phone,price,Distrito
0,GV-WXC3F4MUzwjhAH_f_XA,El Rincón Asturiano,1.0,"Calle de las Delicias, 26, 28045 Madrid, Spain",Spanish Asturian Tapas Bars,We are here for dinner again on two consecutiv...,el-rincón-asturiano-madrid-2,https://s3-media1.fl.yelpcdn.com/bphoto/cS74hs...,False,32,40.403985,-3.692258,"{'address1': 'Calle de las Delicias, 26', 'add...",+34 915 30 89 68,€€,Arganzuela
1,7zpK35tqV8uFtg9BGwfbRg,Donde da la Vuelta el Viento,5.0,"Calle de Mesón de Paredes, 81, 28012 Madrid, S...",Tapas Bars Spanish Modern European,Great place with friendly staff. I came for ta...,donde-da-la-vuelta-el-viento-madrid,https://s3-media2.fl.yelpcdn.com/bphoto/4YsbpQ...,False,37,40.406190,-3.701441,"{'address1': 'Calle de Mesón de Paredes, 81', ...",+34 910 17 72 40,€,Arganzuela
2,cxypKAKs_zzJ8kvB_6G2Bw,El Valle,4.0,"Calle de Sebastián Herrera, 6, 28012 Madrid, S...",Asturian Tapas Bars,The size of everything you order at this resta...,el-valle-madrid,https://s3-media2.fl.yelpcdn.com/bphoto/RApDyk...,False,21,40.403780,-3.699350,"{'address1': 'Calle de Sebastián Herrera, 6', ...",+34 914 67 70 07,€,Arganzuela
3,J6Mq8jWYD9ntHd0u4OQr9A,La Pequeña Graná,4.0,"Calle de Embajadores, 124, 28045 Madrid, Spain",Tapas Bars Tapas/Small Plates Beer Bar,Delicious tapas with awesome service. We order...,la-pequeña-graná-madrid-2,https://s3-media1.fl.yelpcdn.com/bphoto/BhD7tJ...,False,35,40.399390,-3.698510,"{'address1': 'Calle de Embajadores, 124', 'add...",+34 914 74 26 30,€,Arganzuela
4,om6h-4trsKlw9cOp53QXcg,Hermanos Egea,4.0,"Calle de la Batalla del Salado, 33, 28045 Madr...",Spanish Mediterranean Tapas/Small Plates,"The service was phenomenal, very understanding...",hermanos-egea-madrid-2,https://s3-media2.fl.yelpcdn.com/bphoto/rg5ZUM...,False,3,40.400160,-3.696130,"{'address1': 'Calle de la Batalla del Salado, ...",+34 914 68 46 82,€€,Arganzuela
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
44,bniDNIa2zPuL_9D9Rv9taQ,Telepizza,3.0,"Avenida Albufera, 268, 28018 Madrid, Spain",Pizza,,telepizza-madrid-23,,False,1,40.387513,-3.641045,"{'address1': 'Avenida Albufera, 268', 'address...",+34 917 78 20 15,,Villa_de_Vallecas
46,8RsbMBvbWhR7QwyXEh4yfA,Sureste,3.5,"Avenida de la Gran Vía del Sureste, 16, 28031 ...",Dive Bars Tapas/Small Plates Beer Bar,,sureste-madrid,https://s3-media3.fl.yelpcdn.com/bphoto/Rnhtzc...,False,3,40.364350,-3.588800,{'address1': 'Avenida de la Gran Vía del Sures...,+34 911 70 08 39,,Villa_de_Vallecas
47,ftNQ5ZIit98OvANlk2vTGQ,Eloys,4.0,"Calle Luis Marín, 7, 28038 Madrid, Spain",Tapas/Small Plates Tapas Bars,,eloys-madrid,https://s3-media3.fl.yelpcdn.com/bphoto/KJrV2S...,False,2,40.391239,-3.639460,"{'address1': 'Calle Luis Marín, 7', 'address2'...",,,Villa_de_Vallecas
48,lJwToEknJlkhQzN5S6xCgA,Xin Xin,4.0,"Avenida Rafael Alberti, 18, 28038 Madrid, Spain",Chinese,,xin-xin-madrid,,False,1,40.388060,-3.640203,"{'address1': 'Avenida Rafael Alberti, 18', 'ad...",+34 917 77 47 23,€€,Villa_de_Vallecas


#### **CONVESION A CSV PARA BBDD**

In [40]:
df_comercios.to_csv('../data/df_comerios.csv', index =False) # No se incluye el índice del DataFrame en el archivo CSV.