<h1 style="text-align: center;">Ejemplo de ETL de dataset


En este curso trabajaremos con datos referentes a los alojamientos vacacionales en Canarias durante el periodo 2010-2021. Trataremos tres bases de datos: 

- Características de las propiedades que están o han estado abiertas durante ese periodo en Canarias [Fichero Airbnb_properties](https://alumnosulpgc-my.sharepoint.com/:x:/g/personal/juan_hernandez_ulpgc_es/EeYrbeFBYR1OjINagjy6MisBFCnHh2lLxlqM_-nkYiVZ_w?e=oWLGDp).
- Información de los clientes y comentarios dejados sobre las viviendas [Fichero Airbnb_reviews](https://alumnosulpgc-my.sharepoint.com/:x:/g/personal/juan_hernandez_ulpgc_es/EQhLfFpd9epGpxGenMQx5u8BpfnzRfs_VG51FwXP6tnflQ?e=mBqXfm).
- Información sobre los anfitriones [fichero Airbnb_host](https://alumnosulpgc-my.sharepoint.com/:x:/g/personal/juan_hernandez_ulpgc_es/Eaa3oetSDgZAtc5xa0hfAzMBHnMYfaGVZsGrUujOMgPv-Q?e=LbOIne).

La información de las variables se encuentra en el siguiente [glosario](https://alumnosulpgc-my.sharepoint.com/:w:/g/personal/juan_hernandez_ulpgc_es/ESzatil3opNJs04UDr3nqqkBechOAjvPe4Wk6rtKJIDJSQ?e=Sdu0Dv)
  
En primer lugar, nos hacemos una pregunta general de investigación, que es la que va a guiar nuestro análisis: ¿Qué factores determinan el éxito comercial de un alojamiento Airbnb en Canarias? Durante el estudio trataremos de averiguar cómo está la situación y qué debería hacerse para incrementar el éxito comercial de estos alojamientos. 

**Nota teórica** 

Según Hopken y Fuchs (2018), los datos útiles para realizar un trabajo de BI se pueden clasificar en tres: 

- Recursos: Cubren los recursos turísticos y no turísticos proporcionado o disponibles por el negocio (características del alojamientos, financiación, etc.)
- Performance: Cubren los indicadores de desempeño y económicos (ocupación, ingresos, etc.), así como satisfacción de los clientes.
- Demanda: Cubren información de los clientes, así como factores externos que influyen en la demanda (condiciones climáticas, eventos, etc.)

**Fin de la nota teórica** 

Recorrer las primeras etapas de un proceso de BI, esto es, extracción, transformación y almacenamiento de la información. Para ello, se realizarán las siguentes tareas: 

- Identificación de datos faltantes o incorrectos. 
- Limpieza de los datos, tratando los datos faltantes según un criterio razonable. 
- Eliminación de información no relevante o confusa (variables categóricas). 
- Clasificar datos según recursos, performance y demanda.
- Identificación y eliminación de datos duplicados.
- Homogeneización de la información: Normalización de variables continuas; Agrupación de categorías en variables con un elevado número de las mismas.
- Enriquecer los datos con información de otras fuentes, como por ejemplo, GDP y temperatura media del origen del usuario. Pensar en otras fuentes de datos que enriquezcan la información.
- Almacenar los datos en un formato adecuado. No eliminar las fuentes originales y sin normalizar.


In [1]:
import pandas as pd


# Leer el archivo de datos
#host_df = pd.read_csv('CI_Airbnb_Host_Sept21.csv')
#review_df = pd.read_csv('CI_Airbnb_Review_Sept21.csv')
property_df = pd.read_csv('properties.csv')


ParserError: Error tokenizing data. C error: Expected 1 fields in line 15, saw 82570


In [844]:
# Extiende la visualización de las columnas
pd.set_option('display.max_columns', None)
property_df.head()


Unnamed: 0,Property ID,Listing Title,Property Type,Listing Type,Created Date,Last Scraped Date,Country,State,City,Zipcode,Neighborhood,Metropolitan Statistical Area,Currency Native,Average Daily Rate (USD),Average Daily Rate (Native),Annual Revenue LTM (USD),Annual Revenue LTM (Native),Occupancy Rate LTM,Number of Bookings LTM,Number of Reviews,Bedrooms,Bathrooms,Max Guests,Calendar Last Updated,Response Rate,Airbnb Response Time (Text),Airbnb Superhost,HomeAway Premier Partner,Cancellation Policy,Security Deposit (USD),Security Deposit (Native),Cleaning Fee (USD),Cleaning Fee (Native),Extra People Fee (USD),Extra People Fee (Native),Published Nightly Rate (USD),Published Monthly Rate (USD),Published Weekly Rate (USD),Check-in Time,Checkout Time,Minimum Stay,Count Reservation Days LTM,Count Available Days LTM,Count Blocked Days LTM,Number of Photos,Instantbook Enabled,Listing URL,Listing Main Image URL,Listing Images,Latitude,Longitude,Exact Location,Overall Rating,Airbnb Communication Rating,Airbnb Accuracy Rating,Airbnb Cleanliness Rating,Airbnb Checkin Rating,Airbnb Location Rating,Airbnb Value Rating,Pets Allowed,Integrated Property Manager,Amenities,HomeAway Location Type,Airbnb Property Plus,Airbnb Home Collection,License,Airbnb Property ID,Airbnb Host ID,HomeAway Property ID,HomeAway Property Manager ID
0,ab-48306612,Inviting 3-Bed Villa in Playa Blanca,Villa,Entire home/apt,2021-02-28,2021-10-11,Spain,Las Palmas,Yaiza,,Playa Blanca,,EUR,320.88,272.03,19253.0,16322.0,0.392,18.0,0.0,3.0,2.0,6.0,,96.0,within an hour,False,,super_strict_60,,,23.0,,,,311.0,,,After 4:00 PM,10:00 AM,1.0,60.0,93.0,0.0,29.0,True,https://www.airbnb.com/rooms/48306612,https://a0.muscache.com/im/pictures/bfe747a5-3...,['https://a0.muscache.com/im/pictures/bfe747a5...,28.87018,-13.84505,False,,,,,,,,False,,"[""wireless_internet"", ""kitchen"", ""pool"", ""iron...",,False,,VV-35-3-0002942,48306612.0,310835509.0,,
1,ab-48306645,El loft de Mila,Guest suite,Entire home/apt,2021-07-19,2021-10-09,Spain,Santa Cruz de Tenerife,San Cristóbal de La Laguna,,,,EUR,41.45,35.0,1368.0,1155.0,0.359,7.0,5.0,1.0,1.0,2.0,,100.0,within an hour,False,,flexible,,,,,,,42.0,,,Flexible,,2.0,33.0,59.0,0.0,13.0,True,https://www.airbnb.com/rooms/48306645,https://a0.muscache.com/im/pictures/c2a7b84d-b...,['https://a0.muscache.com/im/pictures/c2a7b84d...,28.53897,-16.36011,False,100.0,10.0,10.0,10.0,10.0,9.0,10.0,False,,"[""wireless_internet"", ""kitchen"", ""iron"", ""hair...",,False,,,48306645.0,389885911.0,,
2,ab-48306649,Stunning 2-Bed Villa in Playa Blanca,Villa,Entire home/apt,2021-03-02,2021-10-13,Spain,Las Palmas,Yaiza,,Playa Blanca,,EUR,197.66,167.41,6325.0,5357.0,0.222,10.0,0.0,2.0,2.0,4.0,,96.0,within an hour,False,,super_strict_60,,,23.0,,,,191.0,,,After 4:00 PM,10:00 AM,1.0,32.0,112.0,9.0,24.0,True,https://www.airbnb.com/rooms/48306649,https://a0.muscache.com/im/pictures/c1d1d47c-7...,['https://a0.muscache.com/im/pictures/c1d1d47c...,28.86526,-13.79952,False,,,,,,,,False,,"[""dishwasher"", ""kitchen"", ""wireless_internet"",...",,False,,vv,48306649.0,310835509.0,,
3,ab-48306844,Most Beautiful Location South Tenerife Sea Views,Condominium,Entire home/apt,2021-03-04,2021-06-19,Spain,Santa Cruz de Tenerife,Adeje,,,,EUR,,,0.0,0.0,,0.0,0.0,1.0,1.0,4.0,,81.0,within a few hours,False,,strict_14_with_grace_period,,,73.0,,,,85.0,,,2:00 PM - 11:00 PM,12:00 PM,5.0,,,,77.0,False,https://www.airbnb.com/rooms/48306844,https://a0.muscache.com/im/pictures/193fbe3f-b...,['https://a0.muscache.com/im/pictures/193fbe3f...,28.09549,-16.74055,False,,,,,,,,True,,"[""dishwasher"", ""free_parking"", ""beachfront"", ""...",,False,,,48306844.0,356800202.0,,
4,ab-48306961,Ventura Caprice: nuestra casa para 2 con vistas,Residential home,Entire home/apt,2021-03-05,2021-10-03,Spain,Las Palmas,La Oliva,,,,EUR,73.25,61.4,6519.0,5465.0,0.509,20.0,4.0,1.0,1.0,2.0,,90.0,within an hour,True,,moderate,,,,,,,71.0,,,2:00 PM - 8:00 PM,11:00 AM,5.0,89.0,86.0,39.0,28.0,True,https://www.airbnb.com/rooms/48306961,https://a0.muscache.com/im/pictures/32b05977-a...,['https://a0.muscache.com/im/pictures/32b05977...,28.69038,-13.89302,False,85.0,10.0,10.0,8.0,10.0,9.0,8.0,False,,"[""wireless_internet"", ""free_parking"", ""kitchen...",,False,,,48306961.0,150678647.0,,


### Identificación y eliminación de datos duplicados.

In [845]:
property_df['Currency Native'].value_counts()

EUR    93236
GBP     1965
USD      238
NOK       47
CNY       26
PLN       12
SEK        9
CHF        8
RUB        6
RON        6
DKK        4
CAD        2
THB        1
HUF        1
BGN        1
TRY        1
BAM        1
CRC        1
Name: Currency Native, dtype: int64

In [846]:
property_df_eur = property_df[property_df['Currency Native'] == 'EUR']
property_df_usd = property_df[property_df['Currency Native'] == 'USD']
property_df_gbp = property_df[property_df['Currency Native'] == 'GBP']

print("Correlación de las columnas de Average Daily Rate")
correlation = property_df_usd['Average Daily Rate (Native)'].corr(property_df_usd['Average Daily Rate (USD)'])
print(correlation)
correlation = property_df_eur['Average Daily Rate (Native)'].corr(property_df_eur['Average Daily Rate (USD)'])
print(correlation)
correlation = property_df_gbp['Average Daily Rate (Native)'].corr(property_df_gbp['Average Daily Rate (USD)'])
print(correlation)

print("Correlación de las columnas de Annual Revenue LTM")
correlation = property_df_usd['Annual Revenue LTM (Native)'].corr(property_df_usd['Annual Revenue LTM (USD)'])
print(correlation)
correlation = property_df_gbp['Annual Revenue LTM (Native)'].corr(property_df_gbp['Annual Revenue LTM (USD)'])
print(correlation)
correlation = property_df_eur['Annual Revenue LTM (Native)'].corr(property_df_eur['Annual Revenue LTM (USD)'])
print(correlation)

print("Correlación de las columnas de Security Deposit")
correlation = property_df_eur['Security Deposit (Native)'].corr(property_df_eur['Security Deposit (USD)'])
print(correlation)
correlation = property_df_gbp['Security Deposit (Native)'].corr(property_df_gbp['Security Deposit (USD)'])
print(correlation)
correlation = property_df_usd['Security Deposit (Native)'].corr(property_df_usd['Security Deposit (USD)'])
print(correlation)

print("Correlación de las columnas de Cleaning Fee")
correlation = property_df_eur['Cleaning Fee (Native)'].corr(property_df_eur['Cleaning Fee (USD)'])
print(correlation)
correlation = property_df_gbp['Cleaning Fee (Native)'].corr(property_df_gbp['Cleaning Fee (USD)'])
print(correlation)
correlation = property_df_usd['Cleaning Fee (Native)'].corr(property_df_usd['Cleaning Fee (USD)'])
print(correlation)

print("Correlación de las columnas de Extra People Fee")
correlation = property_df_eur['Extra People Fee (Native)'].corr(property_df_eur['Extra People Fee (USD)'])
print(correlation)
correlation = property_df_gbp['Extra People Fee (Native)'].corr(property_df_gbp['Extra People Fee (USD)'])
print(correlation)
correlation = property_df_usd['Extra People Fee (Native)'].corr(property_df_usd['Extra People Fee (USD)'])
print(correlation)

Correlación de las columnas de Average Daily Rate
0.9999999999999998
0.9996709317870618
0.9995501832656606
Correlación de las columnas de Annual Revenue LTM
1.0
0.9997416225382506
0.9999660069560933
Correlación de las columnas de Security Deposit
0.9931015318247407
0.9995664172013825
1.0
Correlación de las columnas de Cleaning Fee
0.9256126937493422
0.957980383400459
1.0
Correlación de las columnas de Extra People Fee
0.9981559997856094
0.9985419136634697
1.0


Como vemos las correlaciones de estas columnas son muy altas (casi 1), por lo que podemos intuir que son datos duplicados

In [847]:
property_df = property_df.drop(columns=['Average Daily Rate (Native)', 'Annual Revenue LTM (Native)', 'Extra People Fee (Native)', 'Cleaning Fee (Native)', 'Security Deposit (Native)'])

### Identificación de datos faltantes o incorrectos.

Miramos el porcentaje de na de las columnas

In [848]:
# Calcular el procentaje de NaN en cada columna y ordenar de mayor a menor, enseñar los 10 primeros
property_df.isna().mean().sort_values(ascending=False).head(10)


Zipcode                          1.000000
Airbnb Home Collection           1.000000
Metropolitan Statistical Area    1.000000
HomeAway Property Manager ID     0.947203
License                          0.894769
Neighborhood                     0.876669
HomeAway Location Type           0.861680
Extra People Fee (USD)           0.790879
Security Deposit (USD)           0.785960
Integrated Property Manager      0.712210
dtype: float64

### Limpieza de los datos, tratando los datos faltantes según un criterio razonable. 

Eliminamos las columnas con más del 85% de valores nulos, ya que como vemos las columnas 'Zipcode', 'Airbnb Home Collection' y 'Metropolitan Statistical Area' son completamente enteras de valores nulos. La columna de 'HomeAway Property Manager ID' tienen un 95% de valores nulos, por lo que no tiene sentido trabajar con esta columna. La columna 'License', a parte de tener un 89% de valores nulos, el código de licencia no nos da información sobre el alojamiento, por lo que no tiene sentido mantenerla. Por último, 'Neighborhood' y HomeAway Location Type', que para el estudio de la zona en la que se encuentran las propiedades utilizaremos otras columnas con menos valores nulos, como la ciudad.

In [849]:
# Eliminamos las columnas con más del 85% de NaN
property_df = property_df.dropna(thresh=0.15*len(property_df), axis=1)

Convertimos los valores nulos de la columna de HomeAway Premier Partner a False, suponemos que si no está marcado significa que no es premier partner.

In [850]:
# Convertir los na de la columna de HomeAway Premier Partner a False, suponemos que si no está marcado es que no es un partner
property_df["HomeAway Premier Partner"] = property_df["HomeAway Premier Partner"].fillna(False)

property_df["HomeAway Premier Partner"].value_counts()

False    134641
True        329
Name: HomeAway Premier Partner, dtype: int64

Miramos las columnas que tengan el mismo valor en todas las filas, y las eliminamos ya que no proporciona información relevante.

In [851]:
unique_values = property_df.loc[:, property_df.apply(pd.Series.nunique) == 1].columns
print(unique_values)

Index(['Country'], dtype='object')


Como vemos la única columna con un valor único es la de 'Country', la eliminamos ya que al ser todos los datos de España no aporta ningún valor.

In [852]:
property_df = property_df.drop(columns=unique_values)

Corregimos los valores de la columna `City`, como vemos una propiedad tiene de país 'España' y de ciudad 'Chiang Mai', que es una ciudad de Tailandia, intentamos recuperar el valor por su latitud y longitud. También, recuperamos los valores nulos de esta columna y los que tienen de valor 'Unknown name for way with ID 45326926'.

In [853]:
property_df['City'].value_counts()

Adeje                                    12194
Arona                                    12164
San Bartolomé de Tirajana                10732
Las Palmas de Gran Canaria                8927
La Oliva                                  8580
                                         ...  
La Victoria de Acentejo                     76
Betancuria                                  59
San Andrés y Sauces                         53
Unknown name for way with ID 45326926        6
Chiang Mai                                   1
Name: City, Length: 89, dtype: int64

In [854]:
# Cuantos valores nulos hay en la columna de City
property_df['City'].isna().sum()

4495

El siguiente proceso tarda aproximadamente una hora en completarse. Para optimizar el tiempo en futuras ejecuciones, guardamos los resultados en un archivo CSV. De esta forma, en lugar de volver a ejecutar todo el proceso, simplemente cargamos el CSV con los resultados cuando se vuelva a ejecutar el script.

In [855]:
from geopy.geocoders import Nominatim

# Añade que si no encuentra la ciudad, que ponga "Unknown"
def get_city(lat,lon):
    coordinates = f'{lat} {lon}'
    locator = Nominatim(user_agent='myencoder2', timeout=10)
    location = locator.reverse(coordinates,language='en')
    try:
        if len(location.raw['display_name'].split(",")) > 6:
            return location.raw['display_name'].split(",")[-5].strip()
        else:
            return location.raw['display_name'].split(",")[-4].strip()
    except:
        return "Unknown"


In [856]:
# # Obtener las coordenadas de la fila que tiene de ciudad 'Chiang Mai'
# lat = property_df[property_df['City'] == 'Chiang Mai']['Latitude'].values[0]
# lon = property_df[property_df['City'] == 'Chiang Mai']['Longitude'].values[0]

# # Añadir a la columma de City la ciudad que corresponde a las coordenadas de Latitude y Longitude, a la fila que tiene Chiang Mai
# property_df.loc[property_df['City'] == 'Chiang Mai', 'City'] = get_city(lat,lon)
# print(property_df[property_df['City'] == 'Chiang Mai']['City'])


# # Obtener las coordenadas de todas las fila que tienen de City 'Unknown name for way with ID 45326926' 
# lat = property_df[property_df['City'] == 'Unknown name for way with ID 45326926']['Latitude'].values
# lon = property_df[property_df['City'] == 'Unknown name for way with ID 45326926']['Longitude'].values

# # Añadir a la columma de City la ciudad que corresponde a las coordenadas de Latitude y Longitude, a las filas que tienen 'Unknown name for way with ID 45326926'
# for i in range(len(lat)):
#     property_df.loc[property_df['City'] == 'Unknown name for way with ID 45326926', 'City'] = get_city(lat[i],lon[i])

# property_df['City'].value_counts()

# # Obtener la ciudad para los valores nulos de la columna de City
# lat = property_df[property_df['City'].isna()]['Latitude'].values
# lon = property_df[property_df['City'].isna()]['Longitude'].values

# for i in range(len(lat)):
#     property_df.loc[property_df['City'].isna(), 'City'] = get_city(lat[i],lon[i])
#     print(i)

# guardar la columna de City en un archivo csv
# property_df['City'].to_csv('city.csv', index=False)

Obtenemos los datos procesados del csv: 'city.csv'

In [857]:
# Descargar la columna de City de city.csv
city_df = pd.read_csv('city.csv')
city_df.head()

property_df['City'] = city_df['City']
property_df['City'].isna().sum()

0

In [1]:
property_df['City'].value_counts()

NameError: name 'property_df' is not defined

Eliminamos las url, ya que no es un valor significativo

In [859]:
# Visualizar todas las columnas
pd.set_option('display.max_columns', None)

# Eliminar las columnas que no aportan información
property_df = property_df.drop(columns=['Listing Images', 'Listing Main Image URL', 'Listing URL'])

Eliminamos la columna de State, ya que las podemos obtener de la ciudad. Por otro lado, unimos las columnas 'Latitude' y 'Longitude' en una sola llamada 'Location'.

In [860]:
# Eliminar las columnas State y City, porque ya están en la columna de latitud y longitud
property_df = property_df.drop(columns=['State'])

# Unimos las columnas de latitud y longitud en una sola columna
property_df["Location"] = property_df["Latitude"].astype(str) + ", " + property_df["Longitude"].astype(str)

# Eliminamos las columnas de latitud y longitud
property_df = property_df.drop(columns=['Latitude', 'Longitude'])

### Homogeneización de la información: Normalización de variables continuas; Agrupación de categorías en variables con un elevado número de las mismas.

Trabajamos la columna Property Type debido a que vemos una cantidad de valores únicos elevada.

In [861]:
print("Número de valores únicos en la columna 'Property Type':")
print(property_df["Property Type"].nunique())

property_df["Property Type"].value_counts()

Número de valores únicos en la columna 'Property Type':
196


Apartment                  45027
Rental unit                18378
House                      15906
Villa                      15400
Residential home            5189
                           ...  
propertyTypes.YACHT            1
Lodge                          1
Shared room in bungalow        1
Shared room in tipi            1
Mas                            1
Name: Property Type, Length: 196, dtype: int64

In [862]:
# Convertir todos los valores en la columna "Property Type" a minúsculas
property_df["Property Type"] = property_df["Property Type"].str.lower()

# Mostrar el numero de valores únicos en la columna "Property Type"
print("Número de valores únicos en la columna 'Property Type':")
print(property_df["Property Type"].nunique())

Número de valores únicos en la columna 'Property Type':
169


Como vemos, las filas en donde la columna Property Type es nula, el resto de columnas poseen una cantidad de valores nulos muy alta, por lo que las eliminaremos.

In [863]:
# Mostrar las filas donde la columna "Property Type" es nula
print("\nFilas donde 'Property Type' es nulo:")
property_df[property_df["Property Type"].isnull()].head(20)


Filas donde 'Property Type' es nulo:


Unnamed: 0,Property ID,Listing Title,Property Type,Listing Type,Created Date,Last Scraped Date,City,Currency Native,Average Daily Rate (USD),Annual Revenue LTM (USD),Occupancy Rate LTM,Number of Bookings LTM,Number of Reviews,Bedrooms,Bathrooms,Max Guests,Calendar Last Updated,Response Rate,Airbnb Response Time (Text),Airbnb Superhost,HomeAway Premier Partner,Cancellation Policy,Security Deposit (USD),Cleaning Fee (USD),Extra People Fee (USD),Published Nightly Rate (USD),Published Monthly Rate (USD),Published Weekly Rate (USD),Check-in Time,Checkout Time,Minimum Stay,Count Reservation Days LTM,Count Available Days LTM,Count Blocked Days LTM,Number of Photos,Instantbook Enabled,Exact Location,Overall Rating,Airbnb Communication Rating,Airbnb Accuracy Rating,Airbnb Cleanliness Rating,Airbnb Checkin Rating,Airbnb Location Rating,Airbnb Value Rating,Pets Allowed,Integrated Property Manager,Amenities,Airbnb Property Plus,Airbnb Property ID,Airbnb Host ID,HomeAway Property ID,Location
6590,ab-18572519,,,,2017-05-03,2017-05-09,Las Palmas de Gran Canaria,EUR,,0.0,,0.0,,,,,,,,,False,,113.0,40.0,,102.0,1745.0,436.0,,,,,,,,True,False,,,,,,,,False,,,,18572519.0,,,"28.12769472480299, -15.43029298157123"
12181,ab-20458191,León canteras playa,,Entire home/apt,2017-08-14,2017-08-16,Las Palmas de Gran Canaria,EUR,,0.0,,0.0,0.0,0.0,1.0,3.0,2017-08-15,,,False,False,flexible,,,,72.0,2033.0,508.0,Anytime after 3PM,11AM,4.0,,,,2.0,True,False,,,,,,,,False,,"[""kitchen"", ""tv"", ""wireless_internet"", ""elevat...",,20458191.0,52417266.0,,"28.13957783569516, -15.43568591518705"
13320,ab-20884384,,,,2017-09-09,2017-09-13,Adeje,EUR,,0.0,,0.0,,,,,,,,,False,,,,,54.0,1487.0,372.0,,,,,,,,True,False,,,,,,,,False,,,,20884384.0,,,"28.08033900800283, -16.72150037290409"
13359,ab-20896049,,,,2017-09-09,2017-09-13,Güímar,GBP,,0.0,,0.0,,,,,,,,,False,,,,,30.0,836.0,209.0,,,,,,,,False,False,,,,,,,,False,,,,20896049.0,,,"28.29295393674634, -16.37688546490293"
13383,ab-20907434,,,,2017-09-10,2017-09-12,Santa Cruz de Tenerife,EUR,,0.0,,0.0,,,,,,,,,False,,,,,66.0,1833.0,458.0,,,,,,,,True,False,,,,,,,,False,,,,20907434.0,,,"28.46716110846244, -16.25124935310215"
13408,ab-20913225,,,,2017-09-11,2017-09-13,San Cristóbal de La Laguna,EUR,,0.0,,0.0,,,,,,,,,False,,123.0,49.0,,136.0,2421.0,605.0,,,,,,,,True,False,,,,,,,,False,,,,20913225.0,,,"28.49143710421769, -16.32154539910833"
16226,ab-289896,,,,2011-12-16,2017-03-13,San Bartolomé de Tirajana,EUR,,0.0,,0.0,,,,,,,,,False,,220.0,66.0,,176.0,2749.0,1319.0,,,,,,,,False,False,,,,,,,,False,,,,289896.0,,,"27.74851526705882, -15.60366212131932"
23059,ab-6874592,,,,2017-03-09,2017-03-18,Arona,EUR,,0.0,,0.0,,,,,,,,,False,,,33.0,,114.0,2288.0,577.0,,,,,,,,False,False,,,,,,,,False,,,,6874592.0,,,"28.06168272924284, -16.73100102644663"
30809,ab-16952225,,,,2017-01-24,2017-03-10,Santa Cruz de Tenerife,EUR,,0.0,,0.0,,,,,,,,,False,,109.0,8.0,11.0,26.0,520.0,130.0,,,,,,,,True,False,,,,,,,,False,,,,16952225.0,,,"28.46905665384872, -16.27003262434121"
30816,ab-16953927,,,,2017-01-24,2017-03-08,Tinajo,EUR,,0.0,,0.0,,,,,,,,,False,,,,,22.0,611.0,153.0,,,,,,,,True,False,,,,,,,,False,,,,16953927.0,,,"29.08066638771436, -13.68191845036784"


In [864]:
property_df = property_df.dropna(subset=["Property Type"])

Agruparemos los 196 tipos de propiedades en solo 7 categorías generales utilizando palabras clave asociadas con cada una. Primero, definimos un diccionario que mapea cada categoría a un conjunto de palabras clave relevantes. Luego, creamos una función que recorre cada tipo de propiedad, verifica si alguna de las palabras clave aparece en su descripción (independientemente de las mayúsculas), y asigna la propiedad a la categoría correspondiente.

In [865]:
import pandas as pd

# Aquí está el mapeo de categorías y palabras clave asociadas
categories_keywords = {
    'Apartments & Condos': ['apartment', 'condo', 'loft', 'studio', 'condominium', 'apt', 'floor'],
    'Houses & Villas': ['house', 'villa', 'bungalow', 'home', 'casa'],
    'Cottages & Country Houses': ['cottage', 'chateau', 'townhome', 'country', 'townhouse', 'lodge'],
    'Hotels': ['hotel', 'resort', 'aparthotel', 'boutique hotel'],
    'Bed & Breakfasts / Guesthouses': ['bed and breakfast', 'bed & breakfast', 'guesthouse', 'hostel'],
    'Recreational Properties': ['camper', 'rv', 'tent', 'yurt', 'treehouse', 'cave', 'farm', 'campground', 'cabin'],
    'Luxury & Unique Stays': ['castle', 'lighthouse', 'island', 'estate', 'yacht', 'chalet', 'boat']
}

# Función para asignar una categoría basada en las palabras clave
def categorize_property_type(property_type):
    for category, keywords in categories_keywords.items():
        for keyword in keywords:
            if keyword in property_type.lower():  # Compara en minúsculas
                return category
    #   Si no se encuentra ninguna coincidencia, que la mantenga igual
    return property_type

# Aplicar la función a la columna 'Property Type'
property_df['Property Type'] = property_df['Property Type'].apply(categorize_property_type)

# Mostrar el número de valores únicos en la columna "Property Type"
print("Número de valores únicos en la columna 'Property Type':")
print(property_df["Property Type"].nunique())


Número de valores únicos en la columna 'Property Type':
47


Los valores que aparezcan menos de diez veces y no se hayan agrupado con las palabras clave, los trataremos con un nuevo grupo llamado "other".

In [866]:
# Calcular los conteos de cada tipo de propiedad una sola vez
property_counts = property_df["Property Type"].value_counts()

# Convertir a "Other" si el conteo es menor o igual a 10
property_df["Property Type"] = property_df["Property Type"].apply(
    lambda x: x if property_counts[x] > 10 else "other"
)

# Mostrar el número de valores únicos en la columna "Property Type"
print("Número de valores únicos en la columna 'Property Type':")
print(property_df["Property Type"].value_counts())


Número de valores únicos en la columna 'Property Type':
Apartments & Condos               59647
Houses & Villas                   45940
rental unit                       18378
Cottages & Country Houses          3264
Bed & Breakfasts / Guesthouses     1809
Luxury & Unique Stays              1764
Hotels                             1360
Recreational Properties            1244
guest suite                         678
place                               252
other                               204
private room                         94
entire place                         56
dorm                                 38
campsite                             30
bed &amp; breakfast                  26
hut                                  25
in-law                               21
private room in guest suite          21
barn                                 16
caravan                              14
entire guest suite                   12
building                             11
Name: Property Type, dty

Finalmente, agregamos manualmente las categorías relevantes que no se hayan agregado anteriormente.

In [867]:
# Diccionario de mapeo para el resto de categorías
property_mapping = {
    'rental unit': 'Apartments & Condos',
    'in-law': 'Apartments & Condos',
    'building': 'Apartments & Condos',

    'bed &amp; breakfast': 'Bed & Breakfasts / Guesthouses',
    'guest house/pension': 'Bed & Breakfasts / Guesthouses',
    'dorm': 'Bed & Breakfasts / Guesthouses',
    
    'caravan': 'Recreational Properties',
    'hut': 'Recreational Properties',
    'campsite': 'Recreational Properties',
    'barn': 'Recreational Properties',
    
    'castle': 'Luxury & Unique Stays',
    'guest suite': 'Luxury & Unique Stays',
    'private room in guest suite': 'Luxury & Unique Stays',
    'entire guest suite': 'Luxury & Unique Stays',

    'place': 'other',
    'private room': 'other',
    'entire place': 'other',
}

# Reemplazar los valores en la columna 'Property Type' según el mapeo
property_df['Property Type'] = property_df['Property Type'].replace(property_mapping)

# Mostrar el número de valores únicos en la columna "Property Type"
print("Número de valores únicos en la columna 'Property Type':")
print(property_df["Property Type"].nunique())

# Mostrar los valores únicos restantes
print("Valores únicos en la columna 'Property Type':")
print(property_df["Property Type"].value_counts())


Número de valores únicos en la columna 'Property Type':
8
Valores únicos en la columna 'Property Type':
Apartments & Condos               78057
Houses & Villas                   45940
Cottages & Country Houses          3264
Luxury & Unique Stays              2475
Bed & Breakfasts / Guesthouses     1873
Hotels                             1360
Recreational Properties            1329
other                               606
Name: Property Type, dtype: int64


### Normalización de variables continuas

Normalizamos las variables numéricas utilizando el método MinMaxScaler, exceptuando los identificadores.

In [868]:
# Quiero normalizar las variables numericas continuas en valores de 0 a 1, menos Airbnb Property ID y Airbnb Host ID, y se guarden en nuevas columnas llamadas igual más _normalized
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

# Seleccionar las columnas numéricas
numeric_columns = property_df.select_dtypes(include=['float64', 'int64']).columns

# Eliminar las columnas que no queremos normalizar
numeric_columns = numeric_columns.drop(['Airbnb Property ID', 'Airbnb Host ID'])

# Normalizar las columnas numéricas
property_df[numeric_columns + '_normalized'] = scaler.fit_transform(property_df[numeric_columns])


### Enriquecer los datos con información de otras fuentes, como por ejemplo, GDP y temperatura media del origen del usuario. Pensar en otras fuentes de datos que enriquezcan la información.

Con la columna 'Created Date' y 'Last Scraped Date' podremos obtener los meses de apertura (antiguedad) de los establecimientos. También eliminamos la columna de 'Last Scraped Date' debido a que no indica ninguna información sobre la propiedad. Por otro lado, de la columna 'Create Date' solo nos quedaremos con el año y mes, ya que para comparar con otras propiedades el día no es tan relevante.

In [869]:
# Categorizamos la columna de "Created Date" y "Last Scraped Date" en "Año y mes", para ello quedarse con las 7 primeras letras del string
property_df['Created Date'] = property_df['Created Date'].str[:7]
property_df['Last Scraped Date'] = property_df['Last Scraped Date'].str[:7]

# Creamos una nueva columna que muestre la diferencia de meses entre "Last Scraped Date" y "Created Date"
property_df['Open Months'] = (pd.to_datetime(property_df['Last Scraped Date']) - pd.to_datetime(property_df['Created Date'])).dt.days // 30

# Eliminamos la columna "Last Scraped Date"
property_df = property_df.drop(columns=['Last Scraped Date'])



Obtenemos la antiguedad de cada host como propietario de un airbnb.

In [870]:
property_df['Open Months'] = property_df['Open Months'].fillna(0)
property_df['Open Months'] = property_df['Open Months'].astype(int)
property_df['Seniority'] = property_df.groupby('Airbnb Host ID')['Open Months'].transform('max')



Añadiremos otra columna con el numero de propiedades en airbnb que trabaja cada host.

In [871]:
# cAlcula el número de propiedades que tiene cada host
property_df['Profesionality'] = property_df.groupby('Airbnb Host ID')['Airbnb Property ID'].transform('count')

property_df.head()

Unnamed: 0,Property ID,Listing Title,Property Type,Listing Type,Created Date,City,Currency Native,Average Daily Rate (USD),Annual Revenue LTM (USD),Occupancy Rate LTM,Number of Bookings LTM,Number of Reviews,Bedrooms,Bathrooms,Max Guests,Calendar Last Updated,Response Rate,Airbnb Response Time (Text),Airbnb Superhost,HomeAway Premier Partner,Cancellation Policy,Security Deposit (USD),Cleaning Fee (USD),Extra People Fee (USD),Published Nightly Rate (USD),Published Monthly Rate (USD),Published Weekly Rate (USD),Check-in Time,Checkout Time,Minimum Stay,Count Reservation Days LTM,Count Available Days LTM,Count Blocked Days LTM,Number of Photos,Instantbook Enabled,Exact Location,Overall Rating,Airbnb Communication Rating,Airbnb Accuracy Rating,Airbnb Cleanliness Rating,Airbnb Checkin Rating,Airbnb Location Rating,Airbnb Value Rating,Pets Allowed,Integrated Property Manager,Amenities,Airbnb Property Plus,Airbnb Property ID,Airbnb Host ID,HomeAway Property ID,Location,Average Daily Rate (USD)_normalized,Annual Revenue LTM (USD)_normalized,Occupancy Rate LTM_normalized,Number of Bookings LTM_normalized,Number of Reviews_normalized,Bedrooms_normalized,Bathrooms_normalized,Max Guests_normalized,Response Rate_normalized,Security Deposit (USD)_normalized,Cleaning Fee (USD)_normalized,Extra People Fee (USD)_normalized,Published Nightly Rate (USD)_normalized,Published Monthly Rate (USD)_normalized,Published Weekly Rate (USD)_normalized,Minimum Stay_normalized,Count Reservation Days LTM_normalized,Count Available Days LTM_normalized,Count Blocked Days LTM_normalized,Number of Photos_normalized,Overall Rating_normalized,Airbnb Communication Rating_normalized,Airbnb Accuracy Rating_normalized,Airbnb Cleanliness Rating_normalized,Airbnb Checkin Rating_normalized,Airbnb Location Rating_normalized,Airbnb Value Rating_normalized,Open Months,Seniority,Profesionality
0,ab-48306612,Inviting 3-Bed Villa in Playa Blanca,Houses & Villas,Entire home/apt,2021-02,Yaiza,EUR,320.88,19253.0,0.392,18.0,0.0,3.0,2.0,6.0,,96.0,within an hour,False,False,super_strict_60,,23.0,,311.0,,,After 4:00 PM,10:00 AM,1.0,60.0,93.0,0.0,29.0,True,False,,,,,,,,False,,"[""wireless_internet"", ""kitchen"", ""pool"", ""iron...",False,48306612.0,310835509.0,,"28.87018, -13.84505",0.106917,0.029217,0.371901,0.075,0.0,0.053571,0.066667,0.06,0.96,,0.001311,,0.013427,,,3.3e-05,0.165266,0.274336,0.0,0.056641,,,,,,,,8,22.0,57.0
1,ab-48306645,El loft de Mila,Luxury & Unique Stays,Entire home/apt,2021-07,San Cristóbal de La Laguna,EUR,41.45,1368.0,0.359,7.0,5.0,1.0,1.0,2.0,,100.0,within an hour,False,False,flexible,,,,42.0,,,Flexible,,2.0,33.0,59.0,0.0,13.0,True,False,100.0,10.0,10.0,10.0,10.0,9.0,10.0,False,,"[""wireless_internet"", ""kitchen"", ""iron"", ""hair...",False,48306645.0,389885911.0,,"28.53897, -16.36011",0.01352,0.002076,0.33781,0.029167,0.010352,0.017857,0.033333,0.02,1.0,,,,0.001813,,,6.7e-05,0.089636,0.174041,0.0,0.025391,1.0,1.0,1.0,1.0,1.0,0.875,1.0,3,3.0,1.0
2,ab-48306649,Stunning 2-Bed Villa in Playa Blanca,Houses & Villas,Entire home/apt,2021-03,Yaiza,EUR,197.66,6325.0,0.222,10.0,0.0,2.0,2.0,4.0,,96.0,within an hour,False,False,super_strict_60,,23.0,,191.0,,,After 4:00 PM,10:00 AM,1.0,32.0,112.0,9.0,24.0,True,False,,,,,,,,False,,"[""dishwasher"", ""kitchen"", ""wireless_internet"",...",False,48306649.0,310835509.0,,"28.86526, -13.79952",0.065732,0.009598,0.196281,0.041667,0.0,0.035714,0.066667,0.04,0.96,,0.001311,,0.008246,,,3.3e-05,0.086835,0.330383,0.031579,0.046875,,,,,,,,7,22.0,57.0
3,ab-48306844,Most Beautiful Location South Tenerife Sea Views,Apartments & Condos,Entire home/apt,2021-03,Adeje,EUR,,0.0,,0.0,0.0,1.0,1.0,4.0,,81.0,within a few hours,False,False,strict_14_with_grace_period,,73.0,,85.0,,,2:00 PM - 11:00 PM,12:00 PM,5.0,,,,77.0,False,False,,,,,,,,True,,"[""dishwasher"", ""free_parking"", ""beachfront"", ""...",False,48306844.0,356800202.0,,"28.09549, -16.74055",,0.0,,0.0,0.0,0.017857,0.033333,0.04,0.81,,0.004291,,0.00367,,,0.000167,,,,0.150391,,,,,,,,3,7.0,14.0
4,ab-48306961,Ventura Caprice: nuestra casa para 2 con vistas,Houses & Villas,Entire home/apt,2021-03,La Oliva,EUR,73.25,6519.0,0.509,20.0,4.0,1.0,1.0,2.0,,90.0,within an hour,True,False,moderate,,,,71.0,,,2:00 PM - 8:00 PM,11:00 AM,5.0,89.0,86.0,39.0,28.0,True,False,85.0,10.0,10.0,8.0,10.0,9.0,8.0,False,,"[""wireless_internet"", ""free_parking"", ""kitchen...",False,48306961.0,150678647.0,,"28.69038, -13.89302",0.024149,0.009893,0.492769,0.083333,0.008282,0.017857,0.033333,0.02,0.9,,,,0.003065,,,0.000167,0.246499,0.253687,0.136842,0.054688,0.8125,1.0,1.0,0.75,1.0,0.875,0.75,7,49.0,7.0


Añadiremos una nueva columna llamada City Population, que indicará la población de la ciudad donde se ubica la propiedad. Para obtener esta información, hemos descargado los datos del Instituto Nacional de Estadística.

In [872]:
# Datos de la provincia de Las Palmas

# Leer el archivo CSV con codificación y delimitador correctos
df_lp = pd.read_csv('1mun35.csv', sep=';', encoding='utf-8')

# Limpiamos los datos de la columna 'Municipios'
df_lp['Municipios'] = df_lp['Municipios'].str[6:].str.lstrip()
df_lp['Municipios'] = df_lp['Municipios'].str.split(',').apply(lambda x: x[-1] + ' ' + x[0] if len(x) > 1 else x[0])
df_lp['Municipios'] = df_lp['Municipios'].str.strip()

# Modificar el municipio de 'La aldea de san nicolás' a 'San Nicolás de Tolentino'
df_lp['Municipios'] = df_lp['Municipios'].replace('La Aldea de San Nicolás', 'San Nicolás de Tolentino')

df_lp['Municipios'].unique()

array(['provincial', 'Agaete', 'Agüimes', 'San Nicolás de Tolentino',
       'Antigua', 'Arrecife', 'Artenara', 'Arucas', 'Betancuria',
       'Firgas', 'Gáldar', 'Haría', 'Ingenio', 'Mogán', 'Moya',
       'La Oliva', 'Pájara', 'Las Palmas de Gran Canaria',
       'Puerto del Rosario', 'San Bartolomé', 'San Bartolomé de Tirajana',
       'Santa Brígida', 'Santa Lucía de Tirajana',
       'Santa María de Guía de Gran Canaria', 'Teguise', 'Tejeda',
       'Telde', 'Teror', 'Tías', 'Tinajo', 'Tuineje', 'Valleseco',
       'Valsequillo de Gran Canaria', 'Vega de San Mateo', 'Yaiza'],
      dtype=object)

In [873]:
# Datos de la provincia de Santa Cruz de Tenerife

# Leer el archivo CSV con codificación y delimitador correctos
df_tf = pd.read_csv('1mun38.csv', sep=';', encoding='utf-8')

# Limpiar los datos de la columna 'Municipios'
df_tf['Municipios'] = df_tf['Municipios'].str[6:].str.lstrip()
df_tf['Municipios'] = df_tf['Municipios'].str.split(',').apply(lambda x: x[-1] + ' ' + x[0] if len(x) > 1 else x[0])
df_tf['Municipios'] = df_tf['Municipios'].str.strip()

df_tf.head()

Unnamed: 0,Municipios,Sexo,Edad (ano a ano),Total
0,Adeje,Ambos sexos,Total,42.886
1,Agulo,Ambos sexos,Total,1.148
2,Alajeró,Ambos sexos,Total,2.005
3,Arafo,Ambos sexos,Total,5.509
4,Arico,Ambos sexos,Total,7.688


In [874]:
# Juntamos los dos dataframes
df = pd.concat([df_lp, df_tf], ignore_index=True)

# Renombrar la columna 'Municipios' a 'City'
df = df.rename(columns={'Municipios': 'City'})

In [875]:
# Hacemos un merge con el dataframe original
property_df = property_df.merge(df[['City', 'Total']], how='left', on='City')

# Renombrar la columna 'Total' a 'Population'
property_df = property_df.rename(columns={'Total': 'City Population'})

property_df.head()


Unnamed: 0,Property ID,Listing Title,Property Type,Listing Type,Created Date,City,Currency Native,Average Daily Rate (USD),Annual Revenue LTM (USD),Occupancy Rate LTM,Number of Bookings LTM,Number of Reviews,Bedrooms,Bathrooms,Max Guests,Calendar Last Updated,Response Rate,Airbnb Response Time (Text),Airbnb Superhost,HomeAway Premier Partner,Cancellation Policy,Security Deposit (USD),Cleaning Fee (USD),Extra People Fee (USD),Published Nightly Rate (USD),Published Monthly Rate (USD),Published Weekly Rate (USD),Check-in Time,Checkout Time,Minimum Stay,Count Reservation Days LTM,Count Available Days LTM,Count Blocked Days LTM,Number of Photos,Instantbook Enabled,Exact Location,Overall Rating,Airbnb Communication Rating,Airbnb Accuracy Rating,Airbnb Cleanliness Rating,Airbnb Checkin Rating,Airbnb Location Rating,Airbnb Value Rating,Pets Allowed,Integrated Property Manager,Amenities,Airbnb Property Plus,Airbnb Property ID,Airbnb Host ID,HomeAway Property ID,Location,Average Daily Rate (USD)_normalized,Annual Revenue LTM (USD)_normalized,Occupancy Rate LTM_normalized,Number of Bookings LTM_normalized,Number of Reviews_normalized,Bedrooms_normalized,Bathrooms_normalized,Max Guests_normalized,Response Rate_normalized,Security Deposit (USD)_normalized,Cleaning Fee (USD)_normalized,Extra People Fee (USD)_normalized,Published Nightly Rate (USD)_normalized,Published Monthly Rate (USD)_normalized,Published Weekly Rate (USD)_normalized,Minimum Stay_normalized,Count Reservation Days LTM_normalized,Count Available Days LTM_normalized,Count Blocked Days LTM_normalized,Number of Photos_normalized,Overall Rating_normalized,Airbnb Communication Rating_normalized,Airbnb Accuracy Rating_normalized,Airbnb Cleanliness Rating_normalized,Airbnb Checkin Rating_normalized,Airbnb Location Rating_normalized,Airbnb Value Rating_normalized,Open Months,Seniority,Profesionality,City Population
0,ab-48306612,Inviting 3-Bed Villa in Playa Blanca,Houses & Villas,Entire home/apt,2021-02,Yaiza,EUR,320.88,19253.0,0.392,18.0,0.0,3.0,2.0,6.0,,96.0,within an hour,False,False,super_strict_60,,23.0,,311.0,,,After 4:00 PM,10:00 AM,1.0,60.0,93.0,0.0,29.0,True,False,,,,,,,,False,,"[""wireless_internet"", ""kitchen"", ""pool"", ""iron...",False,48306612.0,310835509.0,,"28.87018, -13.84505",0.106917,0.029217,0.371901,0.075,0.0,0.053571,0.066667,0.06,0.96,,0.001311,,0.013427,,,3.3e-05,0.165266,0.274336,0.0,0.056641,,,,,,,,8,22.0,57.0,14.468
1,ab-48306645,El loft de Mila,Luxury & Unique Stays,Entire home/apt,2021-07,San Cristóbal de La Laguna,EUR,41.45,1368.0,0.359,7.0,5.0,1.0,1.0,2.0,,100.0,within an hour,False,False,flexible,,,,42.0,,,Flexible,,2.0,33.0,59.0,0.0,13.0,True,False,100.0,10.0,10.0,10.0,10.0,9.0,10.0,False,,"[""wireless_internet"", ""kitchen"", ""iron"", ""hair...",False,48306645.0,389885911.0,,"28.53897, -16.36011",0.01352,0.002076,0.33781,0.029167,0.010352,0.017857,0.033333,0.02,1.0,,,,0.001813,,,6.7e-05,0.089636,0.174041,0.0,0.025391,1.0,1.0,1.0,1.0,1.0,0.875,1.0,3,3.0,1.0,152.025
2,ab-48306649,Stunning 2-Bed Villa in Playa Blanca,Houses & Villas,Entire home/apt,2021-03,Yaiza,EUR,197.66,6325.0,0.222,10.0,0.0,2.0,2.0,4.0,,96.0,within an hour,False,False,super_strict_60,,23.0,,191.0,,,After 4:00 PM,10:00 AM,1.0,32.0,112.0,9.0,24.0,True,False,,,,,,,,False,,"[""dishwasher"", ""kitchen"", ""wireless_internet"",...",False,48306649.0,310835509.0,,"28.86526, -13.79952",0.065732,0.009598,0.196281,0.041667,0.0,0.035714,0.066667,0.04,0.96,,0.001311,,0.008246,,,3.3e-05,0.086835,0.330383,0.031579,0.046875,,,,,,,,7,22.0,57.0,14.468
3,ab-48306844,Most Beautiful Location South Tenerife Sea Views,Apartments & Condos,Entire home/apt,2021-03,Adeje,EUR,,0.0,,0.0,0.0,1.0,1.0,4.0,,81.0,within a few hours,False,False,strict_14_with_grace_period,,73.0,,85.0,,,2:00 PM - 11:00 PM,12:00 PM,5.0,,,,77.0,False,False,,,,,,,,True,,"[""dishwasher"", ""free_parking"", ""beachfront"", ""...",False,48306844.0,356800202.0,,"28.09549, -16.74055",,0.0,,0.0,0.0,0.017857,0.033333,0.04,0.81,,0.004291,,0.00367,,,0.000167,,,,0.150391,,,,,,,,3,7.0,14.0,42.886
4,ab-48306961,Ventura Caprice: nuestra casa para 2 con vistas,Houses & Villas,Entire home/apt,2021-03,La Oliva,EUR,73.25,6519.0,0.509,20.0,4.0,1.0,1.0,2.0,,90.0,within an hour,True,False,moderate,,,,71.0,,,2:00 PM - 8:00 PM,11:00 AM,5.0,89.0,86.0,39.0,28.0,True,False,85.0,10.0,10.0,8.0,10.0,9.0,8.0,False,,"[""wireless_internet"", ""free_parking"", ""kitchen...",False,48306961.0,150678647.0,,"28.69038, -13.89302",0.024149,0.009893,0.492769,0.083333,0.008282,0.017857,0.033333,0.02,0.9,,,,0.003065,,,0.000167,0.246499,0.253687,0.136842,0.054688,0.8125,1.0,1.0,0.75,1.0,0.875,0.75,7,49.0,7.0,22.827


### Almacenar los datos en un formato adecuado. No eliminar las fuentes originales y sin normalizar.

Guardamos el csv modificado

In [876]:
property_df.to_csv('Property_cleaned.csv', index=False)