# Tourist Accommodations Data

**Objective**  
This part is dedicated to Data Wrangling and EDA of tourist accommodation dataset.

**Description and characteristics of dataset**  
Tourist rental property listings on major platforms.

The characteristics of this dataset are as follows:
- Estimated volume: 25,000 records every 7 days
- Historical data: Available from 2017-01

**Data description**  
[Data description](https://datamarket.es/#alojamientos-turisticos-dataset)  
[Data overview (number of non missing values, unique, missing values)](https://github.com/ITACADEMYprojectes/ProjecteData/blob/main/Equip_F/Data/data_overview.xlsx)

In [2]:
# import libraries 
import os
import warnings

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

warnings.filterwarnings("ignore")

In [20]:
# font sizes
plt.rcParams['figure.titlesize'] = 18 
plt.rcParams['axes.titlesize'] = 16
plt.rcParams['font.size'] = 14             
plt.rcParams['axes.labelsize'] = 14  

plt.rcParams['xtick.labelsize'] = 12      
plt.rcParams['ytick.labelsize'] = 12       
plt.rcParams['legend.fontsize'] = 12

# display settings
#pd.set_option('display.max_colwidth', 100)
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', 50)
pd.set_option('display.float_format', '{:.2f}'.format)
 
# set palette
sns.set_palette('Paired')

sns.set_style("darkgrid")

<div class="alert alert-info"> 
<b>Comment</b> 

We can use any other color palette if you like more other colors =)
    
</div>

In [4]:
# Check versions of libraries
def lib_versions(libs):
    for lib in libs: print('Version', lib.__name__, '-', lib.__version__)

lib_versions([np, pd, mpl, sns])

Version numpy - 1.25.2
Version pandas - 2.0.3
Version matplotlib - 3.10.0
Version seaborn - 0.13.2


## Data Loading

In [5]:
PATH = ['../data']

In [6]:
# define function to read file
def read_file(sprint=1):
    
    """Input: number of sprint.
    Function reads .csv data of desired sprint. 
    Output: Dataframe or error.
    """
    
    # reading file
    filename = f'data_sprint_{sprint}.csv'
    try:
        df = pd.read_csv(os.path.join(*PATH, filename), 
                         parse_dates=['insert_date', 'first_review_date', 'last_review_date'], 
                         date_format='%d/%m/%Y')
        display(df.head())
        return df
    except Exception as error:
        print("An exception occurred:", error)
        return error     

In [7]:
df = read_file()

Unnamed: 0,apartment_id,name,description,host_id,neighbourhood_name,neighbourhood_district,room_type,accommodates,bathrooms,bedrooms,beds,amenities_list,price,minimum_nights,maximum_nights,has_availability,availability_30,availability_60,availability_90,availability_365,number_of_reviews,first_review_date,last_review_date,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,is_instant_bookable,reviews_per_month,country,city,insert_date
0,11964,A ROOM WITH A VIEW,Private bedroom in our attic apartment. Right ...,45553,Centro,,Private room,2,2.0,1.0,1.0,"TV,Internet,Wifi,Air conditioning,Elevator,Buz...",400.0,3,365,VERDADERO,7,20,40,130,78,2010-01-02,2017-09-05,970.0,100.0,100.0,100.0,100.0,100.0,100.0,FALSO,75.0,spain,malaga,2018-07-31
1,21853,Bright and airy room,We have a quiet and sunny room with a good vie...,83531,C�rmenes,Latina,Private room,1,1.0,1.0,1.0,"TV,Internet,Wifi,Air conditioning,Kitchen,Free...",170.0,4,40,VERDADERO,0,0,0,162,33,2014-10-10,2018-07-15,920.0,90.0,90.0,100.0,100.0,80.0,90.0,FALSO,52.0,spain,madrid,2020-01-10
2,32347,Explore Cultural Sights from a Family-Friendly...,Open French doors and step onto a plant-filled...,139939,San Vicente,Casco Antiguo,Entire home/apt,4,1.0,2.0,2.0,"TV,Internet,Wifi,Air conditioning,Wheelchair a...",990.0,2,120,VERDADERO,26,31,31,270,148,2011-01-05,2019-07-22,980.0,100.0,100.0,100.0,100.0,100.0,100.0,VERDADERO,142.0,spain,sevilla,2019-07-29
3,35379,Double 02 CasanovaRooms Barcelona,Room at a my apartment. Kitchen and 2 bathroom...,152232,l'Antiga Esquerra de l'Eixample,Eixample,Private room,2,2.0,1.0,1.0,"TV,Internet,Wifi,Kitchen,Breakfast,Elevator,Bu...",400.0,2,730,VERDADERO,9,23,49,300,292,2012-03-13,2020-01-04,940.0,100.0,90.0,100.0,100.0,100.0,90.0,VERDADERO,306.0,spain,barcelona,2020-01-10
4,35801,Can Torras Farmhouse Studio Suite,Lay in bed & watch sunlight change the mood of...,153805,Quart,,Private room,5,1.0,2.0,5.0,"Wifi,Pool,Free parking on premises,Breakfast,P...",900.0,1,180,VERDADERO,0,19,49,312,36,2011-07-08,2018-08-08,970.0,100.0,100.0,100.0,100.0,100.0,100.0,FALSO,39.0,spain,girona,2019-02-19


## Data Overview

In [8]:
# define function to display information about the data
def data_info(df):
   
    """Input: dataframe.
    Function displays basic information, 
    checks for duplicates and NaN. 
    """
    
    # get information about the data
    print(df.info())
    
    # number of unique values for each column
    print()
    print('\033[1mNumber of unique values')
    display(df.nunique())

    # check for NaN
    if df.isna().sum().sum() > 0:
        print()
        print('\033[1mNumber of missing values')
        display(
            pd.DataFrame({'number': df.isna().sum(), 
                          'percentage': df.isna().mean().mul(100)})
            .query('number > 0')
            .sort_values(by='number', ascending=False)
        )
    else:
        print('There are no NaNs in the data\n')
    
    # check for duplicates (without id columns)
    print()
    if df.iloc[:,1:].duplicated().sum() > 0:
        print('Data contain full duplicates\n')
    else:
        print('There are no full duplicates in the data\n')

In [9]:
data_info(df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7001 entries, 0 to 7000
Data columns (total 35 columns):
 #   Column                       Non-Null Count  Dtype         
---  ------                       --------------  -----         
 0   apartment_id                 7001 non-null   int64         
 1   name                         6998 non-null   object        
 2   description                  6972 non-null   object        
 3   host_id                      7001 non-null   int64         
 4   neighbourhood_name           7001 non-null   object        
 5   neighbourhood_district       4241 non-null   object        
 6   room_type                    7001 non-null   object        
 7   accommodates                 7001 non-null   int64         
 8   bathrooms                    6969 non-null   float64       
 9   bedrooms                     6972 non-null   float64       
 10  beds                         6998 non-null   float64       
 11  amenities_list               6984 non-null 

apartment_id                   6733
name                           6755
description                    6790
host_id                        5239
neighbourhood_name              478
neighbourhood_district           61
room_type                         4
accommodates                     19
bathrooms                        13
bedrooms                         14
beds                             22
amenities_list                 6792
price                           383
minimum_nights                   38
maximum_nights                  129
has_availability                  1
availability_30                  31
availability_60                  61
availability_90                  91
availability_365                366
number_of_reviews               319
first_review_date              1793
last_review_date               1459
review_scores_rating             45
review_scores_accuracy            9
review_scores_cleanliness         8
review_scores_checkin             8
review_scores_communication 


[1mNumber of missing values


Unnamed: 0,number,percentage
neighbourhood_district,2760,39.42
review_scores_value,1342,19.17
review_scores_location,1342,19.17
review_scores_checkin,1341,19.15
review_scores_accuracy,1336,19.08
review_scores_communication,1332,19.03
review_scores_cleanliness,1330,19.0
review_scores_rating,1327,18.95
last_review_date,1255,17.93
first_review_date,1254,17.91



There are no full duplicates in the data



In [10]:
df.country.unique(), df.city.unique()

(array(['spain'], dtype=object),
 array(['malaga', 'madrid', 'sevilla', 'barcelona', 'girona', 'valencia',
        'mallorca', 'menorca'], dtype=object))


---

The dataset contains 7,001 rows and 35 columns.  
The unique values in country and city confirm that the dataset focuses on Spain, covering six cities—'Malaga', 'Madrid', 'Sevilla', 'Barcelona', 'Girona', and 'Valencia'—as well as two islands: 'Mallorca' and 'Menorca'.

*Missing Values*
- 130 rows lack a `price` value.
- The `neighbourhood_district` column has 2,760 missing values (~39% of the data).
- Review-related columns have between 1,250 and 1,327 missing values.
- 550 rows are missing `has_availability` information.

*Duplicates*
- 268 duplicate values in `apartment_id`.
- 243 duplicate values in `name`.
- 182 duplicate values in `description`.
- 192 duplicate values in `amenities_list`.

*Text format issues*  
The presence of � (replacement characters) in string columns is likely due to incorrect encoding, which prevents proper display of special characters (e.g., Spanish accents: á, é, í, ó, ú, ñ).

**Conclusion**  
Before performing an exhaustive EDA, several data quality issues need to be addressed:  
*Missing Values*
- The missing `price` values should be investigated, as they directly impact the marketing business task.
- NaN values in `has_availability` can be treated as 'False', and depending on the business needs, these rows may be dropped.
- Missing values in other columns are not critical for Sprint_1 and do not affect KPIs.

*Duplicates*  
Duplicate apartment_id, name, and description should be analyzed. It's unusual for different apartments to have the same descriptions, which may indicate data inconsistencies or incorrect entries.

## Duplicates

First, let's handle duplicates—removing them might also eliminate listings with missing prices.

In [11]:
print('Number of objects with 1 or more duplicates')
(df['apartment_id'].value_counts()>1).sum()

Number of objects with 1 or more duplicates


261

In [78]:
df['apartment_id'].value_counts().head(20)

apartment_id
10005342    3
10713417    3
14582385    3
24038577    3
15402794    3
14326808    3
13966456    3
6986979     2
22546373    2
12167208    2
15628078    2
12145784    2
12144391    2
12140710    2
20149628    2
1624014     2
15579708    2
17974912    2
18082152    2
12036000    2
Name: count, dtype: int64

In [35]:
df[df.apartment_id==10713417]

Unnamed: 0,apartment_id,name,description,host_id,neighbourhood_name,neighbourhood_district,room_type,accommodates,bathrooms,bedrooms,beds,amenities_list,price,minimum_nights,maximum_nights,has_availability,availability_30,availability_60,availability_90,availability_365,number_of_reviews,first_review_date,last_review_date,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,is_instant_bookable,reviews_per_month,country,city,insert_date
1989,10713417,Room in the heart of barcelona,Room in the center of barcelona just 2 minutes...,2456260,"Sant Pere, Santa Caterina i la Ribera",Ciutat Vella,Private room,1,1.0,1.0,1.0,"Wifi,Kitchen,Smoking allowed,Pets allowed,Door...",370.0,2,1125,VERDADERO,0,0,7,282,68,2016-03-02,2018-09-30,960.0,100.0,90.0,100.0,100.0,100.0,100.0,VERDADERO,214.0,spain,barcelona,2018-10-10
1990,10713417,Room in the heart of barcelona,Room in the center of barcelona just 2 minutes...,2456260,"Sant Pere, Santa Caterina i la Ribera",Ciutat Vella,Private room,1,1.0,1.0,1.0,"Wifi,Kitchen,Smoking allowed,Pets allowed,Door...",370.0,2,1125,VERDADERO,26,56,86,355,72,2016-03-02,2019-01-20,960.0,100.0,90.0,100.0,100.0,100.0,100.0,FALSO,196.0,spain,barcelona,2019-03-08
1991,10713417,Room in the heart of barcelona,Room in the center of barcelona just 2 minutes...,2456260,"Sant Pere, Santa Caterina i la Ribera",Ciutat Vella,Private room,1,1.0,1.0,1.0,"Wireless Internet,Kitchen,Smoking allowed,Pets...",330.0,1,1125,VERDADERO,0,0,30,305,27,2016-03-02,2017-09-08,950.0,100.0,90.0,100.0,100.0,100.0,100.0,FALSO,138.0,spain,barcelona,2017-10-07


Some objects with the same `apartment_id` have 1 or even 2 duplicates recorded on different `insert_date` values. Let's check whether they also share the same name and location, as it's possible for different properties to have the same name but be in different places.

In [70]:
res = (df[df['apartment_id'].duplicated(keep=False)]
       .groupby('apartment_id')[['name', 'description', 'room_type', 'host_id', 'city', 'insert_date']]
       .nunique())

res

Unnamed: 0_level_0,name,description,room_type,host_id,city,insert_date
apartment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
144471,1,1,1,1,1,2
157327,1,1,1,1,1,2
343864,1,1,1,1,1,2
503253,1,1,1,1,1,2
886569,1,2,1,1,1,2
...,...,...,...,...,...,...
26546841,1,0,1,1,1,2
26889462,1,1,1,1,1,2
26987062,1,0,1,1,1,2
27141824,1,1,1,1,1,2


In [38]:
res[res.insert_date<2]

Unnamed: 0_level_0,name,description,host_id,city,insert_date
apartment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1


`apartment_id` duplicates always have different `insert_date` → We can keep the most recent entry.

In [53]:
res[res.city>1]

Unnamed: 0_level_0,name,description,host_id,city,insert_date
apartment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1


`apartment_id` duplicates always belong to the same city → This supports the idea that they are actual duplicates rather than different properties.

In [71]:
res[res.room_type>1]

Unnamed: 0_level_0,name,description,room_type,host_id,city,insert_date
apartment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
14288527,1,2,2,1,1,2
18682379,2,2,2,1,1,2
24342447,1,1,2,1,1,2


In [75]:
df[df.apartment_id.isin(res[res.room_type>1].index)]

Unnamed: 0,apartment_id,name,description,host_id,neighbourhood_name,neighbourhood_district,room_type,accommodates,bathrooms,bedrooms,beds,amenities_list,price,minimum_nights,maximum_nights,has_availability,availability_30,availability_60,availability_90,availability_365,number_of_reviews,first_review_date,last_review_date,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,is_instant_bookable,reviews_per_month,country,city,insert_date
2861,14288527,"Great penthouse with terrace, overlooking the sea",-The accommodation meets the Sanitary Cleaning...,34433429,Centro,,Entire home/apt,2,1.0,1.0,1.0,"Free street parking, Garden or backyard, Iron,...",670.0,2,31,VERDADERO,19,49,79,160,237,2016-08-10,2020-10-12,970.0,100.0,100.0,100.0,100.0,100.0,100.0,VERDADERO,443.0,spain,malaga,2020-12-30
2862,14288527,"Great penthouse with terrace, overlooking the sea","Great penthouse with terrace, overlooking the ...",34433429,Este,,Private room,2,1.0,1.0,1.0,"TV,Internet,Wireless Internet,Pool,Free parkin...",440.0,2,15,VERDADERO,14,18,40,41,102,2016-08-10,2017-11-21,970.0,100.0,100.0,100.0,100.0,90.0,100.0,VERDADERO,647.0,spain,malaga,2017-11-25
4238,18682379,"Lush, sunny & quiet 2 bedroom apartment",Search no more! Welcome to your spacious 2 bed...,27775148,la Dreta de l'Eixample,Eixample,Entire home/apt,4,1.0,2.0,2.0,"Wifi,Kitchen,Breakfast,Elevator,Family/kid fri...",2750.0,4,60,VERDADERO,0,8,8,14,8,2017-07-18,2019-01-01,1000.0,100.0,100.0,100.0,100.0,100.0,90.0,FALSO,44.0,spain,barcelona,2019-01-14
4239,18682379,"Lush, sunny & quiet 2ble room in Barcelona center",Search no more! Welcome to your spacious doubl...,27775148,la Dreta de l'Eixample,Eixample,Private room,2,1.0,1.0,1.0,"Wireless Internet,Kitchen,Breakfast,Elevator,F...",650.0,2,20,VERDADERO,0,0,13,288,4,2017-07-18,2017-08-18,1000.0,100.0,100.0,100.0,100.0,100.0,100.0,FALSO,146.0,spain,barcelona,2017-10-07
6262,24342447,Suite Apt 4 pax en Sants,Apartamento con capacidad para 4 personas con ...,3346610,Sants,Sants-Montju�c,Hotel room,4,,1.0,4.0,"TV,Internet,Wifi,Air conditioning,Kitchen,Pets...",2000.0,1,1125,VERDADERO,0,16,46,160,8,2018-08-04,2019-10-23,900.0,90.0,90.0,100.0,100.0,100.0,90.0,VERDADERO,34.0,spain,barcelona,2020-07-17
6263,24342447,Suite Apt 4 pax en Sants,Apartamento con capacidad para 4 personas con ...,3346610,Sants,Sants-Montju�c,Entire home/apt,4,1.0,1.0,2.0,"Internet,Wifi,Air conditioning,Free parking on...",1650.0,1,1125,VERDADERO,28,58,88,215,0,NaT,NaT,,,,,,,,VERDADERO,,spain,barcelona,2018-07-10


In [54]:
res[res.host_id>1]

Unnamed: 0_level_0,name,description,host_id,city,insert_date
apartment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
19196593,2,2,2,1,2


In [55]:
df.query('apartment_id==19196593')

Unnamed: 0,apartment_id,name,description,host_id,neighbourhood_name,neighbourhood_district,room_type,accommodates,bathrooms,bedrooms,beds,amenities_list,price,minimum_nights,maximum_nights,has_availability,availability_30,availability_60,availability_90,availability_365,number_of_reviews,first_review_date,last_review_date,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,is_instant_bookable,reviews_per_month,country,city,insert_date
4492,19196593,"Centro 4 con garaje gratis, 1 dormitorio",Apartamento de 1 dormitorio con garaje incluid...,18077486,Cruz De Humilladero,,Entire home/apt,4,1.0,1.0,3.0,"Free street parking, Iron, Washer, Hot water, ...",500.0,2,1125,VERDADERO,0,0,0,213,42,2017-06-27,2020-09-05,930.0,100.0,90.0,100.0,100.0,90.0,90.0,VERDADERO,98.0,spain,malaga,2020-12-30
4493,19196593,"CENTRO 4 CON GARAJE GRATIS, APARTAM. VFT/MA/13723","Supermercado Mercadona a 1 minuto, tiendas y b...",134286611,Bailen-Miraflores,,Entire home/apt,4,1.0,1.0,3.0,"TV,Wifi,Air conditioning,Kitchen,Free parking ...",500.0,2,1125,VERDADERO,0,0,0,181,29,2017-06-27,2018-09-30,940.0,100.0,90.0,100.0,100.0,90.0,90.0,VERDADERO,16.0,spain,malaga,2018-12-22


In [58]:
df.query('apartment_id==19196593')['name']

4492             Centro 4 con garaje gratis, 1 dormitorio
4493    CENTRO 4 CON GARAJE GRATIS, APARTAM. VFT/MA/13723
Name: name, dtype: object

<div class="alert alert-info"> 
<b>Comment</b> 

Here I am not sure, different districts (but they are touching), but very similar info.
    
</div>

In [24]:
res[res.name>1]

Unnamed: 0_level_0,name,description,host_id,city
apartment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1624014,2,2,1,1
3163230,2,2,1,1
3634997,2,2,1,1
3770072,2,2,1,1
6584564,2,2,1,1
7029540,2,1,1,1
9619438,2,2,1,1
10806296,2,1,1,1
12514998,2,2,1,1
13166575,2,1,1,1


In [67]:
ap_id_dupl_name = res[(res.name>1) & (res.description>1) & (res.host_id<2)].index

In [69]:
df[df.apartment_id.isin(ap_id_dupl_name)]

Unnamed: 0,apartment_id,name,description,host_id,neighbourhood_name,neighbourhood_district,room_type,accommodates,bathrooms,bedrooms,beds,amenities_list,price,minimum_nights,maximum_nights,has_availability,availability_30,availability_60,availability_90,availability_365,number_of_reviews,first_review_date,last_review_date,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,is_instant_bookable,reviews_per_month,country,city,insert_date
470,1624014,Beautiful apartment to Enjoy the real Barcelona,Recently refurbished modern and spacious 3 bed...,1260560,la Nova Esquerra de l'Eixample,Eixample,Entire home/apt,6,2.00,3.00,5.00,"TV,Cable TV,Internet,Wifi,Air conditioning,Kit...",1600.00,3,90,VERDADERO,16,46,76,334,141,2014-04-14,2020-01-06,910.00,90.00,90.00,100.00,100.00,90.00,90.00,VERDADERO,198.00,spain,barcelona,2020-02-16
471,1624014,Brand-New Low Priced Large Apt 6p,Recently refurbished modern and spacious 3 bed...,1260560,la Nova Esquerra de l'Eixample,Eixample,Entire home/apt,6,2.00,3.00,5.00,"TV,Cable TV,Internet,Wireless Internet,Air con...",1870.00,3,1125,,0,17,47,322,51,2014-04-14,2016-12-04,890.00,90.00,90.00,90.00,90.00,90.00,90.00,VERDADERO,153.00,spain,barcelona,2017-01-04
758,3163230,Apartment in Sagrada Familia for 4 - WIFI,Bel appartement � 100 m de La Sagrada Familia....,8811801,la Sagrada Fam�lia,Eixample,Entire home/apt,4,1.00,2.00,3.00,"Dedicated workspace, Oven, Cooking basics, Dis...",600.00,15,330,VERDADERO,1,31,61,336,39,2014-06-16,2019-10-10,860.00,90.00,90.00,90.00,100.00,90.00,90.00,FALSO,49.00,spain,barcelona,2021-01-12
759,3163230,PISO PARA 5 EN SAGRADA FAMILIA,Piso muy lindo y comodo que puede acoger hasta...,8811801,la Sagrada Fam�lia,Eixample,Entire home/apt,5,1.00,2.00,3.00,"TV,Wireless Internet,Kitchen,Elevator in build...",900.00,7,365,,0,0,0,196,38,2014-06-16,2017-05-18,860.00,90.00,90.00,90.00,100.00,90.00,90.00,VERDADERO,105.00,spain,barcelona,2017-06-05
855,3634997,"Elegant Apt in Eixample, near Paseo de Gracia",Elegant and Original 160 sqm apartment - 12 pe...,2439400,la Dreta de l'Eixample,Eixample,Entire home/apt,12,2.00,6.00,9.00,"TV,Internet,Wifi,Air conditioning,Kitchen,Paid...",3000.00,2,300,VERDADERO,11,19,19,147,34,2014-08-05,2018-03-02,830.00,80.00,90.00,100.00,90.00,90.00,80.00,VERDADERO,74.00,spain,barcelona,2018-05-14
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5762,22985103,Gran habitaci�n para 4 personas en calle Gran Via,Gran habitaci�n exterior sobre calle gran v�a ...,8124160,Cortes,Centro,Private room,4,2.00,1.00,2.00,"TV,Cable TV,Internet,Wifi,Air conditioning,Kit...",400.00,1,1125,VERDADERO,7,7,7,7,26,2018-08-05,2019-05-17,940.00,100.00,100.00,100.00,100.00,100.00,100.00,FALSO,229.00,spain,madrid,2019-07-10
6047,23723324,Double Bedroom front of Ramblas,Best location apartment in the Heart of Barcel...,61874060,el Raval,Ciutat Vella,Private room,2,1.00,1.00,1.00,"Refrigerator, Hot water, Paid parking off prem...",450.00,3,100,VERDADERO,30,60,90,365,42,2018-04-10,2019-10-07,850.00,90.00,70.00,100.00,100.00,100.00,80.00,VERDADERO,128.00,spain,barcelona,2020-12-16
6048,23723324,Double Bedroom next to Plaza Real,Best location apartment in the Heart of Barcel...,61874060,el Raval,Ciutat Vella,Private room,2,1.00,1.00,1.00,"Wifi,Kitchen,Paid parking off premises,Smoking...",400.00,3,100,VERDADERO,6,31,61,331,1,2018-04-10,2018-04-10,1000.00,100.00,80.00,100.00,100.00,100.00,100.00,VERDADERO,10.00,spain,barcelona,2018-04-12
6999,27245117,MATILLA - Fant�stico apartamento con garaje,Apartamento espacioso a 7 minutos del centro d...,137859766,Cadaqu�s,,Entire home/apt,6,2.00,3.00,5.00,"Kitchen,Free parking on premises,Heating,Washe...",1100.00,2,31,VERDADERO,2,31,61,151,0,NaT,NaT,,,,,,,,VERDADERO,,spain,girona,2018-07-31


The names vary, but other characteristics remain similar. Sometimes, hosts adjust details, such as reducing the number of guests or changing the listing from a private room to an entire apartment.

Given these factors, we can decide to **retain only the most recent entry for each `apartment_id`** to ensure we have the latest and most relevant data.

In [83]:
# data with unique apartment_id
df_clean = (df.sort_values(by=['insert_date', 'apartment_id'], ascending=[False, True])
              .drop_duplicates(subset=['apartment_id'], keep='first')
           )

In [84]:
data_info(df_clean)

<class 'pandas.core.frame.DataFrame'>
Index: 6733 entries, 345 to 3358
Data columns (total 35 columns):
 #   Column                       Non-Null Count  Dtype         
---  ------                       --------------  -----         
 0   apartment_id                 6733 non-null   int64         
 1   name                         6730 non-null   object        
 2   description                  6706 non-null   object        
 3   host_id                      6733 non-null   int64         
 4   neighbourhood_name           6733 non-null   object        
 5   neighbourhood_district       4075 non-null   object        
 6   room_type                    6733 non-null   object        
 7   accommodates                 6733 non-null   int64         
 8   bathrooms                    6702 non-null   float64       
 9   bedrooms                     6704 non-null   float64       
 10  beds                         6730 non-null   float64       
 11  amenities_list               6717 non-null   o

apartment_id                   6733
name                           6712
description                    6687
host_id                        5238
neighbourhood_name              478
neighbourhood_district           61
room_type                         4
accommodates                     19
bathrooms                        13
bedrooms                         14
beds                             22
amenities_list                 6635
price                           380
minimum_nights                   38
maximum_nights                  129
has_availability                  1
availability_30                  31
availability_60                  61
availability_90                  91
availability_365                366
number_of_reviews               318
first_review_date              1793
last_review_date               1450
review_scores_rating             45
review_scores_accuracy            9
review_scores_cleanliness         8
review_scores_checkin             8
review_scores_communication 


[1mNumber of missing values


Unnamed: 0,number,percentage
neighbourhood_district,2658,39.48
review_scores_value,1289,19.14
review_scores_location,1289,19.14
review_scores_checkin,1288,19.13
review_scores_accuracy,1283,19.06
review_scores_communication,1279,19.0
review_scores_cleanliness,1277,18.97
review_scores_rating,1274,18.92
last_review_date,1204,17.88
first_review_date,1203,17.87



There are no full duplicates in the data

