# Overview and analysis of AirBnB listings in Barcelona
In this project, we explore Airbnb listings in Barcelona, a city renowned for its tourism. With millions of annual visitors, Barcelona's popularity as a tourist destination significantly influences its hospitality sector. Airbnb, a major player in this landscape, shapes the city's lodging experience. This analysis focuses on key statistics and trends, aiming to reveal insights into the distribution, pricing dynamics, and trends of Airbnb listings across Barcelona's neighborhoods. Through data exploration, we aim to uncover patterns that provide a deeper understanding of the evolving Airbnb market in this vibrant Mediterranean city.

![Barcelona](https://media.traveler.es/photos/63838947050e0f92cd80c982/16:9/w_2560%2Cc_limit/GettyImages-1392907424.jpg)

## Structure of the notebook
1. Loading and exploring dataset
2. Cleaning dataset
3. Business analysis and visualization
4. Hypothesis testing
5. Data preprocessing
6. Prediction of listing price based on other variables

## 1. Loading and exploring dataset

In [1]:
# Importing libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as st

In [2]:
# Loading dataset
data = pd.read_csv('data/cleaned/listings_cleaned.csv')

# Since we have 75 columns, we want pandas to display all of them
pd.set_option('display.max_columns', None)

# Exploring first 5 rows
data.head()

Unnamed: 0.1,Unnamed: 0,host_id,host_since,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_total_listings_count,host_has_profile_pic,host_identity_verified,neighbourhood_cleansed,neighbourhood_group_cleansed,room_type,accommodates,bathrooms_text,bedrooms,beds,price,minimum_nights,maximum_nights,has_availability,availability_30,first_review,review_scores_rating,instant_bookable,calculated_host_listings_count,reviews_per_month
0,0,71615,2010-01-19,within an hour,0.97,0.9,f,48.0,t,t,la Sagrada Familia,Eixample,Entire home/apt,8,2.0,3.0,6.0,202.0,1,1125,t,4,2013-05-27,4.3,t,30,0.3
1,1,90417,2010-03-09,within an hour,1.0,0.94,t,9.0,t,t,el Besos i el Maresme,Sant Marti,Entire home/apt,5,2.0,3.0,4.0,255.0,3,300,t,16,2011-03-15,4.77,f,2,0.48
2,2,567180,2011-05-08,within a few hours,0.88,0.98,f,19.0,t,f,la Sagrada Familia,Eixample,Entire home/apt,8,2.0,3.0,6.0,331.0,2,30,f,0,2011-08-09,4.55,f,19,0.33
3,3,135703,2010-05-31,within an hour,1.0,1.0,f,15.0,t,t,el Camp d'en Grassot i Gracia Nova,Gracia,Entire home/apt,6,1.5,2.0,3.0,171.0,21,31,t,6,2011-07-17,4.46,t,3,0.64
4,4,567180,2011-05-08,within a few hours,0.88,0.98,f,19.0,t,f,la Sagrada Familia,Eixample,Entire home/apt,8,2.5,3.0,5.0,333.0,2,28,f,0,2011-09-13,4.56,f,19,0.34


In [3]:
# Checking the shape of the dataframe

data.shape

(11731, 27)

In [4]:
# Checking dataframe information

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11731 entries, 0 to 11730
Data columns (total 27 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Unnamed: 0                      11731 non-null  int64  
 1   host_id                         11731 non-null  int64  
 2   host_since                      11731 non-null  object 
 3   host_response_time              11731 non-null  object 
 4   host_response_rate              11731 non-null  float64
 5   host_acceptance_rate            11731 non-null  float64
 6   host_is_superhost               11731 non-null  object 
 7   host_total_listings_count       11731 non-null  float64
 8   host_has_profile_pic            11731 non-null  object 
 9   host_identity_verified          11731 non-null  object 
 10  neighbourhood_cleansed          11731 non-null  object 
 11  neighbourhood_group_cleansed    11731 non-null  object 
 12  room_type                       

## 3. Business analysis and visualization

#### What is the neighbourhood with the most listings?

In [None]:
# Neighbourhood with most listings

most_listings = data.groupby('neighbourhood_group_cleansed')['']

#### What year did most hosts create accounts?

#### How long, on average, takes hosts to receive its first review?

#### What is the average price by neighbourhood?

#### What are the neighbourhoods with the highest and lowest prices overall?

#### What is the average received by listings in each neighbourhood?

#### What is the worst rated neighbourhood? What is its price? What about the size and rooms of its listings?

#### Is the overall rating correlated with the prices of the listings?

#### Does the location/neighbourhood of the listings affect the price?

#### What variables seem to be correlated?

## 4. Hypothesis testing

#### Is the average price in Gracia neighborhood significantly different from Sants-Montjuic?

#### Is the average rating in Sant Andreu higher than 4.50?

#### Is the average number of listings per host less than 30?

#### Are xyz categorical columns correlated?

#### Is the average number of bedrooms for listings in Ciutat Vella significantly different from 3?

## 5. Data preprocessing

In [508]:
# Remember to remove nulls from relevant columns

## 6. Predicting the price of listings