<a href="https://colab.research.google.com/github/fredericmenezes/EDA-Airbnb-Rio-de-Janeiro-Project/blob/main/eda_airbnb_rio_de_janeiro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is an insights project in which, through exploratory data analysis, the aim is to generate insights and recommend solutions to the business problem at hand.

To this end, the dataset used pertains to various types of accommodations offered through Airbnb in the city of Rio de Janeiro.

According to their own website, Airbnb "began in 2008 when two designers with extra space hosted three travelers looking for a place to stay. Now, millions of hosts and travelers choose to create a free Airbnb account so they can list their space and book unique accommodations, anywhere in the world. Additionally, Airbnb Experience hosts share their passions and interests with travelers and locals alike. Airbnb makes sharing easy, enjoyable, and safe. We verify personal profiles and listings, maintain a smart messaging system so hosts and guests can communicate with confidence, and manage a trusted platform for collecting and transferring payments."

In other words, Airbnb is an online service that makes it easier for people to book and rent accommodations in a safer and more direct manner between hosts and guests. It can encompass different types of accommodations, where often the prices are lower than those of hotels and properties booked through other means.

Source: [Airbnb](https://www.airbnb.com/help/article/2503)

In light of this, pertinent inquiries were formulated to guide the business problem. Therefore, by applying exploratory data analysis, the aim is to address the following questions:

1. What are the most offered types of accommodations and which ones have the highest revenue?

2. What are the Top 10 most expensive and cheapest accommodations?

3. Which neighborhoods have the highest number of accommodations and which ones have the highest total revenue?

4. What is the most popular price range, and which one generates the highest revenue?

5. Do Superhosts, Verified Hosts, and Hosts with the highest number of listings generate higher total revenue?

6. How does price behavior vary with the number of bathrooms, bedrooms, beds, and different types of rooms and beds?

7. Do accommodations with a higher number of reviews, guest capacity, and included guests have advantages in total revenue?

8. Do accommodations with instant booking feature yield higher gains? What is the most commonly offered length of stay?

9. What is the proportion of accommodations that offer extra guest services, and what is the average price?

10. What factors have the most influence on price?

By addressing these questions, the project can generate insights for investors, hosts, and travelers.

# Importing libraries and the dataset

In [6]:
!wget -P input https://raw.githubusercontent.com/fredericmenezes/EDA-Airbnb-Rio-de-Janeiro-Project/main/input/listings.csv

--2023-09-16 18:39:13--  https://raw.githubusercontent.com/fredericmenezes/EDA-Airbnb-Rio-de-Janeiro-Project/main/input/listings.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73031461 (70M) [text/plain]
Saving to: ‘input/listings.csv.1’


2023-09-16 18:39:16 (275 MB/s) - ‘input/listings.csv.1’ saved [73031461/73031461]



In [9]:
# Input data files are available in the "./input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('./input/'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

./input/listings.csv


In [8]:
# View Settings
%matplotlib inline
%config inlineBackend.figure_formats = ['svg']
# Import Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import statistics as sts
import matplotlib.pyplot as plt
from matplotlib import cm
import folium
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')
# Dataset visualization settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 90)

The dataset used was listings

In [10]:
# Loading data into csv file

df_l = pd.read_csv('./input/listings.csv')

# Data Exploration

In [11]:
# First lines of the Dataset
df_l.head(3)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,3123306,https://www.airbnb.com/rooms/3123306,20230626155752,2023-06-26,previous scrape,Home in Rio · 2 bedrooms · 2 beds · 1 bath,apartamento espaçoso com varanda e vista panorâmica para o mar de ipanema. wifi . tv ...,vizinhança muito hospitaleira e festiva,https://a0.muscache.com/pictures/39851024/a51ce65d_original.jpg,15864313,https://www.airbnb.com/users/show/15864313,José Mario,2014-05-22,"Rio, Brazil",,,,,f,https://a0.muscache.com/im/users/15864313/profile_pic/1400787988/original.jpg?aki_poli...,https://a0.muscache.com/im/users/15864313/profile_pic/1400787988/original.jpg?aki_poli...,Vidigal,1.0,1.0,"['email', 'phone']",t,f,"Rio, Rio de Janeiro, Brazil",Vidigal,,-22.99298,-43.2391,Entire home,Entire home/apt,6,,1 bath,2.0,2.0,[],$800.00,10,30,10,10,30,30,10.0,30.0,,f,0,0,0,0,2023-06-26,0,0,0,,,,,,,,,,,f,1,1,0,0,
1,912633,https://www.airbnb.com/rooms/912633,20230626155752,2023-06-26,previous scrape,Home in Rio de Janeiro · 1 bedroom · 2 beds · 1 shared bath,<b>The space</b><br />Come have fun with Carnival Carioca <br />And be dazzled by the ...,,https://a0.muscache.com/pictures/13482859/f724b2a1_original.jpg,4897168,https://www.airbnb.com/users/show/4897168,Taryhk,2013-01-29,"Rio de Janeiro, Brazil","Meu nome é Taryhk.\r\nSou bailarino e músico. Trabalho com forró, Dança Contemporanea...",,,,f,https://a0.muscache.com/im/users/4897168/profile_pic/1359493287/original.jpg?aki_polic...,https://a0.muscache.com/im/users/4897168/profile_pic/1359493287/original.jpg?aki_polic...,,1.0,2.0,"['email', 'phone']",t,f,,Rio Comprido,,-22.92466,-43.20748,Shared room in home,Shared room,4,,1 shared bath,,2.0,"[""TV with standard cable"", ""Breakfast"", ""Air conditioning"", ""Kitchen"", ""Wifi"", ""Washer""]",$407.00,1,1125,1,1,1125,1125,1.0,1125.0,,f,0,0,0,0,2023-06-26,1,0,0,2013-02-15,2013-02-15,5.0,5.0,4.0,5.0,5.0,4.0,5.0,,f,1,0,0,1,0.01
2,29051942,https://www.airbnb.com/rooms/29051942,20230626155752,2023-06-27,city scrape,Rental unit in Ipanema · Studio · 1 bed · 1 bath,Central location ipanema <br />Walking distance to Leblon<br />Low season rates,,https://a0.muscache.com/pictures/ce7ceee5-25a5-4ac8-a6a1-6f49fe42c26e.jpg,4307081,https://www.airbnb.com/users/show/4307081,Nereu A,2012-12-02,"Rio de Janeiro, Brazil","30 anos de experiencia na area de turismo, idiomas ingles, espanhol e portugues",within an hour,98%,28%,t,https://a0.muscache.com/im/pictures/user/6c5b1dec-21b3-4fdf-abc8-787d2fa9bde4.jpg?aki_...,https://a0.muscache.com/im/pictures/user/6c5b1dec-21b3-4fdf-abc8-787d2fa9bde4.jpg?aki_...,Ipanema,56.0,85.0,['phone'],t,t,,Ipanema,,-22.98341,-43.2147,Entire rental unit,Entire home/apt,3,,1 bath,,1.0,"[""Iron"", ""Air conditioning"", ""Kitchen"", ""Wifi"", ""Elevator"", ""Smoking allowed""]",$334.00,3,1125,3,3,1125,1125,3.0,1125.0,,t,30,60,90,362,2023-06-27,1,0,0,2018-10-07,2018-10-07,0.0,,,,,,,,f,52,43,9,0,0.02


In [12]:
# Number of rows and columns in the Dataset
df_l.shape

(31401, 75)

In [13]:
# Dataframe Columns
df_l.columns.values

array(['id', 'listing_url', 'scrape_id', 'last_scraped', 'source', 'name',
       'description', 'neighborhood_overview', 'picture_url', 'host_id',
       'host_url', 'host_name', 'host_since', 'host_location',
       'host_about', 'host_response_time', 'host_response_rate',
       'host_acceptance_rate', 'host_is_superhost', 'host_thumbnail_url',
       'host_picture_url', 'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'neighbourhood',
       'neighbourhood_cleansed', 'neighbourhood_group_cleansed',
       'latitude', 'longitude', 'property_type', 'room_type',
       'accommodates', 'bathrooms', 'bathrooms_text', 'bedrooms', 'beds',
       'amenities', 'price', 'minimum_nights', 'maximum_nights',
       'minimum_minimum_nights', 'maximum_minimum_nights',
       'minimum_maximum_nights', 'maximum_maximum_nights',
       'minimum_nights_avg_ntm', 'maximum_nights_avg_ntm',
   

In [14]:
# Dataset data types
df_l.dtypes

id                                                int64
listing_url                                      object
scrape_id                                         int64
last_scraped                                     object
source                                           object
name                                             object
description                                      object
neighborhood_overview                            object
picture_url                                      object
host_id                                           int64
host_url                                         object
host_name                                        object
host_since                                       object
host_location                                    object
host_about                                       object
host_response_time                               object
host_response_rate                               object
host_acceptance_rate                            

In [15]:
# Frequency of data types
df_l.dtypes.value_counts()

object     34
int64      21
float64    20
dtype: int64