 
# <img src="./data/inflation.png" alt="Statistics Icon" width="30"/> **Introduction: Exploratory Data Analysis of Airbnb Listings for Price Determinants** <img src="./data/books.png" alt="Book Icon" width="30"/>


As the demand for Airbnb accommodations continues to surge, *<span style="color:#4285f4">understanding the dynamics that influence pricing</span>* in specific locations becomes imperative for hosts, property managers, and stakeholders alike. In response to this need, the following *<span style="color:#4285f4">Exploratory Data Analysis (EDA)</span>* has been conducted to unravel the intricate patterns within the *<span style="color:#4285f4">Airbnb listings dataset.</span>*

# <img src="./data/target.png" alt="Objective Icon" width="30"/> **Objective**

The primary goal of this analysis is to *<span style="color:#4285f4">discern the factors contributing to higher prices in certain places.</span>* 

By delving into the dataset, the aim is not only to *<span style="color:#4285f4">facilitate strategic decision-making</span>* for hosts and stakeholders but also to deepen the comprehension of the *<span style="color:#4285f4">multifaceted elements that intricately shape the pricing dynamics</span>* of Airbnb listings. 

These insights will serve as a valuable resource for;
- Optimizing pricing strategies, 
- Enhancing the overall guest experience, and 
- Seizing opportunities

within the dynamic landscape of short-term rentals.

# <img src="./data/file.png" alt="Overview Icon" width="30"/> **Dataset Overview**

The dataset encompasses a rich array of features, ranging from property characteristics and host details to guest reviews and geographic coordinates. The extensive nature of this dataset allows for *<span style="color:#4285f4">a comprehensive exploration, shedding light on both numerical and categorical facets influencing pricing dynamics.</span>*

# <img src="./data/iteration.png" alt="Methodology Icon" width="30"/> **Methodology**

![Methodology](./data/methodology-t.png)


# <img src="./data/question-mark.png" alt="Questionmark Icon" width="30"/> **Decoding Airbnb Prices: Probing Questions for Deeper Insights**

1. **What is the <span style="color:#4285f4">*distribution of Airbnb prices*</span>, and what insights can we gain from it?**

2. **How do *<span style="color:#4285f4">property types</span>* correlate with prices?**

3. **Are *<span style="color:#4285f4">specific neighborhoods</span>* associated with higher or lower prices?**

4. **What *<span style="color:#4285f4">amenities</span>* significantly impact listing prices?**

5. **Do *<span style="color:#4285f4">host-related features</span>* (superhost status, response time) correlate with prices?**

6. **Are *<span style="color:#4285f4">outlier listings</span>* due to unique features or circumstances, and how do they influence prices?**

# <img src="./data/data-cleaning.png" alt="Datacleaning Icon" width="30"/> **Data Loading and Cleaning**

If you wish to examine the imported libraries essential for this EDA, kindly reveal the details by clicking on 'Show Hidden Code'.

In [5]:
import pandas as pd
import numpy as np
import geopandas
import folium
import seaborn as sns
import matplotlib.pyplot as plt


In [None]:
# read & load the dataset into pandas dataframe
listings = pd.read_csv('./data/airbnb_nyc.csv', delimiter=',')

In [9]:
# clean the rows that ALL columns in a row are NaN
listings.dropna(how="all")

Unnamed: 0,id,name,summary,description,experiences_offered,neighborhood_overview,transit,house_rules,host_id,host_since,...,hot_tub_sauna_or_pool,internet,long_term_stays,pets_allowed,private_entrance,secure,self_check_in,smoking_allowed,accessible,event_suitable
0,2539,Clean & quiet apt home by the park,Renovated apt home in elevator building.,Renovated apt home in elevator building. Spaci...,none,Close to Prospect Park and Historic Ditmas Park,Very close to F and G trains and Express bus i...,-The security and comfort of all our guests is...,2787,39698.0,...,-1,1,1,-1,-1,1,1,-1,1,1
1,3647,THE VILLAGE OF HARLEM....NEW YORK !,,WELCOME TO OUR INTERNATIONAL URBAN COMMUNITY T...,none,,,Upon arrival please have a legibile copy of yo...,4632,39777.0,...,-1,1,-1,-1,-1,-1,-1,-1,-1,-1
2,7750,Huge 2 BR Upper East Cental Park,,Large Furnished 2BR one block to Central Park...,none,,,,17985,39953.0,...,-1,1,-1,1,-1,-1,-1,-1,-1,-1
3,8505,Sunny Bedroom Across Prospect Park,Just renovated sun drenched bedroom in a quiet...,Just renovated sun drenched bedroom in a quiet...,none,Quiet and beautiful Windsor Terrace. The apart...,Ten minutes walk to the 15th sheet F&G train s...,- No shoes in the house - Quiet hours after 11...,25326,40006.0,...,-1,1,-1,-1,-1,-1,-1,-1,-1,-1
4,8700,Magnifique Suite au N de Manhattan - vue Cloitres,Suite de 20 m2 a 5 min des 2 lignes de metro a...,Suite de 20 m2 a 5 min des 2 lignes de metro a...,none,,Metro 1 et A,,26394,40014.0,...,-1,1,-1,-1,-1,-1,-1,-1,-1,-1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30174,36484363,QUIT PRIVATE HOUSE,THE PUBLIC TRANSPORTATION: THE TRAIN STATION I...,THE PUBLIC TRANSPORTATION: THE TRAIN STATION I...,none,QUIT QUIT QUIT !!!!!!,TRAIN STATION 5 MINUTE UBER OR 15 MINUTE WALK ...,"Guest should not wear shoes, no smoking mariju...",107716952,42722.0,...,-1,1,-1,-1,-1,-1,1,-1,-1,1
30175,36484665,Charming one bedroom - newly renovated rowhouse,"This one bedroom in a large, newly renovated r...","This one bedroom in a large, newly renovated r...",none,"There's an endless number of new restaurants, ...",We are three blocks from the G subway and abou...,,8232441,41504.0,...,-1,1,-1,-1,1,-1,1,-1,-1,-1
30176,36485057,Affordable room in Bushwick/East Williamsburg,,,none,,,,6570630,41419.0,...,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
30177,36485609,43rd St. Time Square-cozy single bed,,,none,,,,30985759,42104.0,...,-1,1,-1,-1,-1,-1,1,-1,-1,-1


In [10]:
# check the no. of columns & rows
print('Airbnb Dataset Contains, Rows: {:,d} & Columns: {}'.format(listings.shape[0], listings.shape[1]))

Airbnb Dataset Contains, Rows: 30,179 & Columns: 81


In [19]:
# check key details to comprehend the dataset thoroughly and initiate a structured exploration
# with this we can also uncover the lineup of columns in the listing dataset
listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30179 entries, 0 to 30178
Data columns (total 81 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            30179 non-null  int64  
 1   name                                          30166 non-null  object 
 2   summary                                       28961 non-null  object 
 3   description                                   29575 non-null  object 
 4   experiences_offered                           30179 non-null  object 
 5   neighborhood_overview                         18113 non-null  object 
 6   transit                                       18190 non-null  object 
 7   house_rules                                   16622 non-null  object 
 8   host_id                                       30179 non-null  int64  
 9   host_since                                    30170 non-null 

In [29]:
# check NaN values in each column to guarantee the accuracy and reliability of our data
for column in listings.columns:
    print(column, listings[column].isnull().sum())

id 0
name 13
summary 1218
description 604
experiences_offered 0
neighborhood_overview 12066
transit 11989
house_rules 13557
host_id 0
host_since 9
host_response_time 13097
host_response_rate 13097
host_is_superhost 9
host_listings_count 9
host_identity_verified 9
street 0
neighbourhood 9
latitude 0
longitude 0
property_type 0
room_type 0
accommodates 0
bathrooms 0
bedrooms 0
beds 0
bed_type 0
amenities 0
price 0
guests_included 0
extra_people 0
minimum_nights 0
calendar_updated 0
has_availability 0
availability_30 0
availability_60 0
availability_90 0
availability_365 0
number_of_reviews 0
number_of_reviews_ltm 0
review_scores_rating 9085
review_scores_accuracy 9111
review_scores_cleanliness 9101
review_scores_checkin 9129
review_scores_communication 9110
review_scores_location 9132
review_scores_value 9130
instant_bookable 0
cancellation_policy 0
calculated_host_listings_count 0
calculated_host_listings_count_entire_homes 0
calculated_host_listings_count_private_rooms 0
calculated_hos

In [34]:
# check which columns have higher NaN values
columns_with_high_nan = [column for column in listings.columns if listings[column].isnull().sum() > 12000]
listings[columns_with_high_nan].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30179 entries, 0 to 30178
Data columns (total 4 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   neighborhood_overview  18113 non-null  object 
 1   house_rules            16622 non-null  object 
 2   host_response_time     17082 non-null  object 
 3   host_response_rate     17082 non-null  float64
dtypes: float64(1), object(3)
memory usage: 943.2+ KB


<div style="background-color: rgba(144, 238, 144, 0.5); color: black; padding: 10px; border-radius: 5px;">
    <img src="./data/research.png" alt="Research Icon" width="25"/> <strong>Noteworthy Data Observation:</strong> 

Exploring columns like <strong><i>"neighborhood_overview", "house_rules", "host_response_time", "host_response_rate"</i></strong> highlights a considerable number of missing values. Depending on how important these columns are, it should be considered to fill in missing data or carefully examine how their absence might affect the analysis.
</div>


# <img src="./data/market2.png" alt="House Price Icon" width="30"/> **Price Distribution Analysis**