# **Project Name**    -



##### **Project Type**    - EDA of AirBnB Booking Analysis
##### **Contribution**    - Individual


# **Project Summary -**

The aim of this project is to conduct an Exploratory Data Analysis (EDA) on Airbnb listings in New York City for the year 2019. The analysis seeks to uncover patterns, trends, and insights regarding the rental market, host characteristics, and pricing strategies.The rise of short-term rental platforms, particularly Airbnb, has significantly transformed the hospitality landscape in urban areas. New York City, a bustling metropolis and a prime tourist destination, has seen an explosive growth in Airbnb listings. This project aims to conduct an Exploratory Data Analysis (EDA) of Airbnb listings in NYC for the year 2019, focusing on uncovering patterns, trends, and insights regarding the rental market, host characteristics, and pricing strategies

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The short-term rental market in New York City has experienced significant growth, driven largely by platforms like Airbnb. However, both hosts and potential guests face challenges in navigating this dynamic landscape. Hosts often struggle to set competitive pricing and optimize their listings, while guests may find it difficult to identify the best options for their needs.

#### **Define Your Business Objective?**

**1. Pricing Variability:** There is a lack of clarity on how different factors—such as location, amenities, and host experience—impact rental prices. Hosts need insights to effectively price their listings to attract bookings while maximizing revenue.

**2. Host Performance Optimization:** New and existing hosts may not understand the characteristics that lead to successful listings. Identifying factors that correlate with high ratings and booking rates can help hosts improve their offerings and increase guest satisfaction.

**3. Market Trends and Demand Fluctuations:**The short-term rental market is influenced by seasonal trends and external events. A comprehensive understanding of these trends is essential for both hosts and guests to make informed decisions regarding availability and booking timing.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')



In [None]:
import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/Airbnb.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head(6)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
import missingno as msno
msno.bar(df)

### What did you know about your dataset?

In my dataset there are 0 duplicate value. In 'name' column there are 16 NaN values, in 'host_name' 21, in 'last_review' 10052 and in 'reviews_per_month' 10052 NaN values.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

 This DataFrame has 48895 rows and 16 columns, representing various attributes
of Airbnb listings in NYC for the year 2019. Columns include 'price', 'neighbourhood',
'room_type', and 'availability_365', among others. Data is sourced from the CSV file
'Airbnb.csv'.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df.dropna(inplace=True)
df.isnull().sum()

In [None]:
# Filter out outliers
mean_price = df['price'].mean()
std_dev_price = df['price'].std()
threshold = mean_price + 3 * std_dev_price

df = df[df['price'] < threshold]
df.shape


### What all manipulations have you done and insights you found?

First i remove all null values from the dataset. Since there are no duplicates in the dataset so there's no need to remove them. Then I filtered the 'price' column based on the outlier and threshold.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
sns.histplot(df['price'], bins=20)
plt.title('Distribution of Airbnb Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

I choose Histogram because it is best chart to visualize the distribution of any column.

##### 2. What is/are the insight(s) found from the chart?

The distribution of prices show that most listings are clustered around 100-300 price range, with a few outliers at higher prices. This can indicate market segments (e.g., budget vs. luxury).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights help creating a positive business impact.Understanding price distributions and trends allows hosts to adjust their pricing based on demand, seasonality, and competitive analysis, maximizing occupancy and revenue.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
room_type_counts = df['room_type'].value_counts()
room_type_counts.plot(kind='pie', autopct='%1.1f%%')
plt.title('Room Type Distribution')
plt.ylabel('')  # Hides the y-label
plt.show()


##### 1. Why did you pick the specific chart?

To show proportions of categorical data.

##### 2. What is/are the insight(s) found from the chart?

 A pie chart of room type distribution show that a majority of listings are Entire Home/Apt, which could inform market strategy or guest preferences.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights into guest preferences enable hosts to create personalized offers, improving customer engagement and satisfaction.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.scatter(df['number_of_reviews'], df['price'])
plt.title('Price vs. Number of Reviews')
plt.xlabel('Number of Reviews')
plt.ylabel('Price')
plt.show()


##### 1. Why did you pick the specific chart?

To explore the relationship between price and reviews.

##### 2. What is/are the insight(s) found from the chart?

 A scatter plot of price vs. number of reviews indicate that higher-priced listings tend to have more reviews, suggesting they are more popular or trusted.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Insights from reviews and popularity of features can inform hosts about what amenities to offer or enhance, leading to improved guest satisfaction and positive reviews.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
sns.pairplot(df[['price', 'number_of_reviews', 'availability_365']])
plt.title('Pair Plot of Selected Features')
plt.show()


##### 1. Why did you pick the specific chart?

To visualize relationships between multiple numerical variables in a single view.

##### 2. What is/are the insight(s) found from the chart?

Pair plots can highlight relationships between multiple features. For example, if you notice that listings with more amenities and higher prices are concentrated in certain areas, it may suggest that these factors are influencing market desirability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights can help inform strategic decisions for hosts, such as optimizing pricing strategies, enhancing property features, or identifying target markets. By understanding these relationships, hosts can better cater to guest preferences and improve overall occupancy and revenue.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
avg_price_by_room_type = df.groupby('room_type')['price'].mean().sort_values()
plt.figure(figsize=(8, 5))
avg_price_by_room_type.plot(kind='bar', color='skyblue')
plt.title('Average Price by Room Type')
plt.xlabel('Room Type')
plt.ylabel('Average Price ($)')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

To know how different types of accommodations are priced.

##### 2. What is/are the insight(s) found from the chart?

We can clearly see thet Entire home/apt prices are high and they are booked more than others.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes we can identify which type of accommodation is most and least expensive, guiding pricing strategies and marketing efforts.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

I will suggest few things:
1. Optimize Pricing Strategies
2. Enhance Guest Experience
3. Targeted Marketing and Promotion
4. Analyze and Improve Listing Quality
5. Expand into New Markets







# **Conclusion**

Overall, the EDA has equipped stakeholders with actionable insights that can drive informed decision-making, enhance guest experiences, and improve business performance. By leveraging these insights, Airbnb hosts can optimize their offerings and position themselves competitively in the market, ultimately leading to increased occupancy and revenue. Continuous monitoring and adaptation based on ongoing data analysis will be essential for sustained success in the dynamic short-term rental market.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***