<a href="https://colab.research.google.com/github/nitinpandit2530/python-project/blob/main/Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - AirBnb Booking Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual



# **Project Summary -**

The Airbnb NYC 2019 dataset provides valuable insights into the dynamics of short-term rentals across New York City, focusing on boroughs such as Manhattan, Brooklyn, Queens, Bronx, and Staten Island. With nearly 49,000 listings and key details like room type, price, availability, and reviews, this analysis helps understand the behaviors of both hosts and guests on the platform. Data preparation involved cleaning missing values and removing extreme outliers,. Manhattan consistently showed higher price points compared to Brooklyn and Queens, aligning with its premium location. Availability trends revealed that many listings operate on a seasonal or part-time basis, with the average listing available for about 113 days per year. Boroughs did not show major differences in availability, though some neighborhoods within Manhattan displayed more consistent availability patterns. Listings with higher numbers of reviews or consistent monthly reviews typically correlated with greater demand and customer trust, highlighting the importance of maintaining positive guest feedback. From a business perspective, hosts can leverage these insights to set competitive prices aligned with their specific borough and room type. Hosts in Brooklyn and Queens, in particular, could attract guests looking for affordable alternatives to Manhattan by optimizing their pricing and improving guest experiences to increase reviews and visibility. For Airbnb as a platform, the data suggests opportunities for borough-specific marketing and strategies to encourage hosts in less saturated areas, like Queens or Staten Island, where potential growth exists. Moreover, Airbnb could guide hosts through educational resources on pricing optimization, availability management, and service quality to enhance overall platform performance. In conclusion, the analysis of Airbnb NYC 2019 data reveals clear patterns regarding location preferences, room types, pricing, availability, and the importance of customer feedback. These insights offer valuable recommendations for both hosts and the platform itself, supporting improved business decisions, enhanced customer satisfaction, and sustainable growth in the highly competitive short-term rental market of New York City.



# **Github Link -**

https://github.com/nitinpandit2530/python-project/blob/main/Sample_EDA_Submission_Template.ipynb

# **Problem Statement**


Airbnb, as a global platform for short-term rentals, has a significant presence in New York City, offering thousands of property listings across different boroughs. However, with such a large volume of data on listings, hosts, prices, availability, and customer interactions, it becomes challenging for hosts, potential investors, and even Airbnb itself to clearly understand market trends and make informed decisions. This project aims to analyze the Airbnb NYC 2019 dataset to uncover valuable insights about listing behaviors, pricing strategies, neighborhood demand, room type preferences, and factors influencing performance such as availability and customer reviews. By leveraging data-driven analysis, the goal is to help stakeholders optimize pricing, improve listing performance, identify profitable areas, and enhance the overall customer experience on the platform.



#### **Define Your Business Objective?**

1. Identify the most popular neighborhoods and boroughs for Airbnb listings in NYC.

2. Analyze the distribution of room types and their relationship to price and availability.

3. Understand the pricing trends and detect any patterns based on location.

4. Explore how factors like number of reviews and availability impact listing performance.

5. Provide actionable insights to improve business strategies for hosts and Airbnb.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look
df = pd.read_csv('/content/drive/MyDrive/Airbnb NYC 2019.csv')
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape


### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull(), cbar=False)

### What did you know about your dataset?

The Airbnb NYC 2019 dataset is a publicly available dataset, commonly used for data science and machine learning practice, especially for exploratory data analysis (EDA), visualization, and price prediction projects. It provides detailed information on nearly 49,000 Airbnb property listings from across New York City as of 2019.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

| **Variable Name**                     | **Description**                                                        | **Type**    |
| ------------------------------------- | ---------------------------------------------------------------------- | ----------- |
| **id**                                | Unique identifier for each listing.                                    | Numerical   |
| **name**                              | Name/title of the listing given by the host.                           | Categorical |
| **host\_id**                          | Unique identifier for each host.                                       | Numerical   |
| **host\_name**                        | Name of the host who owns the listing.                                 | Categorical |
| **neighbourhood\_group**              | Borough in which the listing is located (e.g., Manhattan, Brooklyn).   | Categorical |
| **neighbourhood**                     | Specific neighborhood within the borough.                              | Categorical |
| **latitude**                          | Geographical latitude of the listing location.                         | Numerical   |
| **longitude**                         | Geographical longitude of the listing location.                        | Numerical   |
| **room\_type**                        | Type of room provided (Entire home/apt, Private room, Shared room).    | Categorical |
| **price**                             | Price per night in USD.                                                | Numerical   |
| **minimum\_nights**                   | Minimum number of nights required to book the listing.                 | Numerical   |
| **number\_of\_reviews**               | Total number of reviews received by the listing.                       | Numerical   |
| **last\_review**                      | Date of the most recent review.                                        | Date        |
| **reviews\_per\_month**               | Average number of reviews per month.                                   | Numerical   |
| **calculated\_host\_listings\_count** | Total number of listings the host has on Airbnb.                       | Numerical   |
| **availability\_365**                 | Number of days in a year the listing is available for booking (0-365). | Numerical   |


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Fill missing 'reviews_per_month' with 0 (because no reviews = 0 per month)
df['reviews_per_month'] = df['reviews_per_month'].fillna(0)
df['last_review'] = df['last_review'].fillna(0)

df['reviews_per_month']


In [None]:
df['last_review']

In [None]:
# Drop rows with missing 'name' or 'host_name'
df = df.dropna(subset=['name', 'host_name'])

df

In [None]:
# Confirm missing values are handled
print(df.isnull().sum())

In [None]:
# Remove listings with price > 500 USD
df = df[df['price'] <= 500]
(df['price'])

In [None]:
# Remove outliers in 'minimum_nights' column (keep only listings with minimum_nights <= 365)
df = df[df['minimum_nights'] <= 365]
(df['minimum_nights'])

In [None]:
# Reset the index after removing rows to keep DataFrame clean
df = df.reset_index(drop=True)
df

In [None]:
# Check the updated dataset shape and basic statistics
print(df.describe())
print(f"Cleaned dataset shape: {df.shape}")

### What all manipulations have you done and insights you found?

In this project, several important data manipulations were carried out to make the Airbnb NYC 2019 dataset analysis-ready. Firstly, missing values were addressed by replacing the missing entries in the reviews_per_month column with zero, as listings with no reviews logically receive no reviews per month. The columns name and host_name, which had very few missing values, were cleaned by dropping those rows entirely, as these columns do not impact the core analysis significantly. After handling missing data, outliers were removed to ensure the dataset represented realistic market conditions. Specifically, listings with prices exceeding $500 per night were excluded, as such prices are not reflective of the majority of Airbnb users in NYC. Similarly, listings requiring more than 365 minimum nights were removed, given that such properties do not align with the concept of short-term rentals. After these manipulations, the index of the dataset was reset to maintain a clean structure.

Following data cleaning, extensive exploratory data analysis (EDA) was conducted through over 20 different visualizations. These visualizations uncovered several key insights about the NYC Airbnb market. It was found that Manhattan and Brooklyn dominate the Airbnb landscape in terms of the number of listings, while Queens, Bronx, and Staten Island hold significantly fewer. Neighborhoods like Williamsburg, Harlem, and the Upper West Side emerged as hotspots with the highest listing densities. In terms of room type, entire homes and apartments were clearly the most popular and most profitable choice for hosts, whereas private rooms served a smaller market, and shared rooms were extremely rare.

From a pricing perspective, most listings fell between $50 and $200 per night, with Manhattan commanding the highest average prices. Queens and Bronx listings offered more affordable options, especially below $100, making them attractive to budget-conscious travelers. Regarding availability, the data showed that many hosts either listed their properties year-round or only for very short, specific periods. Again, Manhattan and Brooklyn had the highest availability throughout the year.

The analysis of reviews indicated that most listings received fewer than 50 total reviews, though some high-performing listings in Brooklyn and Manhattan accumulated hundreds. Interestingly, cheaper listings tended to gather more reviews, suggesting that affordability drives higher booking frequency. Host behavior also revealed that while most hosts managed one or two properties, a smaller group of professional hosts controlled multiple listings. Correlation analysis further showed weak relationships between price and other variables, though there was a noticeable link between number of reviews and reviews per month.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
sns.countplot(data=df, x='neighbourhood_group', palette='Set2')
plt.title('Number of Listings by Neighbourhood Group')
plt.show()

##### 1. Why did you pick the specific chart?

I chose the countplot because it is the simplest and clearest way to visualize the number of listings by neighbourhood group. It effectively shows the frequency of listings in each category, making it easy to compare and quickly identify which boroughs dominate the Airbnb market in NYC.

##### 2. What is/are the insight(s) found from the chart?

Manhattan and Brooklyn clearly dominate the Airbnb market in NYC. These two boroughs together account for the majority of listings. Queens has moderate activity, while the Bronx and Staten Island have relatively fewer listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights will help create a positive business impact. By identifying that Manhattan and Brooklyn dominate the Airbnb market, businesses and hosts can focus their marketing, pricing strategies, and investments in these high-demand areas to maximize revenue. Additionally, recognizing that Queens holds moderate potential can help Airbnb expand its customer base by targeting more budget-conscious travelers there.

Yes, there are insights that could point toward potential risks or stagnation. The low number of listings in the Bronx and Staten Island suggests limited demand, low tourist attraction, or regulatory challenges in these areas. Investing resources in these boroughs without demand validation may lead to wasted efforts, lower occupancy, and negative growth for hosts. This insight warns businesses to carefully analyze demand before expanding into less popular locations.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
top_neighbourhoods = df['neighbourhood'].value_counts().head(10)
top_neighbourhoods.plot(kind='barh', color='skyblue')
plt.title('Top 10 Neighbourhoods with Most Listings')
plt.xlabel('Number of Listings')
plt.ylabel('Neighbourhood')
plt.show()


##### 1. Why did you pick the specific chart?

I chose the horizontal bar plot because it effectively displays the top 10 neighbourhoods with the most listings in a clear, easy-to-read format. It helps visually compare neighbourhoods side by side, making it simple to identify which areas are most active and important for Airbnb’s business strategy.

##### 2. What is/are the insight(s) found from the chart?

Neighbourhoods like Williamsburg, Harlem, and the Upper West Side show high listing density, reflecting their popularity among tourists and short-term visitors.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights will help create a positive business impact. Identifying the top 10 neighbourhoods with the most listings allows Airbnb and hosts to focus their marketing, competitive analysis, and pricing strategies in these high-demand areas. These locations likely have higher customer traffic, better infrastructure, and consistent demand, which can help maximize occupancy and revenue.

Yes, there are potential risks of negative growth. Highly saturated neighbourhoods (like Williamsburg or Harlem) might lead to increased competition, causing downward pressure on prices and lower occupancy rates for new hosts. Over-saturation can make it harder for new listings to succeed, leading to slower revenue growth if not managed carefully.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
sns.countplot(data=df, x='room_type', palette='Set3')
plt.title('Distribution of Room Types')
plt.show()

##### 1. Why did you pick the specific chart?

I chose the countplot because it clearly shows the number of listings in each room type category (Entire home/apt, Private room, Shared room). This chart makes it easy to understand the market share of each room type, helping to analyze customer preferences and potential revenue opportunities.

##### 2. What is/are the insight(s) found from the chart?

Entire home/apartments make up the largest share of listings, showing that most travelers prefer privacy. Private rooms follow, while shared rooms are rare.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Reveals market preference for entire homes, guiding hosts to choose profitable room types.

Negative Insight: Oversupply of entire homes might force lower prices over time due to competition.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
sns.countplot(data=df, x='neighbourhood_group', hue='room_type', palette='pastel')
plt.title('Room Types in Each Neighbourhood Group')
plt.show()


##### 1. Why did you pick the specific chart?

A grouped countplot was selected to visually compare two categorical variables: neighbourhood group and room type. This helps understand how different boroughs cater to different customer needs, revealing trends and gaps in the market that can drive business decisions.

##### 2. What is/are the insight(s) found from the chart?

Manhattan listings are heavily dominated by entire homes and apartments. Brooklyn shows a healthy mix of both private rooms and entire homes, whereas Queens has a larger proportion of private rooms.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Helps tailor room types to borough-specific demand, improving occupancy and revenue.

Negative Insight: Misalignment (e.g., too many shared rooms in low-demand areas) could cause stagnation.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
sns.histplot(df['price'], bins=50, kde=True, color='purple')
plt.title('Distribution of Airbnb Prices')
plt.xlabel('Price (USD per night)')
plt.show()


##### 1. Why did you pick the specific chart?

The histogram is perfect for showing the frequency distribution of price ranges. It provides a clear picture of common price points and helps detect anomalies or pricing trends, guiding hosts and Airbnb to align offerings with typical customer expectations.

##### 2. What is/are the insight(s) found from the chart?

Most listings fall within the $50 to $200 range. Very high prices (approaching the $500 threshold) are rare, suggesting a competitive pricing landscape for short-term stays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Provides pricing benchmarks; helps hosts price competitively.

Negative Insight: Underpricing by competitors can lead to a price war, reducing profitability.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
sns.boxplot(data=df, x='neighbourhood_group', y='price', palette='coolwarm')
plt.title('Price Distribution by Neighbourhood Group')
plt.show()


##### 1. Why did you pick the specific chart?

The boxplot effectively shows the spread, median, and outliers for prices across boroughs. It helps identify which areas are most expensive or affordable. This chart supports strategic decisions about pricing, investment, and customer targeting within specific locations.

##### 2. What is/are the insight(s) found from the chart?

Manhattan shows the highest median and variability in price. Brooklyn’s prices are more moderate, and Queens offers the most affordable listings on average.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Highlights borough-specific pricing trends for better strategic planning.

Negative Insight: High price volatility in some areas may deter consistent business performance.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
sns.boxplot(data=df, x='room_type', y='price', palette='Blues')
plt.title('Price Distribution by Room Type')
plt.show()


##### 1. Why did you pick the specific chart?

I chose the boxplot to highlight price variation within each room type. It clearly shows which room types offer higher earning potential and where price competition is strongest. This visualization aids hosts in selecting competitive but profitable pricing strategies.

##### 2. What is/are the insight(s) found from the chart?

Entire homes and apartments command significantly higher prices than private rooms or shared rooms. Shared rooms are the cheapest but also the least common.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Shows earning potential by room type, aiding hosts’ investment decisions.

Negative Insight: High variability in prices may confuse guests or affect trust in pricing fairness.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
sns.histplot(df['availability_365'], bins=50, kde=False, color='orange')
plt.title('Distribution of Availability (Days Per Year)')
plt.show()


##### 1. Why did you pick the specific chart?

The histogram is ideal for showing how availability is distributed across listings. It reveals patterns such as full-year availability versus seasonal availability, helping understand host behavior and identify opportunities to optimize inventory.

##### 2. What is/are the insight(s) found from the chart?

Many listings are either available all year (365 days) or only for short periods (less than 100 days). This indicates a mix of professional hosts and casual/seasonal hosts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Informs Airbnb about year-round inventory stability for reliable bookings.

Negative Insight: Listings with extremely limited availability reduce platform reliability for travelers.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
sns.boxplot(data=df, x='neighbourhood_group', y='availability_365', palette='YlOrBr')
plt.title('Availability of Listings by Neighbourhood Group')
plt.show()


##### 1. Why did you pick the specific chart?

The boxplot shows availability spread across boroughs, helping identify which areas offer properties year-round versus seasonally. This supports business decisions regarding targeting more reliable areas with continuous availability for better occupancy rates.

##### 2. What is/are the insight(s) found from the chart?

Availability is relatively balanced across boroughs, but Manhattan has more consistently available listings year-round, likely due to its popularity with tourists.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Identifies boroughs with strong inventory availability, helping operational planning.

Negative Insight: Poor availability in certain areas weakens Airbnb’s customer offering.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
df['room_type'].value_counts().plot.pie(autopct='%1.1f%%', colors=['gold', 'lightblue', 'lightgreen'])
plt.title('Percentage Distribution of Room Types')
plt.ylabel('')
plt.show()



##### 1. Why did you pick the specific chart?

The pie chart effectively visualizes the proportion of each room type, giving a quick and intuitive understanding of market share among entire homes, private rooms, and shared rooms. It simplifies complex numbers into a clear percentage-based story.

##### 2. What is/are the insight(s) found from the chart?

Visual confirmation that Entire Home/Apt is the preferred accommodation type, making up over half the listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Visualizes market share clearly, guiding Airbnb’s room type strategies.

Negative Insight: Over-reliance on entire homes increases regulatory risks in urban markets.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
df['calculated_host_listings_count'].value_counts().head(10).plot(kind='bar', color='coral')
plt.title('Top 10 Host Listing Counts')
plt.xlabel('Number of Listings per Host')
plt.ylabel('Number of Hosts')
plt.show()




##### 1. Why did you pick the specific chart?

The bar plot shows how many hosts own multiple listings. This highlights the presence of professional hosts in the market, helping Airbnb and analysts understand host behavior and the degree of professionalization within the platform.

##### 2. What is/are the insight(s) found from the chart?

A few hosts manage multiple properties, suggesting professional involvement in the Airbnb market. Most hosts have only 1 or 2 listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Identifies key professional hosts for potential partnerships and scaling strategies.

Negative Insight: Market dominance by few hosts might limit opportunities for new entrants.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
sns.scatterplot(data=df, x='number_of_reviews', y='price', alpha=0.5)
plt.title('Price vs Number of Reviews')
plt.show()


##### 1. Why did you pick the specific chart?

The scatterplot was chosen to explore the relationship between price and review count. It visually highlights trends, such as cheaper listings getting more bookings and reviews, supporting insights on demand elasticity and pricing strategy.

##### 2. What is/are the insight(s) found from the chart?

Cheaper listings tend to attract more reviews, likely due to higher booking frequency. Expensive listings attract fewer but possibly longer or more exclusive stays.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Reveals how affordability drives engagement; helps price listings to boost reviews.

Negative Insight: Expensive listings attract fewer reviews, which may discourage bookings and limit growth.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
low_price = df[df['price'] < 100]
sns.countplot(data=low_price, x='neighbourhood_group', palette='pastel')
plt.title('Listings Under $100 by Neighbourhood Group')
plt.show()


##### 1. Why did you pick the specific chart?

The countplot clearly highlights which boroughs have more budget-friendly listings under $100. This insight helps target price-sensitive travelers and identify market opportunities in affordable areas like Queens or Bronx.

##### 2. What is/are the insight(s) found from the chart?

Queens and the Bronx dominate the low-cost segment, making them attractive for budget travelers. Manhattan has fewer listings under $100.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Highlights budget-friendly segments for price-conscious traveler strategies.

Negative Insight: Lower-priced markets may struggle with profitability for hosts and quality consistency.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()


##### 1. Why did you pick the specific chart?

The heatmap is the best visual to quickly identify correlations between numerical variables. It helps understand which metrics influence others, supporting data-driven decisions around pricing, availability, and customer engagement.

##### 2. What is/are the insight(s) found from the chart?

Price has very weak correlations with most variables. Reviews per month correlate moderately with the number of reviews, which makes sense as more reviews indicate active listings.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df)
plt.show()

##### 1. Why did you pick the specific chart?

I chose the pair plot because it visually represents the relationships between multiple numerical variables at once. It helps to quickly identify patterns, correlations, and outliers across key factors like price, availability, and reviews. This chart is ideal for exploring overall trends and spotting hidden insights.

##### 2. What is/are the insight(s) found from the chart?

The pair plot shows no strong correlation between price and other variables. Listings with high reviews tend to receive more monthly reviews, reflecting active engagement. Availability clusters at 365 or below 100 days. Manhattan and Brooklyn dominate higher-priced listings, while Queens and Bronx focus on affordable, short-term stays.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To help Airbnb achieve its business objectives, the analysis suggests several key strategies. First, Airbnb should continue to focus on high-demand areas like Manhattan and Brooklyn, as these boroughs dominate both in listings and booking activity. Targeted marketing and optimized pricing strategies in these regions can help hosts remain competitive while maximizing revenue. In contrast, Queens and the Bronx offer potential for growth, especially among budget-conscious travelers. Airbnb can encourage hosts in these areas to improve their listings’ quality and attract more bookings through competitive pricing and targeted promotions. Additionally, since entire home listings generate higher revenue, Airbnb should promote this room type where demand supports higher prices, especially in tourist-heavy neighborhoods. Improving review engagement is also vital, as listings with more reviews tend to attract more bookings; hosts should be encouraged to enhance guest experiences to drive positive feedback. Moreover, Airbnb should advise hosts to maintain year-round availability where possible, particularly in high-demand areas, to increase occupancy rates and income. New hosts can benefit from education on competitive pricing to avoid mistakes like overpricing in low-demand locations or underpricing in saturated markets. Lastly, Airbnb should monitor and manage risks of oversaturation in popular neighborhoods, guiding new hosts toward under-served areas. By leveraging these data-driven insights, Airbnb can refine its marketing and operational strategies to boost bookings, improve host success, and achieve sustainable business growth.

# **Conclusion**

The analysis of the Airbnb NYC 2019 dataset provides valuable insights into the dynamics of the short-term rental market in New York City. It clearly highlights that Manhattan and Brooklyn are the core markets, dominating in terms of both listings and higher average pricing. These boroughs continue to attract tourists and generate significant revenue, while Queens and the Bronx present opportunities for growth, particularly for budget travelers. The data shows a strong preference for entire home or apartment rentals, which command higher prices and greater demand compared to private or shared rooms.

Furthermore, the analysis reveals that listings with higher review activity tend to achieve better performance and greater customer engagement, reinforcing the importance of maintaining high service quality and availability. However, the presence of market saturation in certain areas suggests that Airbnb and hosts need to strategically manage their listings to avoid excessive competition and price undercutting.

Overall, this project concludes that by leveraging these data-driven insights, Airbnb can enhance its platform strategy, help hosts optimize their listings, and expand intelligently into under-served markets. This will ultimately lead to improved booking rates, higher revenue, and a stronger competitive position in the NYC market.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***