<a href="https://colab.research.google.com/github/manishaachary13/AirBnb_EDA/blob/main/Airbnb_EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name**            - R MANISHA ACHARY

# **Project Summary -**

 **Airbnb, Inc** is an American company operating an online marketplace for short- and long-term homestays and experiences. Founded in **2008** by *Brian Chesky, Nathan Blecharczyk, and Joe Gebbia,* Airbnb has revolutionized the travel industry by offering unique and personalized lodging options compared to traditional hotels. Airbnb's mission is to foster a sense of belonging and create a world where anyone can feel at home anywhere.

This project aims to explore and analyze an Airbnb dataset to uncover key trends and insights that can inform hosts and potential investors. Specifically, the analysis seeks to understand the distribution of listing prices, identify the most popular room types and their average prices, analyze the distribution of listings across different neighbourhoods, and explore seasonal booking trends and factors affecting bookings.By leveraging data analysis on millions of listings, Airbnb can enhance its security measures, business decisions, understanding of customer and host behavior, marketing strategies, and the implementation of innovative services.


# **GitHub Link -**

https://github.com/manishaachary13/EDA-Projects/blob/dd69b00dc43a42994c7e4ae8283f5eab8d8ff868/Airbnb_booking_analysis_RManishaAchary.ipynb

# **Problem Statement**


Airbnb has transformed the hospitality industry by offering unique and personalized lodging options to travelers worldwide. However, to maintain its competitive edge and optimize its offerings, Airbnb needs to gain deeper insights into the patterns and trends within its vast array of listings. Understanding these patterns can help hosts optimize their pricing strategies, improve listing visibility, and enhance guest satisfaction.

The primary problems addressed in this analysis are:


*   **Price Distribution:** What is the distribution of listing prices, and what
factors influence these prices?
*   **Room Type Popularity:** What are the most common room types offered on Airbnb, and how do their prices compare?
*   **Neighbourhood Analysis:** How are listings distributed across different neighbourhoods, and which neighbourhoods command higher prices?
*   **Seasonal Trends:** Are there identifiable seasonal trends in booking demand, and how do they affect pricing and availability?

#### **Define Your Business Objective?**

**Business Objectives**

The primary business objectives of this Airbnb booking analysis project are to:

1. **Optimize Pricing Strategies**: Identify pricing trends and factors influencing listing prices to help hosts set competitive and attractive prices, maximizing occupancy rates and revenue.
2. **Improve Listing Visibility and Attractiveness**: Understand which room types and amenities are most popular and how their presence affects booking rates, guiding hosts in enhancing their listings.
3. **Strategic Investment Decisions**: Provide potential investors with data on the most lucrative neighborhoods and types of properties, helping them make informed decisions about where to invest.
4. **Seasonal Demand Management**: Analyze booking patterns across different seasons to help hosts adjust their availability and pricing dynamically, capitalizing on peak booking periods.
5. **Customer and Host Behavior Insights**: Gain insights into customer preferences and host performance to improve user experience, tailor marketing strategies, and develop new features that meet user needs.
6. **Marketing and Promotion**: Inform marketing initiatives by identifying key trends and target demographics, ensuring promotional efforts are effectively directed and yield the highest return on investment.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd              # Data manipulation and analysis
import numpy as np               # Numerical operations
import matplotlib.pyplot as plt  # Data visualization
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')
# load dataset
file_path = '/content/drive/MyDrive/dataset/Airbnb NYC 2019.csv'


### Dataset First View

In [None]:
# Dataset First Look

In [None]:
# Dataset First Look
encodings = ['utf-8', 'latin1', 'ISO-8859-1', 'utf-16']
for encoding in encodings:
    try:
        df = pd.read_csv(file_path, encoding=encoding)
        print("CSV file read successfully using encoding:", encoding)
        break
    except UnicodeDecodeError:
        print("Error decoding with encoding:", encoding)

In [None]:
# display first few rows of data
print(df.head())

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

In [None]:
# Dataset Duplicate Value Count
# Count duplicate rows in the DataFrame
duplicate_count = df.duplicated().sum()

print("Number of duplicate rows:", duplicate_count)


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

In [None]:
# Missing Values/Null Values Count
# Count missing values in each column of the DataFrame
missing_values_count = df.isnull().sum()

print("Missing values count per column:")
print(missing_values_count)


In [None]:
# Visualizing the missing values

In [None]:
# Generate heatmap to visualize missing values
plt.figure(figsize=(8, 6))
sns.heatmap(df.isnull(), cmap='viridis', cbar=False, yticklabels=False)
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

Dataset Overview:

*   Dimensions: The dataset has 48,895 rows and 16 columns.

*   Data Types: There are columns of type int64, float64, and object.
*   Missing Values: Several columns have missing values:

   1.   name: 16 missing values
   2.   host_name: 21 missing values
   3.   last_review: 10,052 missing values
   4.   reviews_per_month: 10,052 missing values
   5.   Duplicates: There are no duplicate rows in the dataset.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe

In [None]:
# Dataset Describe
df.describe()

### Variables Description

**Columns and Their Descriptions:**

1.   **id:** ID of the listing
2.**name:** Name of the listing
3.**host_id:** ID of the host
4.**host_name:** Name of the host
5.**neighbourhood_group:** Group of neighbourhoods
6.**neighbourhood:** Specific neighbourhood
7.**latitude:** Latitude coordinate of the listing
8.**longitude:** Longitude coordinate of the listing
9.**room_type:** Type of room offered
10.**price**: Price per night for the listing
11.**minimum_nights**: Minimum nights to be paid for
12.**number_of_reviews**: Total number of reviews for the listing
13.**last_review**: Date of the last review
14.**reviews_per_month**: Average number of reviews per month
15.**calculated_host_listings_count**: Total count
16.**availability_365**: Number of days the listing is available within a year

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
# Assuming 'df' is your DataFrame
for column in df.columns:
    unique_values = df[column].unique()
    print(f"Unique values for {column}:", unique_values)
    print("--"*30)


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

In [None]:
# drop data where name and host name missing found
df.dropna(subset=['name', 'host_name'], inplace=True)


In [None]:
# fill 0 inplace null
df["reviews_per_month"] = df["reviews_per_month"].fillna(0)

### What all manipulations have you done and insights you found?

**Handling Missing name and host_name Data:**
In the Airbnb dataset used for analysis, missing values were identified in the name and host_name columns. With careful consideration of the dataset size and the impact on subsequent analyses, rows containing missing name or host_name entries were removed. This decision was made to maintain data integrity and ensure reliable insights.

**Handling Missing Values in last_review and reviews_per_month Columns:**

The dataset from Airbnb includes missing values in the last_review and reviews_per_month columns. These missing values were intentionally retained as null entries. Given that not all guests provide reviews, these null values reflect instances where reviews were not submitted. And the missing values of reviewa_per_month column has filles with 0.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1: Locations and Prices of Airbnb Listings
import matplotlib.pyplot as plt

# Data
latitudes = df['latitude']
longitudes = df['longitude']
prices = df['price']

# Create a scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(longitudes, latitudes, c=prices, cmap='cool', s=100, alpha=0.75)

# Add labels and title
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Locations and Prices of Airbnb Listings')

# Add color bar
plt.colorbar(label='Price ($)')


# Show plot
plt.tight_layout()
plt.grid(True)
plt.show()


##### 1. Why did you pick the specific chart?


A scatter plot with a color gradient effectively visualizes geographical distribution and the pricing pattern of Airbnb listings. It provides spatial insights while correlating the location with prices.

##### 2. What is/are the insight(s) found from the chart?



*  Most listings are concentrated in specific areas, likely popular neighborhoods.
* Higher-priced listings seem sparse and localized, while lower-priced listings are more widespread.
* Manhattan likely exhibits the highest prices compared to other regions.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


1. **Positive Business Impact:**
   - **Targeted Marketing Strategies:**  
     The concentration of listings in specific areas (e.g., central Manhattan and popular neighborhoods) suggests high customer demand. Businesses can focus their marketing efforts on these high-demand zones to attract more bookings.
   - **Price Optimization:**  
     Observing that higher-priced listings are localized allows property owners in those areas to position their properties as premium and adjust pricing accordingly. This could lead to revenue maximization.
   - **Expansion Opportunities:**  
     Areas with a low density of listings but visible demand can be targeted for expansion by adding new properties, creating opportunities for growth.

2. **Insights That May Lead to Negative Growth:**
   - **Market Saturation in High-Density Areas:**  
     High density in central locations may indicate market saturation, leading to intense competition among hosts. If not managed properly, this could result in a price war and reduced profit margins.
   - **Neglecting Underserved Areas:**  
     The focus on popular neighborhoods could overshadow potential opportunities in less-saturated areas. This might lead to underutilized market potential in outer regions or emerging tourist destinations.

**Justification:**
While the scatter plot provides valuable insights for optimizing business strategies in high-demand areas, over-reliance on these zones without exploring new markets may limit long-term growth. Diversifying into less-dense regions with strategic investments could balance growth and mitigate saturation risks.

#### Chart - 2

In [None]:
# chart 2: Distribution of room types

plt.figure(figsize=(8, 6))
sns.countplot(x='room_type', data=df, palette='viridis')
plt.title('Distribution of Room Types')
plt.xlabel('Room Type')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

I chose a **Bar chart** for visualizing the **Distribution of Room Types** because it is ideal for comparing categorical data, making it easy to see and compare the frequencies of different room types. Bar charts are straightforward and clear, allowing for quick interpretation of the data. They also enable effective highlighting of differences between categories and are highly customizable, allowing for additional details like annotations.

##### 2. What is/are the insight(s) found from the chart?

From the chart "Distribution of Room Types," we can derive the following insights:

1.  Prevalence of Entire Home/Apt: The most common type of room in the Airbnb
listings is "Entire home/apt," with the highest count among all room types.
2.  Significant Number of Private Rooms: "Private room" is the second most common type, indicating a substantial number of listings offer private rooms within a larger property.
3.  Low Proportion of Shared Rooms: "Shared room" has the lowest count, suggesting that fewer hosts are offering shared accommodations compared to private rooms or entire homes/apartments.
4.  Market Composition: The distribution highlights that the market is dominated by entire properties and private rooms, which may reflect traveler preferences for more private or exclusive accommodations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

1. Strategic Investment:The dominance of "Entire home/apt" listings indicates a high demand for entire properties. Investing in entire properties can lead to higher occupancy rates and increased revenue.
   
2. Marketing Focus:A significant number of "Private room" listings suggest a market for budget-conscious or solo travelers. Targeted marketing campaigns can attract this segment, increasing bookings and revenue.
   
3. Product Differentiation:The low number of "Shared room" listings could imply a niche market. Enhancing the appeal of shared accommodations can attract budget travelers and create unique value propositions.

Negative Growth Insights:

1. Market Saturation:The high number of "Entire home/apt" and "Private room" listings may indicate market saturation. Intense competition in a saturated market can lead to price wars and lower profit margins.
   
2. Risk of Overinvestment:Focusing heavily on the most common room types could lead to overinvestment. Market shifts or economic downturns may reduce demand for these categories, leading to potential financial losses.
   
3. Neglect of Alternative Options:The low number of "Shared room" listings suggests a neglected segment. Ignoring this segment might mean missing out on opportunities to attract budget travelers looking for communal living experiences.

Justification:

Positive Impact: The insights about room type distribution can guide strategic decisions, helping businesses align their offerings with market demand and target specific customer segments effectively.

Negative Growth Potential: Overemphasis on the most common categories without considering market dynamics and alternative segments might lead to issues like saturation and missed opportunities.

#### Chart - 3

In [None]:
# Chart - 3: Distribution of Airbnb Listings by Neighbourhood Group
plt.figure(figsize=(10, 6))
sns.countplot(x='neighbourhood_group', data=df, palette='Set2')
plt.title('Distribution of Airbnb Listings by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()



##### 1. Why did you pick the specific chart?

A bar chart is ideal for visualizing and comparing categorical data, such as Airbnb listings across neighborhood groups, as it provides a clear, intuitive representation of the distribution. It highlights key trends and makes it easy to identify areas with the highest and lowest listings.

##### 2. What is/are the insight(s) found from the chart?

1.  The chart shows that Manhattan and Brooklyn have the highest number of Airbnb listings, significantly more than other boroughs.
2.Queens, Staten Island, and Bronx have considerably fewer listings compared to Manhattan and Brooklyn, indicating underrepresentation.
3. The high number of listings in Manhattan and Brooklyn suggests strong demand in these areas. Focusing investments and marketing efforts in these boroughs can leverage the existing high demand to maximize occupancy rates and revenue.
4.The high number of listings in Manhattan and Brooklyn may indicate market saturation. Intense competition in these saturated markets can lead to price wars and reduced profit margins, negatively impacting revenue.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Justification:

**Positive Impact:**

Focusing on high-demand areas like Manhattan and Brooklyn ensures alignment with market demand, which can lead to higher occupancy rates and increased revenue.
Strategic marketing in these areas can capitalize on their popularity, attracting more guests and boosting bookings.

**Negative Growth Potential:**

Market saturation in Manhattan and Brooklyn could result in intense competition, forcing hosts to lower prices to attract guests, which reduces profit margins.
Ignoring underrepresented boroughs like Queens, Staten Island, and Bronx could mean missing out on potential growth opportunities in less competitive markets.

#### Chart - 4

In [None]:
# Chart - 4: Average Price by room type and neighbourhood group
plt.figure(figsize=(12, 8))
sns.barplot(x='neighbourhood_group', y='price', hue='room_type', data=df, palette='muted', ci=None)
plt.title('Average Price by Room Type and Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Average Price ($)')
plt.xticks(rotation=45)
plt.legend(title='Room Type')
plt.show()


##### 1. Why did you pick the specific chart?

A grouped bar chart is excellent for comparing average prices across neighborhoods and room types. It makes it easy to spot trends between regions and room categories.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that Manhattan have the highest number of Airbnb listings, significantly more than other Neighbourhood groups.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.



*   Entire home/apartments are consistently the most expensive across all neighborhoods.
* Manhattan has the highest average prices across all room types.
* Shared rooms are the cheapest option, regardless of the neighborhood.



#### Chart - 5

In [None]:
# Chart - 5


# Set up the matplotlib figure size
plt.figure(figsize=(10, 8))

# Create the box plot using seaborn
sns.boxplot(x='neighbourhood_group', y='price', data=df, palette='viridis')

# Customize the plot
plt.title('Price Distribution by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Price ($)')
plt.xticks(rotation=45)

# Show the plot
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

The box plot is a suitable choice for this analysis because it effectively summarizes the distribution of prices across different neighborhood groups.

##### 2. What is/are the insight(s) found from the chart?

Insights:
Median Prices: Manhattan has the highest median price among the neighborhood groups, followed by Brooklyn, Queens, Staten Island, and the Bronx.
Price Spread: Manhattan and Brooklyn have a wider spread of prices, indicating more variability in the listings' prices.
Outliers: All neighborhood groups exhibit outliers, but Manhattan has the most significant number of high-price outliers, indicating the presence of high-end listings.
Comparison: The Bronx, Staten Island, and Queens have lower median prices and less variability compared to Manhattan and Brooklyn.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Justification:
1.   Managing Competition: The high variability in prices within Manhattan and
Brooklyn suggests that while there is potential for high revenue, it also indicates fierce competition. If not strategically managed, this could result in negative growth due to price wars or market saturation.
2. Occupancy Rates: Mispricing in lower-cost areas could lead to lower occupancy rates, negatively impacting overall revenue. It's crucial to align pricing strategies with market demand and sensitivity to maintain growth.

#### Chart - 6

In [None]:
# Chart - 6: Listing Density Heatmap
import matplotlib.pyplot as plt
import seaborn as sns

# Set up the matplotlib figure size
plt.figure(figsize=(10, 8))

# Create the heatmap using seaborn
sns.histplot(x=df['longitude'], y=df['latitude'], bins=100, cmap='viridis',cbar=True)

# Customize the plot
plt.title('Listing Density Heatmap')
plt.xlabel('Longitude')
plt.ylabel('Latitude')


# Show the plot
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A heatmap is ideal for showing the density of listings spatially. It highlights hotspots where listings are densely packed, providing a visual overview of supply concentration.

##### 2. What is/are the insight(s) found from the chart?



*   Managing Saturation: While high-density areas indicate high demand, they also imply greater competition. If not managed well, this could result in reduced profitability due to price wars or an oversupply of listings.
*   Balanced Focus: Neglecting low-density areas could mean missing out on opportunities for growth. These areas might have lower competition and could become lucrative markets with the right strategy.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Central Manhattan and parts of Brooklyn are the densest areas for listings.
* Outer regions, like Staten Island, have significantly fewer listings.
* Density corresponds to urban and high-demand areas, suggesting higher customer preference for central locations.

#### Chart - 7

In [None]:
# Chart - 7: Number of Reviews in terms of Neighbourhood Group
import matplotlib.pyplot as plt

# Group by 'neighbourhood_group' and get the max 'number_of_reviews', then reset the index
area_reviews = df.groupby(['neighbourhood_group'])['number_of_reviews'].max().reset_index()
area_reviews

# Extract the 'neighbourhood_group' and 'number_of_reviews' columns
area = area_reviews['neighbourhood_group']
review = area_reviews['number_of_reviews']

# Create the plot
fig = plt.figure(figsize=(10, 5))
plt.bar(area, review, color='blue', width=0.5)
plt.xlabel('Area')
plt.ylabel('Reviews')
plt.title('Number of Reviews in terms of Neighbourhood Group')
plt.show()


##### 1. Why did you pick the specific chart?

A bar chart is suitable because it effectively compares the number of reviews across neighborhood groups. It provides a clear visual of which areas generate the most customer engagement, making it easy to interpret.

##### 2. What is/are the insight(s) found from the chart?

The Queeens Neighbourhood Group gives more reviews followed by Manhattan, Brooklyn and others.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact of the Insights:**
* **Targeting High-Engagement Areas**:
 1.Queens having the most reviews suggests
high customer activity, indicating strong demand or customer satisfaction. Businesses can focus marketing campaigns and additional listings in this area to capitalize on this trend.
2. Similarly, leveraging Manhattan and Brooklyn, which also show significant engagement, can maximize revenue potential in these already popular neighborhoods.
* **Customer Feedback Utilization:**
1. Reviews provide direct insights into customer preferences. Businesses in Queens can analyze reviews to refine services, maintaining their competitive advantage.
**Potential for Negative Growth:**

* **Over-reliance on Queens:**

1. Over-concentrating resources in Queens while neglecting Manhattan and Brooklyn might lead to missing opportunities in these neighborhoods with high potential.

* **Ignoring Less-Reviewed Areas:**

1. Staten Island and the Bronx, with fewer reviews, might signify underdeveloped markets. Ignoring these regions could mean losing opportunities to expand in untapped areas.

**Justification:**
While Queens demonstrates high engagement, a balanced strategy focusing on both high-performing and underutilized areas will ensure sustainable growth and prevent overdependence on a single neighborhood group.

#### Chart - 8

In [None]:
# Chart - 8: Busiest Hosts in Terms of Reviews
import matplotlib.pyplot as plt

# Group by 'host_id', 'host_name', and 'room_type', and get the max 'number_of_reviews', then reset the index
busy_hosts = df.groupby(['host_id', 'host_name', 'room_type'])['number_of_reviews'].max().reset_index()

# Sort the DataFrame by 'number_of_reviews' in descending order and select the top 10
busy_hosts = busy_hosts.sort_values(by='number_of_reviews', ascending=False).head(10)
busy_hosts

# Extract the 'host_name' and 'number_of_reviews' columns
name_hosts = busy_hosts['host_name']
review_got = busy_hosts['number_of_reviews']

# Create the plot
fig = plt.figure(figsize=(10, 5))
plt.bar(name_hosts, review_got, color='purple', width=0.5)
plt.xlabel('Host Name')
plt.ylabel('Number of Reviews')
plt.title('Busiest Hosts in Terms of Reviews')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.show()



##### 1. Why did you pick the specific chart?

A bar chart is ideal for showcasing the top 10 busiest hosts in terms of reviews as it effectively compares discrete categories (hostnames) against their review counts. This clear visual ranking makes it easy to interpret the busiest hosts' performance.

##### 2. What is/are the insight(s) found from the chart?

**Insights Found from the Chart:**

1. **High Engagement with Top Hosts:**

The top **10** hosts have consistently high review counts **(500 to 600+)**, indicating strong customer interaction and engagement.

2. **Popularity and Customer Trust:**

Hosts with more reviews likely attract repeat bookings or receive higher visibility on the platform, signaling customer trust and satisfaction.

3. **Potential Market Leaders:**

These hosts can be considered market leaders in their respective segments, possibly setting benchmarks for customer service and property management.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Will the Gained Insights Help Create a Positive Business Impact?**
1. **Recognizing High-Performing Hosts:**
Airbnb can highlight these top hosts in promotional content to showcase quality service, boosting platform credibility and trust among users.
2. **Learning Opportunities for Other Hosts:**
Insights from these busy hosts (e.g., property features, customer service strategies) can help new or less popular hosts improve their offerings.
3. **Revenue Maximization:**
The busiest hosts likely generate significant revenue for the platform, so providing them with additional support or incentives can further enhance their performance.

**Insights That Could Lead to Negative Growth:**
1. **Over-reliance on Top Hosts:**
Relying too heavily on a few top-performing hosts could create market imbalances, where smaller or newer hosts struggle to compete, leading to dissatisfaction and churn.
2. **Quality Concerns at Scale:**
With higher demand, busy hosts might face operational challenges (e.g., slower responses or declining service quality), potentially leading to negative reviews and a drop in customer satisfaction.

**Justification:**

The bar chart highlights high-performing hosts, offering actionable insights to recognize and replicate their success. However, ensuring a balanced ecosystem that supports newer hosts while maintaining service quality among busy hosts is crucial for sustained growth.

#### Chart - 9

In [None]:
# Chart - 9: Hosts with Maximum Price Charge
highest_price=df.groupby(['host_id','host_name','room_type','neighbourhood_group'])['price'].max().reset_index()
highest_price=highest_price.sort_values(by='price',ascending=False).head(10)
highest_price

name_of_host=highest_price['host_name']
price_charge=highest_price['price']
fig=plt.figure(figsize=(10,5))
plt.bar(name_of_host,price_charge,color='orange',width=.5)
plt.xlabel("Host names")
plt.ylabel("price")
plt.title('Hosts with Maximum Price Charge')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is effective for visualizing and comparing the maximum price charged by different hosts. It provides a clear, categorical representation of pricing variations, making it easy to identify hosts charging the highest rates.

##### 2. What is/are the insight(s) found from the chart?

* The chart helps differentiate hosts based on their pricing strategies, showing which ones target budget-conscious versus high-end customers.
* The chart highlights significant differences in pricing strategies among hosts, ranging from moderate to premium charges.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Justification:**
While the bar chart enables clear comparisons of maximum pricing and helps inform strategic pricing decisions, ensuring value alignment with pricing is critical. Balancing the needs of both luxury and budget travelers ensures long-term growth and customer satisfaction.

#### Chart - 10

In [None]:
# Chart - 10: Average Price of Airbnb Listings over Time


# Assuming 'last_review' is a datetime column, convert it to datetime format
df['last_review'] = pd.to_datetime(df['last_review'])

# Group by 'last_review' and calculate average price per month
average_price_per_month = df.resample('M', on='last_review')['price'].mean()

# Plotting the line plot
plt.figure(figsize=(10, 6))
plt.plot(average_price_per_month.index, average_price_per_month.values, marker='o', linestyle='-')
plt.title('Average Price of Airbnb Listings over Time')
plt.xlabel('Month')
plt.ylabel('Average Price ($)')
plt.grid(True)
plt.tight_layout()

# Show plot
plt.show()


##### 1. Why did you pick the specific chart?

The line chart showing the average price of Airbnb listings over time was selected because it is ideal for Trend Analysis, Historical Data and Identifying Peaks and Thoughts.

##### 2. What is/are the insight(s) found from the chart?

Insights:
* Price Volatility: There is significant volatility in prices, with noticeable peaks in 2014 and 2015.
* Stabilization: Prices appear to stabilize after 2016, with fewer extreme peaks and a more consistent trend.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Justification:
* Managing Volatility: While price peaks can lead to high revenue in the short term, they also suggest periods of instability. Consistent volatility can make it challenging to maintain steady growth and could lead to negative perceptions among potential investors.
* External Factors: Peaks followed by dips might be influenced by factors outside the control of the business, such as changes in tourism trends, economic conditions, or regulatory impacts. These can lead to unpredictable revenue streams and require adaptive strategies to mitigate risks.

#### Chart - 11

In [None]:
# Chart - 11:Number of Reviews per Month

# Assuming 'last_review' is a datetime column, convert it to datetime format
df['last_review'] = pd.to_datetime(df['last_review'])

# Group by 'last_review' and count number of reviews per month
reviews_per_month = df.resample('M', on='last_review').size()

# Plotting the line plot
plt.figure(figsize=(10, 6))
plt.plot(reviews_per_month.index, reviews_per_month.values, marker='o', linestyle='-')
plt.title('Number of Reviews per Month')
plt.xlabel('Month')
plt.ylabel('Number of Reviews')

plt.tight_layout()

# Show plot
plt.show()


##### 1. Why did you pick the specific chart?

A line chart is ideal for showing trends over time, such as the number of reviews per month. It effectively highlights fluctuations and patterns in reviews, making it easy to analyze changes before and after 2019.

##### 2. What is/are the insight(s) found from the chart?

After 2019 people giving reviews.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


**Positive Business Impact of the Insights:**

1. **Understanding Demand Trends:**  
   - The increase in reviews after 2019 indicates growing customer engagement, which businesses can use to optimize pricing, improve services, and target marketing campaigns during high-demand periods.
   
2. **Seasonal Insights:**  
   - If specific months consistently show spikes in reviews, businesses can plan promotions or launch new listings during these periods to maximize profits.

3. **Customer Retention:**  
   - The trend can reveal customer preferences post-2019, helping businesses tailor offerings to align with evolving demands.

---

**Potential for Negative Growth:**

1. **Over-Reliance on Post-2019 Trends:**  
   - Focusing solely on post-2019 insights may lead businesses to ignore historical patterns, missing opportunities to cater to a broader audience or address past issues.
   
2. **Market Saturation Risks:**  
   - An increase in reviews might also indicate a growing number of competitors, leading to saturation in high-demand areas. This could pressure hosts to lower prices, reducing profitability.

---

**Justification:**

* The line chart provides actionable insights into customer engagement trends over time, helping businesses align strategies with customer preferences. However, a balanced approach, considering past trends and competition, is essential to sustain growth and avoid potential risks.

#### Chart - 12

In [None]:
# Chart - 12: Distribution of Room Types in Airbnb Listings

# Count the occurrences of each room type
room_type_counts = df['room_type'].value_counts()

# Plotting the pie chart
plt.figure(figsize=(8, 6))
plt.pie(room_type_counts, labels=room_type_counts.index, autopct='%1.1f%%', startangle=140, colors=['#66c2a5', '#fc8d62', '#8da0cb', '#e78ac3'])
plt.title('Distribution of Room Types in Airbnb Listings')
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

# Show plot
plt.show()


##### 1. Why did you pick the specific chart?

A pie chart is ideal for displaying proportional data, such as the distribution of room types. It visually represents the share of each room type in the total listings, making it easy to compare their relative popularity.



##### 2. What is/are the insight(s) found from the chart?

**Insights Found from the Chart:**

1. **Dominance of Entire Homes/Apartments:**
Entire homes/apartments account for the majority (52%) of listings, indicating that customers prefer having exclusive spaces for their stays.

2. **Popularity of Private Rooms:**
Private rooms (45.7%) are almost as prevalent as entire homes/apartments, highlighting a demand for more affordable, shared living options.

3. **Minimal Demand for Shared Rooms:**
Shared rooms make up a small fraction (2.4%) of the listings, indicating low customer preference for this type of accommodation.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Will the Gained Insights Help Create a Positive Business Impact?**

1. **Optimizing Room Type Availability:**
Hosts can prioritize listing entire homes/apartments and private rooms, catering to the most popular categories to increase bookings and revenue.
2. **Targeted Marketing:**
Marketing campaigns can emphasize private rooms and entire homes/apartments, addressing customer demand directly.
3. **Reducing Underperforming Listings:**
Insights about low demand for shared rooms can guide hosts to reconfigure shared room setups into private rooms or other popular options.

**Insights That Could Lead to Negative Growth:**
1. **Neglecting Shared Rooms:**
Completely phasing out shared rooms might result in missed opportunities to attract budget-conscious travelers or groups, especially in highly competitive markets.
2. **Market Saturation for Popular Room Types:**
Oversaturating the market with entire homes/apartments could lead to price wars, reduced profitability, and a lack of differentiation among listings.

**Justification:**

While prioritizing the most popular room types ensures high customer satisfaction and better profitability, a balanced approach, including offerings for niche markets (like shared rooms), can cater to diverse customer needs and sustain long-term growth.

#### Chart - 13

In [None]:
# Correlation Heatmap visualization code


# Selecting numeric columns for the correlation heatmap
numeric_columns = ['price', 'minimum_nights', 'number_of_reviews', 'reviews_per_month', 'calculated_host_listings_count', 'availability_365']

# Calculate the correlation matrix
corr_matrix = df[numeric_columns].corr()

# Create a heatmap using seaborn
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f', annot_kws={"size": 10})
plt.title('Correlation Heatmap of Airbnb Data')
plt.show()


##### 1. Why did you pick the specific chart?

A heatmap is used to visualize the correlation between variables. It highlights relationships through color gradients, helping identify strong positive or negative correlations.

##### 2. What is/are the insight(s) found from the chart?

**Insights:**

1. Weak correlations between most variables and price, suggesting no single strong predictor.
2. reviews_per_month and number_of_reviews have a moderate positive correlation (0.59), suggesting that these variables are related, possibly reflecting guest engagement.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Business Impact:**

1. Identifying variables with weak correlations to price might lead to exploring other factors affecting pricing strategies, such as location or amenities.
2. Using the relationship between reviews_per_month and number_of_reviews can help in designing targeted marketing campaigns to boost engagement.

**Negative Growth Insights:**

1. Weak correlations could indicate missing features or need for a richer dataset for a better predictive model of pricing.
**Justification:**

Businesses can use this relationship to identify and promote high-performing listings or improve visibility for low-performing ones, driving growth through increased customer engagement.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Pair Plot visualization code
# Selecting numeric columns for the pair plot
numeric_columns = ['price', 'minimum_nights', 'number_of_reviews', 'reviews_per_month', 'calculated_host_listings_count', 'availability_365']

# Create a pair plot using seaborn
sns.pairplot(df[numeric_columns])

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

Pairwise scatterplots provide a visual understanding of relationships between variables, detecting trends, clusters, or outliers.

##### 2. What is/are the insight(s) found from the chart?

**Insights:**

1. Non-linear trends are apparent between several variables, such as price and others, suggesting potential for feature transformation.
2. Some variables, like availability_365, have significant clustering, indicating common behaviors like seasonal demand.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Business Impact:**

Feature engineering, such as log-transforming skewed variables, can improve prediction accuracy.
Seasonal demand patterns can guide inventory or pricing strategies.

**Negative Growth Insights:**

Outliers may distort predictions and business insights. Addressing them through preprocessing is critical.

**Justification:**

Businesses can leverage this insight to plan for seasonal fluctuations, optimize inventory, and implement dynamic pricing strategies during peak demand periods.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.


**1. Optimize Pricing Strategies**
- **Dynamic Pricing:** Implement dynamic pricing models that adjust rates based on real-time demand, seasonality, local events, and competition. Utilize historical data to predict peak periods and adjust prices accordingly.
- **Competitive Analysis:** Regularly benchmark prices against competitors in similar neighborhoods and for similar property types to ensure competitive pricing without compromising profitability.
- **Price Monitoring Tools:** Develop or utilize existing tools that provide real-time price suggestions based on market conditions and occupancy rates.

**2. Improve Listing Visibility and Attractiveness**
- **Popular Room Types and Amenities:** Focus on listing entire homes/apartments and private rooms, as they are most in demand. Highlight key amenities that attract guests, such as Wi-Fi, air conditioning, kitchen facilities, and free parking.
- **Listing Enhancements:** Use high-quality photos, detailed descriptions, and guest reviews to improve listing attractiveness. Offer virtual tours or video walkthroughs for a better guest experience.
- **SEO Optimization:** Optimize listings with relevant keywords that potential guests are likely to search for, improving visibility on the platform.

**3. Strategic Investment Decisions**
- **Neighborhood Analysis:** Provide detailed reports on the most lucrative neighborhoods, focusing on areas like Manhattan, Brooklyn, and Queens. Highlight metrics such as average booking rates, occupancy rates, and ROI.
- **Property Type Recommendations:** Advise investors on the most profitable property types, considering factors like room type popularity, price volatility, and guest reviews.
- **Emerging Markets:** Identify and recommend underrepresented areas like Staten Island and Bronx, which could offer high growth potential with lower initial competition.

**4. Seasonal Demand Management**
- **Booking Patterns:** Analyze historical booking data to identify seasonal trends and peak periods. Provide insights into high-demand times like holidays, festivals, and local events.
- **Dynamic Availability:** Recommend hosts adjust their availability and pricing dynamically based on seasonal demand. Encourage minimum stay requirements during peak periods to maximize revenue.
- **Off-Peak Promotions:** Suggest offering discounts or special packages during off-peak seasons to attract more bookings and maintain steady occupancy.

**5. Customer and Host Behavior Insights**
- **Customer Preferences:** Analyze data on guest preferences for room types, amenities, and locations. Use these insights to tailor listings and improve guest satisfaction.
- **Host Performance:** Evaluate host performance based on reviews, response times, and occupancy rates. Provide feedback and training to help hosts improve their service.
- **Feature Development:** Develop new features based on customer feedback, such as personalized recommendations, flexible booking options, and enhanced communication tools.

**6. Marketing and Promotion**
- **Target Demographics:** Identify key demographics and tailor marketing campaigns to reach these segments effectively. Use data to understand the preferences and behaviors of different guest groups.
- **Trend Analysis:** Monitor market trends and adjust promotional strategies accordingly. Focus on promoting high-demand areas and unique listings that stand out.
- **Influencer Collaborations:** Partner with influencers and travel bloggers to reach a broader audience and increase visibility. Highlight positive guest experiences and unique selling points of the listings.


# **Conclusion**

In conclusion, the comprehensive analysis of Airbnb booking data has provided valuable insights into market dynamics, customer behavior, and strategic opportunities for hosts and investors. By implementing the recommended strategies, hosts can optimize their pricing, enhance listing visibility, and effectively manage seasonal demand, ultimately improving occupancy rates and revenue. For investors, understanding the most lucrative neighborhoods and property types will facilitate informed decisions to maximize return on investment.

This project underscores the importance of data-driven decision-making in the competitive landscape of short-term rental markets, emphasizing continuous adaptation to market trends and guest preferences. Moving forward, ongoing monitoring and adaptation of strategies will be essential to maintain competitiveness and achieve long-term success in the Airbnb ecosystem.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***