# **Airhub booking anaylsics** -



##### **Project Type**    - EDA
##### **Contribution**    - Individual/Team


# **Project Summary -**

This project involved analyzing the 'Airbnb NYC 2019' dataset to uncover insights that can drive business growth for Airbnb hosts and the platform. The dataset contains information on over 48,000 listings in New York City, including details on pricing, location, room type, availability, and customer reviews.

The primary objectives were to optimize pricing strategies, enhance customer satisfaction, and identify growth opportunities. Key analyses included exploring correlations between variables, identifying demand patterns, and segmenting the market based on listing characteristics.

By utilizing visualizations such as pair plots and bar charts, we identified important trends, such as the impact of location on pricing and the influence of customer reviews on booking rates. These insights can guide data-driven decisions, leading to better pricing models, targeted marketing efforts, and overall improved competitiveness in the market.

The project highlights the importance of leveraging data to inform business strategies, ultimately helping Airbnb hosts and the platform maximize revenue, improve customer experiences, and maintain a strong market presence in New York City.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

#### **Define Your Business Objective?**

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


### Dataset Loading

In [None]:
df = pd.read_csv('/content/Airbnb NYC 2019.csv')

### Dataset First View

In [None]:
df.head()

### Dataset Rows & Columns count

In [None]:
df.shape

### Dataset Information

In [None]:
df.info()

#### Duplicate Values

In [None]:
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
df.isnull().sum()

In [None]:
df.isnull()

### What did you know about your dataset?

Answer Here: It's a Airbnb Dataset. And we have given some columns like id, name, host_id, host_name, neighbourhood_group,
       neighbourhood, latitude, longitude, room_type, price,
       minimum_nights, number_of_reviews, last_review,
       reviews_per_month, calculated_host_listings_count,
       availability_365. Now we have to analyze the data by Cleaning it like handeling missing values, Modifying data etc. So we can get some Useful information from our dataset.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

In [None]:
df.columns

In [None]:
# Dataset Describe

In [None]:
df.describe()

### Variables Description

* id: Unique identifier for each listing.
* name: Name of the listing.
* host_id: ID of the host.
* host_name: Name of the host.
* neighbourhood_group: Larger geographical area (e.g., boroughs of NYC).
* neighbourhood: Specific neighborhood within the borough.
* latitude: Latitude of the listing.
* longitude: Longitude of the listing.
* room_type: Type of room (e.g., Entire home/apt, Private room).
* price: Price per night.
* minimum_nights: Minimum number of nights required for booking.
* number_of_reviews: Total number of reviews received.
* last_review: Date of the last review.
* reviews_per_month: Number of reviews per month.
* calculated_host_listings_count: Number of listings the host has.
* availability_365: Number of available days in a year.


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

In [None]:
values_uni = df.apply(lambda col: col.unique())
values_uni


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

In [None]:
df['last_review'].fillna(df['last_review'].mode().iloc[0],inplace=True)

In [None]:
df['reviews_per_month'].fillna(df['reviews_per_month'].mode().iloc[0],inplace=True)

In [None]:
df['host_name'].fillna(df['host_name'].mode().iloc[0],inplace=True)

In [None]:
df['name'].fillna(df['name'].mode().iloc[0],inplace=True)

In [None]:
df.dtypes

In [None]:
df['last_review'] = pd.to_datetime(df['last_review'])

In [None]:
df['Year'] = df['last_review'].dt.year

In [None]:
df['Months'] = df['last_review'].dt.month_name()

### What all manipulations have you done and insights you found?

Answer Here:- There are some columns in the dataset whose datatype is not according to the values. So I change their data type accoding to their values. In this Airbnb dataset some columns have NAN values i according to the numerical columns i fill the values either with mean or mode. Or in catergorical column i fill the values with mode(The most occuring value in the column). I have created two new columns Year, Months from the already existed column last_review

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
sns.barplot(data=df, x='Year', y="availability_365",hue="room_type")
plt.title("Availability of the rooms over the period of years.")
plt.figure(figsize=(4,4))
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:- The bar chart was chosen to easily compare how the availability of different room types changes across the years. It visually shows differences and trends in room availability, making it simple to understand.

##### 2. What is/are the insight(s) found from the chart?

Answer Here: The chart reveals whether certain room types, like "Entire home/apt" or "Private room," have become more or less available over time, highlighting trends in how room offerings have shifted.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here :- The insights help identify which room types are in demand, guiding decisions to focus on popular options. If availability is dropping for some room types, it could be a warning sign to address potential issues and avoid losing business.


#### Chart - 2

In [None]:
# Chart - 2 visualization code

In [None]:
# Chart - 2 visualization code
sns.lineplot(data=df, x="Year", y="price" , hue='room_type')
plt.title('Changes in the price By years.')
plt.figure(figsize=(4,4))
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:- The line chart was chosen to track and compare price changes over the years for different room types. It clearly shows trends and fluctuations in prices over time.

##### 2. What is/are the insight(s) found from the chart?

[link text](https://)Answer Here:The chart reveals how the prices for each room type, like "Entire home/apt" or "Private room," have increased or decreased over the years, highlighting patterns or sudden changes in pricing.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.  

Answer Here: These insights help identify trends in room pricing, allowing businesses to adjust strategies to maximize revenue. If prices for a certain room type are rising steadily, it suggests growing demand, whereas a drop might indicate reduced interest or increased competition.

#### Chart - 3

In [None]:
# Chart - 3 visualization code

In [None]:
# Chart - 3 visualization code
sns.countplot(data=df,x= "calculated_host_listings_count",hue='room_type', palette='viridis')
plt.xticks(rotation=90)
plt.title("")
plt.figure(figsize=(4,4))
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:- The count plot was chosen to visualize the distribution of the number of listings each host manages, broken down by room type. It helps to see how many hosts manage multiple listings and what types they are.

##### 2. What is/are the insight(s) found from the chart?

Answer Here:The chart shows how many hosts have different numbers of listings and which room types are more common among hosts with many or few listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:This insight helps understand the scale of operations by hosts. If many hosts manage multiple listings of a specific room type, it may indicate a profitable strategy. However, a high concentration of listings by a few hosts might signal potential market control or vulnerability if those hosts leave the platform.

#### Chart - 4

In [None]:
# Chart - 4 visualization code

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='longitude', y='latitude', hue='neighbourhood_group', palette='viridis', s=50 )
plt.title('Geographical Distribution of Listings')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:- The scatter plot was chosen to visualize the geographical spread of listings across different neighborhoods in NYC. It helps to see how listings are distributed on the map.

##### 2. What is/are the insight(s) found from the chart?

Answer Here: The chart shows where most listings are concentrated within each neighborhood group, revealing patterns in location popularity across the city.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here :- These insights help identify hotspots for listings, guiding decisions on where to focus marketing efforts or expand offerings. It also shows potential areas with fewer listings that could be targeted for growth.

#### Chart - 5

In [None]:
# Chart - 5 visualization code

In [None]:
# Chart - 5 visualization code
sns.kdeplot(data=df, x='price',hue='neighbourhood_group')
plt.title("Neighbourhood group By Price")
plt.figure(figsize=(4,4))
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here: The KDE plot was chosen to compare the distribution of prices across different neighborhood groups. It helps in visualizing where most listings are priced within each neighborhood.

##### 2. What is/are the insight(s) found from the chart?

Answer Here:
The chart shows how prices are distributed in each neighborhood group, revealing which areas have higher or lower-priced listings.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here :- These insights can guide pricing strategies by showing where pricing is competitive or where there's room to adjust. It also helps in identifying neighborhoods with potentially undervalued or overpriced listings, which can be crucial for maximizing revenue.

#### Chart - 6

In [None]:
# Chart - 6 visualization code

In [None]:
# Chart - 6 visualization code
maximum_values = df.value_counts().nlargest(20)
maximum_values['name'] = df.value_counts().iloc[20:].sum()


sns.barplot(data=df, x='neighbourhood_group', y='Year', hue='room_type')
plt.title("Neighbourhood group By Years")
plt.figure(figsize=(4,4))
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:- I chose the bar plot to visualize the relationship between "neighbourhood group" and "year" with respect to the "room_type." Bar plots are effective in comparing categorical data across different groups, which makes it easier to understand how the distribution of room types varies across different neighbourhoods and over time.

##### 2. What is/are the insight(s) found from the chart?

Answer Here :- From the chart, one can observe the distribution of room types across different neighbourhood groups over the years. This can reveal trends such as which neighbourhoods have a higher concentration of specific room types or how the popularity of certain room types has evolved in particular areas over time.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:  Yes, the insights can help create a positive business impact. For instance, if a particular room type is more popular in a specific neighbourhood group, businesses can target that area with marketing strategies tailored to that preference. Conversely, if the data shows a decline in a particular room type in certain neighbourhoods, it might indicate oversaturation or shifting consumer preferences, which could negatively impact growth if not addressed. Understanding these trends allows businesses to optimize their offerings and better meet customer demands, thus driving growth.

#### Chart - 7

In [None]:
# Chart - 7 visualization code

In [None]:
sns.boxenplot(data=df, x='price', hue="Year",palette="viridis")
plt.title("Price By Year")
plt.figure(figsize=(4,4))
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:- The boxen plot was chosen because it effectively displays the distribution of prices across different years while also highlighting potential outliers. It is particularly useful for identifying the overall trend and range of prices, especially when dealing with a large dataset where extreme values might influence the analysis.

##### 2. What is/are the insight(s) found from the chart?

Answer Here:- The chart allows us to observe the distribution and variation of prices over the years. Key insights could include identifying whether prices have generally increased, decreased, or remained stable over time. Additionally, the chart could reveal any significant outliers or trends in price distribution across different years.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.



Answer Here:- The insights gained from this chart can indeed help create a positive business impact. For instance, if the analysis shows an upward trend in prices, it could indicate a growing market demand, allowing businesses to adjust pricing strategies accordingly. On the other hand, if the chart reveals a decline in prices, it could suggest market saturation or decreased demand, potentially leading to negative growth. Understanding these trends allows businesses to strategize and make informed decisions about pricing, marketing, and investment.


#### Chart - 8

In [None]:
# Chart - 8 visualization code
top_scores = df.sort_values(by='price', ascending=False).drop_duplicates(subset='name').nlargest(10, 'price')
sns.barplot(data=top_scores,x='name',y='price',hue='room_type',palette='inferno')
plt.title("Top 10 Customers paid the Maximum price")
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:- The bar plot was chosen because it effectively visualizes categorical data against a numerical variable. It allows us to clearly see which listings have the highest prices and how they are distributed across different room types, providing a straightforward comparison.


##### 2. What is/are the insight(s) found from the chart?

Answer Here:- The chart reveals the top 10 most expensive Airbnb listings in New York City. Additionally, it shows the room type associated with each of these high-priced listings. This insight can highlight which types of properties command the highest prices, indicating what kind of properties are in high demand or seen as luxurious.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:- Yes, the gained insights can have a positive business impact. Understanding which properties are able to command higher prices can help hosts and Airbnb optimize pricing strategies, target luxury markets, and enhance features in listings that are proven to attract high-paying customers. However, if these high prices are out of sync with customer expectations or the value provided, it could lead to negative growth due to customer dissatisfaction or negative reviews. Therefore, it's crucial to ensure that high prices are justified by the quality and uniqueness of the experience offered.

#### Chart - 9

In [None]:
# Chart - 9 visualization code

In [None]:
# Chart - 9 visualization code
room_type_counts = df['room_type'].value_counts()

plt.pie(room_type_counts, labels=room_type_counts.index, autopct='%1.1f%%', startangle=140)
plt.title('Distribution of Room Types')
plt.figure(figsize=(4,4))
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:- The pie chart was chosen because it effectively visualizes the proportional distribution of categorical data—in this case, the different types of rooms available on Airbnb. It allows for a quick and intuitive understanding of the relative share of each room type in the dataset.

##### 2. What is/are the insight(s) found from the chart?



Answer Here:- The chart reveals that the majority of Airbnb listings in New York City are either "Entire home/apt" or "Private room" types, with these two categories dominating the market. "Shared room" and "Hotel room" types represent a much smaller portion of the overall listings.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:- The insights can help create a positive business impact by guiding hosts or property managers on the types of rooms they should focus on offering to meet market demand. Since "Entire home/apt" and "Private room" dominate, focusing on these categories could maximize occupancy rates. Conversely, a heavy reliance on "Shared room" types may lead to negative growth if market demand shifts further towards more private accommodations, which seems to be the trend.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
sns.countplot(data=df, x="Months", hue='Months',legend=False, palette='viridis')
plt.title("Count of the Months")
plt.xticks(rotation=45)
plt.figure(figsize=(4,4))
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:- The count plot was chosen to visualize the frequency distribution of bookings across different months. It is particularly effective in highlighting which months have higher or lower numbers of bookings, providing a clear visual comparison of seasonality in the data.


##### 2. What is/are the insight(s) found from the chart?

Answer Here:- The chart reveals how Airbnb bookings are distributed throughout the year. Peaks in certain months might indicate high demand periods, while dips suggest off-peak times. This insight is valuable for understanding seasonal trends in the Airbnb market in NYC.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here :- Yes, the gained insights can lead to a positive business impact. By identifying the peak and off-peak months, hosts can adjust their pricing strategies, offering discounts during slower months to increase occupancy or raising prices during high-demand periods. This strategic pricing can optimize revenue. Conversely, not acting on this information could result in lost revenue opportunities during peak times and higher vacancy rates during off-peak times, potentially leading to negative growth.

#### Chart - 11

In [None]:
# Chart - 11 visualization code

In [None]:
grouped_data = df.groupby('Months')["minimum_nights"].sum().reset_index()
grouped_data

In [None]:
# Chart - 11 visualization code
plt.pie(data=grouped_data,x='minimum_nights' ,labels='Months', autopct='%1.1f%%',startangle=140)
plt.title("Minimum nights Spent by Customers Over the period of Months")
plt.figure(figsize=(4,4))
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:- The pie chart was selected because it effectively visualizes the proportion of total minimum nights spent by customers across different months. This allows for a clear and immediate understanding of which months have the highest and lowest occupancy rates in terms of minimum nights spent.

##### 2. What is/are the insight(s) found from the chart?

Answer Here:- The insights derived from the chart show the distribution of minimum nights spent across the year. This could reveal peak seasons where customers stay longer and off-peak seasons where the stay duration is shorter. Identifying these trends can help businesses adjust their pricing or marketing strategies according to seasonal demand.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:- The gained insights can lead to a positive business impact by allowing businesses to optimize their operations during peak months, perhaps by raising prices or offering special packages to attract longer stays during off-peak months. However, if a significant portion of nights is concentrated in a few months, it could indicate underutilization during other months, leading to potential negative growth if not addressed. Balancing occupancy rates throughout the year would be crucial for sustained revenue growth.

#### Chart - 12

In [None]:
plot = sns.displot(data=df, x="Months",hue="room_type",  col="room_type")
plot.set_xticklabels(rotation=90)
plt.figure(figsize=(4,4))
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:-The distribution plot (displot) with hue="room_type" and col="room_type" was chosen to visually compare the distribution of different room types across months. This chart is effective for identifying seasonal patterns and understanding how room availability or popularity fluctuates over time for each room type. By separating the data by room type, the plot allows for a clear comparison without overcrowding the visualization.

##### 2. What is/are the insight(s) found from the chart?

Answer Here:- The insights from the chart could include observing which room types have more consistent bookings throughout the year and identifying any seasonal peaks or drops in demand. For example, if the chart shows that certain room types are more popular in the summer months, it might indicate a seasonal trend where demand increases due to tourism.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:- The gained insights can lead to a positive business impact by enabling hosts and property managers to adjust pricing strategies or availability based on seasonal demand. For example, higher demand periods might justify higher prices, while lower demand periods might require promotional offers to attract guests. However, if the chart reveals a consistent drop in demand for certain room types during specific months, this could indicate potential negative growth, requiring a reassessment of marketing strategies or property improvements to attract more bookings during those times.

#### Chart - 13

In [None]:
group_data = df.groupby('Months')[[ 'name','price']].agg({'name':'count','price':'sum'}).sort_values(by='name')
sns.barplot(data=group_data, x = 'Months', y = 'name',hue='price',palette='viridis')
plt.xticks(rotation=90)
plt.figure(figsize=(4,4))
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:- The bar plot was chosen to visually compare the number of listings across months, with the total price as a hue, allowing for a clear assessment of seasonal trends and price variations.

##### 2. What is/are the insight(s) found from the chart?

Answer Here:- The chart highlights the months with the highest and lowest listing counts, and indicates if these months correspond to higher total prices, suggesting potential peak and off-peak periods.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here:- The insights can drive pricing strategies, maximizing revenue during peak months. However, identifying low-demand periods may indicate potential for negative growth unless strategic discounts or promotions are implemented.


#### Chart - 14 - Correlation Heatmap

In [None]:
numerical_columns = ['price','minimum_nights','number_of_reviews','reviews_per_month',
                     'calculated_host_listings_count','availability_365']
corr_matrix = df[numerical_columns].corr()
plt.figure(figsize=(10,6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix of Numerical Features')
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here:- The correlation heatmap was chosen to visualize the relationships between multiple numerical features in the dataset, helping to quickly identify strong positive or negative correlations that could influence pricing and availability strategies.

##### 2. What is/are the insight(s) found from the chart?

Answer Here:- The heatmap reveals that 'price' has a low correlation with most other numerical features, while 'minimum_nights' and 'availability_365' show some moderate correlations with other factors like 'reviews_per_month' and 'number_of_reviews.'

#### Chart - 15 - Pair Plot

In [None]:
sns.pairplot(data=df,markers=['o', 's', 'D'])
plt.figure(figsize=(4,4))
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here:- The pair plot was chosen because it provides a comprehensive view of the relationships between multiple numerical variables, making it easier to spot trends, correlations, and outliers in the dataset.

##### 2. What is/are the insight(s) found from the chart?

Answer Here:- The pair plot likely reveals correlations between variables, such as price, number of reviews, and availability, helping to identify factors that may influence pricing and occupancy rates in Airbnb listings.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. Optimize Pricing Strategies:
Analyze the correlation between price and other factors like location, room type, and abailability to set competitive pricing.for example, adjusting prices based on seasonal trends and demand in specific neighborhoods can maximaize occupacy and revenue.
2. Enhance customer Experience:
Identify patterns in customer reviewa and rating to improve service quality. Listing wit high review counts and ratings can be studied to replicate successful features across other listings, boosting overall guest satisfaction.
3. Targeted Marketing:
Utilize insights from the distribution of room types and locations to tailor marketing efforts. For example, promoting listings in high-demand beighborhoods or unique property types can attract more guests.
4. Expand High-Demand Listings:
Based on the availability and booking trends, consider expanding or acquiring more listings in areas with consistent high demand. This can lead to increased market share and revenue growth.
5. Monitor and Adjust to Market Trends:
Regularyly analyze new data to stay updated with market trends, such as changes in booking patterns or guest preferences, and adjust the business strategies accordingly to maintain competitiveness.

implementing these strategies can lead to improved occupancy rates, better customer satisfaction, and ultimately, higher profitability.

# **Conclusion**

The analysis of the 'Airbnb NYC 2019' dataset has provided valuable insights into the dynamics of the Airbnb market in New York City. By exploring correlations between price, room types, location, and availability, we can identify key factors that influence booking rates and pricing strategies. These insights enable targeted improvements in pricing, marketing, and customer experience, which are crucial for maximizing occupancy and revenue.

Ultimately, the project demonstrates how data-driven strategies can enhance business performance, ensuring that listings are competitively priced, well-promoted, and aligned with market demand. This approach not only drives positive business outcomes but also helps in maintaining a competitive edge in the ever-evolving short-term rental market.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***