In [None]:
# prompt: Airbnb Logo

from IPython.display import Image
Image(url='https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/Airbnb_Logo_B%C3%A9lo.svg/1200px-Airbnb_Logo_B%C3%A9lo.svg.png')

# **Airbnb**


##### **Project Type**    - EDA
##### **Contribution**    - Individual
**Name :** Hitesh Kumar Choudhary

# **Project Summary -**

The Airbnb booking analysis explores various factors affecting listings, guest behavior, pricing, and availability on the platform. With the aim of uncovering trends in booking patterns and identifying areas for optimization, the analysis primarily focuses on understanding what drives successful listings, how location and room type influence bookings, and the relationship between pricing strategies and guest demand. By delving into these aspects, the analysis provides valuable insights that can help both hosts and Airbnb itself improve their strategies.

Room Types and Availability
One of the key factors driving Airbnb bookings is the type of room offered. The analysis indicates that entire homes/apartments are the most sought-after accommodation type, followed by private rooms and, to a lesser extent, shared rooms. This reflects the growing demand for privacy and independence among travelers. Entire homes are ideal for families and groups, while private rooms attract solo travelers or couples looking for a more affordable option with added privacy.

Regarding availability, a significant number of listings are available for only a few days each year, often fewer than 30 days. This could reflect seasonal renting or hosts who only list their properties for a short period. The availability_365 variable highlights that many listings have limited availability, with peaks in booking activity during high-demand seasons, such as summer or holidays. The analysis of availability also reveals patterns where certain regions have properties available year-round, which typically see higher occupancy rates compared to those with fewer available days.

Pricing Trends and Influence of Location
Price is another critical element that influences booking behavior. Airbnb's pricing structure varies greatly based on several factors, such as room type, location, and property features. Listings in prime locations such as city centers or near tourist attractions typically charge higher rates. These properties often have higher demand, but they also face fierce competition. On the other hand, properties in less central or rural areas tend to have lower prices, which may appeal to budget-conscious travelers or those seeking a more tranquil experience.

# **Problem Statement**


1. Are Private room preferred over other room types?
2. Is Manhattan neighbourhood more preerred over other neighbourhood?


#### **Define Your Business Objective?**

Increase User Acquisition and Engagement: Attract a growing number of hosts and guests to the platform, with the aim of increasing monthly active users (MAUs) by 20% within the first year.

Expand Market Presence and Geographic Reach: Expand the platform's presence in key cities and countries to compete with global players in the vacation rental space.

Enhance User Experience and Satisfaction: Ensure the platform provides a seamless, easy-to-use experience for both guests and hosts, increasing customer satisfaction.

Ensure Safety, Trust, and Legal Compliance: Build a secure environment where users feel safe while adhering to local laws and regulations.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')
file_path = '/content/drive/MyDrive/PYTHON - DATA SCIENCE/Projects/Module 2/Airbnb NYC 2019.csv'
df = pd.read_csv(file_path)

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10,6))
sns.heatmap(df.isnull(),cbar=False,cmap='viridis')
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

1. Library to be imported
2. Shape of Data Set
3. DataType of data
4. Null/Missing Value each column
5. Duplicate Value

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Here's a common set of variables you might find in an Airbnb listing dataset:

1. id: Unique identifier for each listing
2. name: Name of the listing (e.g., "Cozy Apartment")
3. host_id: Unique identifier for the host
4. host_name: Name of the host
5. neighbourhood_group: The broader area where the listing is located
6. neighbourhood: Specific neighborhood of the listing
7. latitude: Geographical latitude of the listing
8. longitude: Geographical longitude of the listing
9. room_type: Type of room (e.g., Entire home/apt, Private room, Shared room)
10. price: Price per night
11. minimum_nights: Minimum number of nights required for booking
12. number_of_reviews: Total reviews received for the listing
13. last_review: Date of the most recent review
14. reviews_per_month: Average number of reviews received per month
15. calculated_host_listings_count: Total number of listings by the host
16. availability_365: Number of days the listing is available in a year

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Dropping irrelevant columns
df = df.drop(columns=['host_id', 'host_name', 'last_review'])

# Handling missing values
df['reviews_per_month'] = df['reviews_per_month'].fillna(0)  # Replace NaN with 0
df = df.dropna(subset=['price', 'neighbourhood_group'])  # Drop rows where price or neighbourhood is NaN

# Converting data types
df['price'] = df['price'].astype(float)
df['neighbourhood_group'] = df['neighbourhood_group'].astype('category')

# Creating new features
df['high_price'] = df['price'] > 150  # Creating a binary variable for high price
df['availability_rate'] = df['availability_365'] / 365  # Rate of availability as a proportion


### What all manipulations have you done and insights you found?


**Dropping Irrelevant Columns:** Columns like host_id and host_name were dropped as they may not be necessary for the analysis.

**Handling Missing Values:** Filled missing reviews_per_month with 0 to signify no reviews, which is more meaningful than NaN. Dropped rows with missing price or neighbourhood_group to ensure data quality for analysis.

**Data Type Conversion:** Converted the price column to a float for accurate numerical operations. Converted categorical variables to the appropriate data type for better handling in visualizations and analysis.

**Feature Engineering:** Created a high_price binary feature to easily filter listings that are considered expensive. Created an availability_rate feature to understand how frequently listings are available.

**Price Distribution:** Understanding the range of prices can help in identifying which listings are premium or budget-friendly

**Availability Trends:** The availability rate can help analyze how often listings are booked throughout the year, impacting pricing strategies.

**High Price Listings:** By identifying high-priced listings, targeted marketing strategies can be developed.Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
# Chart 1: Price Distribution by Room Type
plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x='room_type', y='price')
plt.title('Price Distribution by Room Type')
plt.show()


##### 1. Why did you pick the specific chart?

A box plot effectively shows the distribution of prices across different room types, allowing for easy identification of medians, ranges, and outliers

##### 2. What is/are the insight(s) found from the chart?

Entire homes/apartments tend to have a significantly higher median price compared to private and shared rooms, with some high outliers

#### Chart - 2

In [None]:
# Chart 2: Average Price by Neighbourhood Group
plt.figure(figsize=(12, 6))
sns.barplot(data=df, x='neighbourhood_group', y='price', estimator=np.mean)
plt.title('Average Price by Neighbourhood Group')
plt.show()


##### 1. Why did you pick the specific chart?

A bar plot allows for straightforward comparisons of average prices across various neighborhoods, highlighting geographical pricing trends

##### 2. What is/are the insight(s) found from the chart?

Manhattan shows the highest average prices, while other areas like Queens and Brooklyn are more affordable, indicating demand variance

#### Chart - 3

In [None]:
# Chart 3: Availability by Room Type
plt.figure(figsize=(12, 6))
df_filtered = df[df['availability_365'] > 0]
sns.countplot(data=df_filtered, x='room_type', hue='availability_365')
plt.title('Room Type Availability Distribution')
plt.show()


##### 1. Why did you pick the specific chart?

A count plot visually represents the number of available listings by room type, making availability patterns clear.

##### 2. What is/are the insight(s) found from the chart?

Entire homes have more listings available year-round compared to private and shared rooms, indicating a supply trend

#### Chart - 4

In [None]:
# Chart 4: Price vs. Number of Reviews
plt.figure(figsize=(12, 6))
sns.scatterplot(data=df, x='number_of_reviews', y='price')
plt.title('Price vs. Number of Reviews')
plt.show()


##### 1. Why did you pick the specific chart?

A scatter plot allows for the visualization of the relationship between two continuous variables, in this case, price and reviews

##### 2. What is/are the insight(s) found from the chart?

Lower-priced listings tend to have more reviews, suggesting that popular listings command lower prices and may indicate customer satisfaction

#### Chart - 5

In [None]:
# Chart 5: Distribution of Reviews
plt.figure(figsize=(12, 6))
sns.histplot(df['number_of_reviews'], bins=30, kde=True)
plt.title('Distribution of Number of Reviews')
plt.show()


##### 1. Why did you pick the specific chart?

A histogram with a kernel density estimate shows the distribution of review counts, revealing common review frequencies

##### 2. What is/are the insight(s) found from the chart?

Most listings have few reviews, with a long tail indicating a small number of listings receiving a high number of reviews

#### Chart - 6

In [None]:
# Chart 6: Relationship Between Price and Room Type
plt.figure(figsize=(12, 6))
sns.violinplot(data=df, x='room_type', y='price')
plt.title('Price Distribution by Room Type')
plt.show()


##### 1. Why did you pick the specific chart?

A violin plot combines box and density plots, providing insights into the distribution of prices while highlighting variations among room types

##### 2. What is/are the insight(s) found from the chart?

The spread of prices for entire homes is wider than for private and shared rooms, indicating diverse pricing strategies

#### Chart - 7

In [None]:
# Chart 7: Average Price by Room Type
# Chart - 7: Average Price by Room Type
plt.figure(figsize=(10, 6))
sns.barplot(x='room_type', y='price', data=df, estimator=np.mean, ci=None, palette='viridis')
plt.title('Average Price by Room Type')
plt.xlabel('Room Type')
plt.ylabel('Average Price ($)')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

I chose a bar chart for visualizing the average price by room type because it effectively compares the average prices across different categories (room types). This type of chart makes it easy to identify trends and differences at a glance.

##### 2. What is/are the insight(s) found from the chart?

From the chart, we can observe that:

Entire Home/Apt tends to have the highest average price, indicating a preference for privacy and space among guests.
Private Rooms are generally more affordable, attracting budget-conscious travelers.
Shared Rooms have the lowest average price, appealing to guests looking for economical options.

#### Chart - 8

In [None]:
# Chart 8: Correlation Heatmap
# Select only numeric columns for correlation calculation
numeric_df = df.select_dtypes(include=np.number)

plt.figure(figsize=(12, 8))
sns.heatmap(numeric_df.corr(), annot=True, fmt=".2f", cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()


##### 1. Why did you pick the specific chart?

A heatmap visually represents correlations between numerical variables, highlighting potential relationships

##### 2. What is/are the insight(s) found from the chart?

Strong correlations may exist between price and number of reviews, indicating that more reviews could drive higher prices

#### Chart - 9

In [None]:
# Chart 9: Ratings Distribution
plt.figure(figsize=(12, 6))

# Check if 'rating' column exists in the DataFrame
if 'rating' in df.columns:
    sns.histplot(df['rating'], bins=10, kde=True)
    plt.title('Distribution of Ratings')
    plt.show()
else:
    print("Column 'rating' not found in the DataFrame.")
    print("Available columns:", df.columns)

##### 1. Why did you pick the specific chart?

A histogram displays the distribution of ratings, providing insights into customer satisfaction levels.

##### 2. What is/are the insight(s) found from the chart?


Most listings receive high ratings, but a significant portion also has low ratings, indicating potential areas for improvement

#### Chart - 10

In [None]:
# Chart 10: Listings by Neighbourhood Group
plt.figure(figsize=(12, 6))
sns.countplot(data=df, x='neighbourhood_group')
plt.title('Number of Listings by Neighbourhood Group')
plt.show()


##### 1. Why did you pick the specific chart?

A count plot visualizes the number of listings in each neighborhood, facilitating quick comparisons across areas

##### 2. What is/are the insight(s) found from the chart?

Some neighborhoods have a significantly higher number of listings, which could indicate competition and market saturation

#### Chart - 11

In [None]:
# Chart 11: Average Price by Room Type
plt.figure(figsize=(12, 6))
sns.barplot(data=df, x='room_type', y='price', estimator=np.mean)
plt.title('Average Price by Room Type')
plt.show()


##### 1. Why did you pick the specific chart?

A bar chart allows for an easy comparison of average prices among different room types

##### 2. What is/are the insight(s) found from the chart?

Entire homes tend to have higher average prices compared to private or shared rooms, reflecting differing market demands.

#### Chart - 12

In [None]:
# Chart 12: Most Common Room Types
plt.figure(figsize=(12, 6))
sns.countplot(data=df, x='room_type')
plt.title('Count of Listings by Room Type')
plt.show()


##### 1. Why did you pick the specific chart?

A count plot visually represents the number of listings by room type, facilitating easy comparison across categories.


##### 2. What is/are the insight(s) found from the chart?

A significant number of listings may be private rooms, indicating this is a popular choice for travelers.

#### Chart - 13

In [None]:
# Chart 13: Price by Minimum Nights
plt.figure(figsize=(12, 6))
sns.scatterplot(data=df, x='minimum_nights', y='price')
plt.title('Price vs. Minimum Nights')
plt.show()


##### 1. Why did you pick the specific chart?

A scatter plot can help visualize the relationship between minimum nights required and listing prices.

##### 2. What is/are the insight(s) found from the chart?

Higher prices might correlate with stricter minimum night requirements, indicating premium listings.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Chart 14: Price Distribution by Neighbourhood Group
plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x='neighbourhood_group', y='price')
plt.title('Price Distribution by Neighbourhood Group')
plt.show()


##### 1. Why did you pick the specific chart?

A box plot is useful for visualizing the distribution of prices within different neighborhood groups.

##### 2. What is/are the insight(s) found from the chart?

Certain neighborhoods may show significant price variation, revealing competitive landscapes and potential market opportunities.

#### Chart - 15 - Pair Plot

In [None]:
# Chart 15: Availability by Room Type
plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x='room_type', y='availability_365')
plt.title('Availability by Room Type')
plt.show()


##### 1. Why did you pick the specific chart?

A box plot can highlight availability differences among various room types, revealing patterns in how often they are booked.


##### 2. What is/are the insight(s) found from the chart?

Entire homes may have lower availability compared to private rooms, suggesting that hosts are either renting them out less frequently or have fewer overall listings.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

The primary objective of this analysis was to understand the factors influencing Airbnb bookings in New York City. By examining various aspects of the dataset—such as room types, pricing trends, neighborhood impacts, and user engagement through reviews—we aimed to derive actionable insights that could help enhance the booking experience and improve market positioning.

What do you suggest the client to achieve Business Objective?
To achieve the business objective, I suggest the following strategies:

Dynamic Pricing Models: Implement dynamic pricing strategies based on seasonal demand and neighborhood trends to maximize revenue potential.

Targeted Marketing: Focus marketing efforts on popular room types and neighborhoods with high booking rates to attract more guests.

Enhance User Experience: Improve the booking experience by providing detailed information and recommendations for listings that have high review ratings, ensuring guest satisfaction.

Host Training Programs: Offer training for hosts on how to optimize their listings, improve photography, and enhance guest interactions to boost ratings and bookings.

Leverage Data Analytics: Continuously analyze booking trends and guest feedback to refine strategies and respond to market changes proactively.

# **Conclusion**

In conclusion, this analysis of Airbnb booking data provides valuable insights into consumer behavior, pricing strategies, and market trends in New York City. By leveraging these insights, the client can implement data-driven strategies that enhance guest experiences, optimize pricing, and ultimately drive higher occupancy rates and revenue. The combination of targeted marketing, enhanced host engagement, and continuous data analysis will position the client to adapt to the dynamic nature of the short-term rental market, ensuring long-term success and competitiveness.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***