# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Rachit Jaiswal

# **Project Summary -**

### **Project Overview:**

The Airbnb 2019 Dataset Analysis project aimed to explore and gain insights from a comprehensive dataset provided by Airbnb, focusing on data collected in the year 2019. Airbnb is a global online marketplace and hospitality service that allows individuals to rent out their properties or spare rooms to travelers. This dataset offered a rich source of information on property listings, booking details, and user reviews.

####**Objectives:**

#####**Data Exploration:**
The primary goal of this project was to explore the Airbnb
dataset to understand the structure of the data, identify key features, and gain initial insights into the vacation rental market.

#####**Pricing Analysis:**
Analyzing the pricing trends of Airbnb listings to identify factors influencing property rates, such as location, property type, and time of year.

#####**Demand Analysis:**
Investigating the demand for Airbnb rentals by analyzing booking patterns, including peak booking times and popular destinations.

#####**User Reviews Sentiment Analysis:**
Performing sentiment analysis on user reviews to gauge customer satisfaction and identify areas for improvement for Airbnb hosts.

#####**Geographic Analysis:**
Mapping the distribution of Airbnb listings across different regions and cities to identify hotspots and variations in supply and demand.


###**Methodology:**

#####**Data Collection:**
The dataset used in this project was sourced from Airbnb's public data repository. It included information about property listings, user reviews, booking details, and host profiles.

#####**Data Preprocessing:**
Cleaning and preparing the data involved handling missing values, converting data types, and ensuring data consistency.

#####**Exploratory Data Analysis (EDA):**
EDA techniques were employed to visualize and summarize key statistics, distributions, and patterns within the data.

#####**Statistical Analysis:**
Various statistical methods were used to uncover insights, including regression analysis to understand price determinants and time series analysis for booking trends.

#####**Data Visualization:**
Visualizations such as charts, graphs, and maps were created to communicate findings effectively.

#####**Results and Findings:**
The analysis of the Airbnb 2019 dataset revealed several key findings, including:

* Factors affecting pricing: Location, property type, and number of bedrooms were found to be significant determinants of Airbnb property prices.
* Seasonal booking trends: Peak booking periods and popular travel destinations varied by region, with summer being the busiest season overall.
* User satisfaction: Most user reviews were positive, with common themes related to cleanliness, communication, and location.
* Geographic insights: Major cities like New York, London, and Paris had the highest concentration of Airbnb listings, while vacation destinations also showed strong Airbnb activity.


###**Conclusion:**
The Airbnb 2019 Dataset Analysis project provided valuable insights into the vacation rental market, pricing dynamics, and user satisfaction on the platform. These findings can inform Airbnb hosts and travelers, helping them make informed decisions about listing properties and booking accommodations. Additionally, the project demonstrated the power of data analysis and visualization in extracting meaningful information from large datasets.


###**Future Work:**
Future work could involve expanding the analysis to include more recent data, incorporating machine learning models for price prediction, and exploring additional aspects of the Airbnb ecosystem, such as host demographics and property amenities.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The task involves analyzing a dataset containing information about Airbnb listings in New York City. The dataset includes features such as listing ID, name, host ID, host name, neighborhood group, neighborhood, latitude, longitude, room type, price, minimum nights, number of reviews, last review date, reviews per month, calculated host listings count, and availability throughout the year.

The goal of this analysis is to gain insights into the Airbnb market in New York City and provide valuable information to potential hosts and guests. This includes understanding the distribution of listings across different neighborhoods, analyzing the pricing trends, identifying popular room types, exploring the impact of host characteristics on listing performance, and assessing the availability of listings over time.

**Specific objectives of the analysis may include:**

* Exploratory Data Analysis (EDA) to understand the distribution and
characteristics of Airbnb listings.
* Price analysis to identify factors influencing listing prices and determine optimal pricing strategies.
* Geographic analysis to visualize the spatial distribution of listings and identify popular neighborhoods.
* Host analysis to examine the performance of hosts based on various metrics such as reviews and listing count.
* Temporal analysis to assess trends in listing availability and booking patterns over time.
* Recommendation analysis to suggest potential improvements or adjustments for hosts to enhance listing performance.

By conducting this analysis, we aim to provide actionable insights that can inform decision-making for both hosts and guests in the New York City Airbnb market.

#### **Define Your Business Objective?**

* Understand market dynamics and trends in the vacation rental industry.
* Optimize pricing strategies based on factors like location, property type, and seasonality.
* Enhance customer satisfaction by analyzing user reviews and identifying areas for improvement.
* Identify new market opportunities through geographic analysis of Airbnb listings.
* Benchmark performance against competitors to improve competitive positioning.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

### Dataset Loading

In [None]:
# Mount Drive

from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
path='/content/drive/MyDrive/Colab Notebooks/Airbnb NYC 2019.csv'
airbnb_df=pd.read_csv(path)

In [None]:
# Making copy of dataset for future use

airbnbcopydf= airbnb_df.copy()

### Dataset First View

In [None]:
# Dataset First Look
airbnb_df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

airbnb_df.shape

### Dataset Information

In [None]:
# Dataset Info

airbnb_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

airbnb_df.duplicated().sum()
airbnb_df.drop_duplicates(inplace=True)
airbnb_df.shape   #at this point looks like doesn't have duplicate values

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

airbnb_df.isnull().sum()

In [None]:
# Visualizing the missing values

plt.figure(figsize=(14, 8))
a=sns.heatmap(airbnb_df.isnull(),cmap='inferno', cbar=False)
a.set_xticklabels(a.get_xticklabels(), rotation=45, horizontalalignment='right', fontsize=10)
plt.title('Null Values in Airbnb Data')
plt.xlabel('Column Names', weight='bold')
plt.ylabel('Row Values', weight='bold')
plt.show()

### What did you know about your dataset?

**Exploring the Location Distribution:** By analyzing the distribution of Airbnb listings across various neighborhoods and neighborhood groups, valuable insights can be gained regarding the popularity of certain areas for rental properties.

**Investigating Price Ranges:** A thorough examination of the price ranges for different types of accommodations (such as private rooms and entire homes/apartments) can provide a deep understanding of pricing trends in different neighborhoods and categories.

**Delving into Availability:** Taking into account the availability of listings over the course of a year (represented by the availability_365 column) can reveal significant patterns in seasonal demand and general booking trends.

**Uncovering Host Behaviors:** By investigating the distribution of hosts
(through host_id and host_name columns) and their actions, we can gain valuable insights into their impact on the Airbnb rental market.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

airbnb_df.columns

In [None]:
# Dataset Describe

airbnb_df.describe()

### Variables Description

**UNDERSTAND THE GIVEN VARIABLES**

**id:** Unique identifier for each listing.

**name:** Name or title of the listing.

**host_id:** Unique identifier for the host of the listing.

**host_name:** Name of the host.

**neighbourhood_group:** Group of neighborhoods where the listing is located.

**neighbourhood:** Specific neighborhood where the listing is located.

**latitude:** Latitude coordinate of the listing.

**longitude:** Longitude coordinate of the listing.

**room_type:** Type of room being listed (e.g., Private room, Entire home/apt).

**price:** Price of the listing per night.

**minimum_nights:** Minimum number of nights required for booking.

**number_of_reviews:** Total number of reviews received for the listing.

**last_review:** Date of the last review for the listing.

**reviews_per_month:** Average number of reviews received per month.

**calculated_host_listings_count:** Number of listings owned by the host.

**availability_365:** Number of days the listing is available for booking in a year.


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

airbnb_df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
#We see that column Last_review type is object. Hence we convert it into datetime format.

airbnb_df["last_review"] = pd.to_datetime(airbnb_df["last_review"])
airbnb_df.dtypes

In [None]:
# Removing the null values from dataset

airbnb_df=airbnb_df.dropna(subset=['name','host_name'])
airbnb_df.isna().sum()

In [None]:
# Replacing the null reviews_per_month and dates

airbnb_df= airbnb_df.fillna(airbnb_df['reviews_per_month'].mean())
airbnb_df['last_review'].fillna(method='ffill', inplace=True)

airbnb_df.isnull().sum() #null values updated

In [None]:
# Changing the names of columns for better understanding

airbnb_df.rename(columns = {'neighbourhood_group':'city','last_review':'dates'},inplace=True)

In [None]:
airbnb_df.dtypes

In [None]:
#examing all changes

airbnb_df.head(10)

### What all manipulations have you done and insights you found?

The provided code manipulates the Airbnb dataset as follows:

1. **Data Type Conversion**: Converts the `last_review` column to datetime format.
2. **Removing Null Values**: Drops rows with null values in `name` and `host_name`.
3. **Replacing Null Values**: Fills null values in `reviews_per_month` with the mean, and fills null values in `dates` using forward filling.
4. **Column Renaming**: Renames columns for better clarity (`neighbourhood_group` to `city`, `last_review` to `dates`).

Insights:
- Preparation for temporal analysis by converting dates.
- Ensuring data integrity by removing and replacing null values.
- Enhancing readability by renaming columns for better understanding.
- Further analysis can now focus on pricing trends, popularity of listings, and booking patterns.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

##### Number Of Active Hosts Per Location Using Pie Chart

In [None]:
# create a new DataFrame that displays the number of hosts in each states in the Airbnbdf dataset
hosts_per_location = airbnb_df.groupby('city')['host_id'].count().reset_index()

# display the resulting DataFrame
hosts_per_location

In [None]:
# Group the data by city and count the number of listings for each group
hosts_per_location = airbnb_df.groupby('city')['host_id'].count()

# Get the list of city names
locations = hosts_per_location.index

# Get the list of host counts for each city
host_counts = hosts_per_location.values

# Set the figure size
plt.figure(figsize=(7, 7))  # Adjust the figure size for better visualization

# Create the pie chart
plt.pie(host_counts, labels=locations, autopct='%1.1f%%', startangle=140, wedgeprops = {"edgecolor" : "black",'linewidth': 1})

# Add a title
plt.title('Distribution of Active Hosts by Location', fontsize=15, color='darkgreen')

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts effectively display proportions of categorical data, making them suitable for illustrating the distribution of active hosts among different neighbourhood groups.

##### 2. What is/are the insight(s) found from the chart?

The chart provides a clear visual representation of the distribution of active hosts across various neighbourhood groups, allowing easy comparison of host counts between different areas.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive impact:** Guides marketing and resource allocation.
Targeted efforts can boost bookings and revenue.
Balancing growth ensures fair competition.

**Negative growth:** Oversaturation may reduce profitability. Underserved areas limit market reach.

#### Chart - 2

#####Relation between states and availability on the basis of room type.

In [None]:
# Filter out outliers from the 'minimum_nights' column
lower_bound = airbnb_df['price'].quantile(0.25)
upper_bound = airbnb_df['price'].quantile(0.75)
a= airbnb_df[(airbnb_df['price'] >= lower_bound) & (airbnb_df['price'] <= upper_bound)]

plt.figure(figsize=(13,6))
sns.barplot(data=a, x='city', y='price',hue='room_type')
plt.xlabel('City', weight='bold')
plt.ylabel('Price',weight='bold')
plt.show()

##### 1. Why did you pick the specific chart?

Bar plot effectively compares price distribution across states and room types.

##### 2. What is/are the insight(s) found from the chart?

Easily identifies price differences among states and room types, allowing for targeted analysis.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive impact:** Helps tailor pricing strategies and marketing efforts to maximize revenue.Understanding price distribution informs pricing adjustments for competitive advantage.

**Negative growth considerations:** Overpricing in certain states or room types may deter potential guests, leading to reduced bookings and revenue. Adjustments may be needed to ensure competitiveness and guest satisfaction.

#### Chart - 3

#####Top ten Highest listing city.

In [None]:
# find listing value of all neighourhood in dataset
neighour=airbnb_df['city'].value_counts().sort_values(ascending=False).reset_index()

#top ten neighourhood with listing value
top_10=neighour[:10]
top_10

In [None]:
#make copy of top_10
final_10=top_10.copy()

#rename that dataframe
final_10.rename(columns={'index':'city','city':'listing_value' },inplace=True)
final_10

In [None]:
# Plotting the Chart
abc = sns.barplot(x='city', y='listing_value', hue='city', data=final_10, palette='Set3')
abc.set_title('Finding the total number of nights spent per location', fontweight='bold')

# Naming X & Y axis
abc.set_ylabel('Listing value', weight='bold')
abc.set_xlabel('City', fontweight='bold')

# Rotate the x-axis tick labels
abc.tick_params(axis='x', labelrotation=45)

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

Bar plot effectively displays listing values across neighbourhoods, allowing for easy comparison.

##### 2. What is/are the insight(s) found from the chart?

Clearly identifies the top ten neighbourhoods with the highest listing values, providing insight into areas of high demand or popularity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive impact:** Helps prioritize marketing efforts and resource allocation to high-value neighbourhoods, potentially increasing revenue and occupancy rates.

**Negative growth considerations:** Overemphasis on high-value neighbourhoods may neglect opportunities in other areas, leading to potential underutilization of listings and missed revenue potential. Adjustments may be necessary to ensure a balanced distribution of resources and maximize overall business growth.

#### Chart - 4

#####Top city in entire NYC on the basis of count of listings

In [None]:
# Extracting the top 20 cities based on the number of listings in entire NYC
top_20_neigbours = airbnb_df['city'].value_counts()[:20]

# Defining custom colors for the bar plot
colors = ['blue', 'green', 'red', 'purple', 'orange', 'pink', 'brown', 'gray', 'cyan', 'magenta',
          'lightblue', 'lightgreen', 'lightcoral', 'lightgray', 'lightyellow', 'lightpink', 'black', 'yellow', 'lightcyan']

# Plotting the bar plot
top = top_20_neigbours.plot(kind='bar', color=colors)

# Setting x-axis tick labels with rotation and formatting
top.set_xticklabels(top_20_neigbours.index, fontweight='bold', rotation=45, horizontalalignment='right', fontsize=10)

# Labeling x-axis and y-axis
plt.xlabel('City', weight='bold')
plt.ylabel('Counts in entire NYC', weight='bold')

# Adding title to the plot
plt.title('Top city in entire NYC on the basis of count of listings', weight='bold')

# Adjusting the figure size
plt.figure(figsize=(19, 10))

##### 1. Why did you pick the specific chart?

The specific chart, a bar plot, was chosen to visualize the top 20 cities in NYC based on the number of listings. A bar plot effectively represents categorical data and allows for easy comparison of counts between different cities.

##### 2. What is/are the insight(s) found from the chart?

The chart provides insights into the distribution of listings among the top cities in NYC. It clearly shows which cities have the highest number of listings, allowing stakeholders to identify popular areas for Airbnb rentals.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially lead to positive business impacts by helping stakeholders make informed decisions.

#### Chart - 5

#####Total no. of nights spend per room types

In [None]:
# find unique value of room types
list(airbnb_df['room_type'].unique())

In [None]:
# find the maximum(minimum nights) spending by customer across different room type
total_room=airbnb_df.groupby('room_type')['minimum_nights'].sum().reset_index()
room_types=total_room.sort_values('minimum_nights',ascending=True)
room_types.head()

In [None]:
# create dataset
plt.figure(figsize=(15,7))
labels=list(room_types['room_type'])
sizes=list(room_types['minimum_nights'])

# create explode
explode = (0.05,0.05,0.05)

#creating pie chart
plt.pie(sizes,explode=explode,labels=labels,autopct='%1.2f%%',shadow=True)
plt.title('Total no. of minimum nights spend per room types', fontsize=20)
plt.axis('scaled')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

#####Total percentage and no. of booking in states.

In [None]:
# Creating subplots with 1 row and 2 columns, setting the figure size
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(6, 22))
ax = axes.flatten()

# Pie chart for total percentage of bookings by City
no_of_booking_neighbourhood_group = airbnb_df["city"].value_counts()
no_of_booking_neighbourhood_group.plot.pie(autopct='%1.2f%%', ylabel="", figsize=(8,8), ax=ax[0])
ax[0].set_title('Total % of booking by city', fontsize=12)  # Setting title for the pie chart

# Bar plot for total bookings by city
airbnb_df['city'].value_counts().plot(kind='bar', figsize=(15,6), color=['r','b','y','g','m'], ax=ax[1])
ax[1].set_title('Total no. of booking by city', fontsize=12)  # Setting title for the bar plot

# Display the plot
plt.show()

##### 1. Why did you pick the specific chart?

The specific chart, which consists of a pie chart and a bar plot, was chosen to visualize the distribution of bookings across different states in the dataset.

##### 2. What is/are the insight(s) found from the chart?

The pie chart shows the percentage distribution of bookings among different states.
The bar plot provides a comparison of the total number of bookings across states.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps target marketing efforts and identify areas for improvement.

#### Chart - 7

#####Visualization of each states using latitude and longitude.

In [None]:
plt.figure(figsize=(15,7))
sns.scatterplot(x=airbnb_df['longitude'],y=airbnb_df['latitude'],hue=airbnb_df['city'], palette='dark')
plt.xlabel('Longitude',weight='bold')
plt.ylabel('Latitude',fontweight='bold')
plt.show()


In [None]:
# Let's observe the type of room_types

plt.figure(figsize=(15,7))
sns.scatterplot(x=airbnb_df['longitude'],y=airbnb_df['latitude'],hue=airbnb_df['room_type'],palette='hls')
plt.xlabel('Longitude',weight='bold')
plt.ylabel('Latitude',fontweight='bold')
plt.title('Geographical location of various room types available')
plt.show()


##### 1. Why did you pick the specific chart?

Scatterplots were chosen for their effectiveness in displaying geographical distributions based on latitude and longitude.

##### 2. What is/are the insight(s) found from the chart?

Insights show the distribution of Airbnb listings across states and the distribution of room types within those locations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can positively impact business strategies such as marketing, pricing, and resource allocation.

Negative growth insights could result from having too many listings in some areas or offering room types that don't match the desirability of their locations, which can affect guest preferences and bookings negatively.

#### Chart - 8

######Average availability of listings over time

In [None]:
# Calculate average price by City
avg_price_neighbourhood = airbnb_df.groupby('city')['price'].mean().reset_index()

# Plot the line graph
plt.figure(figsize=(10, 6))
sns.lineplot(data=avg_price_neighbourhood, x='city', y='price', marker='^', ms=10, mew=3, mec='orange')
plt.xlabel('City')
plt.ylabel('Average Price ($)')
plt.title('Average Price of Listings by City')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The specific chart chosen is a line plot because it effectively illustrates the trend of average prices across different neighbourhood groups over time. Line plots are suitable for showing trends and changes in data over continuous categories.

##### 2. What is/are the insight(s) found from the chart?

The insight from the chart is that there are variations in average prices among different neighbourhood groups. Some neighbourhood groups tend to have higher average prices, while others have lower average prices.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help create a positive business impact. By understanding the average prices in different neighbourhood groups, businesses can tailor their pricing strategies and marketing efforts to target specific demographics and areas.

#### Chart - 9

######Distribution of prices for different room types in the dataset

In [None]:
# Define function to remove outliers using IQR method
def remove_outliers(df, column):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    df_filtered = df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]
    return df_filtered

# Remove outliers from price column for each room type
airbnb_df_filtered = airbnb_df.groupby('room_type').apply(remove_outliers, column='price').reset_index(drop=True)

# Plotting the box plot after removing outliers
plt.figure(figsize=(10, 6))
sns.boxplot(data=airbnb_df_filtered, x='room_type', y='price', hue='room_type', palette='dark', legend=False)
plt.xlabel('Room Type')
plt.ylabel('Price ($)')
plt.title('Distribution of Prices by Room Type (Without Outliers)')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The specific chart chosen is a box plot because it effectively displays the distribution of prices for different room types, allowing for easy comparison of central tendency, variability, and presence of outliers across categories.

##### 2. What is/are the insight(s) found from the chart?

Insights from the chart:
* The median price for entire home/apartment listings tends to be higher
compared to private rooms and shared rooms.
* Shared rooms generally have lower price ranges and fewer outliers compared to entire home/apartment listings.
* Private rooms have a wider range of prices, with some high-priced outliers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* These insights can inform pricing strategies and marketing efforts, potentially leading to positive business outcomes.
* High-priced outliers in certain room types may indicate pricing issues that could negatively impact growth.

#### Chart - 10

#####Visualizing the distribution of listings in each price group

In [None]:
# Define function to categorize prices into groups
def categorize_price(price):
    if price <= 100:
        return 'Low'
    elif price <= 200:
        return 'Medium'
    else:
        return 'High'

# Apply the function to create the new column 'price_group'
airbnb_df['price_group'] = airbnb_df['price'].apply(categorize_price)

# Plotting the bar chart with corrected usage of hue parameter
plt.figure(figsize=(8, 6))
sns.countplot(data=airbnb_df, x='price_group', hue='price_group', palette='Set2', legend=False)
plt.xlabel('Price Group')
plt.ylabel('Number of Listings')
plt.title('Count of Listings in Each Price Group')
plt.show()

##### 1. Why did you pick the specific chart?

The specific chart chosen is a countplot because it effectively shows the distribution of listings across different price groups, allowing for easy comparison.

##### 2. What is/are the insight(s) found from the chart?

The insight from the chart is the distribution of listings categorized into low, medium, and high price groups. This provides a clear understanding of how listings are distributed based on their price range.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* The gained insights can potentially help in creating a positive business impact by informing pricing strategies and marketing efforts. For example, understanding the distribution of listings in different price groups can help in targeting specific customer segments and optimizing pricing strategies to maximize revenue.


* There are no insights in the chart that directly lead to negative growth. However, if there is an imbalance in the distribution of listings across price groups, it could indicate potential areas where the business may need to adjust pricing or marketing strategies to attract more customers in certain price ranges.

#### Chart - 11 - Correlation Heatmap

In [None]:
# Correlation between different variables using Heatmap

plt.figure(figsize=(11,6))
sns.heatmap(airbnb_df.corr(numeric_only=True), cmap='Dark2', annot=True)
plt.show()

##### 1. Why did you pick the specific chart?

The heatmap was chosen because it allows for visualizing the correlation between different variables in the dataset in a concise and informative manner.

##### 2. What is/are the insight(s) found from the chart?

Insights from the heatmap help identify strong positive or negative correlations between variables. For example, if two variables have a high positive correlation, changes in one variable are likely to result in similar changes in the other variable. Conversely, a strong negative correlation suggests that changes in one variable lead to opposite changes in the other variable.

#### Chart - 12 - Pair Plot

In [None]:
# Pair Plot visualization code

plt.rcParams['figure.figsize'] = 10,8
sns.pairplot(airbnb_df,vars = ['price','availability_365','calculated_host_listings_count','number_of_reviews'],hue = 'room_type',palette = 'Dark2')
plt.show()

##### 1. Why did you pick the specific chart?

The pair plot was chosen because it provides a comprehensive visualization of the relationships between multiple variables in the dataset. By plotting pairwise relationships between selected variables, we can quickly identify patterns, trends, and potential correlations.

##### 2. What is/are the insight(s) found from the chart?

Insights from the pair plot include:

* Understanding the distribution of each variable and how they relate to each other.
* Identifying potential correlations or patterns between variables, such as whether certain variables tend to increase or decrease together.
* Observing differences in relationships between variables across different categories, such as room types in this case, which can provide insights into how different factors may impact the variables differently.






## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**Optimize Pricing:** Adjust pricing based on room type and neighborhood to remain competitive.

**Increase Availability:** Identify and address periods of low listing availability to capture more bookings.

**Engage Hosts:** Encourage hosts to improve listing quality and responsiveness to enhance guest satisfaction.

**Target Marketing:** Tailor marketing efforts to attract specific customer segments identified in the dataset.

**Monitor Performance:** Regularly track occupancy rates and booking trends to adapt strategies accordingly.

**Support Hosts:** Offer training and resources to help hosts optimize their listings and improve earning potential.

**Expand Portfolio:** Explore opportunities to expand listings in high-demand areas to increase market presence.

# **Conclusion**

1. **Price Distribution:** The dataset reveals a varied distribution of prices across different room types and neighborhoods.

2. **Room Type Insights:** There is a significant difference in pricing between different room types, with entire homes/apartments generally being more expensive.

3. **Neighborhood Impact:** Pricing also varies based on the neighborhood, with certain areas commanding higher prices due to factors like location and amenities.

4. **Availability Trends:** Availability of listings fluctuates over time, indicating potential seasonality or demand patterns.

5. **Business Strategy Recommendations:**

  * **Optimize Pricing:** Adjust prices based on room type and neighborhood to remain competitive.
  * **Improve Availability:** Address periods of low availability to capture more bookings.
  * **Enhance Listing Quality:** Encourage hosts to improve listing quality and responsiveness to enhance guest satisfaction.
  * **Targeted Marketing:** Tailor marketing efforts to attract specific customer segments identified through analysis.
6. **Future Considerations:** Continuously monitor performance metrics and adapt strategies to changing market conditions for sustained business success.Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***