# **Project Name** - airbnb_nyc_eda



##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**

### Project Summary: Airbnb NYC EDA  

This project focuses on exploring and analyzing the Airbnb NYC dataset to uncover meaningful insights and trends. The analysis involves cleaning the data to address missing values and inconsistencies, followed by exploring various features such as neighborhood distributions, room types, pricing patterns, and availability. Visualizations are used extensively to highlight key findings and provide a clear understanding of the data. The ultimate goal is to derive actionable insights that can help stakeholders better understand the dynamics of Airbnb listings in New York City.

# **GitHub Link -**

https://github.com/kush-agra-soni/5_airbnb_nyc_eda.git

# **Problem Statement**


The aim of this project is to conduct an **Exploratory Data Analysis (EDA)** on the **Airbnb NYC 2019** dataset, which contains detailed information about Airbnb listings in New York City. By analyzing the dataset, the project seeks to uncover key insights about factors such as pricing, room types, neighborhood distribution, availability, and the review patterns of Airbnb listings in the city. The analysis will explore how various variables interact with each other, focusing on identifying trends and patterns that may influence Airbnb pricing, booking behaviors, and the popularity of different neighborhoods. The project will also examine the relationship between the number of reviews and the availability of listings, providing valuable insights into how listing characteristics, such as room type and host activity, impact overall performance. Furthermore, the goal is to highlight opportunities for new hosts, help guests understand pricing dynamics, and support stakeholders in making informed decisions. By addressing these questions, this project aims to provide a deeper understanding of the Airbnb market in New York City, ultimately enabling better strategies for hosts and more informed choices for guests.

#### **Define Your Business Objective?**

The business objective of this project is to leverage **Exploratory Data Analysis (EDA)** to gain valuable insights from the **Airbnb NYC 2019 dataset**. By understanding the key factors that influence Airbnb listing prices, availability, and popularity across different neighborhoods, the goal is to help potential hosts optimize their pricing strategies, enhance their listing visibility, and make data-driven decisions regarding property offerings. Additionally, the analysis aims to provide guests with a better understanding of what drives pricing and availability in various locations, helping them make more informed booking decisions. Ultimately, the objective is to offer actionable recommendations that benefit both Airbnb hosts and guests, improve user experience, and support strategic business decisions for Airbnb and other stakeholders in the hospitality industry.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
import missingno as msno
from sklearn.preprocessing import StandardScaler
from scipy import stats
import plotly.express as px
import plotly.io as pio

### Dataset Loading

In [None]:
# GitHub raw URL for your dataset
data_url = "https://raw.githubusercontent.com/kush-agra-soni/5_airbnb_nyc_eda/main/Airbnb%20NYC%202019.csv"

airbnb_df = pd.read_csv(data_url)

### Dataset First View

In [None]:
# Dataset First Look
airbnb_df.head(1)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows, columns = airbnb_df.shape

# Display the result
print(f"Number of rows: {rows}")
print(f"Number of columns: {columns}")

### Dataset Information

In [None]:
# Dataset Info
airbnb_df.info()

#### Duplicate Values

In [None]:
# Count the number of duplicate rows in the dataset
duplicate_count = airbnb_df.duplicated().sum()

# Display the result
print(f"Number of duplicate rows: {duplicate_count}")

#### Missing Values/Null Values

In [None]:
# Count the number of missing (null) values in each column
missing_values = airbnb_df.isnull().sum()

# Display the result
print("Missing values in each column:")
print(missing_values)

In [None]:
# Visualizing the missing values
# Plot a heatmap to visualize missing values
plt.figure(figsize=(12, 6))
sns.heatmap(airbnb_df.isnull(), cbar=False, cmap='viridis', yticklabels=False)
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

The **Airbnb NYC 2019** dataset (`airbnb_df`) contains 16 columns with data on various aspects of Airbnb listings in New York City, such as listing details, host information, pricing, availability, and reviews. The dataset includes a total of 22,059 entries, with columns like `id`, `host_id`, `name`, `host_name`, `neighbourhood_group`, `room_type`, `price`, `number_of_reviews`, `last_review`, `reviews_per_month`, and `availability_365`. The data types are a mix of integers, floats, and strings. There are some missing values in a few columns: `name` (16 missing), `host_name` (21 missing), `last_review` (10,052 missing), and `reviews_per_month` (10,052 missing). These missing values could affect analysis and may require imputation or other handling techniques. Other columns, such as `id`, `host_id`, `latitude`, `longitude`, and `price`, have no missing data. The dataset offers valuable insights into the performance of Airbnb listings, including factors influencing pricing, availability, and reviews, making it ideal for analysis and decision-making in the context of Airbnb market trends in New York City.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb_df.columns

In [None]:
# Get the summary statistics of the dataset
airbnb_df.describe()

### Variables Description

Here’s a theoretical description of each variable in the **Airbnb NYC 2019** dataset:

1. **id**: A unique identifier assigned to each Airbnb listing. This is a numerical value and helps in distinguishing each listing from others.

2. **name**: The name of the Airbnb listing as provided by the host. This is typically a short description or title for the property and is a string type variable.

3. **host_id**: A unique identifier assigned to each host. It is a numerical value that helps in distinguishing one host from another, even if they have multiple listings.

4. **host_name**: The name of the host. This can be the personal name or alias of the individual or organization managing the listing. It’s a string variable.

5. **neighbourhood_group**: The broader geographical area in which the listing is located. In the context of New York City, this might refer to boroughs like Manhattan, Brooklyn, Queens, etc. It is a categorical variable.

6. **neighbourhood**: The specific neighborhood within the borough where the listing is located. For example, it could be neighborhoods like Williamsburg or SoHo. This is a categorical variable and provides more precise location data.

7. **latitude**: The geographic latitude of the Airbnb listing, representing the North-South position on the Earth's surface. This is a continuous numeric variable.

8. **longitude**: The geographic longitude of the listing, representing the East-West position on the Earth's surface. Like latitude, this is a continuous numeric variable.

9. **room_type**: The type of room being offered for rent. This could include options like "Entire home/apt", "Private room", or "Shared room". It is a categorical variable indicating the nature of the accommodation.

10. **price**: The nightly price of the Airbnb listing. It is a continuous numeric variable and represents how much guests are charged for staying one night at the listing.

11. **minimum_nights**: The minimum number of nights a guest must book to stay at the listing. This is a numerical variable and provides insight into the booking policy of the listing.

12. **number_of_reviews**: The total number of reviews the listing has received. This is a numeric variable that gives an idea of how popular or frequently reviewed the listing is.

13. **last_review**: The date of the most recent review left by a guest. It is stored as a string, but ideally, it would be a datetime type variable. This field helps in determining the freshness of the reviews.

14. **reviews_per_month**: The average number of reviews the listing receives per month. This is a continuous numeric variable that helps in gauging the listing's activity and popularity.

15. **calculated_host_listings_count**: The number of listings managed by the host. This is a numeric variable, and a higher count suggests that the host may be managing multiple properties.

16. **availability_365**: The number of days in a year that the listing is available for booking. This is a continuous numeric variable, indicating how frequently the listing is available to guests.

These variables collectively provide a detailed overview of Airbnb listings in New York City, allowing for various analyses such as pricing trends, popularity, host activity, and geographic distribution.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
airbnb_df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Round latitude and longitude to 2 decimal places
airbnb_df['latitude'] = airbnb_df['latitude'].round(2)
airbnb_df['longitude'] = airbnb_df['longitude'].round(2)

# Handle missing values for 'last_review' and 'reviews_per_month'
# For 'last_review', set to 'No Reviews' if missing
airbnb_df['last_review'] = airbnb_df['last_review'].fillna('No Reviews')

# For 'reviews_per_month', set to 0 if missing (assuming no reviews means 0 reviews per month)
airbnb_df['reviews_per_month'] = airbnb_df['reviews_per_month'].fillna(0)

# Handle other missing values (fill with mean or mode)
# For numerical columns, fill with the mean (use more precise mean like median if required)
for column in airbnb_df.select_dtypes(include=['float64', 'int64']).columns:
    if airbnb_df[column].isnull().sum() > 0:
        # Use the mean for numerical columns
        airbnb_df[column] = airbnb_df[column].fillna(airbnb_df[column].mean())

# For categorical columns, fill with mode
for column in airbnb_df.select_dtypes(include=['object']).columns:
    if airbnb_df[column].isnull().sum() > 0:
        # Use the mode for categorical columns
        airbnb_df[column] = airbnb_df[column].fillna(airbnb_df[column].mode()[0])

# Check if there are any remaining missing values
missing_values = airbnb_df.isnull().sum()

### What all manipulations have you done and insights you found?

In the process of data wrangling, I performed several key manipulations on the dataset to clean and prepare it for further analysis. The primary manipulations I undertook are as follows:

1. **Handling Missing Values**:
   - **Latitude and Longitude**: These values were rounded to two decimal places to ensure uniform precision and improve consistency. Rounding geographic coordinates is a common practice to reduce unnecessary precision that may not be useful for analysis at the city level.
   - **Last Review**: Missing values in the `last_review` column were filled with the placeholder `"No Reviews"`, as many listings may not have received reviews. This allows the dataset to reflect the absence of reviews without losing information.
   - **Reviews per Month**: Missing values for `reviews_per_month` were set to `0`. This was done under the assumption that if a listing does not have reviews, it would logically have 0 reviews per month. This prevents the column from introducing bias due to null values.
   - **Numerical Columns**: For columns with numeric data types (such as `price`, `minimum_nights`, `number_of_reviews`), missing values were filled using the mean of the respective column. The mean was selected due to its common usage for imputation in numerical data, providing a reasonable estimate for missing values without significantly affecting the distribution.
   - **Categorical Columns**: Missing values in categorical columns, such as `name`, `host_name`, and `room_type`, were filled using the mode (the most frequent value). This ensures that the missing data is replaced with the most common category, maintaining consistency in the dataset.

2. **Insights Found**:
   - **Geographic Consistency**: The rounding of `latitude` and `longitude` data to two decimal places helped ensure that the geographic data is accurate enough for analysis without being excessively precise, improving performance in spatial analysis.
   - **Missing Reviews**: The handling of missing `last_review` and `reviews_per_month` provided a clear insight that many listings in the dataset either haven't been reviewed yet or haven't received frequent reviews, which is important for understanding user engagement with listings.
   - **Data Imputation**: By filling missing values in numerical columns with the mean, and in categorical columns with the mode, the dataset was made more complete without introducing significant bias. However, filling missing `reviews_per_month` with 0 could introduce assumptions about the listings with no reviews, which should be considered in later analyses.
   
In summary, these manipulations ensure the dataset is clean and prepared for analysis, allowing for reliable insights to be drawn from the data, such as patterns in reviews, pricing, or room types. This data wrangling process helped in reducing the noise from missing values and improved the overall quality of the dataset.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Price vs Room Type (Boxplot)

In [None]:
plt.figure(figsize=(10,6))
sns.boxplot(x='room_type', y='price', data=airbnb_df)
plt.title('Price vs Room Type')
plt.xlabel('Room Type')
plt.ylabel('Price')
plt.show()

##### 1. Why did you pick the specific chart?

A boxplot is ideal for visualizing the distribution of prices across different room types. It effectively shows central tendency, spread, and outliers.

##### 2. What is/are the insight(s) found from the chart?

- Price Variation: The boxplots show that prices vary significantly across room types. Private rooms and entire homes/apartments tend to have higher median prices and a wider range of prices compared to shared rooms.
- Outliers: There are numerous outliers, especially for private rooms and entire homes/apartments, indicating the presence of very expensive listings.
- Shared Rooms: Shared rooms have a lower median price and a tighter price range, suggesting more consistency in pricing.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Positive Impacts:

- Pricing Strategy: Understanding the price distribution for each room type can help optimize pricing strategies.
- Target Market: Identifying outliers and the range of prices can help target specific market segments (e.g., luxury travelers, budget travelers).
- Marketing: The insights can be used to tailor marketing messages and promotions to different room types and price points.
2. Negative Impacts:

- Outliers: The presence of very high-priced outliers for private rooms and entire homes/apartments could negatively impact the perception of overall pricing. This could potentially deter budget-conscious travelers.

3. Specific Reasons:

- Outliers: These high-priced outliers may not be representative of the majority of listings and could skew the perception of average pricing.
- Competition: If competitors have more affordable options for private rooms and entire homes/apartments, it could lead to lost business.

#### Chart - 2 Room Type Distribution (Bar Plot)

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(x='room_type', data=airbnb_df, hue='room_type', palette='Set2', legend=False)
plt.title('Room Type Distribution')
plt.xlabel('Room Type')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A bar plot is the perfect choice for visualizing the frequency or count of different room types. It allows for easy comparison and identification of the most common type.

##### 2. What is/are the insight(s) found from the chart?

- Dominant Room Types: Entire homes/apartments are the most common room type, followed by private rooms. Shared rooms are the least frequent.
- Distribution: The distribution is skewed towards entire homes/apartments, indicating a preference for this type of accommodation.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Positive Impacts:

- Inventory Management: The insights help in understanding the demand for different room types, aiding in inventory management and allocation.
- Marketing Strategy: Targeting specific room types can be prioritized based on their popularity.
- Pricing Strategy: Pricing strategies can be tailored to the different room types based on their demand.

2. Negative Impacts:

- Shared Room Demand: The low demand for shared rooms could potentially lead to lower occupancy rates and revenue for this category.

#### Chart - 3 Average Price by Neighbourhood Group (Bar Plot)

In [None]:
plt.figure(figsize=(10, 6))
avg_price_neigh_group = airbnb_df.groupby('neighbourhood_group')['price'].mean().sort_values(ascending=False)

# Assign the x variable to hue and set legend=False
sns.barplot(x=avg_price_neigh_group.index, y=avg_price_neigh_group.values, hue=avg_price_neigh_group.index, palette='viridis', legend=False)

plt.title('Average Price by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Average Price')
plt.show()

##### 1. Why did you pick the specific chart?

A bar plot is the perfect choice for visualizing the average price across different neighborhood groups. It allows for easy comparison and identification of the highest and lowest average prices.

##### 2. What is/are the insight(s) found from the chart?

- Price Variation: There is a significant variation in average prices across different neighborhoods. Manhattan has the highest average price, followed by Brooklyn and Staten Island. Queens and Bronx have the lowest average prices.
- Price Hierarchy: The chart clearly shows the price hierarchy among neighborhoods, with Manhattan being the most expensive and Bronx being the most affordable.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Positive Impacts:

- Pricing Strategy: Understanding the average price for each neighborhood can help in setting appropriate pricing strategies.
- Target Market: Identifying the most expensive and affordable neighborhoods can help target specific market segments.
- Marketing: Marketing messages can be tailored to highlight the value proposition for each neighborhood based on pricing.
2. Negative Impacts:

- Pricing Perception: High average prices in certain neighborhoods (like Manhattan) might deter budget-conscious travelers.
- Competition: If competitors have more affordable options in certain neighborhoods, it could lead to lost business.
3. Specific Reasons:

- Pricing Sensitivity: Travelers might be more sensitive to price differences, especially when choosing between neighborhoods.
- Competition: If there are many similar accommodations with lower prices in a particular neighborhood, it could negatively impact demand.
4. To mitigate these negative impacts, consider strategies like:

- Value Proposition: Emphasize the unique value proposition of each neighborhood, such as amenities, attractions, or cultural experiences.

#### Chart - 4   Price vs Minimum Nights (Scatter Plot)

In [None]:
plt.figure(figsize=(10,6))
sns.scatterplot(x='minimum_nights', y='price', data=airbnb_df, color='red', alpha=0.6)
plt.title('Price vs Minimum Nights')
plt.xlabel('Minimum Nights')
plt.ylabel('Price')
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot is the perfect choice for visualizing the relationship between two continuous variables, in this case, price and minimum nights. It helps identify patterns, trends, and potential correlations.

##### 2. What is/are the insight(s) found from the chart?

- Price-Minimum Nights Relationship: The scatter plot doesn't show a strong linear correlation between price and minimum nights. However, there are some interesting
> observations:
- There's a cluster of points with low minimum nights and a wide range of prices, suggesting that many listings have short minimum stays and vary in price.
- There are some outliers with high minimum nights and relatively low prices, which might be interesting to investigate further.
- There seems to be a slight positive trend, indicating that as minimum nights increase, the price tends to increase as well, but the relationship is weak.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Positive Impacts:

- Pricing Strategy: Understanding the relationship between price and minimum nights can help in setting competitive prices for different stay durations.
- Inventory Management: Identifying popular minimum night ranges can help optimize inventory allocation.
- Marketing: Targeted marketing campaigns can be created for specific minimum night segments.
2. Negative Impacts:

- Outliers: Outliers with high minimum nights and low prices might not be ideal for revenue generation.
- Weak Correlation: The weak correlation between price and minimum nights suggests that other factors might be more influential on pricing.

#### Chart - 5 Correlation Heatmap

In [None]:
plt.figure(figsize=(8, 6))

# Select only the numeric columns from the DataFrame
numeric_cols = airbnb_df.select_dtypes(include=[float, int])

# Compute the correlation matrix for the numeric columns
corr = numeric_cols.corr()

# Plot the heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

A correlation heatmap is an excellent choice for visualizing the relationships between numerical variables. It quickly reveals the strength and direction of the correlations using color-coded squares.

##### 2. What is/are the insight(s) found from the chart?

1. Strong Correlations:
- number_of_reviews and reviews_per_month have a strong positive correlation (0.59), indicating that listings with more reviews tend to have more reviews per month.
- calculated_host_listings_count and availability_365 have a moderate positive correlation (0.23), suggesting that hosts with more listings tend to have higher availability.
2. Moderate Correlations:
- host_id and id have a moderate positive correlation (0.59), indicating that hosts with more listings tend to have more listings in the dataset.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Positive Impacts:

- Feature Engineering: The strong correlation between number_of_reviews and reviews_per_month suggests that one of these features might be redundant and can be removed to reduce dimensionality.
- Model Building: Understanding the correlation between variables can help in selecting relevant features for machine learning models.
- Business Insights: The correlation between calculated_host_listings_count and availability_365 can be used to identify potential strategies for increasing availability, such as incentives for hosts with multiple listings.
2. Negative Impacts:

- Multicollinearity: Strong correlations between variables can lead to multicollinearity, which can negatively impact the performance of statistical models.
3. Specific Reasons:

- Model Instability: Multicollinearity can make models unstable and less reliable.
- Interpretation Difficulty: It can be difficult to interpret the impact of individual variables when they are highly correlated.

#### Chart - 6 Number of Reviews vs Price (Scatter Plot)

In [None]:
plt.figure(figsize=(10,6))
sns.scatterplot(x='number_of_reviews', y='price', data=airbnb_df, color='green', alpha=0.6)
plt.title('Number of Reviews vs Price')
plt.xlabel('Number of Reviews')
plt.ylabel('Price')
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot is the perfect choice for visualizing the relationship between two continuous variables, in this case, the number of reviews and price. It helps identify patterns, trends, and potential correlations.

##### 2. What is/are the insight(s) found from the chart?

- Price-Review Relationship: The scatter plot shows a weak negative trend between the number of reviews and price. This suggests that as the number of reviews increases, the price tends to decrease slightly.
- Price Variation: There is a wide range of prices for all levels of reviews, indicating that the number of reviews is not the sole determinant of price.
- Outliers: There are a few outliers with a high number of reviews and a relatively low price. These could be interesting to investigate further.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Positive Impacts:

- Pricing Strategy: Understanding the relationship between price and reviews can help optimize pricing strategies.
- Inventory Management: Identifying listings with a high number of reviews and affordable prices can help prioritize them for guests.
- Marketing: Highlighting listings with a high number of positive reviews can be a powerful marketing tool.
2. Negative Impacts:

- Weak Correlation: The weak negative correlation suggests that other factors might be more influential on pricing.
- Outliers: Outliers with a high number of reviews and low prices might not be profitable or might require special marketing efforts to attract guests.
3. Specific Reasons:

- Competition: If competitors offer similar listings with lower prices, it could negatively impact demand.
- Guest Expectations: Listings with a high number of reviews might have higher guest expectations, which could put pressure on hosts to maintain high standards.

#### Chart - 7  Reviews Per Month vs Price (Scatter Plot)

In [None]:
plt.figure(figsize=(10,6))
sns.scatterplot(x='reviews_per_month', y='price', data=airbnb_df, color='purple', alpha=0.6)
plt.title('Reviews Per Month vs Price')
plt.xlabel('Reviews Per Month')
plt.ylabel('Price')
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot is the perfect choice for visualizing the relationship between two continuous variables, in this case, reviews per month and price. It helps identify patterns, trends, and potential correlations.

##### 2. What is/are the insight(s) found from the chart?

- Price-Review Relationship: The scatter plot shows a weak negative trend between reviews per month and price. This suggests that as the number of reviews per month increases, the price tends to decrease slightly.
- Price Variation: There is a wide range of prices for all levels of reviews per month, indicating that the number of reviews per month is not the sole determinant of price.
- Outliers: There are a few outliers with a high number of reviews per month and a relatively low price. These could be interesting to investigate further.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Positive Impacts:

- Pricing Strategy: Understanding the relationship between price and reviews per month can help optimize pricing strategies.
- Inventory Management: Identifying listings with a high number of reviews per month and affordable prices can help prioritize them for guests.
- Marketing: Highlighting listings with a high number of positive reviews per month can be a powerful marketing tool.
2. Negative Impacts:

- Weak Correlation: The weak negative correlation suggests that other factors might be more influential on pricing.
- Outliers: Outliers with a high number of reviews per month and low prices might not be profitable or might require special marketing efforts to attract guests.
3. Specific Reasons:

- Competition: If competitors offer similar listings with lower prices, it could negatively impact demand.
- Guest Expectations: Listings with a high number of reviews per month might have higher guest expectations, which could put pressure on hosts to maintain high standards.

#### Chart - 8  Average Price by Room Type in Neighbourhood Groups (Grouped Bar Plot)

In [None]:
plt.figure(figsize=(12,6))
avg_price_room_neigh = airbnb_df.groupby(['neighbourhood_group', 'room_type'])['price'].mean().unstack()
avg_price_room_neigh.plot(kind='bar', figsize=(12,6), colormap='viridis', width=0.8)
plt.title('Average Price by Room Type in Neighbourhood Groups')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Average Price')
plt.xticks(rotation=45)
plt.legend(title='Room Type')
plt.show()

##### 1. Why did you pick the specific chart?

A grouped bar plot is an excellent choice for comparing the average price of different room types across multiple neighborhood groups. It allows for easy visual comparison and identification of trends.

##### 2. What is/are the insight(s) found from the chart?

- Price Variation: There is significant price variation across different room types and neighborhoods.
- Dominant Room Type: Entire homes/apartments consistently have the highest average price across all neighborhoods.
- Neighborhood Impact: Manhattan has the highest average prices for all room types, followed by Brooklyn and Staten Island. Queens and Bronx have the lowest average prices.
- Room Type Impact: Shared rooms have the lowest average price in all neighborhoods, followed by private rooms.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Positive Impacts:

- Pricing Strategy: Understanding the average price for each room type and neighborhood can help in setting competitive pricing strategies.
- Inventory Management: Identifying the most profitable room types and neighborhoods can help optimize inventory allocation.
- Marketing: Targeted marketing campaigns can be created for specific room types and neighborhoods.
2. Negative Impacts:

- Pricing Perception: High average prices in certain neighborhoods might deter budget-conscious travelers.
- Competition: If competitors have more affordable options in certain neighborhoods, it could lead to lost business.
3. Specific Reasons:

- Pricing Sensitivity: Travelers might be more sensitive to price differences, especially when choosing between neighborhoods.
- Competition: If there are many similar accommodations with lower prices in a particular neighborhood, it could negatively impact demand.

#### Chart - 9 Latitude vs Longitude (Scatter Plot)

In [None]:
plt.figure(figsize=(10,6))
sns.scatterplot(x='longitude', y='latitude', data=airbnb_df, hue='room_type', palette='Set1', alpha=0.5)
plt.title('Geographical Distribution of Listings')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot is the perfect choice for visualizing the geographical distribution of listings based on their latitude and longitude coordinates. It allows us to see the density and clustering of listings in different areas.

##### 2. What is/are the insight(s) found from the chart?

- Clustered Distribution: The listings appear to be clustered in specific areas, indicating popular neighborhoods or districts.
- Room Type Distribution: Different room types are distributed across the map, suggesting that each type can be found in various locations.
- Density Variation: The density of listings varies across the map, with some areas having higher concentrations than others.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Positive Impacts:

- Market Analysis: Identifying areas with high density of listings can help in understanding the competitive landscape and potential demand.
- Pricing Strategy: Analyzing the distribution of different room types in specific areas can help in setting competitive pricing strategies.
- Marketing: Targeted marketing campaigns can be created for specific neighborhoods or districts based on their popularity.
2. Negative Impacts:

- Overcrowding: Areas with high density of listings might face increased competition and lower occupancy rates.
- Seasonality: Some areas might be more seasonal than others, which could impact demand and pricing.
3. Specific Reasons:

- Competition: If there are many similar accommodations in a particular area, it could lead to lower occupancy rates and lower prices.
- Seasonality: If a neighborhood is heavily dependent on tourism, it might experience fluctuations in demand throughout the year.

#### Chart - 10 Neighbourhood Group vs Number of Reviews (Bar Plot)

In [None]:
plt.figure(figsize=(10,6))
reviews_by_neigh_group = airbnb_df.groupby('neighbourhood_group')['number_of_reviews'].sum().sort_values(ascending=False)
sns.barplot(x=reviews_by_neigh_group.index, y=reviews_by_neigh_group.values, hue=reviews_by_neigh_group.index, palette='coolwarm', legend=False)
plt.title('Total Number of Reviews by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Total Reviews')
plt.show()

##### 1. Why did you pick the specific chart?

A bar plot is the perfect choice for visualizing the total number of reviews for each neighborhood group. It allows for easy comparison and identification of the highest and lowest number of reviews.

##### 2. What is/are the insight(s) found from the chart?

- Review Dominance: Brooklyn has the highest number of reviews, followed by Manhattan.
- Review Disparity: There is a significant disparity in the number of reviews between the top two neighborhoods (Brooklyn and Manhattan) and the remaining three.
- Least Reviewed: Staten Island has the lowest number of reviews.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Positive Impacts:

- Market Analysis: Understanding the number of reviews for each neighborhood can help identify popular areas and potential demand.
- Pricing Strategy: Neighborhoods with a high number of reviews might be able to command higher prices.
- Marketing: Targeted marketing campaigns can be created for popular neighborhoods to attract more guests.
2. Negative Impacts:

- Competition: Neighborhoods with a high number of reviews might face increased competition.
- Guest Expectations: Neighborhoods with a high number of reviews might have higher guest expectations.
3. Specific Reasons:

- Competition: If there are many similar accommodations in a popular neighborhood, it could lead to lower occupancy rates and lower prices.
- Guest Expectations: Guests might have higher expectations for accommodations in popular neighborhoods, which could put pressure on hosts to maintain high standards.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the analysis of the charts, here are some recommendations to achieve the business objective of optimizing pricing and marketing strategies:

**Pricing Strategy:**

* **Dynamic Pricing:** Implement a dynamic pricing strategy that adjusts prices based on demand, seasonality, and other factors. This can help optimize revenue and fill vacancies.
* **Neighborhood-Based Pricing:** Consider pricing variations based on the neighborhood, taking into account factors like popularity, amenities, and competition.
* **Room Type-Based Pricing:** Adjust prices based on the room type (entire home/apt, private room, shared room), considering factors like amenities, space, and privacy.
* **Review-Based Pricing:** Reward hosts with a high number of positive reviews by allowing them to set higher prices.

**Marketing Strategy:**

* **Targeted Marketing:** Use data-driven insights to target specific segments of travelers, such as families, business travelers, or budget travelers.
* **Leverage Reviews:** Highlight positive reviews in marketing materials to attract guests.
* **Geographical Targeting:** Focus marketing efforts on areas with high demand and potential for growth.
* **Seasonal Promotions:** Offer seasonal promotions and discounts to attract guests during off-peak periods.

**Inventory Management:**

* **Optimal Pricing:** Set optimal prices for different room types and neighborhoods to maximize revenue.
* **Demand Forecasting:** Use historical data and future trends to forecast demand and adjust inventory accordingly.
* **Flexible Cancellation Policies:** Offer flexible cancellation policies to attract more guests.

**Additional Considerations:**

* **Unique Selling Points:** Identify and highlight the unique selling points of each listing, such as amenities, location, or experiences.
* **Guest Experience:** Focus on providing a positive guest experience to encourage repeat bookings and positive reviews.
* **Data-Driven Decision Making:** Continuously monitor key metrics and use data-driven insights to inform decision-making.

# **Conclusion**

Through a comprehensive analysis of various charts and data, we have gained valuable insights into the factors influencing pricing and demand in the short-term rental market. By leveraging these insights, we can develop effective strategies to optimize pricing, marketing, and inventory management.

Key findings from the analysis include:

* **Price Variation:** Prices vary significantly across different room types, neighborhoods, and seasons.
* **Review Impact:** A higher number of reviews and positive ratings can positively impact pricing and demand.
* **Geographical Distribution:** The geographical distribution of listings plays a crucial role in determining pricing and demand.
* **Room Type Preference:** Entire homes/apartments are generally more popular and command higher prices.

Based on these findings, we recommend the following strategies:

* **Dynamic Pricing:** Implement a dynamic pricing strategy to adjust prices based on real-time demand and market conditions.
* **Targeted Marketing:** Utilize data-driven insights to target specific segments of travelers and promote listings effectively.
* **Optimal Inventory Management:** Optimize inventory allocation to maximize revenue and minimize vacancies.
* **Guest Experience:** Prioritize guest satisfaction to encourage positive reviews and repeat bookings.

By implementing these strategies, the client can improve their business performance, increase revenue, and enhance guest satisfaction. Continuous monitoring and analysis of data will be essential to adapt to changing market conditions and optimize strategies over time.
