<a href="https://colab.research.google.com/github/kanishka2985/airbnb-eda-analysis/blob/main/EDA_Project_Airbnb_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Airbnb Data Analysis Project



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member**    - Kanishka Sharma





# **Project Summary -**

**📝 <U>Project Summary: Airbnb Data Analysis**</U>

The objective of this project was to perform comprehensive exploratory data analysis (EDA) on a real-world Airbnb dataset to uncover meaningful insights that can help improve host strategy, platform efficiency, and guest experience. By leveraging Python libraries like Pandas, Matplotlib, and Seaborn, we examined trends across prices, availability, reviews, and host behavior. The dataset included listings from various neighborhoods, room types, host activity metrics, and temporal review data.

To begin, we cleaned and preprocessed the dataset by handling missing values in columns like reviews_per_month and last_review, and created new features such as price_per_night (derived from price and minimum nights) and review_rate_indicator (a categorization of listings based on their review activity). This enriched dataset allowed for deeper and more interpretable analysis.

A central part of the analysis was understanding room type popularity. We generated count plots and average pricing comparisons that revealed that "Entire home/apartment" listings dominate in terms of pricing and quantity, followed by "Private rooms". However, "Shared rooms" and "Hotel rooms" were the least common, suggesting limited demand or host willingness to share spaces.

We also investigated how neighborhood and location (latitude/longitude) influence listing behavior. By plotting average price and availability by neighborhood group, we observed that central areas (like Manhattan) typically have the highest prices and lowest availability, whereas outer neighborhoods (like the Bronx or Queens) offer more availability at lower prices. A geographic scatter plot using latitude and longitude helped visualize density clusters, highlighting property "hotspots".

Temporal trends were another major focus. By converting last_review into datetime format and grouping reviews by month and year, we created line plots to explore seasonal engagement. These graphs indicated that reviews spike during mid-year months, likely reflecting summer travel peaks. Additionally, we explored how reviews_per_month varied with room type and location to identify which types of listings maintain consistent customer engagement.

Another dimension of the project involved understanding host behavior. We grouped hosts by the number of listings they manage and compared this with their average reviews per month. This led to identifying "super hosts" with multiple listings and high engagement.Box plots were used to show the distribution of availability and pricing across different room types and activity indicators, giving insights into how commercial vs. occasional hosts behave.

Advanced plots such as pair plots, heatmaps, and correlation matrices were used to identify strong relationships among variables like price, availability, and review activity. For instance, we found that availability_365 is weakly correlated with price, while reviews_per_month has a stronger relationship with the number of reviews and host listing count.

To summarize, this project delivered a detailed analytical view of Airbnb listings by breaking down factors that affect price, availability, engagement, and host activity. Visualizations like line plots, heatmaps, scatter plots, bar charts, and violin plots provided an interactive and intuitive understanding of the data. The analysis not only helps hosts optimize listing strategies but also aids Airbnb in understanding market dynamics across different urban zones and times of the year.



# **GitHub Link -**

https://github.com/kanishka2985/airbnb-eda-analysis

# **Problem Statement**


**AIRBNB PROBLEM OVERVIEW**

Airbnb is a global online marketplace that connects people looking to rent out their homes with those seeking accommodations. With millions of listings worldwide, Airbnb generates a wealth of data related to prices, locations, availability, host activity, and customer reviews. Understanding this data can offer valuable insights into how Airbnb operates at a city level and how guests choose where to stay.

This project focuses on analyzing Airbnb listings from a dataset that includes features such as listing price, room type, neighborhood, host information, number of reviews, availability, and more. By using this dataset, we aim to uncover trends that can help both Airbnb hosts and the platform itself make smarter, data-driven decisions.

Despite Airbnb’s popularity, hosts often face challenges in pricing their listings competitively and understanding guest behavior. Similarly, travelers are frequently overwhelmed by the number of choices available, and the platform needs to ensure quality, trust, and optimal user experience. This dataset offers a snapshot of Airbnb listings in a particular region, which we will analyze to solve some key business and user challenges.

#### **Define Your Business Objective?**

The main business objective of this project is to analyze Airbnb listing data at the city level to uncover actionable insights that can improve pricing strategies, host performance, and the guest experience on the platform. By studying key features such as price, room type, location, availability, host activity, and user reviews, the project aims to:

* Help hosts optimize pricing and listing strategies to attract more
guests and increase revenue.

* Assist Airbnb as a platform in enhancing recommendation systems, ensuring listing quality, and identifying top-performing neighborhoods and hosts.

* Empower travelers by highlighting trends that indicate better value, reliability, and satisfaction across different room types and areas.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
path='/content/drive/MyDrive/Colab Notebooks/Airbnb dataset/Copy of Copy of Airbnb NYC 2019.csv'
df=pd.read_csv(path)

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(df.isnull().sum())

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull(),cbar=False)

### What did you know about your dataset?

1. **🔢<u>Basic Info:-</u>**

* The dataset contains 48,895 rows and 16 columns.

* It includes details about listings, hosts, location, pricing, and review activity.

2. **🔍 <u>Missing Values</u>**

* `name`: 16 missing → few listings without titles.

* `host_name`: 21 missing → some hosts haven't added names.

* `last_review`: 10,052 missing → listings have never been reviewed.

* `reviews_per_month`: 10,052 missing → same listings with no reviews.

* Other columns have no missing values.

3. **🧠<u> Column Types</u>**

* Numerical (`int64`, `float64`): price, minimum_nights, number_of_reviews, availability, etc.

* Categorical/Text (`object`): name, host_name, neighbourhood, room_type, etc.

* `last_review` is text but can be converted to datetime.

4. **🗺️<u> Geographic Info</u>**

* `latitude` and `longitude` are complete for all listings.

* Useful for creating location maps or hotspot plots.

5. **💡<u> Key Observations</u>**

* Around 20% of listings have never been reviewed.

* Core metrics like price, availability_365, and room_type are fully available.

* The data is suitable for visual analysis, feature engineering, and ML modeling.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

| **Column Name**                  | **Description**                                                              |
| -------------------------------- | ---------------------------------------------------------------------------- |
| `id`                             | Unique identifier for each listing on the Airbnb platform.                   |
| `name`                           | Title or name of the listing (e.g., "Cozy Apartment in Brooklyn").           |
| `host_id`                        | Unique ID assigned to each host on the platform.                             |
| `host_name`                      | Name of the host managing the listing.                                       |
| `neighbourhood_group`            | Broad geographic area or borough (e.g., Manhattan, Brooklyn, Queens).        |
| `neighbourhood`                  | More specific area within the neighbourhood group (e.g., Harlem, Bushwick).  |
| `latitude`                       | Geographic coordinate indicating the latitude of the listing.                |
| `longitude`                      | Geographic coordinate indicating the longitude of the listing.               |
| `room_type`                      | Type of room offered: "Entire home/apt", "Private room", "Shared room", etc. |
| `price`                          | Price per booking (often per night, but not always clear).            |
| `minimum_nights`                 | Minimum number of nights required to book the listing.                       |
| `number_of_reviews`              | Total number of reviews received for the listing.                            |
| `last_review`                    | Date of the most recent review received.                                     |
| `reviews_per_month`              | Average number of reviews per month.                                         |
| `calculated_host_listings_count` | Number of listings the host has on Airbnb.                                   |
| `availability_365`               | Number of days the listing is available for booking in a year (0–365).       |


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns.tolist():
  print("No. of unique values in ",i,"is",df[i].nunique(),".")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
df = df.copy()
df.head()

In [None]:
# Write your code to make your dataset analysis ready.

In [None]:
df['last_review'] = pd.to_datetime(df['last_review'])

In [None]:
#Handling the missing values
df = df[df['last_review'].notnull()]
df = df[df['reviews_per_month'].notnull()]
df['name'] = df['name'].fillna('not present')
df['host_name'] = df['host_name'].fillna('not present')

In [None]:
#Review Rate Indicator
def review_rate_indicator(x):
  if x >= 2.0:
    return 'Highly Active'
  elif x >= 1.0:
    return 'Moderately Active'
  elif x > 0:
    return 'Low Activity'
  else:
    return 'Inactive'
df['review_rate_indicator']=df['reviews_per_month'].apply(review_rate_indicator)
df.head()

In [None]:
#Price Per Night Ratio
df['price_per_night_ratio']=df['price']/df['minimum_nights']
df.head()

In [None]:
#Remove the lsiting with availabilty>365
df = df[df['minimum_nights'] <= 365]

In [None]:
#Ensuring that latitute and logitude are within NYC bounds
df = df[(df['latitude'].between(40.5, 41.0)) & (df['longitude'].between(-74.5, -73.5))]
df.head()

In [None]:
#Gives average prices for each neighbourhood group
df.groupby('neighbourhood_group')['price'].mean()

In [None]:
#Gives the latitude and longitude of each of the neighbourhood
dist=df.groupby('neighbourhood')[['latitude', 'longitude']].mean()
dist

In [None]:
#Get name and price of listing(s) with the highest price
max_price = df[df['price'] == df['price'].max()][['name', 'price']]
print('The hotels with the highest price:')
print(max_price)

In [None]:
#Remove the hotel names with price 0
df = df[df['price'] > 0]
#Get name and price of listing(s) with the lowest price
min_price = df[df['price'] == df['price'].min()][['name', 'price']]
print('The hotels with the lowest price:')
print(min_price)

In [None]:
#Get the mean of the price
mean_price = df['price'].mean()
print("The mean price is :-",mean_price)
#Get the mode of the price
mode_price = df['price'].mode()
print("The most common price is :-",mode_price.iloc[0])
#Get the median of the price
median_price = df['price'].median()
print("The median price is :-", median_price)

In [None]:
#Total number of unique host
unique_hosts = df['host_id'].nunique()
print("The total number of unique hosts are:", unique_hosts)

In [None]:
#total number listings
total_listings = len(df)
print("Total number of listings:", total_listings)

In [None]:
#Hotels with thr maximum availability
max_avail= df[df['availability_365'] == df['availability_365'].max()][['name', 'price','availability_365']]
print('The hotels with the maximum availability:')
max_avail.head()

In [None]:
#Hotels with the minimum availability
min_avail= df[df['availability_365'] == df['availability_365'].min()][['name', 'price','availability_365']]
print('The hotels with the minimum availability:')
min_avail.head()

### What all manipulations have you done and insights you found?

**✅DATA WRANGLING / CLEANING**

1. <u>Date Parsing</u>-Converted 'last_review' to datetime using pd.to_datetime().

2. <u>Missing Value Handling</u>-
* Dropped rows where last_review and reviews_per_month were null.
* Filled missing name and host_name with 'not present'.

3. <u>Price Cleaning</u>-Removed listings with price = 0 (which are unrealistic or incorrect).

4. <u>Outlier Removal</u>-
* Removed listings with minimum_nights > 365 (likely data errors).
* Filtered out listings outside NYC bounds (latitude, longitude constraints).


**✅FEATURE ENGINEERING**

1. <u>price_per_night_ratio</u>

* Created by dividing price / minimum_nights.

2. <u>review_rate_indicator</u>

* Applied custom function to classify listings:

* greater than equal to 2 → Highly Active

* greater than equal to 1 → Moderately Active

* greater than 0 → Low Activity

*  equals to 0 → Inactive

**✅ EXPLORATORY DATA ANALYSIS (EDA)**

1. <u>Descriptive Statistics</u>
* Mean Price: 142.36 approx.

* Most common Pricing(mode): 100

* Median Price:106.0

* Total Listings: 38,827

* Unique Hosts: 30,245
→ Indicates some hosts have multiple properties.

2. <u>Extreme Price Listings</u>
* Highest Price: 10,000,
Listings like “Luxury 1 bedroom apt…” and “Furnished room in Astoria”.

* Lowest Valid Price (after cleaning): 10,
Examples: “Girls only cozy room…”, “Very Spacious Bedroom…”, etc.

3. <u>Availability</u>
* Listings with availability_365 = 0 are inactive or blocked for bookings.
Example listings with 0 availability were identified.

4. <u>Location Insights</u>
* Average Price per Neighbourhood Group:

* Manhattan: 196.88 (most expensive)

* Bronx: 87.50 (cheapest)

* Neighborhood Coordinates:
Grouped by neighbourhood to find average latitude and longitude for mapping.



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
#Room Popularity Graph
sns.countplot(data=df, x='room_type', hue='room_type', palette='Set2', legend=False)
plt.title('Room Type Popularity')
plt.xlabel('Room Type')
plt.ylabel('Number of Listings')
plt.show()

##### 1. Why did you pick the specific chart?

the **countplot chart** is selected as:

* room_type is a categorical variable.

* Count plot shows the frequency of each category.

* It’s ideal for comparing number of listings across room types.

* Automatically counts and displays data using Seaborn.

* Easy to read and interpret visually

##### 2. What is/are the insight(s) found from the chart?

Insights from the Room Type Popularity Chart:

* **Entire home/apt** is the **most popular** room type, with the highest number of listings.
* **Private rooms** are the **second most common**, showing strong demand or availability.
* **Shared rooms** are **rare**, indicating **low popularity or supply**.
* Users or hosts may **prefer more privacy**, as seen from the high count of entire homes and private rooms.
* Airbnb listings are likely geared more toward **individual travelers or small groups** rather than large/shared accommodations.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes, the insights can drive positive business decisions, such as:

* Focusing inventory on "Entire home/apt" and "Private room" categories, which are in higher demand.

* Targeting marketing campaigns toward these preferred room types to maximize ROI.

* Improving amenities or pricing strategies for these popular types to further increase bookings.

**<u>Potential Negative Growth Insight:</u>**

* The low popularity of “Shared room” listings might indicate either a lack of demand or poor user experience. If the company continues to invest in shared room inventory without addressing customer concerns (privacy, safety, etc.), it could result in low returns and resource misallocation.

**Justification:** Maintaining underperforming listings increases overhead without much benefit, and may even damage the brand’s perception if customers associate it with low-quality options.



#### Chart - 2

In [None]:
# Chart - 2 visualization code
#Average price of neighbourhood group
neighbourhood = df.groupby('neighbourhood_group')['price'].mean()

# Plot as a pie chart
plt.figure(figsize=(9,3))
neighbourhood.plot(
    kind='pie',
    autopct='%1.1f%%',
    colors=sns.color_palette('Set2'),
    ylabel=''  # remove default label
)
plt.title('Neighbourhood Group Vs Average Price')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A **pie chart** was chosen because it effectively displays the proportional contribution of each neighbourhood group to the overall average price. It provides a clear visual breakdown of how pricing differs by region, which is helpful when analyzing relative value across categories. It’s especially useful when dealing with percentage distributions and categorical data.



##### 2. What is/are the insight(s) found from the chart?

Key insights from the pie chart:

* Manhattan has the highest average price, contributing 31.8% of the total.

* Brooklyn follows at 21.4%, indicating it’s also a relatively expensive area.

* Bronx and Staten Island have the lowest average prices, at 14.0% and 15.9% respectively.

* Queens sits in the mid-range at 16.9%.

These insights suggest how pricing varies significantly by location.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes, these insights can support several strategic decisions:

* Target premium listings in Manhattan and Brooklyn to maximize revenue, as they command higher average prices.

* Design pricing strategies that reflect each neighborhood's demand and value proposition.

* Promote affordable options in Bronx or Staten Island to attract budget-conscious travelers, enabling broader market reach.

**<u>Potential Negative Growth Insight:</u>**

* If too much investment is made in high-price areas like Manhattan, it could limit accessibility for price-sensitive customers, possibly reducing booking volume.

* Neglecting lower-cost areas like Bronx or Staten Island might mean missing out on untapped market segments or local travelers seeking affordability.

**Justification:** A narrow focus on high-value neighborhoods could alienate budget travelers, affecting overall booking diversity and growth potential.



#### Chart - 3

In [None]:
# Chart - 3 visualization code
#Graph to find which month has more number of reviews
df['review_month'] = df['last_review'].dt.month
monthly_reviews = df['review_month'].value_counts().sort_index()
# Count reviews per month
plt.figure(figsize=(9,4))
sns.lineplot(x=monthly_reviews.index, y=monthly_reviews.values, palette='Set2')
plt.title('Monthly Engagement (Based on Last Review Dates)')
plt.xlabel('Month')
plt.ylabel('Number of Reviews')
plt.grid(axis='y')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A **line plot** was selected because it effectively visualizes trends over time, in this case, user engagement across months based on the number of reviews. It helps identify seasonal patterns, peaks, and drops in engagement, which are essential for business planning and forecasting.



##### 2. What is/are the insight(s) found from the chart?

Key insights from the chart:

* Engagement peaks sharply in June (Month 6) with the highest number of reviews (~14,000), suggesting it’s a high-activity season.

* There is a gradual increase in engagement from March to May, followed by a steep decline from July onwards.

* The lowest engagement is observed in February (Month 2) and again in the last quarter (October to December).

This suggests a strong seasonal pattern, with summer seeing significantly more activity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes, these insights are highly valuable for:

* Strategic marketing: Focus campaigns in Q2 and early Q3 (April–June) to capture the peak engagement window.

* Resource planning: Increase staffing, inventory, or listings during high-engagement months.

* Promotions during slow months (like February and November) to stimulate demand and avoid flat sales.

**<u>Potential Negative Growth Insight:</u>**

* The sharp drop after June indicates seasonal disengagement. If unaddressed, this off-peak inactivity can lead to revenue dips and underutilized assets.

**Justification:** Without proactive measures in off-season months, hosts and platforms could suffer decreased occupancy, impacting profitability and long-term sustainability.



#### Chart - 4

In [None]:
# Chart - 4 visualization code
#Geographical Distribution of listings
plt.figure(figsize=(9,4))
sns.scatterplot(
    data=df,
    x='latitude',
    y='longitude',
    hue='neighbourhood_group',
    palette='Set1',
    alpha=0.5
)
plt.title('Geographical Distribution of listings')
plt.xlabel('Latitude')
plt.ylabel('Longitude')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

A **scatter plot** is ideal for visualizing spatial data. This chart effectively shows how Airbnb listings are distributed across different neighbourhood groups based on their latitude and longitude. It allows for a clear view of geographic clustering and regional density patterns, which are not easily captured in bar or pie charts.



##### 2. What is/are the insight(s) found from the chart?

Key insights from the chart:

* Manhattan and Brooklyn have dense clusters of listings, indicating high activity and popularity in these regions.

* Queens also has a broad geographical spread of listings, though slightly less dense.

* Staten Island and the Bronx have fewer and more spread-out listings, showing they are less saturated markets.

* There’s a clear geographic separation between neighbourhood groups, helping to identify the regional spread of inventory.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes, this visualization can guide:

* Strategic placement of new listings in under-served or high-demand areas.

* Targeted marketing and promotions based on regional density.

* Urban planning and zoning analysis to avoid oversaturation or to encourage expansion into underutilized areas (e.g., Staten Island).

**<u>Potential Negative Growth Insight:</u>**

* The overconcentration in Manhattan and Brooklyn could lead to market saturation, driving price competition and possibly lower profit margins.

* Underutilization in the Bronx or Staten Island may mean missed opportunities if those regions have untapped demand.

**Justification:** A skewed distribution may lead to resource imbalance—too much focus on saturated areas and not enough on emerging zones could restrict long-term growth.



#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(5,3))
sns.histplot(df['availability_365'], bins=30, kde=True, color='teal')

plt.title('Availability Distribution of Listings (0–365 days)')
plt.xlabel('Number of Available Days per Year')
plt.ylabel('Number of Listings')
plt.grid(True)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A **histogram** was chosen because it’s the most effective way to show the distribution of a continuous variable—in this case, the number of available days per year for listings. It reveals how listings are spread across different availability ranges and helps identify patterns like peaks or gaps in the dataset. The added KDE (Kernel Density Estimate) curve gives insight into the distribution's shape and density.



##### 2. What is/are the insight(s) found from the chart?

Key insights:

* A large number of listings are available for very few days (0–50 days), indicating many hosts use Airbnb occasionally or seasonally.

* There's also a small spike at 365 days, meaning some listings are available year-round—likely professional or full-time rentals.

* Availability between 100–300 days is much less common, suggesting a gap between occasional and full-time listings.

This distribution is highly skewed, with most listings concentrated at the lower availability range.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes, these insights are valuable for:

* Segmenting the host base (e.g., part-time vs. full-time hosts), allowing for customized outreach, support, and tools.

* Predicting booking volume and availability trends more accurately.

* Encouraging high-performing occasional hosts to increase availability, potentially improving supply.

**<u>Potential Negative Growth Insight:</u>**

* The high number of low-availability listings could limit booking opportunities for guests, especially during peak seasons. This reduces overall platform reliability and satisfaction.

* Listings with very few available days may lead to lower revenue and underutilized potential, which could affect long-term platform growth.

**Justification:** If most listings are not available consistently, it undermines Airbnb's ability to meet demand—especially in high-traffic periods—leading to lost revenue and potentially dissatisfied users.



#### Chart - 6

In [None]:
# Chart - 6 visualization code
#Reviews per month over time

df = df[df['last_review'].notnull()]


df['last_review'] = pd.to_datetime(df['last_review'], errors='coerce')

df['month_year'] = df['last_review'].dt.to_period('M')


monthly_reviews = df['month_year'].value_counts().sort_index()

monthly_reviews.index = monthly_reviews.index.to_timestamp()

plt.figure(figsize=(9,4))
sns.lineplot(x=monthly_reviews.index, y=monthly_reviews.values, marker='o', color='purple')

plt.title('Reviews Per Month Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Reviews')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A **line plot** with a time series is the best choice for visualizing trends over time. This chart allows for the detection of growth patterns, fluctuations, and sudden spikes or drops in review activity. Tracking reviews per month is a strong proxy for user engagement and platform adoption over the years.



##### 2. What is/are the insight(s) found from the chart?

Key insights:

* There is a steady increase in review volume from 2011 to around 2017.

* From 2018 onward, there is a sharp spike in monthly reviews, indicating a surge in platform usage.

* The peak occurs around early/mid-2019, with more than 12,000 reviews in a single month.

* A sudden drop right after the peak might indicate data truncation or a temporary event (like a system cutoff or external disruption).

Overall, it demonstrates rapid growth and market penetration in recent years.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes, the insights highlight:

* Strong user engagement growth, which is critical for investor confidence and strategic scaling.

* The ability to forecast demand based on historical growth trends.

* A signal to invest in infrastructure, marketing, and listing support to handle growing volume.

**<u>Potential Negative Growth Insight:</u>**

* The sharp decline after the 2019 peak raises questions. If not due to data collection limitations, it could signal an external event (like a policy change or market issue) that hurt user engagement.

* Without proper context or intervention, this drop may affect forecasting and trust in consistent growth.

**Justification:** Rapid spikes followed by declines can indicate unsustainable growth or anomaly-driven peaks, which, if unaddressed, may result in volatility and instability.



#### Chart - 7

In [None]:
# Chart - 7 visualization code
import matplotlib.pyplot as plt

# Reuse the same data
top_hotels = df.groupby('name')['reviews_per_month'].mean()
top10 = top_hotels.sort_values(ascending=False).head(10)

# Sort values for better visual order
top10 = top10.sort_values()

# Plot
plt.figure(figsize=(10, 6))

# Plot the sticks
plt.hlines(y=top10.index, xmin=0, xmax=top10.values, color='skyblue')

# Plot the circles (lollipops)
plt.plot(top10.values, top10.index, 'o', color='teal')

# Labels and titles
plt.title('Top 10 Hotel Names by Average Reviews per Month (Lollipop Plot)')
plt.xlabel('Average Reviews per Month')
plt.ylabel('Hotel Name')
plt.grid(True, axis='x', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A **lollipop plot** was chosen because it is a cleaner and more visually appealing alternative to a standard bar chart. It allows for easy comparison of discrete items—in this case, hotel names—based on their average reviews per month. The combination of vertical lines and points helps quickly highlight performance differences among the top listings while keeping the chart uncluttered.

##### 2. What is/are the insight(s) found from the chart?

Key insights:

* The listing titled "Enjoy great views of the City in our Deluxe Room!" stands out significantly, with nearly 60 reviews per month, far exceeding the rest.

* Other top listings cluster between 15 to 30 reviews per month, indicating strong but more typical performance.

* Most top-performing listings are located near JFK Airport, suggesting that proximity to transportation hubs boosts engagement.

* Descriptive and benefit-driven titles (e.g., mentioning views, Times Square, or “NO CLEANING FEE”) may correlate with higher review rates.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes, these insights can be leveraged to:

* Analyze what makes high-performing listings successful, such as naming strategies, location, and amenities.

* Guide new hosts to improve listing titles and services for better engagement.

* Feature top listings in promotions or use them as benchmarks for onboarding and training.

**<u>Potential Negative Growth Insight:</u>**

* If over-reliance on a few high-performing listings occurs, it may limit growth opportunities for others or create unfair platform visibility.

* A lack of diversity among high performers (e.g., mostly airport-related) might indicate a need to diversify high-engagement zones.

**Justification:** Overconcentration in specific themes or areas might lead to neglect of other potentially profitable regions or experiences, stalling broader platform growth.



#### Chart - 8

In [None]:
# Chart - 8 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Step 1: Group by host_name
host_stats = df.groupby('host_name').agg(
    num_listings=('id', 'count'),
    total_reviews=('number_of_reviews', 'sum')
).reset_index()

# Step 2: Calculate reviews per listing
host_stats['reviews_per_listing'] = host_stats['total_reviews'] / host_stats['num_listings']

# Step 3: Filter to hosts with multiple listings (e.g., 5 or more)
high_listing_hosts = host_stats[host_stats['num_listings'] >= 5]

# Step 4: Sort by lowest reviews per listing
low_review_hosts = high_listing_hosts.sort_values(by='reviews_per_listing').head(10)

# Step 5: Plot
plt.figure(figsize=(10, 5))
bar_width = 0.4
x = range(len(low_review_hosts))

# Bar chart: listings and total reviews
plt.bar(x, low_review_hosts['num_listings'], width=bar_width, label='Number of Listings', color='blue')
plt.bar([i + bar_width for i in x], low_review_hosts['total_reviews'], width=bar_width, label='Total Reviews', color='orange')

# Labels
plt.xticks([i + bar_width / 2 for i in x], low_review_hosts['host_name'], rotation=45, ha='right')
plt.xlabel('Host Name')
plt.ylabel('Count')
plt.title('Top 10 Hosts with High Listings but Low Reviews per Listing')
plt.legend()
plt.tight_layout()
plt.show()



##### 1. Why did you pick the specific chart?

A **grouped bar** chart is perfect for comparing two related metrics—number of listings and total reviews—across multiple hosts. This format clearly highlights disparities between quantity and engagement, helping to identify hosts who are scaling listings without receiving proportionate user feedback.



##### 2. What is/are the insight(s) found from the chart?

Key insights:

* Hosts like Kenny, Sonder, and Blueground have many listings, but low total reviews in comparison, leading to a very low average number of reviews per listing.

* These hosts likely operate at scale, but engagement per property is poor—suggesting potential quality, visibility, or user satisfaction issues.

* The imbalance indicates these hosts may be struggling to generate traction or maintain guest experience despite having a high presence.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes, these insights are valuable because they:

* Help identify underperforming high-volume hosts, allowing for intervention (e.g., quality reviews, customer support, or listing optimization).

* Support the improvement of guest satisfaction by focusing on hosts whose properties may be abundant but lack engagement.

* Allow the platform to set benchmarks or offer personalized host guidance to improve overall performance.

**<u>Potential Negative Growth Insight:</u>**

* These high-volume but low-engagement hosts could dilute platform quality, resulting in poor guest experiences, lower repeat rates, and negative reviews.

* Over-prioritizing listing quantity over guest experience can hurt brand reputation and trust, especially if these hosts dominate search results.

**Justification:** Hosts who prioritize scale without maintaining listing quality or guest interaction may create a false sense of supply and drag down platform-wide metrics such as review scores or occupancy rates.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
import matplotlib.pyplot as plt

# Count the number of listings in each review rate category
review_counts = df['review_rate_indicator'].value_counts().reindex(
    ['Highly Active', 'Moderately Active', 'Low Activity', 'Inactive'], fill_value=0
)

# Plotting
plt.figure(figsize=(8, 5))
review_counts.plot(kind='bar', color='teal', edgecolor='black')

# Labels
plt.title('Review Rate Indicator Distribution')
plt.xlabel('Review Activity Level')
plt.ylabel('Number of Listings')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A **bar chart** is the most appropriate way to visualize categorical data, such as review activity levels. This chart was chosen to clearly compare the number of listings that fall under each activity category—Highly Active, Moderately Active, Low Activity, and Inactive. It offers a simple and effective way to assess engagement distribution across listings.



##### 2. What is/are the insight(s) found from the chart?

Key insights:

* A majority of listings are categorized under Low Activity, indicating they receive few or infrequent reviews.

* Only a small portion of listings are Highly Active, showing a much smaller group of high-performing or frequently reviewed listings.

* The Inactive category has almost no presence, which is positive—suggesting most listings are at least occasionally engaged with.

* The Moderately Active listings are fewer than Low Activity but still represent a decent mid-tier group.

This suggests an imbalance: lots of supply but relatively low user engagement per listing.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes, these insights can help in the following ways:

* Targeted host support: Hosts in the Low Activity segment can be educated or incentivized to improve their listings (e.g., photos, pricing, communication).

* Platform optimization: Helps Airbnb identify which listings are performing well vs. those needing visibility or quality improvements.

* Marketing focus: Promotions can be designed to boost underperforming listings into the "Moderately" or "Highly Active" categories.

**<u>Potential Negative Growth Insight:</u>**

* A large number of low-activity listings might indicate poor guest experience, listing quality, or discoverability issues.

* If unaddressed, this could lead to guest dissatisfaction, fewer bookings, and ultimately churn—especially if users frequently encounter inactive or low-value listings.

**Justification:** While supply is high, low engagement suggests inefficient utilization, reducing platform value. Improving listing activity is critical to maintaining healthy marketplace dynamics.



#### Chart - 10

In [None]:
# Chart - 10 visualization code
sns.scatterplot(data=df[df['minimum_nights'] < 60], x='minimum_nights', y='price', alpha=0.5)
plt.title('Minimum Nights vs Price')

##### 1. Why did you pick the specific chart?

A **scatter plot** is ideal for identifying relationships between two continuous numerical variables—in this case, minimum nights and price. This visualization makes it easy to spot outliers, trends, and concentrations in booking behavior and pricing strategies. It was chosen to understand if pricing is influenced by required minimum stay durations.



##### 2. What is/are the insight(s) found from the chart?

Key insights:

* Most listings cluster between 1–10 minimum nights and are priced under $1,000.

* There are noticeable outliers where listings with low minimum nights have extremely high prices, possibly indicating premium/luxury or incorrectly entered data.

* Listings with minimum nights > 30 tend to have lower prices, suggesting discounted long-term stays or compliance with rental laws (e.g., 30-day minimums in some cities).

* The lack of a strong linear trend suggests that price is not directly proportional to minimum nights—they're influenced by other factors like location, amenities, or listing type.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes, these insights are actionable:

* Helps detect data quality issues like abnormal pricing (e.g., $10,000+ for 1 night), enabling platform moderation or automatic outlier filtering.

* Encourages dynamic pricing strategies based on length of stay, enabling better yield management.

* Useful for policy compliance (e.g., monthly rentals) and for identifying market segments (e.g., short-term vs. long-term stays).

**<u>Potential Negative Growth Insight:</u>**

* Overpricing short-stay listings could deter bookings and harm customer satisfaction.

* A large number of listings with high minimum nights but low engagement could suggest underutilized inventory, leading to revenue loss.

**Justification:** Understanding and optimizing the balance between minimum stay requirements and pricing is crucial to drive higher occupancy, better conversion, and guest satisfaction.



#### Chart - 11

In [None]:
# Chart - 11 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Step 1: Calculate price per night
df['price_per_night'] = df['price'] / df['minimum_nights']

# Step 2: Bin price per night
bins = [0, 50, 100, 200, 500]
labels = ['0-50', '51-100', '101-200', '201-500']
df['price_range'] = pd.cut(df['price_per_night'], bins=bins, labels=labels)

# Step 3: Filter out outliers
filtered_df = df[df['price_range'].notnull() & (df['reviews_per_month'] < 15)]

# Step 4: Group and aggregate
grouped = filtered_df.groupby(['price_range', 'room_type'])['reviews_per_month'].mean().unstack(fill_value=0)

# Step 5: Plot stacked bar
grouped.plot(kind='bar', stacked=True, figsize=(10, 6), colormap='Set2', edgecolor='black')

# Styling
plt.title('Avg Reviews per Month by Room Type and Price Range')
plt.xlabel('Price per Night Range')
plt.ylabel('Avg Reviews per Month')
plt.legend(title='Room Type')
plt.grid(True, axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()



##### 1. Why did you pick the specific chart?

This **stacked bar chart** was chosen because it allows comparison across multiple dimensions:

* Room type (Entire home/apt, Private room, Shared room)

* Price range per night (binned into 4 categories)

It visualizes how engagement (reviews per month) varies based on both price and room type, giving a comprehensive view in a single plot.

##### 2. What is/are the insight(s) found from the chart?

Key insights:

* Listings in the 101–200 price range receive the highest average reviews per month, suggesting it's a sweet spot in terms of value and popularity.

* Entire home/apartment listings consistently get more reviews across all price ranges, highlighting user preference for privacy and autonomy.

* Shared rooms have the lowest engagement across price tiers, reinforcing earlier findings of lower popularity.

* The lowest (<50) and highest (>200) price ranges see lower engagement, indicating reduced demand or niche usage.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes, the insights are highly actionable:

* Hosts can optimize pricing by targeting the $101–200 range to maximize exposure and bookings.

* Platforms can prioritize entire home/apt listings in search rankings or marketing for higher user satisfaction and engagement.

* This helps identify high-performing listing types and pricing combinations, improving conversion rates.

**<u>Negative Growth Insight:</u>**

* Shared rooms, even at low price points, show poor engagement, indicating a low ROI for hosts and possibly dissatisfaction from guests.

* Listings priced above $200 may require added value (luxury amenities, unique features) to justify low engagement.

**Justification:**
Not acting on these insights could lead to overpricing, underutilized listings, or promoting low-performing categories—ultimately harming business growth and user trust.



#### Chart - 12

In [None]:
# Chart - 12 visualization code
min_nights_trend = df[df['month_year'].notnull()].groupby('month_year')['minimum_nights'].mean()

min_nights_trend.plot(kind='line', figsize=(12, 6), color='darkgreen', marker='d')
plt.title('Average Minimum Nights Over Time')
plt.xlabel('Month-Year')
plt.ylabel('Minimum Nights')
plt.grid(True)
plt.tight_layout()
plt.show()




##### 1. Why did you pick the specific chart?

This **line chart** was selected because it best visualizes temporal trends in the data. It clearly shows how the average minimum_nights required by hosts changed over time (monthly), making it ideal for identifying spikes, drops, and unusual behavior patterns in listings.



##### 2. What is/are the insight(s) found from the chart?

Key insights:

* There is a major spike in the average minimum nights around a specific time period (exceeding 180 days).
This is likely caused by a small number of outliers with extremely high minimum stay requirements.

* Outside the spike, most of the values fluctuate between 5 and 15 nights, which seems to be the typical trend.

* The trend appears inconsistent early on, but stabilizes in the more recent months.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes,The insights help identify outliers or data anomalies, which may skew analytics or pricing models.

* Platforms can warn or advise hosts against unusually long minimum stay requirements that may reduce booking rates.

* It provides historical evidence to support policy tuning, such as recommending optimal minimum night ranges to maximize engagement.

**<u>Negative Growth Risk:</u>**

* Listings with unusually high minimum_nights (e.g., >100 days) are likely unbooked and uncompetitive, leading to wasted listing space.

* If not addressed, it may mislead market analysis or hurt guest experience, especially if such listings appear in search results.

**Justification:**
Monitoring and reacting to this metric over time can improve listing quality, host behavior, and platform trust—thus directly impacting conversion and retention positively.



#### Chart - 13

In [None]:
# Chart - 13 visualization code
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
sns.boxplot(
    data=df,
    x='review_rate_indicator',
    y='availability_365',
    palette='Set2'
)

plt.title('Box Plot of Availability by Review Rate Indicator')
plt.xlabel('Review Activity Level')
plt.ylabel('Availability (Days per Year)')
plt.grid(True, axis='y', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()



##### 1. Why did you pick the specific chart?

A **box plot** is ideal for comparing distribution characteristics (like median, IQR, and outliers) across categories—in this case, availability across different review activity levels (Low Activity, Highly Active, Moderately Active).

It highlights not just average values, but also variability, spread, and presence of extreme values, which a bar chart or mean-based plot would miss.



##### 2. What is/are the insight(s) found from the chart?

Key observations:

* Highly Active listings generally have higher availability and tighter spread (Q1 to Q3), suggesting consistent hosting.

* Low Activity listings show the widest range and highest variation, including many with nearly zero availability.

* Moderately Active listings fall somewhere in between, with a slightly wider spread than Highly Active ones.

* Outliers are present in all three categories—many listings with availability over 300 days.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**<u>Positive Business Impact:</u>**

* Yes,Helps platforms identify host commitment levels—those consistently available and receiving reviews.

* Useful for targeting or incentivizing moderately active or inconsistent hosts to improve availability and engagement.

* Indicates potential for listing quality improvement by encouraging more consistent availability for those with high review potential.

**<u>Risk of Negative Impact:</u>**

* Yes, if ignored:

* Listings with low activity and high availability may not be generating bookings—indicating either poor visibility or guest experience issues.

* Over-reliance on highly active listings could reduce diversity in options and create demand concentration, hurting long-tail hosts.

**Justification:**
This plot enables strategic platform decisions—such as better listing recommendations, review prompting, or availability-based quality rankings.



#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
import seaborn as sns
import matplotlib.pyplot as plt

# Exclude independent and non-numeric columns
exclude_cols = [
    'id', 'name', 'host_id', 'host_name',
    'neighbourhood_group', 'neighbourhood',
    'room_type', 'last_review', 'review_month',
    'month_year', 'review_rate_indicator'
]

# Select only dependent numeric columns
numeric_cols = df.drop(columns=exclude_cols, errors='ignore').select_dtypes(include='number')

# Compute correlation
correlation_matrix = numeric_cols.corr()

# Plot heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap (Dependent Numeric Features Only)')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

The **correlation heatmap** is essential for exploring how numeric variables are related to each other. It helps:

* Identify multicollinearity (important for modeling).

* Reveal strong relationships worth deeper analysis.

* Visualize pairwise trends in a compact, intuitive format.



##### 2. What is/are the insight(s) found from the chart?

Key insights from the heatmap:

1. Strong positive correlation between:

* price and price_per_night_ratio (0.66)

* price and price_per_night (0.66)

* reviews_per_month and number_of_reviews (0.55) – expected, since both measure review activity.

2. Slight negative correlation between:

* minimum_nights and reviews_per_month (–0.15): Listings with longer minimum stays tend to get fewer monthly reviews.

3. Weak or negligible correlation between geographical coordinates and other numeric features.

4. availability_365 shows only slight positive relationships with other features like price and reviews.



#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Import necessary libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load your dataset
# Example: df = pd.read_csv('your_airbnb_data.csv')
# Make sure your DataFrame is already loaded as 'df'

# Optional: Set a clean style
sns.set(style='whitegrid', palette='muted')

# Select the relevant numerical columns for multivariate analysis
selected_cols = ['price', 'number_of_reviews', 'availability_365', 'reviews_per_month']

# Create the pairplot
sns.pairplot(df[selected_cols])

# Add a title to the plot
plt.suptitle('Multivariate Analysis: Price vs Reviews and Availability', y=1.02)

# Display the plot
plt.show()



##### 1. Why did you pick the specific chart?

The **pair plot** was selected to:

* Visualize relationships between multiple numeric variables (price, number_of_reviews, availability_365, and reviews_per_month) simultaneously.

* Quickly identify patterns, clusters, and outliers.

* Explore potential linear or nonlinear correlations between variables.

* Spot skewed distributions and any potential anomalies before modeling.

* It's ideal for getting an overall sense of the data’s structure across multiple dimensions in one compact view.

##### 2. What is/are the insight(s) found from the chart?

Key insights observed:

1. Price has a right-skewed distribution with several extreme outliers.

2. There is no strong linear relationship between price and:

* number_of_reviews

* availability_365

* reviews_per_month

3. number_of_reviews vs reviews_per_month shows a moderate positive trend—listings with more reviews tend to receive them more frequently.

4. Availability (availability_365) is spread across the full 0–365 range without a strong visible relation to the number of reviews or pricing.

5. Dense vertical lines (e.g., at 365 availability) indicate common default values or cutoff thresholds in data.



## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**✅ <u>Business Objective Recommendation:</u>**

* **Boost Review Activity-**Listings with more reviews are more trusted and get more bookings. Encourage guests to leave reviews after their stay.

* **Optimize Availability-**Many listings are either rarely or always available. Focus on making listings available during high-demand periods to maximize revenue.

* **Smart Pricing + Minimum Nights-**Listings priced moderately with 1–3 night minimums tend to perform better. Avoid overly high prices or long minimum stays.

* **Promote Best-Performing Listings-**Highlight listings that are “Highly Active” (frequent bookings and reviews) in marketing campaigns or search results.

* **Monitor Multi-Listing Hosts-**Some hosts have many listings but low engagement. Ensure quality and guest experience isn't compromised for quantity.

* **Promote High-Performing Listings-**Listings with high reviews, consistent availability, and competitive pricing should be prioritized in search results.

* **Use Data Trends for Forecasting-**Time-based trends in reviews and minimum nights can help predict demand surges or market changes.Use these trends to plan promotions, onboarding waves, or platform updates.

* **Build with Data-Driven Strategy-**Use correlation and pair plots to guide modeling, recommendations, and platform changes.Focus on variables that truly impact bookings (reviews, availability, room type) rather than noise (e.g., minimum nights alone).



# **Conclusion**



✅ <u>**Conclusion:**</u>

This project provided a comprehensive analysis of Airbnb listings using visual exploration of key metrics such as availability, reviews, pricing, room types, host behavior, and listing performance. By leveraging 15 unique charts, we identified critical patterns and insights that directly impact business outcomes.

Key takeaways include:

* **Listings with frequent reviews and moderate pricing perform best.**
* **High availability does not guarantee high engagement**—strategic calendar management is essential.
* **Entire homes** are preferred over shared spaces, especially in mid-range price segments.
* Some hosts operate at scale but **underperform in guest engagement**, signaling a need for quality control.
* **Review activity and availability** strongly influence booking potential and guest trust.

By applying these insights, Airbnb can optimize platform performance by:

* Supporting high-quality hosts,
* Promoting active and well-reviewed listings,
* Implementing smarter pricing and availability tools,
* And improving overall guest satisfaction.

Ultimately, data-driven decisions will empower Airbnb to enhance user experience, increase bookings, and drive sustainable revenue growth.


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***