# **Project Name - EDA Project on AirBNB Booking Analysis**


##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name -** Ratul Dutta

# **Project Summary -**

Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present a more unique, personalized way of experiencing the world. Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data - data that can be analyzed and used for security, business decisions, understanding of customers' and providers' (hosts) behavior and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more.

This dataset has around 49,000 observations in it with 16 columns and it is a mix between categorical and numeric values.

# **GitHub Link -**

# **Problem Statement**


Lets Explore and analyze the Data Set and find some insights (Few Questions Listed Below)
1. What can we learn about different hosts and areas?
2.What we learn from room type and their prices according to area?
3. What can we learn from Data? (ex: locations, prices, reviews, etc)
4.Which hosts are the busiest and why is the reason?
5.Which Hosts are charging higher price?
6.Is there any traffic difference among different areas and what could be the reason for it?
7.What is the correlation between different variables?

#### **Define Your Business Objective?**

The primary business objective of this project is to optimize the client's Airbnb rental business in New York City by maximizing occupancy rates, profitability, and guest satisfaction. By implementing data-driven strategies and insights, the aim is to achieve higher occupancy rates, increase revenue, and ensure exceptional guest experiences, ultimately driving the success and growth of the client's business in the competitive NYC Airbnb market.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
import itertools
import calendar
from plotnine import *
import missingno as mno

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')
dataset = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/AirBNB Booking Analysis/Airbnb NYC 2019.csv')
df=dataset.copy()

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print("Dataset has",df.shape[0],"Rows &",df.shape[1],"Columns")

### Dataset Information

In [None]:
# Dataset Info
df.info('all')

In [None]:
print(type(df['last_review'][0]))

In [None]:
from datetime import datetime
df['last_review']= pd.to_datetime(df['last_review'], format = '%Y-%m-%d')
# df['last_review'] = df['last_review'].apply(lambda x: datetime.strptime(x,"%Y-%m-%d"))
#merged_df['Date1'] = merged_df['Date'].apply(lambda x: datetime.strptime(x,"%Y-%m-%d"))

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
print("This dataset has",len(df[df.duplicated()]),"Duplicate values")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum().sum()

In [None]:
# Visualizing the missing values
#create the missing value matrix
mno.matrix(df, figsize=(18,6),sparkline=False,color=(0.27,0.52,1.0))
plt.title('Missing values in dataset')
# Show the graph
plt.show()

### What did you know about your dataset?

1. Total Null values/missing values - 20141.
2. Dataset has 48895 rows and 16 columns.
3. No duplicate Values found in Dataset.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe().T

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## ***3. Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df.isnull().sum()

In [None]:
# removing the price columns which have 0 value from our dataset
df = df[df['price']!=0]

In [None]:
df['name'].fillna('empty',inplace = True)
df['host_name'].fillna('empty',inplace = True)
df['reviews_per_month'].fillna(0 , inplace = True)

In [None]:
# last_review coloumn not so important for data analysis so this coloumn should be droped
df.drop('last_review', axis = 1, inplace=True)

In [None]:
# For checking any null value is present or not
df.isnull().sum()

### What all manipulations have you done and insights you found?

1.   Next we checked for listings which had price = 0s which makes no sense.
2.   After that we replaced the null values in the 'name', 'host_name' & 'reviews_per_month' columns and dropped the 'last_reviews' column.
3.   At this stage we checked the null values and it was all fixed with a O dataframe shape (48884, 15).
4.   We found out the mean and median for all values for a particular neighbourhood group.
5.   We can see that the mean and medial for the price column usually tends ON be around the same range

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

### **Univariate Analysis**

#### Chart - 1: Column wise Histogram and Box Plot

In [None]:
# Chart - 1 visualization
for col in df.describe().columns:
  fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(18, 6))
  # Histogram plot
  sns.histplot(df[col], ax=axes[0], kde=True, color='#05204a',edgecolor='white')
  axes[0].set_xlabel(col)
  axes[0].set_ylabel('Frequency')
  axes[0].set_title("Distribution of " + col, fontsize=15)
  axes[0].grid(True, linestyle='--')

  #Boxplot
  sns.boxplot(df[col], ax=axes[1], orient='h', showmeans=True, color='#cafe48')
  axes[1].set_xlabel(col)
  axes[1].set_ylabel("")
  axes[1].set_title("Boxplot of " + col, fontsize=15)
  axes[1].grid(True, linestyle='--')

  # Adjust spacing between subplots
  plt.tight_layout()

  #Show the plots
  plt.show()


##### 1. Why did you pick the specific chart?

*   A histplot is a type of chart that displays the distribution of a dataset. It is a graphical representation of the data that shows how ofteneach value or group of values occurs. Histplots are useful for understanding the distribution of a dataset and identifying patterns ortrends in the data. It is also useful when dealing with large data sets (greater than 100 observations). It can help detect any unusualobservations (outliers) or any gaps in the data.

*   Thus we used the histogram plot to analyse the variable distributions over the whole dataset whether it's symmetric or not.

*   A boxplot is used to summarize the key statistical characteristics of a dataset, including the median, quartiles, and range, in a singleplotBoxplots are useful for identifying the presence of outliers in a dataset, comparing the distribution of multiple datasets, andunderstanding the dispersion of the data. They are often used in statistical analysis and data visualization

*   Thus, for each numerical varibale in the given dataset, I used box plot to analyse the outliers and interquartile range including median,maximum and minimum value.



##### 2. What is/are the insight(s) found from the chart?

From above distribution charts we can see that, not all columns are symmetrically distributed and the mean and median are ranging the same
is for numerical columns.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Histogram and Boxplot cannot give usnwhole information about data. It's done just to see the distribution of the column data over the dataset.

#### Chart - 2: Average Price For Each Neighbourhood Group

In [None]:
# Chart - 2 visualization code
# First grouping the dataset by neighbourhood groups
neigh = df.groupby('neighbourhood_group')['price'].mean()

sns.set(style="whitegrid")

#Create a barplot
plt.figure(figsize=(10,6))
colors = ['#272932', '#f05d5e', '#0f7173', '#9fc2cc', '#f1ecce']
ax = sns.barplot(x=neigh.index, y=neigh, palette= colors)

#set the title
plt.title('Neighbourhood Group vs. Average Price', fontsize = 16)
plt.xlabel('Neighbouhood Group', fontsize = 12)
plt.ylabel('Average Price', fontsize= 12)
plt.xticks(rotation = 30)

#Add value labels on top of each bar
for x in ax.patches:
  ax.annotate(format(x.get_height(),'.2f'), (x.get_x() + x.get_width() / 2. , x.get_height()), ha = 'center', va='center',xytext = (0,5), textcoords='offset points',fontsize = 10)

#Remove the top and right spines
sns.despine()

#Display the plot
plt.tight_layout()

##### 1. Why did you pick the specific chart?

A bar plot is an effective visualization technique for comparing the average prices of different neighborhood groups in the Airbnb 2019 NYC
data set. By representing each neighborhood group with a separate bar, the height of each bar corresponds to the average price. This visual
representation allows for easy interpretation and identification of trends. Viewers can quickly compare the heights of the bars to determine
which neighborhood groups tend to have higher or lower average prices. The clear and intuitive nature of a bar plot makes it a useful tool for
understanding and analyzing the average prices across different neighborhood groups in the data set.

##### 2. What is/are the insight(s) found from the chart?

Analyzing the average prices of different neighborhood groups in the Airbnb 2019 NYC data set using a bar plot can provide valuable insightsBy visually comparing the heights of the bars, we can identify which neighborhoods have higher or lower average prices. his information canbe crucial for both hosts and guests. Hosts can gain insights into which neighborhoods tend to command higher prices and potentially adjusttheir pricing strategies accordingly. Guests, on the other hand, can use this information to make informed decisions about where to bookaccommodations based on their budget and preferences. Additionally, he ordered arrangement of the bars can reveal trends, highlightingwhich neighborhood groups consistently have higher or lower average prices. Overall, this analysis can help stakeholders in the Airbnb marketunderstand the variations in prices across different neighborhood groups, enabling them to make data-driven decisions and optimize their
experiences in the NYC market.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing the average prices of different neighborhood groups in the Airbnb 2019 NYC data set can have a positive
business impact by helping hosts strategically set prices and maximize profits, and enabling guests to find accommodations within their
desired budget. However, there is a potential for negative growth if certain neighborhoods consistently have lower average prices, indicating a
lack of demand or less desirable features. Focusing solely on high-priced neighborhoods may lead to increased competition and saturation,
driving down prices. To mitigate these risks, hosts should consider factors beyond average prices, such as location amenities and uniqueness
of their listings, and maintain competitive pricing strategies across diverse neighborhoods.

#### Chart - 3: Neighbourhood Group That Has Highest Number Of Listings

In [None]:
# Chart - 3 visualization code
df_group = pd.pivot_table(df, index=['neighbourhood_group'], values = 'id', aggfunc=['count'], margins = True, margins_name='Total Count')
df_group

By creating a pivot table of the total number of listings grouped by the different boroughs, we learn that out of the 49,884 listings being
analyzed, 21,660 of them are located in Manhattan and 20,095 in Brooklyn. Staten Island shows up fast with only 278 listings
We can use a pie chart to better visualize how the numbers are distributed in percentages:


In [None]:
df_pie = df.groupby('neighbourhood_group')['id'].count()
plt.figure(figsize=(10, 8))
colors = ['#272932', '#f05d5e','#0f7173','#9fc2cc','#f1ecce']
explode = (0, 0, 0.1, 0, 0) # Explode the first slice for emphasis
sns.set(style="whitegrid")
sns.set_palette(sns.color_palette(colors))

df_pie.plot.pie(labels=df_pie.index, autopct='%1.0f%%', startangle=0, explode=explode, wedgeprops=dict(width=0.3), shadow=True)
plt.ylabel('Neighbourhood Group', fontsize=12)
plt.title("Total Listings by Neighbourhood Group", fontsize =16)
plt.axis('equal') # Equal aspect ratio ensures a circular pie

plt.show()

##### 1. Why did you pick the specific chart?

A pie plot is a suitable visualization for determining the neighborhood group with the highest number of listings in the Airbnb 2019 NYC daIt represents the distribution of listings through proportional slices, where each slice corresponds to a neighborhood group and its size retthe relative number of listings. By comparing the sizes of the slices, it becomes evident which neighborhood group has the largest slice arthus the highest number of listings. The clear and intuitive nature of the pie plot allows viewers to easily grasp the distribution and identifmost popular neighborhood group for accommodations in NYC.

##### 2. What is/are the insight(s) found from the chart?

Analyzing the distribution of listings across neighborhood groups in the Airbnb 2019 NYC dataset using a pie plot can provide valuable insBy visualizing the proportional representation of each neighborhood group, we can identify the group with the highest number of listings,offering valuabie iou" tulnsen ta nntimize their listings byvaluable information about the popularity and demand for accommodations in different areas of the city. This insight can beadvantageous for hosts looking to optimize their listings by focusing on the most popular neighborhood group. Additionally, guests can befrom this information by knowing which neighborhood group offers a wider range of choices and potentially better availability. It also allows stakeholders to identify potential opportunities for business expansion or to better understand the dynamics of the NYC Airbnb market.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing the distribution of listings across neighborhood groups in the Airbnb 2019 NYC dataset using a pie plot can
have a positive business impact by allowing hosts to focus on the most popular neighborhood group and optimize their marketing efforts.
Concentrating resources in the group with the highest number of listings can increase bookings and profitability. However, there is a potential
for negative growth if one neighborhood group dominates, leading to oversupply, intense competition, and potential decreases in prices and
profitability. It may also limit diversity and opportunities for hosts in other neighborhood groups. To mitigate these risks, hosts should consider
factors beyond just the number of listings, maintain a balanced approach, and adapt to market dynamics to ensure sustainable growth.

#### Chart - 4: Relation Between Average Availability and Price

In [None]:
# Chart - 4 visualization code

#Group the data by 'neighbourhood_group' and calculate the average availability
avail_group = df.groupby('price')['availability_365'].mean()

#Creating a bar plot
plt.figure(figsize=(10,6))
sns.lineplot(x=avail_group.index, y=avail_group.values, color='#f05d5e')

plt.title("Average Availability of Listings bt Price")
plt.xlabel('Price')
plt.ylabel("Average Availability (in days)")

plt.show()

##### 1. Why did you pick the specific chart?

By utilizing a line plot, analysts can gain insights into how the average availability of accommodations in different neighborhood groups is
influenced by their respective price ranges. This information can assist hosts and guests in making informed decisions based on the
relationship between price and availability in the Airbnb 2019 NYC dataset.

##### 2. What is/are the insight(s) found from the chart?

The line plot shows how availability changes with price for different neighborhood groups in the Airbnb 2019 NYC dataset. It reveals whether
higher prices are associated with higher or lower availability and identifies neighborhood groups with consistently high availability. The plot also
highlights optimal price ranges for maximizing availability within each group. Comparing patterns across neighborhood groups helps
understand variations in the availability-price relationship, aiding decision-making for hosts and guests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the line plot can have a positive business impact by optimizing pricing strategies and helping guests make informed choices. However, there is a potential for negative growth if higher prices lead to significantly lower availability or if the optimal price range fallsoutside the desired range for guests. Competition among hosts may also rive down prices. Balancing pricing strategies, considering uniquefeatures, and diversifying offerings can mitigate risks and promote positive growth.

### **Bivariate Analysis**

#### Chart - 1: Average Price Of Airbnb Listings in NYC in 2019

In [None]:
# Chart - 1 visualization code
df1 = df[['neighbourhood_group','room_type','price']]

#Group the data by neighbourhood_group and room_type, and calculate the mean price
df1 = df1.groupby(['neighbourhood_group','room_type'], as_index = False)[['price']].mean()

#Create a figure and the set the theme to white
plt.figure(figsize=(12,6))
sns.set_theme(style = 'white')

#Creating plaette

my_pal = {'Entire home/apt':'#f05bfe','Private room':'#ead94c','Shared room':'#0f7173'}
df1 = sns.barplot(x="neighbourhood_group", y="price", data=df1, hue="room_type",palette=my_pal)

for p in df1.patches:
  df1.annotate(format(p.get_height(),'.2f'),(p.get_x() + p.get_width()/2., p.get_height()), ha='center', va ='center',size =11,xytext=(0,-12), textcoords = 'offset points')

plt.xlabel("Neighbourhood Group")
plt.ylabel("Average Price")
plt.title('Average Price By Room Type')

plt.show()

##### 1. Why did you pick the specific chart?

A bar plot is used in the Airbnb 2019 NYC dataset to visualize the distribution of listing prices across different room types. This plot allows for a
clear comparison of prices among the categories, with each room type represented by a separate bar. The height of each bar indicates the
range or average price for that room type, enabling viewers to identify variations in pricing across different types of accommodations. The bar
plot provides valuable insights into the price dynamics and trends associated with each room type, assisting hosts and guests in making
informed decisions based on their budget and preferences.

##### 2. What is/are the insight(s) found from the chart?

The insights gained from the bar plot depicting the distribution of listing prices across different room types in the Airbnb 2019 NYC dataset canprovide valuable information. The plot reveals the price ranges and differences between room types, allowing guests to identifyaccommodations that align with their budget. It also highlights popular room types and potential outliers, providing an understanding of marketpreferences and unique pricing patterns. This information assists both hosts and guests in making informed decision s regarding pricingstrategies and accommodation choices, ultimately enhancing the overall Airbnb experience in NYC.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing the distribution of listing prices across different room types using a bar plot in the Airbnb 2019 NYC dataset
can have a positive business impact by optimizing pricing strategies and meeting guest preferences. However, there is a potential for negative
growth if certain room types are consistently overpriced or if intense price competition arises. Striking a balance between competitiveness and
profitability is crucial for hosts to mitigate these risks and maintain positive business growth.

#### Chart - 2: Top Monthly Reviewed Room Types in Each Neighborhood

In [None]:
# Chart - 2 visualization code

g, vx = plt.subplots(figsize=(10,8))

vx = sns.stripplot(x='room_type', y='reviews_per_month', hue='neighbourhood_group', dodge=True, data = df, palette= ['#272932', '#f05d5e','#0f7173','#9fc2cc','#f1ecce'])

vx.set_title("Most monthly-reviewed room types in each neighborhood")

sns.despine(right=False, bottom=True)

plt.show()

##### 1. Why did you pick the specific chart?

By utilizing a strip plot, analysts can gain insights into the most reviewed room types in each neighborhood per month in the Airbnb 2019 NYC
dataset. This visualization helps hosts understand which room types are in high demand and can guide their marketing and investment
decisions. It also provides valuable information for guests seeking popular accommodations in specific neighborhoods and time periods.

##### 2. What is/are the insight(s) found from the chart?

The insights gained from the strip plot showcasing the most reviewed room types in each neighborhood per month in the Airbnb 2019 NYCdataset provide valuable information. The plot reveals popular room types, seasonal trends, neighborhood preferences, and potential outliers.This knowledge helps hosts and guests understand demand patterns, make informed decisions, and tailor their offerings or choicesaccordingly. The insights from the chart can guide marketing strategies and enhance guest satisfaction in the NYC Airbnb market.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing the most reviewed room types in each neighborhood per month using a strip plot in the Airbnb 2019 NYC
dataset can help create a positive business impact. By understanding popular room types and tailoring marketing efforts accordingly, hosts can
attract more guests and increase bookings, leading to business growth. However, there is a potential for negative growth if limited availability of the most reviewed room types or overlooking other room type occurs. To mitigate these risks, hosts should maintain a diverse range of offerings and adapt to market demands to ensure continued positive bussines growth.

#### Chart - 3: Distribution of Listing Prices Across Room Types

In [None]:
df['price'].mean()

In [None]:
# Chart - 3 visualization code

df_price=df[(df['price']<400)]

plt.figure(figsize=(10,6))
sns.set_theme(style="whitegrid")

colors=["#f05d5e","#f1ecce","#9fc2cc"]
sns.violinplot(x="room_type",y="price",data=df_price,palette=colors)

plt.xlabel('Room Type',fontsize=12)
plt.ylabel('Price', fontsize=12)
plt.title('Distribution of Price by Room Type', fontsize = 14)

plt.xticks(fontsize=11)
plt.yticks(fontsize=11)

sns.despine(right=False)

plt.tight_layout()
plt.show()

In [None]:
# Group the dataset vy room type and calculating avg price
room_avg_price = df.groupby('room_type')['price'].mean().reset_index()

sns.set(style = 'whitegrid')
pal=['#f05d5e','#f1ecce','#9fc2cc']

plt.figure(figsize=(10,5))
bx = sns.barplot(x='room_type', y='price',data = room_avg_price, palette = pal)

# Add Labels to the bar
for x in bx.patches:
  bx.annotate(format(x.get_height(),'.2f'),(x.get_x() + x.get_width()/2., x.get_height()), ha='center', va ='center',size =11,xytext=(0,-12), textcoords = 'offset points')

plt.title("Average Room Price By Type", fontsize=16)
plt.xlabel("Room Type", fontsize = 12)
plt.ylabel('Average Price', fontsize = 12)

#Remove the top and right Spine
sns.despine()

plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

By utilizing a violin plot, analysts can gain insights into how the listing prices vary across different room types in the Airbnb 2019 NYC dataset. It
helps in understanding the range, shape, and skewness of the price distributions for each room type, enabling hosts and guests to make
informed decisions based on their pricing preferences and expectations.

##### 2. What is/are the insight(s) found from the chart?

The violin plot depicting the distribution of listing prices across different room types in the Airbnb 2019 NYC dataset provides valuable insights.
It reveals the variation in price ranges, central tendency, skewness, and potential outliers for each room type. This information helps hosts and
guests understand the pricing landscape, make informed decisions based on their budget and preferences, and identify unique or exceptional
accommodations. Overall, the violin plot enhances the understanding of listing prices across room types, contributing to an improved Airbnb
experience in NYC.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing the distribution of listing prices across different room types using a violin plot in the Airbnb 2019 NYC
dataset have the potential to create a positive business impact. By optimizing pricing strategies based on the observed price ranges and
distributions, hosts can attract more bookings and increase revenue. However, there is a risk of negative growth if certain room types are consistently overpriced, leading to decrease demand, or if intense price competition arises. Hosts should strike a balance between competitiveness and profitability, consedering market dynamics and guest prefernces to ensure sustainable bussiness growth.

### **Multi Variate Analysis**

#### Chart - 1: Property Price Variations Across NYC Locations

In [None]:
# Chart - 1 visualization code
from skimage.io import imread
from matplotlib.colors import ListedColormap

plt.figure(figsize=(10,8))

# Load the map image using the imread function
img = imread("/content/drive/MyDrive/Colab Notebooks/AirBNB Booking Analysis/new-york-city-map.jpg")

# Define coustom colors
coustom_colors= ['#272932','#f05d5e','#0f7173','#9fc2cc','#f1ecce']

# Create a coustom color map
coustom_cmap=ListedColormap(coustom_colors)

#Display the map image
plt.imshow(img, zorder=0,extent=[df['longitude'].min(),df['longitude'].max(),df['latitude'].min(),df['latitude'].max()])
nx = plt.gca()

# Create a acatter plot of latitude and longitude with price as the color
df.plot(legend=False, kind='scatter', x='longitude', y='latitude', label='price_trends', c='price', ax=nx, cmap=coustom_cmap, colorbar=True, alpha=0.7, figsize=(10,8))
plt.title("Price variation using Spatial Data")
plt.legend()

plt.show()

##### 1. Why did you pick the specific chart?

The specific chart chosen is a scatter plot overlaid on a map image of New York City. This chart was selected to visualize the geographical
distribution of Airbnb rental prices across different neighborhoods. It combines location information (latitude and longitude) with the price
variable, providing a spatial understanding of price trends.

##### 2. What is/are the insight(s) found from the chart?

Geographical price distribution: The chart shows the variation in rental prices across different neighborhoods in New York City.

Spatial patterns: It helps identify clusters of high or low-priced rentals in specific areas of the city.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the chart can have a positive business impact by guiding pricing strategies, targeting specific markets, and identifying
investment opportunities. However, it's crucial to consider areas with consistently low prices or oversaturation of listings to avoid negative
growth prospects.

#### Chart - 2: Top 30 Airbnb Hosts By Average Review Rates, Exploring Review Rate Distribution Across Room Types

In [None]:
# Chart - 2 visualization code
# Create a series of top 30 hosts by review

review=df.groupby('host_id')['reviews_per_month'].mean().head(40).reset_index()
review.sort_values('reviews_per_month', ascending=False, inplace=True)
review = review.iloc[:,0]

# Create a new dataframe containing information of only these top 30 hosts
top_host=df.loc[df['host_id'].isin(review)]
top_host=top_host.sort_values('reviews_per_month')

#Define a Coustom palette
color = ['#272932','#f05d5e','#0f7173','#9fc2cc','#f1ecce']

#create a catplot to visualize the review rates of these top 30 hosts by room type
fig = sns.catplot(data=top_host,x='host_id',y='reviews_per_month', col='room_type', kind='bar', height=2, aspect=8, palette=color)
fig.set_xticklabels(rotation = 90)
fig.fig.set_size_inches(20,10)

##### 1. Why did you pick the specific chart?

The specific chart chosen is a catplot bar chart that visualizes the review rates of thg top 30 hosts by Toom type. This chart was selected tocompare the review rates of these hosts across different room types and identify any variations or patterns.

##### 2. What is/are the insight(s) found from the chart?

Comparison by room type: The chart allows for a comparison of review rates among the top hosts across different room types, such as
entire homes/apartments, private rooms, and shared rooms.

Variations in review rates: The chart reveals any discrepancies in review rates among the top hosts, highlighting differences in
performance across room types.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Quality Improvement: Hosts can indentify areas for improvement based on review rates and focus on enhancing the guest experience in specific room types with lower rates.

Competitive advantages: Higher review Rates can be leveraged as a competitive advantage in marketing efforts to attract more guests and build strong reputation.

#### Chart - 3: Price Variation with Minimum Nights, Reviews, and Availability

In [None]:
# Chart - 3 visualization code
columns = ['price', 'minimum_nights', 'number_of_reviews','availability_365']
colors = ['#f05d5e', '#481d24', '#9fc2cc']

# Create a dataframe with the selected columns
data = df[columns]

# Normalize the data for better visualization
data_normalized = (data - data.mean()) / data.std()

# Add a target column for color mapping
data_normalized['target'] = df["room_type"]

# Create a scatter matrix plot
sns.set_theme(style="ticks")
sns.pairplot(data_normalized, hue='target', palette= colors)

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

The chosen chart is a pairplot that visualizes the relationships between multiple variables (price, minimum nights, number of reviews, and
availability) in a scatter matrix. It was selected to explore the correlations and distributions among these variables and observe any patterns or trends.

##### 2. What is/are the insight(s) found from the chart?

Variable relationships: The chart allows for the examination of relationships between different variables. For example, it helps identify if
there is a correlation between price and availability, or between the number of reviews and minimum nights.

Distributions: The chart provides insights into the distributions of each variable along the diagonal. It helps determine if the variables
follow a normal distribution or exhibit skewness.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Pricing strategy: By analyzing the relationship between price and other variables, businesses can adjust their pricing strategies accordingly. For example, they can determine if higher prices are correlated with longer minimum nights or higher availability.

Operational decisions: Understanding the relationship between minimum nights, availability, and number of reviews can inform operational decisions such as adjusting minimum stay requirements or managing availability to optimize guest satisfaction and maximize bookings.

### **Correlation Heatmap**

In [None]:
colors = ['#f1ecce','#9fc2cc','#0f7173','#0f7173','#f05d5e','#272932']
plt.figure(figsize=(10,7))
corr = df.corr()

#Visualize the correlation function
sns.heatmap(corr, cmap=colors, annot=True)

plt.show()

##### 1. Why did you pick the specific chart?

The correlation heatmap was chosen because it provides a comprehensive overview of the relationships between variables in the dataset. It allows for easy identification of patterns and associations between different features, helping to uncover potential insights and understand the underlying structure of the data.

##### 2. What is/are the insight(s) found from the chart?

The heatmap reveals insights such as positive correlations between variables indicated by lighter colors, negative correlations represented by darker colors, and weak or no correlations in areas without distinct color patterns. These insights help understand the interdependencies between variables and can inform decision-making processes.

### **Pairplot**

In [None]:
# Create pairplot using seaborn libary to visualize the relationships between different variables in dataset
sns.pairplot(df)

plt.show()

##### 1. Why did you pick the specific chart?

The pair plot was chosen to visualize the relationships between different variables in the Airbnb NYC dataset in a concise and comprehensive manner.

##### 2. What is/are the insight(s) found from the chart?

The pair plot provides insights into the correlations and distributions between variables. It allows us to observe patterns such as linear relationships, clusters, and outliers. These insights can help identify potential associations and dependencies between variables, which can be useful for further analysis and decision-making processes.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the analysis of the NYC Airbnb dataset, here are some suggestions to help the client achieve their business objectives:

1. Optimize Pricing Strategy: Analyze the relationship between pricing and factors such as location, property type, and amenities. Adjust prices accordingly to maximize occupancy rates and profitability. Consider implementing dynamic pricing strategies based on demand and seasonality.

2. Improve Listing Quality: Enhance the attractiveness and competitiveness of listings by improving the quality of descriptions, photos, and amenities. Highlight unique selling points and ensure accurate and detailed information to attract potential guests.

3. Enhance Customer Experience: Focus on providing excellent customer service and ensuring a positive experience for guests. Promptly address any issues or concerns, and encourage guests to leave reviews to build credibility and attract more bookings.

4. Maintain Property Availability: Ensure a consistent availability of properties throughout the year, especially during peak tourist seasons and events. Plan maintenance and renovation schedules to minimize disruptions and maximize occupancy rates.

5. Monitor Competitor Activity: Stay updated on the offerings and prices of competitors in the local market. Adjust strategies accordingly to remain competitive and capture potential guests.

6. Consider Special Offers and Discounts: Implement promotional offers, discounts, or loyalty programs to attract guests and encourage repeat bookings. Monitor the impact of these offers on occupancy rates and profitability.

7. Stay Compliant with Regulations: Stay informed about local regulations and legal requirements for short-term rentals. Ensure compliance with tax obligations, licensing, and zoning laws to avoid any legal issues.

8. Seek Guest Feedback: Continuously seek feedback from guests to identify areas for improvement and address any recurring issues. Use guest feedback to enhance the overall guest experience and make necessary adjustments to property management.

9. Foster Positive Community Relations: Establish positive relationships with the local community by being respectful responsible, and considerate. Encourage guests to be mindful of the neighborhood and local regulations to maintain good community relations.

By implementing these suggestions, the client can optimize their business operations, attract more guests, and achieve their business objectives of maximizing occupancy rates profitability, and guest satisfaction.



# **Conclusion**

The 2019 Airbnb NYC project aimed to provide a comprehensive analysis of the short-term rental market in New York City using a dataset that includes various attributes of Airbnb listings. By examining the information available, we were able to gain valuable insights into the trends and patterns within the industry.

Firstly, we observed that there was a wide range of unique listings in the dataset, each identified by a unique ID. These listings had diverse names, reflecting the creativity of hosts in branding their accommodations. The project also captured the unique host IDs and names, allowing us to understand the distribution of hosts and their involvement in the Airbnb market.

Location played a crucial role in the Airbnb market, and the dataset included information on both the neighborhood group and specific neighborhood of each listing. This allowed us to analyze the popularity of different areas and understand the distribution of listings across the city. By utilizing the latitude and longitude ranges, we were able to visualize the geographical spread of the listings and identify areas with high concentration or scarcity of Airbnb accommodations.

Room types provided insights into the variety of accommodations available on Airbnb. The dataset included information on the type of room listed, such as entire homes, private rooms, or shared rooms. This analysis helped us understand the preferences of hosts and guests and provided valuable information on the diversity of options for travelers in NYC.

Price was a significant aspect of the project, as it directly impacted both hosts and guests. The dataset included the price of each listing. allowing us to analyze the range and distribution of prices across different neighborhoods and room types. This information helped us identify areas with higher-priced accommodations and understand the factors that influenced pricing.

Minimum nights indicated the minimum duration for which guests had to book a listing. By examining this attribute, we gained insights into the preferences of hosts and the demand for longer stays in the Airbnb market.