# **Project Name**    -  **AirBnb Bookings Analysis**



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name -**          - Santosh Puri


# **Project Summary -**

In the project we are analyzing Airbnb’s New York City(NYC) data of 2019. NYC is not only the famous city in the world but also has top global destination for visitors attracted to its museums, entertainment, restaurants, UN offices and commerce.

The project began with a comprehensive understanding of the Airbnb dataset, including data size, information like properties and their availability, price, location, reviews and ratings, exploring data related to Airbnb listings, including the number of properties listed, host characteristics, the variety of amenities available, and the occupancy rate of different properties etc. Further analysis of data to understand the significance of the reviews left by Airbnb users.

Exploratory data analysis projects on Airbnb typically involve investigating patterns and trends in various aspects of the platform, such as pricing, popularity and availability of listings. This data can be used to gain insights into consumer behavior and preferences, as well as to inform marketing and business strategies for hosts and Airbnb as a company. Techniques such as data visualization and objective solution may be used to analyze the data and draw meaningful conclusions.

In this type of analysis, data visualizations such as line plots, scatter plots, and bar charts are used to help identify trends, patterns, and relationships in the data. For instance, a bar chart can be used to show the distribution of properties across different neighbourhoods in a city.

Overall, the exploratory data analysis provides crucial insights for the Airbnb platform to improve customer satisfaction and enhance rental revenues. The insights also benefitted renters who can use the data generated to gain a deeper understanding of the landscape and make informed decisions.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**The purpose of this exploratory data analysis project is to analyze and examine the factors that influence customer bookings and preferences. The dataset used in this analysis includes information on customer demographics, subscription room type and location, minimum stays, retention rate and experience with service.**

**The aim is to identify insights and patterns in the data that can help the company understand the drivers of customer retention and inform future decision-making regarding host listing, location, price and customer service and marketing strategies.**

#### **Define Your Business Objective?**



1.   Recommending marketing campaign strategies and predicting the destination neighbourhood which are in high demand.

2.   Using Exploratory Data Analysis, find out the most demanded room type, neighbourhood_group.

3.   Find the average days guests prefer to stay in single visit in different room type in varied neighbourhood_group.
4.   Find out the most sought after Price bracket in which maximum booking happens and get most reviews.


5. Find the neighbourhood_group in which maximum listings done by top hosts? Specify the reason behind it with your insight.





# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')


### Dataset Loading

In [None]:
# Load Dataset
file_path = '/content/drive/MyDrive/Data/Airbnb NYC 2019.csv'
airbnb_df = pd.read_csv(file_path)

### Dataset First View

In [None]:
# Dataset First Look
airbnb_df.head()

In [None]:
# Cheking what are the variables here:
airbnb_df.columns

*   **so now first rename few columns for better understanding of variables -**

In [None]:
rename_col = {'listing_id':'id','listing_name':'name','total_reviews':'number_of_reviews','host_listing_count':'calculated_host_listing_count'}

In [None]:
# Use a pandas function to rename the current function
airbnb_df = airbnb_df.rename(columns = rename_col)
airbnb_df.head(2)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
airbnb_df.shape

### Dataset Information

In [None]:
# Dataset Info
airbnb_df.info()

**So, host_name, neighbourhood_group, neighbourhood and room_type fall into categorical variable category.**

**While host_id, latitude, longitude, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, host_listings_count, availability_365 are numerical variables**

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(airbnb_df[airbnb_df.duplicated()]) # at this point looks like doesn't have duplicatef values

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(airbnb_df.isnull().sum()) # only for columns has null value

In [None]:
# Visualizing the missing values
# Checking Null Values by plotting Heatmap
sns.heatmap(airbnb_df.isnull(), cbar=False)

**host_name** and **listing_name** are not that much of null values, so first  we are good to fill those with some substitutes in both the columns first.

In [None]:
airbnb_df['name'].fillna('unknown', inplace=True)
airbnb_df['host_name'].fillna('no_name', inplace=True)

In [None]:
# So the null values are removed
airbnb_df[['host_name','name']].isnull().sum()

now, the columns **last_review** and **reviews_per_month** have total 10052 null values each.

**last_review** column is not required for our analysis as compared to **number_of_reviews** & **reviews_per_month**. We're good to drop this column.

**listing_id** also not that much of important for our analysis but i dont remove because of **listing_id** and **listing_name** is pair and removing listing_id it still wont make much difference.

In [None]:
airbnb_df = airbnb_df.drop(['last_review'], axis=1) #removing last_review column beacause of not that much important

The **reviews_per_month** column also containing null values and we can simple put 0 reviews by replacing NAN's

In [None]:
airbnb_df['reviews_per_month'] = airbnb_df['reviews_per_month'].replace(to_replace=np.nan,value=0).astype('int64')

In [None]:
# checking null values of each columns
airbnb_df.isnull().sum()

### What did you know about your dataset?

*   **id:** Unique Id

*   **name:** Name Of The Listing

*   **host_id:** Unique host_id

*   **host_name:** Name Of The Host
*   **neighbourhood_group:** Location


*   **neighbourhood:** Area

*   **latitude:** latitude Range


*   **longitude:** Longtitude Range


*   **room_type:** Type Of Listing

*   **price:** Price Of Listing

*   **minimum_nights:** Minimum nigths to be paid for

*   **number_of_reviews:** Number of review
*   **last_review:** Content of the last review


*   **reviews_per_month:** Number of Checks per month


*   **calculated_host_listings_count:** Total Count

*   **availability_365:**Availability around the year















## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb_df.columns

In [None]:
# Dataset Describe
airbnb_df.describe(include='all')

## **Variables Description**

***So, we get to know that some columns falls under categorical and remaining are numeriacal except one last_review comes under Date_Time category.***



*   Categorical variable : name, host_name, neighbourhood_group, neighbourhood, room_type, price

* Numerical variable : id, host_id, price, minimum_nights, number_of_reviews, reviews_per_month, calculated_host_listings_count, availability_365
*  Date_Time variable : last_review


*   Coordinates : latitude, longitude



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
airbnb_df['id'].nunique()

In [None]:
airbnb_df['name'].nunique() # show some listing are common

In [None]:
airbnb_df['host_id'].nunique() # shows that as many as 20k host_id are reapeted

In [None]:
airbnb_df['host_name'].unique()
airbnb_df.host_name.nunique()   #11.5k hosts and 49k listings shows single host have multiple listings.

In [None]:
(airbnb_df['neighbourhood_group'].value_counts()) # area of the city

In [None]:
airbnb_df['neighbourhood'].nunique()  #no. of neighbourhood

In [None]:
airbnb_df["room_type"].value_counts() # room type listing count

In [None]:
price_value_counts = airbnb_df["price"].value_counts()

price_value_counts.sort_index()      #shows 673 different prices ranging from 0 to 10k


In [None]:
airbnb_df["calculated_host_listings_count"].unique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Dropping unnecesesary columns
airbnb_df.drop(['id'], axis=1, inplace=True)

In [None]:
# Examining changes after droping unnecessary columns
airbnb_df.head()

In [None]:
airbnb_df.tail()

In [None]:
len(airbnb_df[airbnb_df['availability_365']==0])

In [None]:
len(airbnb_df[airbnb_df['price']==0])

In [None]:
len(airbnb_df[airbnb_df['price']<10])

In [None]:
index_names = airbnb_df[(airbnb_df["price"]==0)].index
airbnb_df.drop(index_names, inplace = True)

In [None]:
len(airbnb_df[airbnb_df['price']>=500])

In [None]:
airbnb_df.info()

In [None]:
airbnb_df['host_name'].value_counts()[:5] # top 5 host listing counts in entries

In [None]:
#Maximum listings by hosts in entire dataset with unique listings within neighbourhood_group and this table gives partial answer for 5th objective.
hosts_listings = airbnb_df.groupby(['host_name','host_id','neighbourhood_group'])['calculated_host_listings_count'].max().reset_index()
hosts_listings.sort_values('calculated_host_listings_count', ascending=False,).head(10)

In [None]:
airbnb_df.loc[(airbnb_df['neighbourhood_group']=='Manhattan') & (airbnb_df['host_name']=='John')]
#Same hosts have many listings in same neighbourhood_groups with different room type or same/different room_type in other neighbporhood


### What all manipulations have you done and insights you found?

**From the above experiments we get some more insights like :**
*   Entire home/apt have highest listings followed by Private room and Shared room have miniscule conteibution.
*   Unique host_name number and above experiment shows that one host have many roomtype and/or more than one listing in same and/or different neighbourhood.
*   Overall listing has distibuted in 5 neighbourhood_group which are having over 200 neighbourhoods. Its price goes upto 10k.
*   After inspection I figured out that a particular property name have one particular host_name hosted by that same individual but a particular host_name can have multiple properties in a neighbourhood_group or neighbourhood.
*   From the unique ids we get to know that all the property ids are different and each listings are different here.
*   By experimenting we get to know that in columns "price" and "availability_365" shows zero cost and not available throughout year respectively. Hosts not available round the justifies but zero Price doesn't.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1    **Top neighbourhoods in entire NYC on the basis of count of listings Using Lineplot**

In [None]:
top_20_neigbours= airbnb_df['neighbourhood'].value_counts()[:20] #checking top 20 neighbourhoods on the basis of no of listings in entire NYC!
top_20_neigbours.plot(kind='bar',color='chocolate')
plt.xlabel('neighbourhood')
plt.ylabel('counts in entire NYC')
plt.title('Top neighbourhoods in entire NYC on the basis of count of listings')

##### 1. Why did you pick the specific chart?

This chart shows the which neighbourhood had most no of booking.

##### 2. What is/are the insight(s) found from the chart?

Williamsburg ,Bedford-Stuyvesant and Harlem are top 3 neighbouhood which has most no of booking

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

With the help of this data we can find target places to improve services at those and it will also help to get profit by limiting the advertisement market area so we can minimize the cost and increase the profit.

#### Chart - **(2) average price in different neighbourhood_groups using Bar plot**

In [None]:
avg_price = airbnb_df.groupby(['neighbourhood_group'])['price'].mean()
a = avg_price.plot.bar(figsize=(5,5), fontsize = 10)
a.set_xlabel("neighbourhood group", fontsize=11)
a.set_ylabel("average price",fontsize=11)
a.set_title("averae price in different neighbourhood groups", fontsize=12);

##### 1. Why did you pick the specific chart?

This chart shows price difference between the neighbourhood groups

##### 2. What is/are the insight(s) found from the chart?

The chart shows that Manhattan has the most average price in the neighbourhood group. This shows the demand for the Manhatten over others.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The higher average price shows more listing price can be feasible for making more profit as well as more listings can be done in the high demand neighboourhood. And spend on offline advertising accordingly.

#### Chart - **%_of_booking_neighbourhood_group using pie chart**

In [None]:
# Chart - 3 visualization code
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(6, 22))
ax = axes.flatten()

no_of_booking_neighbourhood_group = airbnb_df["neighbourhood_group"].value_counts()
no_of_booking_neighbourhood_group.plot.pie(autopct='%1.2f%%', ylabel = "",figsize=(8,8), ax=ax[0])
plt.title('%_of_booking_neighbourhood_group', fontsize = 12)

airbnb_df['neighbourhood_group'].value_counts().plot(kind='bar', figsize =(15,6), color=['r','b','y','g','m'], ax=ax[1]);

##### 1. Why did you pick the specific chart?

This chart shows the percentages of booking done in different neighbourhood group as well as second chart gives information about which Neighbourhood group has how many bookings.



##### 2. What is/are the insight(s) found from the chart?

This chart shows that only Manhattan and Brooklyn have more than 85% of orders. This way we can list more properties by targetting particular Neighbourhood group at the begining.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights will help to increase business by targeting particular area of market which have higher number of orders and booking. Keep track of no of listings in the area, keep eye on percentage occupancy to make sure the demand and supply doesn't mismatch.

#### Chart - 4 most demanded room type using Bar plot

In [None]:
# Chart - 4 visualization code for finding most demanded room type
most_demand_room = airbnb_df.groupby(['room_type'])["host_id"].count()
b = most_demand_room.plot.bar(figsize = (4,4), fontsize=10)
b.set_xlabel('room type', fontsize = 12)
b.set_ylabel("no of booking", fontsize = 12)
b.set_title("most demanded room type", fontsize= 14);

##### 1. Why did you pick the specific chart?

This chart shows that which room type is in most demand.

##### 2. What is/are the insight(s) found from the chart?

From this chart we found that shared room is least in demand and most demand room type is entire house/apt, people like to rent a entire house/apt followed by shared room. So data suggests to make sure availability and service of demanded room type.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It help us to know that most people want entire house/apt for rent so that according to our data we will try to make/list/availability most no. of rental properties as entire house/apt or private rooms.

#### Chart - 5 Aaverage prices of different room types

In [None]:
# Chart - 5 visualization code  this time we will try to find out average prices of different room types

new = airbnb_df.groupby(["room_type"])['price'].mean()
chart5 = new.plot.bar(figsize = (4,4), fontsize = 10)
chart5.set_xlabel("room type", fontsize = 12)
chart5.set_ylabel('average price', fontsize = 12)
chart5.set_title('average price in different room type', fontsize = 14);

##### 1. Why did you pick the specific chart?

To get the idea of average price of room type wrt their demand/occupancy as

showed in earlier one chart.

##### 2. What is/are the insight(s) found from the chart?

This Chart shows shows the average mean price of Entire home/apt is higher as compared to remaining two room types. This also give us profit oriented new listing approach for all stakeholders benefits.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This Chart attracts special attention to make sure the avalability as well as best service for this room type for overall positive customer sentiments and good reviews.

#### Chart - 6  Average Stays in different room types

In [None]:
# Chart - 6 visualization code Average Stays in different room types
airbnb_df.groupby('room_type')['minimum_nights'].mean().plot(figsize=(4,4), kind='bar', color='chocolate')
plt.title('Average Stays in different room types', fontsize = 14)
plt.xlabel('Room types', fontsize = 12)
plt.ylabel('Average Stays' , fontsize = 12);

##### 1. Why did you pick the specific chart?

We choose this chart to show the average nights stays in different room types.

##### 2. What is/are the insight(s) found from the chart?

This chart gives us information about making adequate facilities for average days and when respective room type going to available again for booking.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

By this chart we can derive business profit by making basic calculations like no of times respective room type can be available for booking in a month and accordingly we have to decide booking price considering expenses.

#### Chart - 7 Visualization of each neighbourhood_group using latitude and longitude

In [None]:
# Chart - 7 visualization code
# chart visualization of each neighbourhood_group using latitude and longitude
plt.figure(figsize=(12,6))
sns.scatterplot(x = airbnb_df['longitude'], y = airbnb_df['latitude'], hue = airbnb_df['neighbourhood_group'])
plt.show()

##### 1. Why did you pick the specific chart?

This chart shows the location wrt longitude and latitude of different neighbourhood groups in the city.

##### 2. What is/are the insight(s) found from the chart?

The Scatter plot shows Manhattan and Brooklyn has almost similar longitude that's why they both garner almost 85% of bookings. And Staten Island belongs to outskirts so has less bookings as well as listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This chart shows locations nearer to the prime hotspots can garner more bookings hence we try hard to list more number of demanding room type and increase overall revenue by attracting customer.

#### Chart - 8  Room Availability throughout Neighbourhood/Room Type

In [None]:
# Chart - 8 visualization code Room Availability throughout Neighbourhood/Room Type
fig, axes = plt.subplot(nrows = 1, ncols = 2, figsize=(22,6))
ax = axes.flatten()

sns.lineplot(data=airbnb_df, x='neighbourhood_group', y='availability_365', hue='room_type', ax=ax[0])
ax[0].set_title('Room Availability throughout Neighbourhood/Room Type')

sns.scatterplot(data = airbnb_df, x='price', y='number_of_reviews', hue='room_type', ax=ax[1])
ax[1].set_title('price vs Number of Reviews')
sns.despine(fig, left=True)

##### 1. Why did you pick the specific chart?


In first chart shows how neighbourhood group is busy or available for booking throughout year. In later we tried to draw relationshiop between price and no of reviews

##### 2. What is/are the insight(s) found from the chart?

First chart shows Statan Island is busiest among all even for least demanded shared room and Manhattan and Brroklyn shows descent bookings and availability.

Second chart shows negative relation between Price and no of reviews. Usually cheaper rooms has more occupancy hence more reviews.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The occupancy and/or vacancy in respective areas for different room type can draw attention for different required solution like listing and/or delisting room_type from neighbourhood group. And Price vs Reviews shows booking volume for resonable Booking Price hence revenue generation Price point range.

#### Chart - 9 No. of Properties Available 365 days

In [None]:
# Chart - 9 visualization code
fig = plt.subplots(figsize=(6,6))

sns.countplot(data=airbnb_df[airbnb_df["availability_365"] == 365], x='neighbourhood-group', hue='room_type', palette='GnBu_d')
plt.title('No. of properties Available 365 days', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?

This chart will give us idea about availability of room_type in respective neighbourhood group.

##### 2. What is/are the insight(s) found from the chart?

This subplot gives us clear picture i.e. "Trend" of room_type available most during a year. e.g. Private room has most availability round the year except in Manhattan and least availability in Shared room irrespective of Neighbourhood group.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The above Subplot shows us which room type in which neighbourhood group has most as well as least availability throughout the year. Hence this data can be for listing and delisting demanded and least asked room type resp.

Negative Insight - In Manhattan we get to know that Entire home/apt has most availability round the year and in earlier charts we also know that same room_type has most booking but vacancy also most maybe due to more listing of same room_type in the area.

#### Chart - 10  average prefered price at every neighbourhood group as per type of room

In [None]:
# Chart - 10 visualization code
# chart average prefered price at every neighbourhood group as per type of room
avg_price_df = airbnb_df.groupby(['neighbourhood_group','room_type'])['price'].mean().unstack()
avg_price_df

In [None]:
avg_price_df.plot.bar(figsize=(10,5),ylabel='Average price calculated')

##### 1. Why did you pick the specific chart?


This chart gives us idea about relationship between Price and Room_type in different neighbourhood_group. In this bar plot we are getting the price comparison of each room type in different neighbourhood groups.



##### 2. What is/are the insight(s) found from the chart?

This chart clearly shows some trends in price and room type wrt neighbourhood groups. This chart shows Entire home/apt has highest average price followed by Private room and Shared room. And this chart also shows Manhattan gives us most average price compared to other neighbourhood groups.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This chart gives us good business insight about revenue realisation like from which room type and which neighbourhood group are going to give us highest monetary benefits for ultimate goal of profit.

Negative Insight: If this chart insight gets to the hosts then they won't be that much attracted towards low avg room type as well as neighbourhood group, hence business expansion may face hurdle.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
#Busiest_host
busiest_hosts = airbnb_df.groupby(['host_name', 'host_id','room_type'])['number_of_reviews'].max().reset_index()
busiest_hosts = busiest_hosts.sort_values(by='number_of_reviews', ascending=False).head(10)
busiest_hosts

In [None]:
name = busiest_hosts['host_name']
reviews = busiest_hosts['number_of_reviews']

fig = plt.figure(figsize = (8, 5))
plt.bar(name, reviews, color ='chocolate', width = 0.4)
plt.xlabel("Name of the Host")
plt.ylabel("Number of Reviews")
plt.title("Busiest Hosts", fontsize=14)
plt.show()

##### 1. Why did you pick the specific chart?

This is best chart which give us idea about which kind of room_type having which facilities attract most number of customers.

##### 2. What is/are the insight(s) found from the chart?

This chart shows that busiest hosts in top 10 are mostly Private room and few entire home/apt. That shows Private rooms have high frequncy compared to others.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

We can use busiest hosts experience and their room type for our understanding like locality, facilities, aesthetics, service by which customer getting satisfied and paying more number of visits. Accordingly we can arrange training for our host community for making positive impact like these hosts.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
#Most important finding outliers in Price variable
# Split the dataset by room type
entire_home_apt = airbnb_df[airbnb_df['room_type'] == 'Entire home/apt']
private_room = airbnb_df[airbnb_df['room_type'] == 'Private room']
shared_room = airbnb_df[airbnb_df['room_type'] == 'Shared room']

# Create boxplots for price
fig, axs = plt.subplots(1, 3, figsize=(15, 5))
axs[0].boxplot(entire_home_apt['price'])
axs[0].set_title('Entire home/apt')
axs[1].boxplot(private_room['price'])
axs[1].set_title('Private room')
axs[2].boxplot(shared_room['price'])
axs[2].set_title('Shared room')
plt.show()
#This shows that price variable has many outliers in each room_type, need to be removed.

In [None]:
# Remove outliers from price variable for each room_type
def remove_outliers(data):
    Q1 = np.percentile(data['price'], 25)
    Q3 = np.percentile(data['price'], 75)
    IQR = Q3 - Q1
    upper_bound = Q3 + 1.5 * IQR
    lower_bound = Q1 - 1.5 * IQR
    data1 = data[(data['price'] >= lower_bound) & (data['price'] <= upper_bound)]
    return data1

entire_home_apt1 = remove_outliers(entire_home_apt)
private_room1 = remove_outliers(private_room)
shared_room1 = remove_outliers(shared_room)

# Combine the datasets in combined_df
combined_df = pd.concat([entire_home_apt1, private_room1, shared_room1], axis=0)

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20, 6))
ax = axes.flatten()

# Show the final boxplot without outliers
sns.boxplot(x='room_type', y='price', data=combined_df, ax=ax[0])

#distplot gives us univariate here price distribution of observation.
sns.distplot(combined_df['price'], ax=ax[1])

##### 1. Why did you pick the specific chart?

We chose this graphs to know the Price distribution in overall and within room type

##### 2. What is/are the insight(s) found from the chart?

From the above three Price graphs gives clearer picture of Min & Max Price, most important it shows us Price density between $1-300 but Price point extends upto 10000 dollar for some properties luxury villas also have customers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The business point of view we have to list maximum properties in average booking amount of $1-300 to maximize booking frequency for host and also focus on listing of few luxurious properties to fullfill the need of rare but high paying customers.

#### Chart - 13

In [None]:
# Chart - 13 visualization code

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20, 6))
ax = axes.flatten()

sns.violinplot(data=combined_df, x='neighbourhood_group', y='price', ax=ax[0])

sns.violinplot(data=combined_df, x='neighbourhood_group', y='price', hue='room_type')

##### 1. Why did you pick the specific chart?

After lot of deliberations picked the Violinplot to get clearer picture of most demanded price bracket for neighbourhood_group as well as room_type

##### 2. What is/are the insight(s) found from the chart?

Above five violinplot for respective neighbourhood_group shows us that most demanded price range $50-100 in everywhere except Manhattan where considerable booking happens above 100 price as well.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

For the bisiness perspective company should make adequate listings in demanded price bracket and also above average price range because some Guests are ready to pay for luxury and better services.

Most important 2nd plot shows Private room and Shared room doen't fetch considerable booking beyond this price bracket.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
corr = airbnb_df.corr(method = 'kendall')
plt.figure(figsize= (10,8))
sns.heatmap(corr, annot =True)
airbnb_df.columns

##### 1. Why did you pick the specific chart?

The Correlation Heatmap shows us visual bi-variate correlation and relationship between numerical columns of Dataframe in the form of colour shades. We get quick visualisation of large amount of data.

##### 2. What is/are the insight(s) found from the chart?

In the Heatmap Matrix some variables with some other have positive correlation and some also have negative correlation. e.g.

Positive corr : Number_of_reviews and reviews_per_month, calculated_host_listings_count and availability_365

Negative corr : price and longitude, number_of_reviews and id, reviews_per_month and minimum_nights

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

#Pair Plot visualization of outlier cleaned data

sns.pairplot(combined_df, hue='room_type',
             x_vars=['price', 'number_of_reviews','reviews_per_month','availability_365'],
             y_vars=['price', 'number_of_reviews','reviews_per_month','availability_365'],
             kind='scatter', diag_kind= 'hist')

##### 1. Why did you pick the specific chart?

The Pair plot is perfect example for showing correlation between any two required numerical variable in given dataset. It gives all considered charts in one grid like Dashboard

##### 2. What is/are the insight(s) found from the chart?

The relationship between two variables and formation of separated clustres shows some insight like Price and No of reviews have negative relationship, more price less number of reviews per month, reviews per month vs availability_365 shows less than 20 reviews per month, clustre of availability_365 vs price shows entire home/apt properties available for higher prices etc.

Histogram shows every variable has its frequency, distribution and density. Also shows characteristics of skewness.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

We have find out the Top 20 neighbourhoods where number of booking is high, we can focus on those areas for marketting campaigns and advertisements for maximize booking and reduce marketting cost by redcing the focus areas.

Most of the guests doesn't prefer shared rooms rather choose entire home/apt and private room. Manhattan and Brooklyn are most demanded neighbourhood groups.

The average stays in Entire home/apt, Shared room and Private room are 8, 6 and 5 days approx. resp. Get to know the average number of booking for respective room type in a month.

Most subscribed booking price is come within the bracket of $10-400 where almost 95% booking happens as well as most reviews are also given by these guests. One more thing Private room and Entire home/apt gives most number of reviews.

Maximum listings done by a top hosts are also concentrated in Manhattan and Brooklyn like higher prices and most bookings. We can say that hosts are also inclined to Manhattan and Brooklyn due to higher price realisation and more booking as we know from earlier charts; so we can ask more comission from hosts here and realise more profit.

# **Conclusion**

From the above Exploratory Data Analysis of Airbnb Dataset we can conclude that:
* Manhattan and Brooklyn are the two distinguished, expensive & posh areas of NY. Though location of property has high effect on deciding price, but a property in popular location doesn't mean it will stay occupied in most of the time.

* The people who prefer to stay in Entire home or Apartment they are going to stay bit longer and same is the most booked room type in the Neighbourhood group.

* The findings from an exploratory data analysis project on Airbnb can help both hosts and guests make more informed decisions. Hosts can learn more about what amenities guests are looking for and how to price their property competitively. On the other hand, guests can follow some parameteres to make decisions about the location, amenities, and price of properties they want to book.

The given Airbnb Dataset has vast data but lacks in some required features because it is not easy to decide property valuation. Overall, conducting an exploratory data analysis project on Airbnb can provide valuable insights into the dynamics of the short-term rental market and enhance the user experience for both hosts and guests.

### ***Hurrah! successfully completed EDA Capstone Project !!!***