<a href="https://colab.research.google.com/github/sahilasad321/EDA-on-Airbnb/blob/main/Copy_of_Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  Airbnb Bookings Analysis



##### **Project Type**    - EDA :- Exploratory Data Analysis
##### **Contribution**    - Team
##### **Team Member 1 -**   Sahil Asad
##### **Team Member 2 -** Aheli Some


# **Project Summary -**

This project aims to analyze Airbnb booking data to provide valuable insights for both hosts and guests. By examining historical data from Airbnb, we explore various factors influencing bookings, such as pricing, location, availability, and listing attributes. Our analysis uncovers patterns and trends that can guide hosts in optimizing their listings and help guests make informed decisions when booking accommodations. The results contribute to enhancing the Airbnb experience for both parties and fostering a more efficient and satisfying marketplace.

To conduct this analysis, we collected a comprehensive dataset comprising historical Airbnb booking information from various cities and regions. The dataset includes information such as host name, neighbourhood group, prices, availability, guest reviews, and location attributes. Leveraging data science techniques, we performed exploratory data analysis, data cleaning, and data visualisation to gain valuable insights into the factors influencing bookings.

The libraries that are used in the projects are :-


1.   Pandas for data manipulation
2.   Numpy for computational efficient operations.
3.   Matplotlib and Seaborn for visualisation and analyzing the behaviour.


##**Methodology:**

**Data Collection:**
The first step in the methodology was to collect a comprehensive dataset of Airbnb bookings. This dataset encompassed various cities and regions, providing a diverse range of properties and guest experiences. Data was obtained from Airbnb's official API and other publicly available sources, ensuring the reliability and authenticity of the information.

**Data Preprocessing:**
To ensure data accuracy and consistency, thorough preprocessing was conducted. This involved cleaning the dataset by removing any duplicate entries, handling missing values, and standardizing variables. Data normalization was also performed to bring all variables to a similar scale for more accurate analysis. Categorical variables were encoded using appropriate techniques such as one-hot encoding or label encoding.

**Exploratory Data Analysis (EDA):**
EDA was conducted to gain insights into the dataset and understand its characteristics. Various visualization techniques, such as histograms, scatter plots, and box plots, were employed to analyze the distribution, relationships, and patterns within the data. This step helped identify outliers, uncover trends, and explore potential correlations between variables.

By combining these steps, valuable insights were derived, allowing hosts to optimize their listings, enhance guest satisfaction, and mitigate the environmental impact of their operations. The methodology laid the foundation for informed decision-making, contributing to the growth and sustainability of the Airbnb platform and its stakeholders.






# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**



1.   What we learn from our Airbnb dataset?
2.   What manipulation can we do in our data?
3. What are the maximum and minimum price of room ans its type according to area?
4. Which location guests like the most?
5. Does price of room have impact on reveiws given by people?
6. Which host is the busiest?
7. Which host is charging the highest price?
8. What is the percentage of room type available in different neighbourhood group?
9. Which location customers reviews the most?
10. Which are the top hosts on the basis of monthly review?
11. In which neighbourhood group the maximum bookings had happened?
12. In which neighbouhood customers stays for maximum nights?




#### **Define Your Business Objective?**

Answer:- Our business objective is to get the useful insights from the given dataset and on the basis of those insights we should be able to tell us that what are the factors that customers are more driven to. We should be able to find out which location is popular among the customers and what is the reason behind that. Also from this insight we will try to find out that how our hosts are playing a crucial role in building our business. In this analysis we are more focused on customer behaviour, hosts, locality and type of room customers are preffering the most.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
airBnb = pd.read_csv("/content/drive/MyDrive/AlmaBetter Assignment files/Airbnb NYC 2019.csv")


### Dataset First View

In [None]:
# Dataset First Look
airBnb

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
airBnb.shape

Our Dataset has 48895 rows and 16 columns.

### Dataset Information

In [None]:
# Dataset Info
airBnb.info()

By **.info()** we came to know that in our dataset these columns has null values in last_review, reviews_per_month, calculated_host_listings_count.

In [None]:
airBnb.describe()

**.describe()** gives us the glimpse of our dataset, it gives the information of all the numerical columns that we have in our dataset.
The column which we are most concerned about at this moment is the price column. By doing .describe() we came to know that our dataset has minimum price value as 0, which is quite unrealistic.

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
airBnb.drop_duplicates()

By doing drop_duplicates() we are removing the duplicates rows in our dataset.
But as we can see there are no duplicate values in our dataset.

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
airBnb.isnull()

**.isnull()** will gives us a boolean output.
If any value is missing it will return True otherwise gives False as output.

In [None]:
# Visualizing the missing values
airBnb.isnull().sum().sort_values(ascending = False)

Along with isnull() we have use .sum() to get the sum of null values in every column, and on top of that we have sorted the null values in descending order to get the clear visualization of null values.

So, we can see that we have 4 columns that have null values in our dataset.

### What did you know about your dataset?

Uptill now, we came to know that our dataset contains 48895 rows and 16 columns. There are no duplicate values in our dataset. There are 4 columns which have null values and the columns associated with their respective null values are:-


last_review = 10052,
reviews_per_month = 10052,
host_name = 21,
name = 16

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airBnb.columns

**.column** will give the number of columns present in our dataset.

In [None]:
# Dataset Describe
airBnb.describe()

### Variables Description

1. There are 16 variables in the Airbnb daatset. These are presented as columns in the dataset.
2. The names are: id, name, host_id, host_name, neighbourhood_group, neighbourhood, latitude, longitude, room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month,calculated_host_listings_count, availability_365
3. This dataset contains heterogeneous datatypes.
4. There are 4 variables (name, host_name, last_review, reviews_per_month) containing null or missing values.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
print("The number of unique values are present for each column:")  # the number of unique values for each column
airBnb.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

## **(i) Room Type and their price according to area.**

(a) Maximum price according to area.

In [None]:
# Write your code to make your dataset analysis ready.
room_price_area_wise = airBnb.groupby(["neighbourhood_group","room_type"])["price"].max().reset_index()
room_price_area_wise.sort_values(by = "price",ascending = False).head(10)

(b) Minimum price according to area.

In [None]:
room_price_area_wise = airBnb.groupby(["neighbourhood_group","room_type"])["price"].min().reset_index()
room_price_area_wise.sort_values(by = "price",ascending = False).tail()

So, here we can see that the prices for the above rooms are 0, which is practically not possible.

In [None]:
airBnb = airBnb[airBnb["price"] > 0]
airBnb

In [None]:
room_price_area_wise = airBnb.groupby(["neighbourhood_group","room_type"])["price"].min().reset_index()
room_price_area_wise.sort_values(by = "price",ascending = False).tail()

Now, we can see that we have sliced our dataset on the basis of price so that price should be greater than 0.
So we can see that now the minimum price is 10.

In [None]:
host_minNights = airBnb.groupby(["host_id","host_name"])["minimum_nights"].sum().reset_index
host_minNights

Here we can see that, we have done a groupby to find out which hosts along with host ID has stayed for how many nights.

### What all manipulations have you done and insights you found?



1.   Analyzed the data and found the maximum price for the type of room and in which neighbourhood.
2.   Found the minimum price for the type of room and in which neighbourhood group.
3.We found that for various rooms the minimum price in dataset is 0. So, we have done the slicing of our dataset and keeps the values whose minimum price is greater than 0.
4. Then we found out that the minimum price is 10 and the maximum price is 10000.
5. We also found out that which host stays for how many nights along with its host ID.





## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
print(airBnb["neighbourhood_group"].unique())

In [None]:
neighbourhood_group = ['Brooklyn' ,'Manhattan', 'Queens','Manhattan','Brooklyn','Staten Island','Queens','Bronx','Queens','Bronx']
room_type = ['Entire home/apt','Entire home/apt','Private room','Private room','Private room','Entire home/apt','Entire home/apt','Private room','Shared room','Entire home/apt']

room_dict = {}

for i in room_type:
  room_dict[i] = room_dict.get(i,0) + 1

plt.bar(room_dict.keys(),room_dict.values(),color = 'green',edgecolor = 'red')
plt.title('Room Types')
plt.xlabel('Room Type')
plt.ylabel('Count')
plt.show()


##### 1. Why did you pick the specific chart?

We have picked this Bar chart to show the comparison between the three types of room. We choose bar chart because it is easy to understand the insights through it.

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer:- The above bar chart is made by taking the maximum price for the type of rooms. And we can clearly see that among the top 10 highest price room type, people prefer to have entire home/apartment then the private room and they prefer least the shared room.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer:- Yes, above insights helps creating a positive business impact because we can clearly see that people prefer to spend more on entire home/apartment and they are least likely to spend on shared room.

There are no negative insights.

#### Chart - 2

So, in this chart we are gonna focus on the reviews given by the people depending on the type of location.

In [None]:
reviews_by_area = airBnb.groupby(['neighbourhood_group'])['number_of_reviews'].max().reset_index()
reviews_by_area

In [None]:
area = reviews_by_area["neighbourhood_group"]
review = reviews_by_area["number_of_reviews"]

fig = plt.figure(figsize=(10,5))
plt.bar(area,review,color = "red",width = 0.5)
plt.xlabel("Area")
plt.ylabel("Review")
plt.title("Number of Reviews According to Area")
plt.show()

##### 1. Why did you pick the specific chart?

Answer:- We have chosen this Bar chart because it shows the clear comparison that which of the location receives the most number of reviews.

##### 2. What is/are the insight(s) found from the chart?

Answer:- The insight that is gained from the graph is that Queens have received the most number of reviews and Bronx have received the least number of reviews. So, by these reviews we can get a clear understanding that the location which is getting higher reviews is doing great and we can gain insights from the Queens location to improve review of Bronx location.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer:- Yes, with the help of reviews company can found that what people are liking in the location which are getting higher reviews and on the basis of that they can make improved decision to perform better at the location where they are getting low reviews.

There are no negative impact from these observations.

#### Chart - 3

**In this chart we are focusing on the reviews given by people depending onto the price of the room.**

In [None]:
# Chart - 3 visualization code
airBnb = airBnb[airBnb["price"] > 0]
price_review = airBnb.groupby(["price"])["number_of_reviews"].max().reset_index()
price_review


In [None]:
price_list = price_review["price"]
review = price_review["number_of_reviews"]


fig = plt.figure(figsize=(10,5))
plt.scatter(price_list,review)
plt.xlabel("Price")
plt.ylabel("Reviews")
plt.title("Price V/s Reviews")
plt.show

##### 1. Why did you pick the specific chart?

Answer:-
We have picked the scatter plot because it gives a clear understanding that where the density of variables are high and where it is low.

##### 2. What is/are the insight(s) found from the chart?

Answer:- The insights that we gained from the graph is that people like to stay at the places where prices are low.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer:- Yes, these insights helps us to make a positive business impact. This helps us to know our customers best and what prices are likely to be offered to them so they are comfortable to book rooms and provide good reviews.

#### Chart - 4
In this chart we are checking for the busiest host.

In [None]:
# Chart - 4 visualization code
busy_hosts = airBnb.groupby(['host_id','host_name','room_type'])["number_of_reviews"].max().reset_index()
busy_hosts = busy_hosts.sort_values(by = "number_of_reviews",ascending = False).head(10)
busy_hosts

In [None]:
name_host = busy_hosts["host_name"]
review = busy_hosts["number_of_reviews"]
fig = plt.figure(figsize = (10,5))

plt.bar(name_host,review,color = "blue",width = 0.5)
plt.xlabel("Host Name")
plt.ylabel("Reviews")
plt.title("Busiest Host V/s Reviews")
plt.show()

##### 1. Why did you pick the specific chart?

Answer:- We have picked the Bar chart to compare different host on the basis of maximum reviews they got. We have chosen Bar Chart because it shows a clear insight of the data.

##### 2. What is/are the insight(s) found from the chart?

Answer: We have found that the host name Dona has got the maximum number of reviews for its private room. And we have also found out that the most number of reviews are given for Private room.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer:- Yes, the insights will have positive business impact because we can see from the above analysis that most reviews are given to host Dona and also we got an insight that private rooms receive most number of reveiws.

No, there is no negative impact.

#### Chart - 5

Hosts charging higher prices.

In [None]:
# Chart - 5 visualization code

highest_price = airBnb.groupby(["host_id","host_name","room_type","neighbourhood_group"])["price"].max().reset_index()
highest_price = highest_price.sort_values(by = "price",ascending = False).head(10)
highest_price

In [None]:
name_of_host = highest_price["host_name"]
price_charged = highest_price["price"]

fig = plt.figure(figsize = (10,5))
plt.bar(name_of_host,price_charged,color = "purple",width = 0.5)
plt.xlabel("Name of Host")
plt.ylabel("Price")
plt.title("Host V/s Price Charged")
plt.show()

##### 1. Why did you pick the specific chart?

Answer:- We have picked bar chart to compare difference hosts on the basis of the maximum prices that they offer.
Bar chart shows us the clear comparison between different hosts.

##### 2. What is/are the insight(s) found from the chart?

Answer:- Erin,Jelena and Kathrine are the top three hosts that charges the highest which is 10000 dollars.
And above graphs shows the top 10 hosts who charges the highest.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer:- These insights helps us to know the behaviour of the host. By this we can recommend these hosts to those customers who can afford these hosts.

#### Chart - 6
**In this chart we are finding out that which neighbourhood group has which type of rooms and in what percentage.**

In [None]:
# Chart - 6 visualization code

plt.rcParams["figure.figsize"] = (8,5)
ax = sns.countplot(y = "room_type" , hue = "neighbourhood_group", data = airBnb, palette = "bright")

total = len(airBnb["room_type"])
for p in ax.patches:
  percentage = '{:.1f}%'.format(100 * p.get_width()/total)
  x = p.get_x() + p.get_width() + 0.02
  y = p.get_y() + p.get_height()/2
  ax.annotate(percentage,(x,y))

plt.title("Count of each room type in NYC")
plt.xlabel("rooms")
plt.xticks(rotation = 90)
plt.ylabel("Room Counts")

plt.show()

##### 1. Why did you pick the specific chart?

Answer:- We have picked this countchart to show a clear comparison in percentage of various room types in various neighbourhood groups.

##### 2. What is/are the insight(s) found from the chart?

Answer:- The insights that we gain from the plot is that in Brooklyn most properties are listed as private rooms(i.e. 20.7%).
But in Manhattan most properties are listed as entire home/apartment.
Whereas their are very few shared rooms in all the neighbourhood group.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer:- This observation will definetely create a positive impact because company can easily see that in which neighbourhood group they have which type of property and according to that it can recommend those properties to its customer. Also if company wants to expand its business they will be much aware that which type of rooms are limited in a particular loocation, so that it can provide that type of property.
There is no negative impact because this observation is giving a clear overview of the business that in which location it has which type of room.

#### Chart - 7
In this chart we are finding the best location on the basis of monthly reviews.

In [None]:
# Chart - 7 visualization code
# Find the chart of Location based on Review Score
fig = plt.figure(figsize=(12,4))
top_5_location = airBnb.groupby(["neighbourhood_group"])["reviews_per_month"].max().reset_index()
top = top_5_location.sort_values(by="reviews_per_month",ascending=False).head(5)
sns.barplot(x="neighbourhood_group", y="reviews_per_month", data= top)
plt.title('Location and Review Score')
plt.ylabel('reviews_per_month')
plt.xlabel('Neighbourhood Group')
plt.show()

##### 1. Why did you pick the specific chart?

Answer :- We have chosen this bar chart because we need to show the reviews for different neighbourhood group. So, by using bar chart all the neighbourhood groups are quite distinguishable.

##### 2. What is/are the insight(s) found from the chart?

Answer:- With this chart, we can state that the most preferable location as per customer's choice are: 1. Manhattan 2. Queens 3. Brooklyn.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer:- The company gets the overview of the customer's choice for location so that it can improve its service.

#### Chart - 8
**In this chart we are finding the  top hosts on the basis of monthly review.**

In [None]:
# Chart - 8 visualization code
# Based on the review score find the top 5 Host
fig = plt.figure(figsize=(12,4))
top_5_host = airBnb.groupby(["host_name","host_id"])["reviews_per_month"].max().reset_index()
top = top_5_host.sort_values(by="reviews_per_month",ascending=False).head(5)
sns.barplot(x="host_name", y="reviews_per_month", data= top)
plt.title('host_name and Review Score')
plt.ylabel('reviews_per_month')
plt.xlabel('host_name')
plt.show()

##### 1. Why did you pick the specific chart?

Answer:- Based on the review score we plot, who is our top 5 Host, this increases the confidence of tourist before booking.

##### 2. What is/are the insight(s) found from the chart?

Answer:- We can state that the top 5 host are: Row NYC, Louann, Nalicia, Danielle and Brent based upon customer's review per month.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer:- Yes, it will create a positive impact. It will also let the customer know that which host has the highest rating and according to that they can choose their stay.

#### Chart - 9
**This chart is for univariate analysis, it gives us the overview of the bookings that had happen in different neighbourhood group.**

In [None]:
# Chart - 9 visualization code
#Maximum booking based upon meighbourhood_group and neighbourhood
df = airBnb["neighbourhood_group"].value_counts()
df.plot(kind='pie',figsize=(8, 8),colors = ['orange', 'pink', 'crimson', 'lightgreen', 'black'])
plt.show()

##### 1. Why did you pick the specific chart?

Answer:-We are presenting this pie chart to state the overall idea of the most preferable places (neighbourhood_group).

##### 2. What is/are the insight(s) found from the chart?

Answer:- We can state that the maximum no of bookings are made for Manhattan, Brooklyn and Queens. It's pretty obvious that both Manhattan and Brookly are famous for tourism.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer:- The company can get an overall idea which neighbourhood-group the customer prefer most.As we can see that the most number of bookings had happened in Manhattan and Brooklyn. Company can improve its services or exapand its business in these locations.


#### Chart - 10
**This Chart tells us about the maximum nights stayed in which neighbourhood.**

In [None]:
# Chart - 10 visualization code
# Top 5 of locality (neighbourhood) listed based upon maximum nights stayed by the customer
top_5_location = airBnb.groupby(["neighbourhood"])["minimum_nights"].max().reset_index()
top = top_5_location.sort_values(by="minimum_nights",ascending=False).head(5)
fig = plt.figure(figsize=(12,4))
sns.barplot(x="neighbourhood", y="minimum_nights", data= top)
plt.title('Neighbourhood and minimum_nights')
plt.ylabel('Minimum_nights')
plt.xlabel('Neighbourhood')
plt.show()

##### 1. Why did you pick the specific chart?

Answer:- We are presenting the chart distribution for top 5 locality (neighbourhood) listed based upon maximum nights stayed by the customers to get an overview of customer's preference regarding the neighbourhood of the neighbourhood group.

##### 2. What is/are the insight(s) found from the chart?

Answer:- We can state that the top 5 neighbourhood are:

1. Greenwich Village 2. Battery Park City 3. Williamsburg 4. Harlem 5. Crown Heights

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer:- The company can get the idea regarding the best locality or neighbourhood according to the customer's choice.

#### Chart - 11

**This Chart gives us information about the top 5 neighbourhood on the basis of monthly review.**

In [None]:
# Chart - 11 visualization code
# Top 5 of locality (neighbourhood) listed based upon review
top_5_loc_review = airBnb.groupby(["neighbourhood"])["reviews_per_month"].max().reset_index()
top = top_5_loc_review.sort_values(by="reviews_per_month",ascending=False).head(5)
fig = plt.figure(figsize=(12,4))
sns.barplot(x="neighbourhood", y="reviews_per_month", data= top)
plt.title('Neighbourhood and reviews_per_month')
plt.ylabel('Reviews_per_month')
plt.xlabel('Neighbourhood')
plt.show()

##### 1. Why did you pick the specific chart?

Answer :- This bar chart will help us to present our observation in the best manner. We are presenting the bar chart distribution of Top 5 of locality (neighbourhood) listed based upon review per month of the customer.

##### 2. What is/are the insight(s) found from the chart?

Answer:- We can state that the best locality or neighbourhood is Theater District based upon customer's review per month.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer:- The company gets an idea regarding the good locations as per customer's choice based upon customer's review.

#### Chart - 12
**This chart tells us about the total no of nights as per location.**

In [None]:
# Chart - 12 visualization code
# Find total no of nights as per location::

import seaborn
sns.barplot(x="neighbourhood_group",y="minimum_nights",data=airBnb)
plt.show()

##### 1. Why did you pick the specific chart?

Answer:- We have done this specific Bar Daigram to show the overall idea of the customer's preference for location by presenting the graphical distribution of total no of minimum nights spent as per the location (neighbourhood group).

##### 2. What is/are the insight(s) found from the chart?

Answer:- With this chart, we can find the top 5 location compared to other locations where the customer made the most bookings.

We can state that the most preferable locations for night stay as per customer's choice are: 1.Manhattan; 2.Brooklyn; 3.Queens; 4.Staten Island; 5.Bronx

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer:- With this chart, the company can identify the customer's preferable location.

#### Chart - 13
**This chart gives the total number of nights spend per room types**.

In [None]:
# Chart - 13 visualization code
# Find the total of nights spend per room types:
import seaborn
sns.barplot(y="minimum_nights",x="room_type",data=airBnb)
plt.show()

##### 1. Why did you pick the specific chart?

Answer :- We have done this specific Bar Daigram to show the overall idea of the customer's preference for room_type by presenting the graphical distribution of total no of minimum nights spent as per room type.

##### 2. What is/are the insight(s) found from the chart?

Answer :- With this chart, we can state that the most preferable room type for minimum number of night-stay as per customer's choice are: 1.Entire home/apt; 2.Shared room; 3.Private toom.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer :- With this chart, the company can identify which room type are preferred by the customer based upon the minimum night stay (number of nights).

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
corr = airBnb.corr(method = "kendall")
fig = plt.figure(figsize = (12,6))
sns.heatmap(corr,annot = True)
airBnb.columns

##### 1. Why did you pick the specific chart?

Answer :- We have used the correlation chart to get to know how each variable is dependent or related to the other variable.

##### 2. What is/are the insight(s) found from the chart?

Answer:- The insights we gain is that, as we can see price is positively correlated to the minimum nights.Calculated host listings count has a positive correlation with availability 356.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(airBnb,hue = "room_type")
plt.show()

In [None]:
sns.pairplot(airBnb,hue = "room_type",x_vars = ["price","neighbourhood_group","number_of_reviews"],y_vars = ["number_of_reviews","reviews_per_month"])
plt.show()

##### 1. Why did you pick the specific chart?

Answer:- The pairplot function in seaborn allows us to create a matrix of scatter plots, where each scatter plot compares a pair of variables from a given dataset. The x_vars and y_vars parameters in this context represent lists of variable names that we want to use for the rows and columns of the scatter plot matrix.

##### 2. What is/are the insight(s) found from the chart?

Answer:- We got a clear insight regarding the price and number of reviews. As we can see the comparison between these two variables tells us that, the review count is more when the prices are less.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer:- From the above analysis we came to know that people are highly influenced by the price and people prefer to stay at low prices. So, to client should focus on providing rooms within the range of 2000.

Also we came to know that people prefer to stay in private rooms and they least prefer to stay in shared rooms. So, the client should give more focus on private room. People also showed up interest in booking entire home/apt. so there is also scope of increasing business by linking with hosts that are ready to offer entire home/apt.

Manhattan is the location where people like to stay most. And in Manhattan people like to stay in entire home/apt. , so the client should focus on providing the required property to the people booking rooms in Manhattan.
While in Brooklyn people prefer to have private room. So, the client should focus on providing private properties in Brooklyn.



# **Conclusion**

1. The highest price of the property is 10000. There are three properties that offer 10000 price. In Brooklyn and Manhattan its the Entire home/apt. and in Queens its a Private room.
2. Top three hosts charging the highest price are Erin, Jelena, Kathrine.

3. When we analyze the neighbourhood group on the basis of reviews we found out that Queens have received the most number of reviews and Bronx have received the least number of reviews.

4. The analysis which is done on the price basis tells us that people prefer to stay at those locations where the price is below 2000.

5. We have found from our analysis that the top three busiest hosts are Dona, Jj and Maya and they all offer Private room.

6. We have found that the maximum private room are in Brooklyn(20.7%). Maximum entire home/apt are in Manhattan(27%) and shared rooms are comaparably less in every location.

7. Customers have liked the property of Manhattan the most as they have reviewed it the highest.

8. Among the neighbourhood, people have stayed the maximum nights in Greenwich Village folowed by Battery Park City and WilliamsBurg.

9. Among the various neighbourhood , Theater District haev received the most monthly reviews.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***