# **Project Name**    -  Airbnb Booking Analysis Exploratory Data Analysis




##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### Name - Rahul Rajput




# **Project Summary -**
Since 2008, guests and hosts have used Airbnb to expand on travelling possibilities and present a more unique, personalised way of experiencing the world. Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data - data that can be analysed and used for security, business decisions, understanding of customers' and providers' (hosts) behaviour and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more.

This dataset has around 49,000 observations in it with 16 columns and it is a mix of categorical and numeric values. Explore and analyse the data to discover key understandings.

# **Problem Statement**
**Lets Explore and analyze the Data set and find some insights**
1. What can we learn about different hosts and areas?

2. What we learn from room type and their prices according to area?

3. What can we learn from Data?(ex: Locations, prices, reviews,etx)

4. Which hosts are the busiest and what is the reason?

5. Which Hosts are charging higher price?

6. Is there any traffic difference among different areas and what could be the reason for it?

7. What is the correlation between different variables?

8. What is the room count in overall NYC according to the listing of room types?

**Github Link :**

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

### Mounting drive


In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Reading Dataset

In [None]:
airbnb = pd.read_csv('/Airbnb NYC 2019 (1).csv')
airbnb

### Dataset Description and Information

In [None]:
airbnb.describe()

In [None]:
airbnb.info()

In [None]:
airbnb.isnull().sum()

In [None]:
airbnb.columns

### Deleting the unnecesary columns

In [None]:
airbnb.drop(['latitude','longitude','last_review','reviews_per_month'],axis=1,inplace=True)

In [None]:
airbnb.head(10)

**1.What can we learn about different hosts areas?**

In [None]:
host_areas = airbnb.groupby(['host_name','neighbourhood_group'])['calculated_host_listings_count'].max().reset_index()
host_areas.sort_values(by='calculated_host_listings_count',ascending=False).head(5)

We find that Host name **Sonder(NYC)** has listed highest number of listings in **Manhattan** followed by Blueground

**2. What we learn from room type and their prices according to area?**

In [None]:
room_price_area_wise = airbnb.groupby(['neighbourhood_group','room_type'])['price'].max().reset_index()
room_price_area_wise.sort_values(by='price',ascending=False).head(10)

Visualize the data


In [None]:
neighbourhood_group = ['Brooklyn','Manhattan','Queens','Manhattan','Brooklyn','Staten Island','Queens','Bronx','Queens','Bronx']
room_type = ['Entire home/apt','Entire home/apt','Private room','Private room','Entire home/apt','Entire home/apt','Private room','Private room','Entire home/apt','Shared room']

room_dict = {}

for i in room_type:
  room_dict[i] = room_dict.get(i,0) + 1

plt.bar(room_dict.keys(),room_dict.values(),color = 'green', edgecolor='blue')
plt.title('Room Types')
plt.xlabel('Room Type')
plt.ylabel('Count')
plt.show()

We found that **Entire home/apt** is the highest number of room types overall and prices are high in the **brooklyn** and **Manhattan** for entire home/apt.

**3. What can we learn from Data? (ex: locations,prices,reviews,etc)**

In [None]:
area_reviews = airbnb.groupby(['neighbourhood_group'])['number_of_reviews'].max().reset_index()
area_reviews

In [None]:
area = area_reviews['neighbourhood_group']
review = area_reviews['number_of_reviews']
fig = plt.figure(figsize=(10,5))

plt.bar(area,review,color = 'blue',width = 0.5)
plt.xlabel('Area')
plt.ylabel('Review')
plt.title("Number of Reviews in term of area")
plt.show()

In [None]:
price_area = airbnb.groupby(['price'])['number_of_reviews'].max().reset_index()
price_area.head(10)

In [None]:
price_list = price_area['price']
review = price_area['number_of_reviews']
fig = plt.figure(figsize = (10,5))

plt.scatter(price_list, review)
plt.xlabel('Price')
plt.ylabel("Number of reviews")
plt.title('Number of Reviews Vs Price')
plt.show()

From above visualization we can say that most number of people like to stay in less price and their reviews are higher in those areas.

**4. Which hosts are the busiest and why is the reason?**

In [None]:
busy_hosts = airbnb.groupby(['host_id','host_name','room_type'])['number_of_reviews'].max().reset_index()
busy_hosts = busy_hosts.sort_values(by = 'number_of_reviews',ascending = False).head(10)
busy_hosts

In [None]:
name_hosts = busy_hosts['host_name']
review_got = busy_hosts['number_of_reviews']

fig = plt.figure(figsize = (10,5))

plt.bar(name_hosts,review_got, color= 'purple',width =0.5)
plt.xlabel('Name of the Host')
plt.ylabel('Review')
plt.title("Busiest Host in terms of reviews")
plt.show()

**We have found Busiest hosts :**

1. Dona
2. Ji
3. Maya
4. Carol
5. Danielle

Because these hosts listed their room type as Entire home and Private room which is preferred by most number of people and also their reviews are higher.

**5. Which Hosts are charging higher price ?**

In [None]:
Highest_price = airbnb.groupby(['host_id','host_name','room_type','neighbourhood_group'])['price'].max().reset_index()
Highest_price = Highest_price.sort_values(by = 'price',ascending = False).head(10)
Highest_price

In [None]:
name_of_host = Highest_price['host_name']
price_charge = Highest_price['price']

fig = plt.figure(figsize = (10,5))

plt.bar(name_of_host,price_charge, color = 'orange',width = 0.5)
plt.xlabel('Name of the Host')
plt.ylabel('Price')
plt.title("Hosts with maximum price charges")
plt.show()

**Now we have seen that 10 Hosts who are charging maximum price:**

Jelena,Kathrine,Erin,Matt,Olson,Amy,Rum,Jessica,Sally,Jack

Maximum price is 10000 USD

**6. Is there any traffic difference among different areas and what could be the reason for it?**

In [None]:
traffic_areas = airbnb.groupby(['neighbourhood_group','room_type'])['minimum_nights'].count().reset_index()
traffic_areas = traffic_areas.sort_values(by = 'minimum_nights',ascending = False).head(10)
traffic_areas

In [None]:
areas_Traffic = traffic_areas['room_type']
room_stayed = traffic_areas['minimum_nights']

fig = plt.figure(figsize = (7,5))
plt.bar(areas_Traffic,room_stayed, color = 'Blue', width = 0.2)

plt.xlabel('Room Type')
plt.ylabel('Minimum Night')
plt.title("Traffic Areas based on Minimum Nights Booked")
plt.show()

From this visualization We found that most of the people likely to stay at Entire home and Private room which are present in Mantattan, Brooklyn & Queens and also vistors referring stay in rooms which listing price is less

In [None]:
airbnb

**7. What is the correlation between different variables ?**

In [None]:
corr = airbnb.corr(method = 'kendall')
fig = plt.figure(figsize = (12,6))
sns.heatmap(corr, annot = True)
airbnb.columns

We have seen all the Correlations between all the variables.

**8. What is the room count in overall NYC according to the listing of room types ?**

In [None]:
plt.rcParams['figure.figsize'] = (8,5)
ax = sns.countplot(y='room_type', hue= 'neighbourhood_group',data = airbnb, palette = 'bright')

total = len(airbnb['room_type'])
for p in ax.patches:
      percentage = '{:.1f}%' . format(100 * p.get_width()/total)
      x = p.get_x() + p.get_width() +0.02
      y = p.get_y() + p.get_height()/2
      ax.annotate(percentage, (x,y))

plt.title('Count of each room types in NYC')
plt.xlabel('Rooms')
plt.xticks(rotation=90)
plt.ylabel('Room Counts')

plt.show()

Manhattan has more listed properties with Entire home/apt around 27% of total listed properties followed by Brooklyn with around 19.6%

Private rooms are more in Brooklyn as in 20.7% of the total listed properties followed by Manhattan with 16.3% of them. While 6.9% of Private rooms are from Queens.

We can infer that Brooklyn, Queens,Bronx has more private rooms types while Manhattan which has the highest no of listings in entire NYC has more Entire home/apt room types.


**CONCLUSION :**

1. We find that Host name **Sonder(NYC)** has listed highest number of listings in **Manhattan** followed by Blueground.

2. We found that Entire home/apt is the highest number of room types overall and prices are high in the brooklyn and Manhattan for entire home/apt.

3. From above visualization we can say that most number of people like to stay in less price and their reviews are higher in those areas.

4. We have found Busiest hosts :
Dona
Ji
Maya
Carol
Danielle

5. Now we have seen that 10 Hosts who are charging maximum price:
Jelena,Kathrine,Erin,Matt,Olson,Amy,Rum,Jessica,Sally,Jack
Maximum price is 10000 USD

6. From this visualization We found that most of the people likely to stay at Entire home and Private room which are present in Mantattan, Brooklyn & Queens and also vistors referring stay in rooms which listing price is less

7. We have seen all the Correlations between all the variables

8. Manhattan has more listed properties with Entire home/apt around 27% of total listed properties followed by Brooklyn with around 19.6%
Private rooms are more in Brooklyn as in 20.7% of the total listed properties followed by Manhattan with 16.3% of them. While 6.9% of Private rooms are from Queens.
We can infer that Brooklyn, Queens,Bronx has more private rooms types while Manhattan which has the highest no of listings in entire NYC has more Entire home/apt room types.


***Thank You...!***