# **Project Name**    -



##### **Project Type**    - EDA(AirBnb Booking Analysis)
##### **Contribution**    - Individual
##### **(Sanjana Nasa)**

# **Project Summary -**

### **AirBnb Booking analysis**

---
Airbnb, as in **“Air Bed and Breakfast,”** is a service that lets property owners rent out their spaces to travelers looking for a place to stay. Travelers can rent a space for multiple people to share, a shared space with private rooms, or the entire property for themselves.

The company was founded in 2008 by Brian Chesky, Nathan Blecharczyk, and Joe Gebbia.

Here,in this project, we are going to do **Exploratory Data Analysis** on the dataset of the year 2019.

The dataset contains around 49000 observations with 16 columns in it  and is a mix of categorical and numeric values.










## **CATEGORICAL** **VALUES**                                     

1. name
2. host_name
3. neighbourhood
2. neighbourhood_group
5. room_type

## **NUMERICAL VALUES**



1. latitude
2. longitude
3. price
2. number_of_reviews
5. minimum_nights
2. reviews_per_month
7.  calculated_host_listings_count        0        
2.   availability_365

## **DATE**


1. last_review

## **UNIQUE**


1. id
2. host_id

























* **id**- unique id for the property
* **name**-name of the listing house/apartment
* **host_id**- it is the unique id for each individual who list their property on AirBnb
* **host_name**-name of the individual who owns the property listed on AirBnb
* **neighbourhood_group**-location of the property in a particular area
* **neighbourhood-**name of the area where property is located
* **latitude**- measurement of distance north/south of the equator
* **longitude**-measurement of distance east/west of the prime meridien
* **room_type**-category of the room, namely-private room, entire home/apartment, shared room
* **price**-total price for the booking based on the price set by the host plus taxes and other fee charged by Airbnb
* **minimum_nights**-number of nights for which the property is being booked for
* **number_of_reviews**-number of reviews of each host submitted by the guests
* **last_review**-date on which latest/last review was submitted by the guest
* **reviews_per_month**-number of reviews a host gets per month
* **calculated_host_listings_count**-number of listings owned by a host
* **availability_365**-number of days for which the listing is available during the year
























# **GitHub Link -**

https://github.com/sanjananasa/AirBnb-booking-analysis.git

# **Problem Statement**

The main objective of the analysis would be to explore insights based on the data given to us as statements such as hosts, areas, prices, location, availability, reviews but ofcourse, we are not restricted to it. we would try to explore some more insights as well.


Using the given dataset ,we'll try to find out answers to the following questions-

1. which type of room is preferred the most?
2. price distribution according to room type
2. Is Manhattan preferred over other areas?
3. What is the average price for each area?
4. top 5 neighbourhoods having highest number of apartments
5. top 5 hosts receiving highest number of reviews
6. name of the host who owns maximum number of listings
7. Establishing relationship between variables
8. visualize the location of each apartment using latitude and longitude values



#### **Define Your Business Objective?**

“Airbnb's mission is to create a world where anyone can belong anywhere, and we are focused on creating an end-to-end travel platform that will handle every part of your trip.”

Through facilitating access to distinctive spaces and local culture, Airbnb aims to enable travelers to “feel at home anywhere you go in the world” by building connections with local hosts, gaining access to distinctive spaces and culture of their destinations.

Airbnb is a peer-to-peer platform collecting a “platform tax” by charging guests a service fee between 5%-20% of the booking and hosts 3%.

 AirBnB applied Platform Thinkingto solve the problem of traveller accommodation. It didn't compete on features. Instead, it created a platform that allowed anyone with a spare room, apartment or island to start running a B&B with access to a global market of travelers.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

In [None]:
file_path='/content/drive/MyDrive/Airbnb NYC 2019.csv'
df=pd.read_csv(file_path)

### Dataset First View

In [None]:
# Dataset First Look
df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

number of rows=48895 ,

number of columns=16

### Dataset Information

In [None]:
# Dataset Info
df.info()

# **Null values in dataset**

In [None]:
#null/missing values in each column
number_of_null_values=df.isna().sum()
print(number_of_null_values)



## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

# **Observations**

*   If we observe the **'price'** column,we can see the minimum price listed for the apartment is zero which is practically not possible, and
*   the maximum price charged is 1000 which is way too high qnd we'll try to find out why is it so


*  on an average a property is getting 1 review per month which can be a negative impact
*   And if we see the  **'availability_365'** column which depicts the number of days for which the apartment is available round the year,we can see minimum and 25% availability as zero and maximum availability as 365.

zero availabilty may be meant that the property is no more operational or it has been already been  pre booked for the year.

and availabilty for the whole year may be concluded as the new property which is not getting bookings as of now or the reviews are not good.





## 3. ***Data Wrangling and visualisation***

# **Q1. which type of room is preferred the most and least?**

In [None]:
#import library for plotting/visualising data
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
#room type preference
bookings_for_each_room_type=df['room_type'].value_counts()
print(bookings_for_each_room_type)

In [None]:
plt.rcParams['figure.figsize']=[10,5]
plt.plot(bookings_for_each_room_type)
plt.xlabel('type of room')
plt.ylabel('number of bookings')



* The number of bookings made are maximum for the entire home/apartment and then the second most preferred room type is private room
* very less people prefer to book the shared room



# **Q2.  Price distribution according to room type**

In [None]:
avg_room_price=df.groupby(['neighbourhood_group','room_type'])['price'].mean().unstack()
print(avg_room_price)

In [None]:
avg_room_price.plot.bar(figsize=(10,5),ylabel='Average Price calculated')
plt.title('price distribution')

## **Observations-**

1. we can see that Manhattan is the costliest and Bronx is the cheapest for all room type.

2. we can make it more useful for business implementation if we do some analysis on successful hosts according to highest number of reviews and can suggest those prices to our other hosts to get good business.


# **Q3.  Is Manhattan preferred over other  neighbourhood_group/area?**

In [None]:
b=df.groupby(['neighbourhood_group'])['name'].count()
print(b)

### As the number of properties listed on AirBnb are maximum in Manhattan,so we can say that yes, Manhattan is preferred over other areas

In [None]:
area=['Bronx','Brooklyn','Manhattan','Queens','Staten Island']
plt.pie(b,labels=area)
plt.title('Number of properties in each area')


# **Q4. average price of the property in each  area**

---



In [None]:
#neighbourhoods along with the average prices charged there
a=df.groupby(['neighbourhood_group'])['price'].mean()
print(a)



In [None]:
a.plot(kind='barh')
plt.xlabel('average price')




### *   The average price charged for the room is highest in Manhattan which is nearly $200.This is much higher as compared to the prices in other areas.
### *  This could be due to demand.As we have seen above that Manhattan is the most preferred area to stay



## **Q5. Top 5 neighbourhoods /locations having highest number of apartments**  **and 5 neighbourhoods with least number of apartments**

In [None]:
#top 5 neighbourhoods with maximum number of apartments
df[['neighbourhood_group','neighbourhood']].value_counts().head(5)

In [None]:
pd.value_counts(df['neighbourhood'])[:5].plot.bar()
plt.ylabel('number of properties')

In [None]:
#neighbourhoods with least number of listings on AirBnb
df[['neighbourhood_group','neighbourhood']].value_counts().tail(5)

In [None]:
df['neighbourhood'].value_counts().tail(5).plot.bar()
plt.ylabel('number of properties')

# **Q6.  Top 5 hosts receiving highest  number of reviews per month**

In [None]:
df.groupby(['host_id','host_name',],as_index=False)['number_of_reviews'].sum().sort_values(['number_of_reviews'],ascending = False).head(5)

# **Q7. Host who owns maximum number of listings**

In [None]:
#number of listingd owned by different hosts
df['host_name'].value_counts()

### From above information, we can say that Michael owns maximum number of properties but that might not be 100% true insight because there may be more than one person named Michael.

### so to get accurate answer we can use host_id which is unique for each host

In [None]:
#getting number of properties by host_id
df['host_id'].value_counts()

## A host with host_id **"219517861"** owns highest number of properties.

## Let's find out name of the host





In [None]:
df[df['host_id']==219517861]['host_name'].unique()

### Sonder (NYC) is the host with highest number of holdings.

# **Q8. Establishing relation between different variables**

In [None]:
import seaborn as sns

In [None]:
#correlation between columns
sns.heatmap(df.corr(),cmap='coolwarm',annot=True)

# **Q9. visualize the location of each apartment using latitude and longitude values**

In [None]:
sns.scatterplot(x=df['longitude'],y=df['latitude'],hue=df['neighbourhood_group'])
plt.title('')

#**distribution of the room type over the location**

In [None]:
sns.scatterplot(x=df['longitude'],y=df['latitude'],hue=df['room_type'])

**observations-**

1) that maximum numbers of room are Entire home/Apartment and Private room there are only few shared rooms .

2)So mostly host prefer to give Entire home/Apartment or Private Rooms rather than Shared rooms

## **4. Solution to Business Objective**

## **To improve the overall performance of the AirBnb following things can be considered by hosts-**
1. Provide an accurate and optimized description of the property and update it time to time.A detailed description helps guests know what to expect and allows them to envision themselves in the space.
2. Expanding your amenities is another way to make sure your listing stands out from the crowd.
3. Use dynamic pricing to ensure that your listings are always competitive.

# **How the project is useful to stakeholders?**

Common examples of stakeholders include employees, customers, shareholders, suppliers, communities, and governments.

1. Employees at AirBnb could use the observations to list out the properties which  have zero availability or are no more operational so that there is no misleading.
Also, they can guide the hosts who are not getting much reviews and bookings to update the description of their property and add some more amenities due to which other properties are preferred in that area.

2. The shareholders expect certain things from a business but the main interest of a shareholder is the profitability of the project or business.shareholders want the business to make huge revenues so they can get higher share prices and dividends.

So they can use the project insights as a base to whether they want to invest in the business or not.

They could use the the number of hosts listed and number of properties listed on AirBnd and also the average price that people are ready to pay, the number of bookings made per property for minimum number of nights.

These variables can be used to find the estimated turnover of the business that they earn by the means of taxes and commission that they levy on host as well as the customer and shareholder can accordingly decide.

# **Conclusion**

Airbnb isn’t a place where you book hotels (even though you can).

By listing the properties of ordinary people, they set themselves apart by offering a different experience to travellers.

Airbnb’s tend to be cheaper than hotels, and so the cost-conscious traveller would prefer to pay less and go without some of the frills and security that you get with a hotel

In [None]:
from google.colab import drive
drive.mount('/content/drive')