<a href="https://colab.research.google.com/github/satendra-p/Airbnb_data-analysis/blob/main/Team_Project_EDA_Capstone_ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present a more unique, personalized way of experiencing the world. Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data - data that can be analyzed and used for security, business decisions, understanding of customers' and providers' (hosts) behavior and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more. This dataset has around 49,000 observations in it with 16 columns and it is a mix between categorical and numeric values. Explore and analyze the data to discover key understandings (not limited to these) such as

What can we learn about different hosts and areas?
What can we learn from predictions? (ex: locations, prices, reviews, etc)
Which hosts are the busiest and why?
Is there any noticeable difference of traffic among different areas and what could be the reason for it?

EDA project by Satendra
List item

List item

List item

List item

Breakdown of this Notebook:
1.Importing Libraries

2.Loading the dataset

3.Data Cleaning:

Dropping duplicates.
Cleaning individual columns.
Remove the Null values from the dataset
Some Transformations
4.Data Analysis and Visualization: Using plots to find relations between the features

What is the average preferred price by customers according to the location?

Number of active hosts per location (Where most of the hosts focused to own property?)

Where the customer pays the highest and lowest rent according to location?

Most popular/demanded host of Airbnb in New York

Find the total count of each room type

Room types and their relation with availability in different neighbourhood groups

Which are the top 25 most used words in listing names?

Find top 10 hosts with most listings

Find the top three hosts based on their turnover

Find total no. of nights spend per location

Total no. of nights spends per room types

Top 10 highest listing neighborhood

Answering following Questions:
What is the average preferred price by customers according to the location?

No. of active host per location (Where most of the host focused to own property)

What is the highest price and lowest price of rent for customer according to location OR which host takes highest rent and lowest rent according to location

Most famous/demanded host of Airbnb in New york

In [1]:
#import all library that will be used in entire project
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (15, 10)
import seaborn as sns

In [2]:
 # Mounting data using url from the g-drive
url='https://drive.google.com/file/d/1ioU5r9KEYSfwgfUi22SclVkx4l1a_8ou/view?usp=share_link'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
airbnb_df = pd.read_csv(url)

In [3]:
airbnb_df.info() # checking basic information

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48895 entries, 0 to 48894
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              48895 non-null  int64  
 1   name                            48879 non-null  object 
 2   host_id                         48895 non-null  int64  
 3   host_name                       48874 non-null  object 
 4   neighbourhood_group             48895 non-null  object 
 5   neighbourhood                   48895 non-null  object 
 6   latitude                        48895 non-null  float64
 7   longitude                       48895 non-null  float64
 8   room_type                       48895 non-null  object 
 9   price                           48895 non-null  int64  
 10  minimum_nights                  48895 non-null  int64  
 11  number_of_reviews               48895 non-null  int64  
 12  last_review                     

In [4]:
#Remove latitude, longitude, last_review and reviews_per_month columns from original dataset
airbnb_df.drop(airbnb_df.columns[[6,7,12,13]], axis=1, inplace=True)

In [5]:
airbnb_df.isnull().sum() # checking null values/ missing values in the data sets.

id                                 0
name                              16
host_id                            0
host_name                         21
neighbourhood_group                0
neighbourhood                      0
room_type                          0
price                              0
minimum_nights                     0
number_of_reviews                  0
calculated_host_listings_count     0
availability_365                   0
dtype: int64

In [6]:
# Replacing the NaN with zero
airbnb_df.fillna({'name':0, 'host_name':0},inplace=True)
airbnb_df.isnull().sum()
# All Null Values are Eliminated


id                                0
name                              0
host_id                           0
host_name                         0
neighbourhood_group               0
neighbourhood                     0
room_type                         0
price                             0
minimum_nights                    0
number_of_reviews                 0
calculated_host_listings_count    0
availability_365                  0
dtype: int64

In [7]:
airbnb_df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,room_type,price,minimum_nights,number_of_reviews,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,Private room,149,1,9,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,Entire home/apt,225,1,45,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,Private room,150,3,0,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,Entire home/apt,89,1,270,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,Entire home/apt,80,10,9,1,0
