<a href="https://colab.research.google.com/github/pooja251096/airbnb_booking_analysis/blob/main/Copy_of_AlmaBetter_EDA_Airbnb_Rough1_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present a more unique, personalized way of experiencing the world. Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data - data that can be analyzed and used for security, business decisions, understanding of customers' and providers' (hosts) behavior and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more.
This dataset has around 49,000 observations in it with 16 columns and it is a mix between categorical and numeric values.
Explore and analyze the data to discover key understandings (not limited to these) such as :

1. What can we learn about different hosts and areas?
2. What can we learn from predictions? (ex: locations, prices, reviews, etc)
3. Which hosts are the busiest and why?
4. Is there any noticeable difference of traffic among different areas and what could be the reason for it?


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
data = pd.read_csv('Airbnb NYC 2019.csv')

In [None]:
data.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48895 entries, 0 to 48894
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              48895 non-null  int64  
 1   name                            48879 non-null  object 
 2   host_id                         48895 non-null  int64  
 3   host_name                       48874 non-null  object 
 4   neighbourhood_group             48895 non-null  object 
 5   neighbourhood                   48895 non-null  object 
 6   latitude                        48895 non-null  float64
 7   longitude                       48895 non-null  float64
 8   room_type                       48895 non-null  object 
 9   price                           48895 non-null  int64  
 10  minimum_nights                  48895 non-null  int64  
 11  number_of_reviews               48895 non-null  int64  
 12  last_review                     

In [None]:
unwanted_columns = ['id', 'name']
data = data.drop(unwanted_columns, axis =1)

In [None]:
data.head()

Unnamed: 0,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [None]:
 data.isna().sum()

host_id                               0
host_name                            21
neighbourhood_group                   0
neighbourhood                         0
latitude                              0
longitude                             0
room_type                             0
price                                 0
minimum_nights                        0
number_of_reviews                     0
last_review                       10052
reviews_per_month                 10052
calculated_host_listings_count        0
availability_365                      0
dtype: int64

The missing values in last_review are difficult to deal with as they are date entities, on the other hand missing values in reviews_per_month can be filled with 0 as there are no reviews


In [None]:
data = data.drop('last_review', axis = 1)
data['reviews_per_month'] = data['reviews_per_month'].fillna(0)

In [None]:
data.isna().sum()

host_id                            0
host_name                         21
neighbourhood_group                0
neighbourhood                      0
latitude                           0
longitude                          0
room_type                          0
price                              0
minimum_nights                     0
number_of_reviews                  0
reviews_per_month                  0
calculated_host_listings_count     0
availability_365                   0
dtype: int64

In [None]:
data.groupby('neighbourhood_group').count()

Unnamed: 0_level_0,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
neighbourhood_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Bronx,1091,1090,1091,1091,1091,1091,1091,1091,1091,1091,1091,1091
Brooklyn,20104,20095,20104,20104,20104,20104,20104,20104,20104,20104,20104,20104
Manhattan,21661,21652,21661,21661,21661,21661,21661,21661,21661,21661,21661,21661
Queens,5666,5664,5666,5666,5666,5666,5666,5666,5666,5666,5666,5666
Staten Island,373,373,373,373,373,373,373,373,373,373,373,373


In [None]:

data.groupby('neighbourhood').count()

Unnamed: 0_level_0,host_id,host_name,neighbourhood_group,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Allerton,42,42,42,42,42,42,42,42,42,42,42,42
Arden Heights,4,4,4,4,4,4,4,4,4,4,4,4
Arrochar,21,21,21,21,21,21,21,21,21,21,21,21
Arverne,77,77,77,77,77,77,77,77,77,77,77,77
Astoria,900,900,900,900,900,900,900,900,900,900,900,900
...,...,...,...,...,...,...,...,...,...,...,...,...
Windsor Terrace,157,157,157,157,157,157,157,157,157,157,157,157
Woodhaven,88,88,88,88,88,88,88,88,88,88,88,88
Woodlawn,11,11,11,11,11,11,11,11,11,11,11,11
Woodrow,1,1,1,1,1,1,1,1,1,1,1,1


In [None]:
data.sort_values(by='price')

Unnamed: 0,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
25796,86327101,Adeyemi,Brooklyn,Bedford-Stuyvesant,40.68258,-73.91284,Private room,0,1,95,4.35,6,222
25634,15787004,Martial Loft,Brooklyn,Bushwick,40.69467,-73.92433,Private room,0,2,16,0.71,5,0
25433,131697576,Anisha,Bronx,East Morrisania,40.83296,-73.88668,Private room,0,2,55,2.56,4,127
25753,1641537,Lauren,Brooklyn,Greenpoint,40.72462,-73.94072,Private room,0,2,12,0.53,2,0
23161,8993084,Kimberly,Brooklyn,Bedford-Stuyvesant,40.69023,-73.95428,Private room,0,4,1,0.05,4,28
...,...,...,...,...,...,...,...,...,...,...,...,...,...
40433,4382127,Matt,Manhattan,Lower East Side,40.71980,-73.98566,Entire home/apt,9999,30,0,0.00,1,365
12342,3906464,Amy,Manhattan,Lower East Side,40.71355,-73.98507,Private room,9999,99,6,0.14,1,83
17692,5143901,Erin,Brooklyn,Greenpoint,40.73260,-73.95739,Entire home/apt,10000,5,5,0.16,1,0
9151,20582832,Kathrine,Queens,Astoria,40.76810,-73.91651,Private room,10000,100,2,0.04,1,0


In [None]:
data.groupby('neighbourhood_group').count().sort_values(by='price')

Unnamed: 0_level_0,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
neighbourhood_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Staten Island,373,373,373,373,373,373,373,373,373,373,373,373
Bronx,1091,1090,1091,1091,1091,1091,1091,1091,1091,1091,1091,1091
Queens,5666,5664,5666,5666,5666,5666,5666,5666,5666,5666,5666,5666
Brooklyn,20104,20095,20104,20104,20104,20104,20104,20104,20104,20104,20104,20104
Manhattan,21661,21652,21661,21661,21661,21661,21661,21661,21661,21661,21661,21661


In [None]:
# import folium
# from folium.plugins import HeatMap

# m=folium.Map([40.7128,-74.0060],zoom_start=11)
# HeatMap(data[['latitude','longitude']].dropna(),radius=5,gradient={0.2:'blue',0.4:'purple',0.6:'orange',1.0:'red'}).add_to(m)
# display(m)

In [None]:
import pandas as pd
Airbnb = pd.read_csv("Airbnb NYC 2019.csv")

import plotly.express as px

fig = px.scatter_mapbox(Airbnb, lat="latitude", lon="longitude", hover_name="neighbourhood_group", hover_data=["neighbourhood_group", "neighbourhood"],
                        color_discrete_sequence=["fuchsia"], zoom=10, height=750,width =720)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

In [None]:
## Changes by Abdul
import plotly.express as px

fig = px.scatter_mapbox(Airbnb, lat="latitude", lon="longitude", color="neighbourhood_group",
                        color_discrete_map={'Bronx': '#222A2A',  'Brooklyn': '#2E91E5',  'Manhattan': '#FC0080',  'Queens': '#750D86',  'Staten Island': '#0000EE'}
                        , zoom=9.5, height=700,width =800)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()