Problem background

    Suppose that you are working as a data analyst at Airbnb. For the past few months, Airbnb has seen a major decline in revenue. - Now that the restrictions have started lifting and people have started to travel more, Airbnb wants to make sure that it is fully prepared for this change.

End Objective

    To prepare for the next best steps that Airbnb needs to take as a business, you have been asked to analyse a dataset consisting of various Airbnb listings in New York.

Presentation - I

    Data Analysis Managers: These people manage the data analysts directly for processes and their technical expertise is basic.
    Lead Data Analyst: The lead data analyst looks after the entire team of data and business analysts and is technically sound.

Presentation - II

    Head of Acquisitions and Operations, NYC: This head looks after all the property and host acquisitions and operations. Acquisition of the best properties, price negotiation, and negotiating the services the properties offer falls under the purview of this role.
    Head of User Experience, NYC: The head of user experience looks after the customer preferences and also handles the properties listed on the website and the Airbnb app. Basically, the head of user experience tries to optimise the order of property listing in certain neighbourhoods and cities in order to get every property the optimal amount of traction.


In [1]:
# Import the necessary libraries
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [5]:
from google.colab import files
uploaded = files.upload()


Saving AB_NYC_2019.csv to AB_NYC_2019.csv


In [10]:
# Read the uploaded file into a DataFrame
for filename in uploaded.keys():
    AB_NYC_2019 = pd.read_csv(filename)

In [13]:
# Check the shape of the dataset
AB_NYC_2019.shape

(48895, 16)

In [14]:
# Calculating the missing values in the dataset
AB_NYC_2019.isnull().sum()

id                                    0
name                                 16
host_id                               0
host_name                            21
neighbourhood_group                   0
neighbourhood                         0
latitude                              0
longitude                             0
room_type                             0
price                                 0
minimum_nights                        0
number_of_reviews                     0
last_review                       10052
reviews_per_month                 10052
calculated_host_listings_count        0
availability_365                      0
dtype: int64

In [21]:
# Now we have the missing values, there are certain columns that are not efficient to the dataset
AB_NYC_2019.drop(['id','name','last_review'], axis = 1, inplace = True)

KeyError: "['id', 'name', 'last_review'] not found in axis"

In [22]:
# View whether the columns are dropped
AB_NYC_2019.head(5)

Unnamed: 0,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
0,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,0.21,6,365
1,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,0.38,2,355
2,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,1,365
3,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,4.64,1,194
4,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,0.1,1,0


In [30]:
AB_NYC_2019.reviews_per_month.isnull().sum()

0

In [27]:
# Now reviews per month contains more missing values which should be replaced with 0 respectively
AB_NYC_2019.fillna({'reviews_per_month':0},inplace=True)

In [29]:
AB_NYC_2019.reviews_per_month.isnull().sum()

0

In [31]:
# There are no missing values present in reviews_per_month column
# Now to check the unique values of other columns'
AB_NYC_2019.room_type.unique()

array(['Private room', 'Entire home/apt', 'Shared room'], dtype=object)

In [35]:
len(AB_NYC_2019.room_type.unique())

3

In [33]:
AB_NYC_2019.neighbourhood_group.unique()

array(['Brooklyn', 'Manhattan', 'Queens', 'Staten Island', 'Bronx'],
      dtype=object)

In [34]:
len(AB_NYC_2019.neighbourhood_group.unique())

5

In [36]:
len(AB_NYC_2019.neighbourhood.unique())

221

In [38]:
AB_NYC_2019.to_csv(r'C:\Users\graina\Downloads\AB_NYC_2019.csv',index=False, header=True)

In [39]:
AB_NYC_2019.host_id.value_counts().head(10)

host_id
219517861    327
107434423    232
30283594     121
137358866    103
16098958      96
12243051      96
61391963      91
22541573      87
200380610     65
7503643       52
Name: count, dtype: int64

In [40]:
airbnb2 = AB_NYC_2019.sort_values(by="calculated_host_listings_count",ascending=False)
airbnb2.head()

Unnamed: 0,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
39773,219517861,Sonder (NYC),Manhattan,Hell's Kitchen,40.76037,-73.99744,Entire home/apt,185,29,1,1.0,327,332
41463,219517861,Sonder (NYC),Manhattan,Financial District,40.70782,-74.01227,Entire home/apt,396,2,8,2.12,327,289
41469,219517861,Sonder (NYC),Manhattan,Financial District,40.7062,-74.01192,Entire home/apt,498,2,8,2.5,327,255
38294,219517861,Sonder (NYC),Manhattan,Financial District,40.70771,-74.00641,Entire home/apt,229,29,1,0.73,327,219
41468,219517861,Sonder (NYC),Manhattan,Financial District,40.70726,-74.0106,Entire home/apt,229,2,2,0.77,327,351
