# AirBnB NY Locations Data Case Study

Your task will be to take the data provided and find evidence to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

This is to simulate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt

In [19]:
air_bnb = pd.read_csv('./AB_NYC_2019.csv')
air_bnb['last_review'] = pd.to_datetime(air_bnb['last_review'])
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,NaT,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [3]:
# Check for duplicate host IDs
dup_id = air_bnb.host_id.duplicated().sum()
dup_id

11438

In [4]:
# Which hosts are the busiest and why?
hosts = air_bnb.groupby('host_id').calculated_host_listings_count.count().nlargest(5)
hosts

host_id
219517861    327
107434423    232
30283594     121
137358866    103
12243051      96
Name: calculated_host_listings_count, dtype: int64

In [5]:
# How many neighborhood groups are available and which shows up the most?
neighborhoods = air_bnb.groupby('neighbourhood_group').neighbourhood_group.count().nlargest(5)
neighborhoods

neighbourhood_group
Manhattan        21661
Brooklyn         20104
Queens            5666
Bronx             1091
Staten Island      373
Name: neighbourhood_group, dtype: int64

In [6]:
# Are private rooms the most popular in manhattan?
manhattan = air_bnb.query('neighbourhood_group == "Manhattan"')
# manhattan
rooms = manhattan.room_type.value_counts()
rooms
# Entire home/apt is most popular in manhattan

Entire home/apt    13199
Private room        7982
Shared room          480
Name: room_type, dtype: int64

In [7]:
# Which hosts are the busiest based on their reviews?
most_reviewed = air_bnb.groupby('host_id').count()
most_reviewed.sort_values(by=['number_of_reviews'],ascending=False).head(5)

Unnamed: 0_level_0,id,name,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
host_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
219517861,327,327,327,327,327,327,327,327,327,327,327,207,207,327,327
107434423,232,232,232,232,232,232,232,232,232,232,232,28,28,232,232
30283594,121,121,121,121,121,121,121,121,121,121,121,43,43,121,121
137358866,103,103,103,103,103,103,103,103,103,103,103,51,51,103,103
16098958,96,96,96,96,96,96,96,96,96,96,96,61,61,96,96


In [18]:
# air_bnb

In [9]:
#Which neighorhood group has the highest average price?
neighborhood_average = air_bnb.groupby('neighbourhood_group').price.mean().nlargest(5)
neighborhood_average
# Manhattan has the highest average price

neighbourhood_group
Manhattan        196.875814
Brooklyn         124.383207
Staten Island    114.812332
Queens            99.517649
Bronx             87.496792
Name: price, dtype: float64

In [10]:
# Which neighborhood group has the highest total price?
highest_neighborhood = air_bnb.groupby('neighbourhood_group').price.sum().nlargest(5)
highest_neighborhood
# Manhattan has the highest average price

neighbourhood_group
Manhattan        4264527
Brooklyn         2500600
Queens            563867
Bronx              95459
Staten Island      42825
Name: price, dtype: int64

In [11]:
#Which top 5 hosts have the highest total price?
top_hosts = air_bnb.groupby('host_id').sum()
top_hosts.sort_values(by=['price'],ascending=False).head(5)

Unnamed: 0_level_0,id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
host_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
219517861,10885561678,13316.25823,-24198.18856,82795,4353,1281,397.56,106929,98588
107434423,7210036953,9451.60418,-17166.13165,70331,7470,29,6.04,53824,58884
156158778,332529233,488.73929,-887.71735,37097,12,1,1.0,144,776
205031545,1415225676,1996.92821,-3624.34656,35294,750,127,21.21,2401,10796
30283594,1611854192,4931.41347,-8952.50779,33581,3767,65,3.94,14641,37924


In [12]:
# Who currently has no (zero) availability with a review count of 100 or more?

not_available = air_bnb.loc[(air_bnb['availability_365'] == 0) & (air_bnb['number_of_reviews'] >= 100)]
not_available

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
8,5203,Cozy Clean Guest Room - Family Apt,7490,MaryEllen,Manhattan,Upper West Side,40.80178,-73.96723,Private room,79,2,118,2017-07-21,0.99,1,0
94,20913,Charming 1 bed GR8 WBurg LOCATION!,79402,Christiana,Brooklyn,Williamsburg,40.70984,-73.95775,Entire home/apt,100,5,168,2018-07-22,1.57,1,0
132,30031,NYC artists’ loft with roof deck,129352,Sol,Brooklyn,Greenpoint,40.73494,-73.95030,Private room,50,3,193,2019-05-20,1.86,1,0
174,44221,Financial District Luxury Loft,193722,Coral,Manhattan,Financial District,40.70666,-74.01374,Entire home/apt,196,3,114,2019-06-20,1.06,1,0
180,45556,"Fort Greene, Brooklyn: Center Bedroom",67778,Doug,Brooklyn,Fort Greene,40.68863,-73.97691,Private room,65,2,206,2019-06-30,1.92,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29581,22705516,The Quietest Block in Manhattan :),127740507,Kathleen,Manhattan,Harlem,40.83102,-73.94181,Private room,65,2,103,2019-07-07,5.89,2,0
30461,23574142,queens get away!!,176185168,Janet,Queens,Laurelton,40.68209,-73.73662,Private room,65,1,119,2018-12-24,7.79,1,0
31250,24267706,entire sunshine of the spotless mind room,21074914,Albert,Brooklyn,Bedford-Stuyvesant,40.68234,-73.91318,Private room,49,1,102,2019-07-05,6.73,3,0
32670,25719044,COZY Room for Female Guests,40119874,Stephany,Brooklyn,Prospect-Lefferts Gardens,40.66242,-73.94417,Private room,48,1,131,2019-05-31,9.97,2,0


In [13]:
# What host has the highest total of prices and where are they located?

highest_host = air_bnb.groupby(['host_id','neighbourhood_group']).sum()
highest_host.sort_values(by=['price'],ascending=False).head(5)
# Host ID 219517861 has the highest total of price and they are located in Manhattan.

Unnamed: 0_level_0,Unnamed: 1_level_0,id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
host_id,neighbourhood_group,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
219517861,Manhattan,10885561678,13316.25823,-24198.18856,82795,4353,1281,397.56,106929,98588
107434423,Manhattan,7142993903,9370.18553,-17018.18255,69741,7410,29,6.04,53360,58347
205031545,Manhattan,1415225676,1996.92821,-3624.34656,35294,750,127,21.21,2401,10796
30283594,Manhattan,1611854192,4931.41347,-8952.50779,33581,3767,65,3.94,14641,37924
156158778,Manhattan,232134838,326.02619,-591.83023,29194,8,1,1.0,96,711


In [14]:
# When did Danielle from Queens last receive a review?
# df['time'] = pd.to_datetime(df['time'])
danielle_info = air_bnb.loc[(air_bnb['host_name'] == 'Danielle') & (air_bnb['neighbourhood_group'] == 'Queens')]
danielle_info.last_review.max()

Timestamp('2019-07-08 00:00:00')

## Further Questions

1. Which host has the most listings?

In [15]:
most_listings = air_bnb.groupby(['host_id','host_name']).calculated_host_listings_count.count()
most_listings.sort_values(ascending=False).head(5)

host_id    host_name     
219517861  Sonder (NYC)      327
107434423  Blueground        232
30283594   Kara              121
137358866  Kazuya            103
16098958   Jeremy & Laura     96
Name: calculated_host_listings_count, dtype: int64

2. How many listings have completely open availability?

In [16]:
host_availability = air_bnb.query('availability_365 == 365')
open_availablity = host_availability.availability_365.count()
open_availablity

1295

3. What room_types have the highest review numbers?

In [17]:
room_type = air_bnb.groupby(['room_type', 'host_name', 'name']).number_of_reviews.max()
max_reviews = room_type.sort_values(ascending=False).head(5)
max_reviews

room_type     host_name  name                          
Private room  Dona       Room near JFK Queen Bed           629
              Jj         Great Bedroom in Manhattan        607
                         Beautiful Bedroom in Manhattan    597
                         Private Bedroom in Manhattan      594
              Dona       Room Near JFK Twin Beds           576
Name: number_of_reviews, dtype: int64

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [None]:
# The busiest hosts are located in Manhattan, this is possibly due to NYC being a popular tourist destination and Manhattan is a popular destination for tourists.
# Manhattan shows up the most due to location and also offers a variety of room types (entire home/apt,private room, etc..).
# Entire home/apt are most popular and this could be people using this as a second source of income (airbnb) while living on the outskirts of the city.
# Host ID: 219517861 has the most reviewes possibly due to being located centrally. The use also has availability for most of the year. Could be a popular listing on airbnb due to easy access to different parts of city.
# Manhattan has the highest average price as it is also the most expensive neighborhood, people willing to pay more for a populat location.
# Manhattan has the highest total price again being the most expensive neighborhood
# Hosts in Manhattan have the highest total price due to popularity and being centrally located
# 162 hosts have no current availability as their living arrangements could have changed or they're currently not interested in hosting anyone or they could have renovations planned for the property.
# Host ID 219517861 has the highest total of price and they are located in Manhattan. price and loaction have a direct relationship.
# Last review from 2019-07-08 could be because Danielle has not hosted again or people had a good experience but they chose to not submit a review/
# Host ID: 219517861 Name: Sonder (NYC) has the most listings as he also has an availability of 365.
# 1295 hosts possibly just use thier property for airbnb.
# Private rooms hosted by Dona have the highest review numbers possibly due to location and also the host offering amenities. Many people travelling could be using the single room to crash before catching another flight the next day.