# AirBnB NY Locations Data Case Study

Your task will be to take the data provided and find evidence to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

This is to simulate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [122]:
df = pd.read_csv('./AB_NYC_2019.csv')
df

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.94190,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.10,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48890,36484665,Charming one bedroom - newly renovated rowhouse,8232441,Sabrina,Brooklyn,Bedford-Stuyvesant,40.67853,-73.94995,Private room,70,2,0,,,2,9
48891,36485057,Affordable room in Bushwick/East Williamsburg,6570630,Marisol,Brooklyn,Bushwick,40.70184,-73.93317,Private room,40,4,0,,,2,36
48892,36485431,Sunny Studio at Historical Neighborhood,23492952,Ilgar & Aysel,Manhattan,Harlem,40.81475,-73.94867,Entire home/apt,115,10,0,,,1,27
48893,36485609,43rd St. Time Square-cozy single bed,30985759,Taz,Manhattan,Hell's Kitchen,40.75751,-73.99112,Shared room,55,1,0,,,6,2


In [14]:
# How many neighborhood groups are available and which shows up the most?

num_neighborhood_groups = df.groupby(['neighbourhood_group']).count()[['id']]
num_neighborhood_groups.sort_values(by = ['id'],ascending=False)
#There are 5 groups available and Manhattan shows up the most.

Unnamed: 0_level_0,id
neighbourhood_group,Unnamed: 1_level_1
Manhattan,21661
Brooklyn,20104
Queens,5666
Bronx,1091
Staten Island,373


In [38]:
# Are private rooms the most popular in manhattan?

filtered_neighbourhood = df.loc[df['neighbourhood_group'] == 'Manhattan']
most_pop_room = filtered_neighbourhood.groupby(['room_type']).count()[['id']] 
most_pop_room
#Private rooms are the second most popular room type in Manhattan.

Unnamed: 0_level_0,id
room_type,Unnamed: 1_level_1
Entire home/apt,13199
Private room,7982
Shared room,480


In [77]:
# Which hosts are the busiest and based on their reviews?

hosts = df.groupby(['host_id', 'host_name']).sum()[['number_of_reviews']]
hosts.sort_values(by = ['number_of_reviews'],ascending=False)
#Maya, Brooklyn& Breakfast -Len-, Danielle, Yasu & Akiko, and Brady are the busiest based on the number of their reviews.

Unnamed: 0_level_0,Unnamed: 1_level_0,number_of_reviews
host_id,host_name,Unnamed: 2_level_1
37312959,Maya,2273
344035,Brooklyn& Breakfast -Len-,2205
26432133,Danielle,2017
35524316,Yasu & Akiko,1971
40176101,Brady,1818
...,...,...
39695769,Avra,0
39706334,Erin,0
39724060,Jaime,0
39731713,Polina,0


In [53]:
#Which neighorhood group has the highest average price?

average = df.groupby(['neighbourhood_group']).mean()[['price']]
average.sort_values(by = ['price'],ascending=False)
#Manhattan has the highest average price

Unnamed: 0_level_0,price
neighbourhood_group,Unnamed: 1_level_1
Manhattan,196.875814
Brooklyn,124.383207
Staten Island,114.812332
Queens,99.517649
Bronx,87.496792


In [58]:
# Which neighbor hood group has the highest total price?

most_expensive_group = df.groupby(['neighbourhood_group']).sum()[['price']]
most_expensive_group.sort_values(by = ['price'],ascending=False)
#Manhattan has the highest total price by a wide margin.

Unnamed: 0_level_0,price
neighbourhood_group,Unnamed: 1_level_1
Manhattan,4264527
Brooklyn,2500600
Queens,563867
Bronx,95459
Staten Island,42825


In [75]:
#Which top 5 hosts have the highest total price?

boujee_hosts = df.groupby(['host_id', 'host_name']).sum()[['price']]
boujee_hosts.sort_values(by = ['price'],ascending=False).head()
#Sonder (NYC), Blueground, Sally, Ref Awning, and Kara have the highest total price.

Unnamed: 0_level_0,Unnamed: 1_level_0,price
host_id,host_name,Unnamed: 2_level_1
219517861,Sonder (NYC),82795
107434423,Blueground,70331
156158778,Sally,37097
205031545,Red Awning,35294
30283594,Kara,33581


In [84]:
# Who currently has no (zero) availability with a review count of 100 or more?

df.loc[(df['availability_365'] == 0) & (df['number_of_reviews']>100)]
#There are 158 places with 0 availablity and 100 or more reviews.

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
8,5203,Cozy Clean Guest Room - Family Apt,7490,MaryEllen,Manhattan,Upper West Side,40.80178,-73.96723,Private room,79,2,118,2017-07-21,0.99,1,0
94,20913,Charming 1 bed GR8 WBurg LOCATION!,79402,Christiana,Brooklyn,Williamsburg,40.70984,-73.95775,Entire home/apt,100,5,168,2018-07-22,1.57,1,0
132,30031,NYC artists’ loft with roof deck,129352,Sol,Brooklyn,Greenpoint,40.73494,-73.95030,Private room,50,3,193,2019-05-20,1.86,1,0
174,44221,Financial District Luxury Loft,193722,Coral,Manhattan,Financial District,40.70666,-74.01374,Entire home/apt,196,3,114,2019-06-20,1.06,1,0
180,45556,"Fort Greene, Brooklyn: Center Bedroom",67778,Doug,Brooklyn,Fort Greene,40.68863,-73.97691,Private room,65,2,206,2019-06-30,1.92,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29581,22705516,The Quietest Block in Manhattan :),127740507,Kathleen,Manhattan,Harlem,40.83102,-73.94181,Private room,65,2,103,2019-07-07,5.89,2,0
30461,23574142,queens get away!!,176185168,Janet,Queens,Laurelton,40.68209,-73.73662,Private room,65,1,119,2018-12-24,7.79,1,0
31250,24267706,entire sunshine of the spotless mind room,21074914,Albert,Brooklyn,Bedford-Stuyvesant,40.68234,-73.91318,Private room,49,1,102,2019-07-05,6.73,3,0
32670,25719044,COZY Room for Female Guests,40119874,Stephany,Brooklyn,Prospect-Lefferts Gardens,40.66242,-73.94417,Private room,48,1,131,2019-05-31,9.97,2,0


In [88]:
# What host has the highest total of prices and where are they located?

most_expensivest_host = df.groupby(['host_id', 'host_name', 'neighbourhood_group']).sum()[['price']]
most_expensivest_host.sort_values(by = ['price'],ascending=False)
#Sonder (NYC) has the highest total price and they are in Manhattan.

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,price
host_id,host_name,neighbourhood_group,Unnamed: 3_level_1
219517861,Sonder (NYC),Manhattan,82795
107434423,Blueground,Manhattan,69741
205031545,Red Awning,Manhattan,35294
30283594,Kara,Manhattan,33581
156158778,Sally,Manhattan,29194
...,...,...,...
91034542,Maureen,Manhattan,10
205820814,Luz,Bronx,10
52777892,Amy,Manhattan,10
10132166,Aymeric,Brooklyn,0


In [105]:
# When did Danielle from Queens last receive a review?

df.loc[(df['host_name'] == 'Danielle') & (df['neighbourhood_group'] == 'Queens'), ['last_review']]
#Danielle's most recent review was on 07/08/19.

Unnamed: 0,last_review
7086,2019-07-03
16349,
20403,2019-07-06
21517,2019-07-07
22068,2019-07-06
22469,2019-07-08
27021,2018-01-02
33861,2019-06-20


## Further Questions

1. Which host has the most listings?

In [109]:
hostest_with_mostest = df.groupby(['host_id', 'host_name']).count()[['id']]
hostest_with_mostest.sort_values(by = ['id'],ascending=False)
#Sonder (NYC) has the most listings.

Unnamed: 0_level_0,Unnamed: 1_level_0,id
host_id,host_name,Unnamed: 2_level_1
219517861,Sonder (NYC),327
107434423,Blueground,232
30283594,Kara,121
137358866,Kazuya,103
16098958,Jeremy & Laura,96
...,...,...
13543967,Paulina,1
13541655,Michael,1
13540183,Ashley,1
13538150,Mariana,1


2. How many listings have completely open availability?

In [119]:
no_takers = df.loc[(df['availability_365'] == 365)].availability_365.count()
no_takers
#1295 listings have 365 days of availability

1295

3. What room_types have the highest review numbers?

In [124]:
best_room_types = df.groupby(['room_type']).sum()[['number_of_reviews']]
best_room_types
#Entire home/apt has the highest number of reviews.

Unnamed: 0_level_0,number_of_reviews
room_type,Unnamed: 1_level_1
Entire home/apt,580403
Private room,538346
Shared room,19256


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [125]:
#The first question enlightened me as to how much more popular Manhattan and Brooklyn are in this dataset than the
#rest of the burroughs. That does make sense to me though, as those seem to be the most popular in terms of nightlife
#and having a younger crowd.

In [126]:
#Question 2 confirmed my own feelings as someone who uses airbnb fairly frequently. I would much rather have an entire
#space to myself and/or the group I'm with, and it seems this data backs up those thoughts. People prefer their own
#spot almost twice as much as a shared rental.

In [127]:
#The next question doesn't explain much other than what the query shows on face value. These are the most reviewed  
#rentals, and that can either mean people love the place, or it means people really do not like it.

In [128]:
#It makes sense that Manhattan and Brooklyn are the most expensive places to rent an airbnb, but it's very interesting
#to see how large the disparity is between the most expensive, Manhattan, and the second, Brooklyn, given how close 
#they are in terms of number of properties. Other than Manhattan, the rest of the burroughs are all similarly priced.

In [129]:
#The next question only reiterates the question prior, Manhattan has the largest total price across all rentals,
#and it ain't close. They have the highest count, highest individual price, and the highest total price by a huge margin.

In [130]:
#Sonder (NYC) must have some fancy rentals, because the total price to rent their properties is almost 90k. And, of course,
#all of the hosts with the highest total price are in Manhattan.

In [131]:
#NYC is a very busy city, and a major tourist attraction, and this query illustrates that over 150 highly rated spots
#are booked for a full year, hopefully there's some hidden gems that have more availability.

In [132]:
#Danielle loves Queens, as she's left multiple reviews on rentals in the burrough. Hopefully they were positive.

In [133]:
#Ahh that makes sense. Sonder (NYC) has the most listings by almost 100. The top 5 most listings is almost identical
#to the top 5 total price list. These hosts are running the airbnb game in Manhattan. 

In [134]:
#Whew, we'll be fine. There are a ton, actually a little less than a ton, places available to rent for any day
#throughout the whole year. Maybe one of them is a Sonder spot? They seem reputable.

In [None]:
#Entire homes/apt are the most popular room type, therefore they have the highest number of reviews. Makes sense
#to me!