# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidence to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [17]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [18]:
air_bnb = pd.read_csv('../files/AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [36]:
# How many neighborhood groups are available and which shows up the most?

print(air_bnb.neighbourhood_group.unique())
print('\nNeighborhood Group Count:',len(air_bnb.neighbourhood_group.unique()))

# There are five neighborhood groups available

print('\n',air_bnb.groupby('neighbourhood_group')['id'].count())

# Manhattan shows up the most, with 21661

['Brooklyn' 'Manhattan' 'Queens' 'Staten Island' 'Bronx']

Neighborhood Group Count: 5

 neighbourhood_group
Bronx             1091
Brooklyn         20104
Manhattan        21661
Queens            5666
Staten Island      373
Name: id, dtype: int64


In [58]:
# Are private rooms the most popular in manhattan?

print(air_bnb[air_bnb['neighbourhood_group'] == 'Manhattan'].groupby('room_type').count()['id'])

# No, it appears Entire home/apt are the most popular, with 13199 of them

room_type
Entire home/apt    13199
Private room        7982
Shared room          480
Name: id, dtype: int64


In [67]:
# Which hosts are the busiest and based on their reviews?

sorted_by_reviews = air_bnb.sort_values('number_of_reviews', kind = 'mergesort', ascending = False)
sorted_by_reviews.head(1) 
# It appears Dona, host_id of 47621202, is the busiest, with 629 reviews, and 14.58 per month

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
11759,9145202,Room near JFK Queen Bed,47621202,Dona,Queens,Jamaica,40.6673,-73.76831,Private room,47,1,629,2019-07-05,14.58,2,333


In [73]:
#Which neighorhood group has the highest average price?
print(air_bnb.groupby('neighbourhood_group')['price'].mean())

# Based on the data, it looks like Manhattan has the highest average price, with Manhattan averaging about $196.88
# per listing.

neighbourhood_group
Bronx             87.496792
Brooklyn         124.383207
Manhattan        196.875814
Queens            99.517649
Staten Island    114.812332
Name: price, dtype: float64


In [74]:
# Which neighbor hood group has the highest total price?
print(air_bnb.groupby('neighbourhood_group')['price'].sum())

# Unsurpisingly, Manhattan has the highest total price. This would make sense, considering it has the most listings
# by about 1500 rooms, and the highest average price by about $72/room. So for it to have nearly double the next 
# highest (Brooklyn) with 4264527 would make sense.

neighbourhood_group
Bronx              95459
Brooklyn         2500600
Manhattan        4264527
Queens            563867
Staten Island      42825
Name: price, dtype: int64


In [93]:
#Which top 5 hosts have the highest total price?
top_5_hosts = air_bnb.groupby('host_name').sum().sort_values('price', ascending = False)
print(top_5_hosts['price'].head(5))

# The top 5 hosts by price appear to be:
# Sonder (NYC), Blueground, Michael, David, Alex

host_name
Sonder (NYC)    82795
Blueground      70331
Michael         66895
David           65844
Alex            52563
Name: price, dtype: int64


In [115]:
# Who currently has no (zero) availability with a review count of 100 or more?
# pd.set_option('display.max_rows',None)

print(air_bnb[(air_bnb['availability_365'] == 0) & (air_bnb['number_of_reviews'] >= 100)].count(), '\n')
air_bnb[(air_bnb['availability_365'] == 0) & (air_bnb['number_of_reviews'] >= 100)]
# based on my print statement, there are 162 instances of reviews of 100 or more with zero availability. 
# They are all printed below:

id                                162
name                              162
host_id                           162
host_name                         161
neighbourhood_group               162
neighbourhood                     162
latitude                          162
longitude                         162
room_type                         162
price                             162
minimum_nights                    162
number_of_reviews                 162
last_review                       162
reviews_per_month                 162
calculated_host_listings_count    162
availability_365                  162
dtype: int64 



Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
8,5203,Cozy Clean Guest Room - Family Apt,7490,MaryEllen,Manhattan,Upper West Side,40.80178,-73.96723,Private room,79,2,118,2017-07-21,0.99,1,0
94,20913,Charming 1 bed GR8 WBurg LOCATION!,79402,Christiana,Brooklyn,Williamsburg,40.70984,-73.95775,Entire home/apt,100,5,168,2018-07-22,1.57,1,0
132,30031,NYC artists’ loft with roof deck,129352,Sol,Brooklyn,Greenpoint,40.73494,-73.9503,Private room,50,3,193,2019-05-20,1.86,1,0
174,44221,Financial District Luxury Loft,193722,Coral,Manhattan,Financial District,40.70666,-74.01374,Entire home/apt,196,3,114,2019-06-20,1.06,1,0
180,45556,"Fort Greene, Brooklyn: Center Bedroom",67778,Doug,Brooklyn,Fort Greene,40.68863,-73.97691,Private room,65,2,206,2019-06-30,1.92,2,0
220,57468,"Modern, Large East Village Loft",239208,Ori,Manhattan,East Village,40.72821,-73.98701,Entire home/apt,189,3,205,2019-06-23,1.96,1,0
250,62461,B NYC Staten Alternative...,303939,Lissette,Staten Island,Tompkinsville,40.63627,-74.08543,Private room,37,2,147,2019-06-10,1.44,6,0
357,99070,Comfortable Cozy Space in El Barrio,522065,Liz And Melissa,Manhattan,East Harlem,40.79406,-73.94102,Shared room,65,7,131,2019-05-26,1.31,2,0
415,140425,Holiday Time in NY - Oh My!!,683975,Ivy,Brooklyn,Crown Heights,40.6755,-73.95878,Private room,79,2,115,2017-05-25,1.18,1,0
462,163627,Blue Room in Awesome Artist's Apartment!,242506,Jsun,Brooklyn,Williamsburg,40.71023,-73.96665,Private room,89,3,205,2017-12-31,2.31,3,0


In [179]:
# What host has the highest total of prices and where are they located?
air_bnb.keys()
air_bnb.groupby(['host_name','neighbourhood'])[['price']].sum() #.nlargest(3,['price'])
# Using the method I commented out to the right, it appears that Sonder (NYC) has the most for one neighborhood,
# with 57738 in the Financial District.

Unnamed: 0_level_0,Unnamed: 1_level_0,price
host_name,neighbourhood,Unnamed: 2_level_1
'Cil,Astoria,120
(Ari) HENRY LEE,East Harlem,140
(Email hidden by Airbnb),Clinton Hill,261
(Email hidden by Airbnb),Midtown,389
(Email hidden by Airbnb),Upper West Side,90
(Email hidden by Airbnb),West Village,200
(Email hidden by Airbnb),Williamsburg,120
(Mary) Haiy,Bay Ridge,126
-TheQueensCornerLot,Queens Village,150
0123,Lower East Side,600


In [164]:
# When did Danielle from Queens last receive a review?

print(air_bnb[(air_bnb['host_name']=='Danielle') & (air_bnb['neighbourhood_group']=='Queens')]['last_review'])

# According to the data below, it appears that the last review was from 07/08/2019 (6th row down)

7086     2019-07-03
16349           NaN
20403    2019-07-06
21517    2019-07-07
22068    2019-07-06
22469    2019-07-08
27021    2018-01-02
33861    2019-06-20
Name: last_review, dtype: object


## Further Questions

1. Which host has the most listings?

In [178]:
air_bnb.groupby('host_name')['name'].count().nlargest()

# Michael appears to have the most listings, with 417

host_name
Michael         417
David           403
Sonder (NYC)    327
John            294
Alex            279
Name: name, dtype: int64

2. How many listings have completely open availability?

In [188]:
# air_bnb.columns
print(air_bnb[air_bnb['availability_365'] >= 365]['id'].count())
air_bnb[air_bnb['availability_365'] >= 365].head()

# It appears that 1295 have availability 365 days of the year!

1295


Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
36,11452,Clean and Quiet in Brooklyn,7355,Vt,Brooklyn,Bedford-Stuyvesant,40.68876,-73.94312,Private room,35,60,0,,,1,365
38,11943,Country space in the city,45445,Harriet,Brooklyn,Flatbush,40.63702,-73.96327,Private room,150,1,0,,,1,365
97,21644,"Upper Manhattan, New York",82685,Elliott,Manhattan,Harlem,40.82803,-73.94731,Private room,89,1,1,2018-10-09,0.11,1,365


3. What room_types have the highest review numbers?

In [193]:
# air_bnb.columns

print(air_bnb.groupby('room_type')['number_of_reviews'].sum())

# It appears that Entire home/apt have the most reviews, with 580403.

room_type
Entire home/apt    580403
Private room       538346
Shared room         19256
Name: number_of_reviews, dtype: int64


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --