# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidence to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this entire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [4]:
air_bnb = pd.read_csv('AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [11]:
# Which hosts are the busiest and why?
maxreviews = air_bnb.groupby(['host_id', 'host_name'])[['number_of_reviews']].sum()
maxreviews

Unnamed: 0_level_0,Unnamed: 1_level_0,number_of_reviews
host_id,host_name,Unnamed: 2_level_1
2438,Tasos,1
2571,Teedo,27
2787,John,105
2845,Jennifer,46
2868,Letha M.,2
...,...,...
274273284,Anastasia,0
274298453,Adrien,0
274307600,Jonathan,0
274311461,Scott,0


In [16]:
# another way to look at busy host - average property availability
hosta365 = air_bnb.groupby(['host_id', 'host_name'])[['availability_365']].mean()
hosta365 = hosta365[hosta365['availability_365'] <= 50]

# join zero availability with total reviews - inner join
maxreviews.merge(hosta365, on=['host_id', 'host_name'], how='inner').sort_values('number_of_reviews', ascending=False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,number_of_reviews,availability_365
host_id,host_name,Unnamed: 2_level_1,Unnamed: 3_level_1
40176101,Brady,1818,48.857143
22959695,Gurpreet Singh,1157,0.0
156684502,Nalicia,1046,25.666667
50600973,Joyell,949,49.428571
2267153,John,846,7.0


In [26]:
# How many neighborhood groups are available and which shows up the most?
# the .nunique() called on a column/series will give us the number of unique values in that series
print(f"There are {air_bnb['neighbourhood_group'].nunique()} neighbourhood groups!")
air_bnb.groupby('neighbourhood_group')[['id']].count().sort_values('id', ascending=False)
# we have the most airbnbs in Manhattan.

There are 5 neighbourhood groups!


Unnamed: 0_level_0,id
neighbourhood_group,Unnamed: 1_level_1
Manhattan,21661
Brooklyn,20104
Queens,5666
Bronx,1091
Staten Island,373


In [32]:
# Are private rooms the most popular in manhattan?
# before we can answer this we have to filter down our dataframe into two sections
# and then compare/contrast those two dataframes

manhattan = air_bnb[air_bnb['neighbourhood_group'] == 'Manhattan']

manhattan.groupby('room_type').count()[['neighbourhood_group']]
# One potential answer - how many airbnbs of this room type exist in Manhattan?
# By that interpretation, private rooms are not the most popular in Manhattan.
# The most popular room type is an entire home/apt.

Unnamed: 0_level_0,neighbourhood_group
room_type,Unnamed: 1_level_1
Entire home/apt,13199
Private room,7982
Shared room,480


In [35]:
# What about by average availability
manhattan.groupby('room_type')[['availability_365']].mean().sort_values('availability_365')
# In terms of which room type is least available - private rooms are the most popular.

Unnamed: 0_level_0,availability_365
room_type,Unnamed: 1_level_1
Private room,101.845026
Entire home/apt,117.140996
Shared room,138.572917


In [53]:
# Which hosts are the busiest based on their reviews?
# Let's look at this through the lens of average reviews per month and sum total of reviews
maxreviews = maxreviews[maxreviews['number_of_reviews'] > 527] # from question 1
avgrpm = air_bnb.groupby(['host_id', 'host_name'])[['reviews_per_month']].mean()
avgrpm = avgrpm[avgrpm['reviews_per_month'] > 6.75]

maxreviews.merge(avgrpm, on=['host_id', 'host_name'], how='inner').sort_values('number_of_reviews', ascending=False)
# these hosts are in the top 1% for both total number of reviews and average reviews per month

Unnamed: 0_level_0,Unnamed: 1_level_0,number_of_reviews,reviews_per_month
host_id,host_name,Unnamed: 2_level_1,Unnamed: 3_level_1
37312959,Maya,2273,10.706
26432133,Danielle,2017,13.604
4734398,Jj,1798,7.68
47621202,Dona,1205,13.99
58391491,Juel,1154,7.436
156948703,Asad,1052,9.406667
156684502,Nalicia,1046,18.126667
121391142,Deloris,693,12.48
1314045,Tim,678,6.986667
55125246,Yvonne,653,8.556667


In [55]:
# Which neighorhood group has the highest average price?
air_bnb.groupby('neighbourhood_group')[['price']].mean().sort_values('price', ascending=False)
# Highest average price goes to Manhattan at $196.87

Unnamed: 0_level_0,price
neighbourhood_group,Unnamed: 1_level_1
Manhattan,196.875814
Brooklyn,124.383207
Staten Island,114.812332
Queens,99.517649
Bronx,87.496792


In [56]:
# Which neighborhood group has the highest total price?
air_bnb.groupby('neighbourhood_group')[['price']].sum().sort_values('price', ascending=False)
# The highest total price also goes to Manhattan

Unnamed: 0_level_0,price
neighbourhood_group,Unnamed: 1_level_1
Manhattan,4264527
Brooklyn,2500600
Queens,563867
Bronx,95459
Staten Island,42825


In [57]:
# Which top 5 hosts have the highest total price?
air_bnb.groupby(['host_id', 'host_name'])[['price']].sum().sort_values('price', ascending=False).head()


Unnamed: 0_level_0,Unnamed: 1_level_0,price
host_id,host_name,Unnamed: 2_level_1
219517861,Sonder (NYC),82795
107434423,Blueground,70331
156158778,Sally,37097
205031545,Red Awning,35294
30283594,Kara,33581


In [45]:
sonderbnbs = air_bnb[air_bnb['host_name'] == 'Sonder (NYC)'].groupby(['host_name', 'neighbourhood_group','neighbourhood', 'room_type'])[['price']].sum().sort_values('price', ascending=False)
sonderbnbs

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,price
host_name,neighbourhood_group,neighbourhood,room_type,Unnamed: 4_level_1
Sonder (NYC),Manhattan,Financial District,Entire home/apt,55303
Sonder (NYC),Manhattan,Murray Hill,Entire home/apt,11005
Sonder (NYC),Manhattan,Theater District,Entire home/apt,7743
Sonder (NYC),Manhattan,Hell's Kitchen,Entire home/apt,2789
Sonder (NYC),Manhattan,Financial District,Private room,2435
Sonder (NYC),Manhattan,Chelsea,Entire home/apt,1761
Sonder (NYC),Manhattan,Upper East Side,Entire home/apt,958
Sonder (NYC),Manhattan,Midtown,Entire home/apt,801


In [46]:
# show that total as a row within the grouped by dataframe
sonderbnbs.loc['Total', 'price'] = sonderbnbs['price'].sum()
sonderbnbs

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,price
host_name,neighbourhood_group,neighbourhood,room_type,Unnamed: 4_level_1
Sonder (NYC),Manhattan,Financial District,Entire home/apt,55303.0
Sonder (NYC),Manhattan,Murray Hill,Entire home/apt,11005.0
Sonder (NYC),Manhattan,Theater District,Entire home/apt,7743.0
Sonder (NYC),Manhattan,Hell's Kitchen,Entire home/apt,2789.0
Sonder (NYC),Manhattan,Financial District,Private room,2435.0
Sonder (NYC),Manhattan,Chelsea,Entire home/apt,1761.0
Sonder (NYC),Manhattan,Upper East Side,Entire home/apt,958.0
Sonder (NYC),Manhattan,Midtown,Entire home/apt,801.0
Total,,,,82795.0


In [64]:
air_bnb['neighbourhood_group'].value_counts().sort_values(ascending=False)
# shows each Neighbourhood_group and the number of times that neighbourhood occurs in the dataframe
# Another way to show the neighbourhood group with the most listings

Manhattan        21661
Brooklyn         20104
Queens            5666
Bronx             1091
Staten Island      373
Name: neighbourhood_group, dtype: int64

In [66]:
air_bnb['neighbourhood'].value_counts().sort_values(ascending=False).head()

Williamsburg          3920
Bedford-Stuyvesant    3714
Harlem                2658
Bushwick              2465
Upper West Side       1971
Name: neighbourhood, dtype: int64

In [73]:
# Who currently has no (zero) availability with a review count of 100 or more?

# Who referring to a single property =:
zero_avail = air_bnb[air_bnb['availability_365'] == 0]
zero_avail = zero_avail[zero_avail['number_of_reviews'] >= 100]
zero_avail # 162 different properties currently have zero availability and more than 100 reviews

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
8,5203,Cozy Clean Guest Room - Family Apt,7490,MaryEllen,Manhattan,Upper West Side,40.80178,-73.96723,Private room,79,2,118,2017-07-21,0.99,1,0
94,20913,Charming 1 bed GR8 WBurg LOCATION!,79402,Christiana,Brooklyn,Williamsburg,40.70984,-73.95775,Entire home/apt,100,5,168,2018-07-22,1.57,1,0
132,30031,NYC artists’ loft with roof deck,129352,Sol,Brooklyn,Greenpoint,40.73494,-73.95030,Private room,50,3,193,2019-05-20,1.86,1,0
174,44221,Financial District Luxury Loft,193722,Coral,Manhattan,Financial District,40.70666,-74.01374,Entire home/apt,196,3,114,2019-06-20,1.06,1,0
180,45556,"Fort Greene, Brooklyn: Center Bedroom",67778,Doug,Brooklyn,Fort Greene,40.68863,-73.97691,Private room,65,2,206,2019-06-30,1.92,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29581,22705516,The Quietest Block in Manhattan :),127740507,Kathleen,Manhattan,Harlem,40.83102,-73.94181,Private room,65,2,103,2019-07-07,5.89,2,0
30461,23574142,queens get away!!,176185168,Janet,Queens,Laurelton,40.68209,-73.73662,Private room,65,1,119,2018-12-24,7.79,1,0
31250,24267706,entire sunshine of the spotless mind room,21074914,Albert,Brooklyn,Bedford-Stuyvesant,40.68234,-73.91318,Private room,49,1,102,2019-07-05,6.73,3,0
32670,25719044,COZY Room for Female Guests,40119874,Stephany,Brooklyn,Prospect-Lefferts Gardens,40.66242,-73.94417,Private room,48,1,131,2019-05-31,9.97,2,0


In [74]:
air_bnb[(air_bnb['availability_365']==0) & (air_bnb['number_of_reviews']>=100)].groupby(['host_id','host_name']).sum()[['number_of_reviews']].sort_values('number_of_reviews')


Unnamed: 0_level_0,Unnamed: 1_level_0,number_of_reviews
host_id,host_name,Unnamed: 2_level_1
42399786,Braydon,100
22423049,Abraham,100
84141923,Marisha,100
96148809,Raymond,100
1492339,Karin,101
...,...,...
37818581,Sofia,432
792159,Wanda,480
121391142,Deloris,693
99392252,Michael,732


In [88]:
# Are there any hosts that meet these criteria for EVERY property
# aka is the total number of reviews > 100 and the availability 0 for the host
hosts = air_bnb.groupby(['host_id', 'host_name'])[['number_of_reviews', 'availability_365']].sum()
hosts = hosts[hosts['availability_365'] == 0]
hosts = hosts[hosts['number_of_reviews'] >= 100].sort_values('number_of_reviews', ascending=False)
hosts

Unnamed: 0_level_0,Unnamed: 1_level_0,number_of_reviews,availability_365
host_id,host_name,Unnamed: 2_level_1,Unnamed: 3_level_1
22959695,Gurpreet Singh,1157,0
99392252,Michael,732,0
121391142,Deloris,693,0
792159,Wanda,480,0
37818581,Sofia,479,0
...,...,...,...
22423049,Abraham,100,0
42399786,Braydon,100,0
21090508,Jarad,100,0
140293912,Awilda,100,0


In [90]:
# What host has the highest total of prices and where are they located?
sonderbnbs = air_bnb[air_bnb['host_name'] == 'Sonder (NYC)'].groupby(['host_name', 'neighbourhood_group','neighbourhood'])[['price']].sum().sort_values('price', ascending=False)
sonderbnbs


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,price
host_name,neighbourhood_group,neighbourhood,Unnamed: 3_level_1
Sonder (NYC),Manhattan,Financial District,57738
Sonder (NYC),Manhattan,Murray Hill,11005
Sonder (NYC),Manhattan,Theater District,7743
Sonder (NYC),Manhattan,Hell's Kitchen,2789
Sonder (NYC),Manhattan,Chelsea,1761
Sonder (NYC),Manhattan,Upper East Side,958
Sonder (NYC),Manhattan,Midtown,801


In [91]:
air_bnb[air_bnb['host_name'] == 'Sonder (NYC)']

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
38293,30181691,Sonder | 180 Water | Incredible 2BR + Rooftop,219517861,Sonder (NYC),Manhattan,Financial District,40.70637,-74.00645,Entire home/apt,302,29,0,,,327,309
38294,30181945,Sonder | 180 Water | Premier 1BR + Rooftop,219517861,Sonder (NYC),Manhattan,Financial District,40.70771,-74.00641,Entire home/apt,229,29,1,2019-05-29,0.73,327,219
38588,30347708,Sonder | 180 Water | Charming 1BR + Rooftop,219517861,Sonder (NYC),Manhattan,Financial District,40.70743,-74.00443,Entire home/apt,232,29,1,2019-05-21,0.60,327,159
39769,30937590,Sonder | The Nash | Artsy 1BR + Rooftop,219517861,Sonder (NYC),Manhattan,Murray Hill,40.74792,-73.97614,Entire home/apt,262,2,8,2019-06-09,1.86,327,91
39770,30937591,Sonder | The Nash | Lovely Studio + Rooftop,219517861,Sonder (NYC),Manhattan,Murray Hill,40.74771,-73.97528,Entire home/apt,255,2,14,2019-06-10,2.59,327,81
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
47691,35871510,Sonder | 116 John | Vibrant Studio + Fitness Room,219517861,Sonder (NYC),Manhattan,Financial District,40.70818,-74.00631,Entire home/apt,135,29,0,,,327,339
47692,35871511,Sonder | 116 John | Vibrant 1BR + Fitness Room,219517861,Sonder (NYC),Manhattan,Financial District,40.70691,-74.00682,Entire home/apt,165,29,0,,,327,342
47693,35871515,Sonder | 116 John | Stunning 1BR + Rooftop,219517861,Sonder (NYC),Manhattan,Financial District,40.70772,-74.00673,Entire home/apt,165,29,0,,,327,347
47814,35936418,Sonder | 116 John | Polished Studio + Gym,219517861,Sonder (NYC),Manhattan,Financial District,40.70840,-74.00518,Entire home/apt,699,29,0,,,327,327


In [102]:
# When did Danielle from Queens last receive a review?
danielles = air_bnb[air_bnb['host_name'] == 'Danielle']
danielles = danielles[danielles['neighbourhood_group'] == 'Queens']
danielles = danielles.sort_values('last_review', ascending=False).head(1)
# When did the most recent review for a Queens property owned by a Danielle get a review and which Danielle owns it?
print(f"Danielle with the host_id {danielles.loc[22469, 'host_id']} last received a review on {danielles.loc[22469, 'last_review']}.")

Danielle with the host_id 26432133 last received a review on 2019-07-08.


## Further Questions

1. Which host has the most listings?

In [107]:
air_bnb.groupby(['host_id', 'host_name'])[['name']].count().sort_values('name', ascending=False).head(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,name
host_id,host_name,Unnamed: 2_level_1
219517861,Sonder (NYC),327
107434423,Blueground,232


2. How many listings have completely open availability?

In [110]:
air_bnb[air_bnb['availability_365'] == 365].shape[0]
# 1,295 listings have completely open availability

1295

3. What room_types have the highest review numbers?

In [124]:
# groupby room_type
# 1. sum of number of reviews
# 2. mean reviews_per_month
# 3. merge
a = air_bnb.groupby(['room_type'])[['number_of_reviews']].sum()
b = air_bnb.groupby(['room_type'])[['reviews_per_month']].mean()
c = air_bnb.groupby(['room_type'])[['id']].count()
m1 = a.merge(b, on='room_type', how='inner')
m2 = c.merge(m1, on='room_type', how='inner')
m2 = m2.rename({'id':'Total Listings', 'number_of_reviews': 'Total Reviews', 'reviews_per_month': 'Mean Reviews per Month'}, axis='columns')

In [126]:
m2

Unnamed: 0_level_0,Total Listings,Total Reviews,Mean Reviews per Month
room_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Entire home/apt,25409,580403,1.306578
Private room,22326,538346,1.445209
Shared room,1160,19256,1.471726


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please describe them here.

-- Add your conclusion --