# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidence to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?


In [72]:
import pandas as pd
import numpy as np
bnb_data = pd.read_csv(r'AB_NYC_2019.csv')

print(bnb_data.shape)
print(bnb_data.columns)

# make DF of unique IDs with summary stats

uh_total_reviews = (bnb_data.groupby('host_id')['number_of_reviews'].sum()).to_frame(name='total_reviews')
uh_total_rpm = (bnb_data.groupby('host_id')['reviews_per_month'].sum()).to_frame(name='tot_reviews_per_month')
uh_avg_avail = (bnb_data.groupby('host_id')['availability_365'].mean()).to_frame(name='avg_availability')
uh_avg_price = (bnb_data.groupby('host_id')['price'].mean()).to_frame(name='avg_price')

uh_sum_stats = pd.concat([uh_total_reviews, uh_total_rpm, uh_avg_avail, uh_avg_price], axis = 1).reset_index()

# creating a simplified version of main DF that omits columns not needed for host profile DF.
bnb_simplified = bnb_data.drop(['name', 'id', 'neighbourhood', 'latitude', 'longitude', 'price', 'minimum_nights', 'reviews_per_month', 'availability_365'], axis=1)

# drop duplicate host IDs, but first sorting by last review date, so only listing with the most recent review is kept for each unique host.
unique_hosts = uh_sum_stats.merge(bnb_simplified.sort_values('last_review', ascending=False).drop_duplicates('host_id'), on='host_id', how='inner')

def set_host_status(row):
    if row['total_reviews'] == 0:
        status = 'no history'
    elif row['last_review'] > '2019':
        status = 'active'
    else:
        status = 'inactive'
    return status

unique_hosts['status'] = unique_hosts.apply(set_host_status, axis = 1)

''' Old method for setting host status, keeping it to remember that process'''
# # hosts with no reviews (for this analysis, no reviews will be interpreted as no bookings)
# no_review_hosts = unique_hosts[unique_hosts['total_reviews'] == 0]
# no_review_hosts['status'] = 'inactive'

# # hosts who have not booked in the last 12 months
# inactive_hosts = unique_hosts[(unique_hosts['last_review'] < '2019') & (unique_hosts['last_review']> '2000')]

# # hosts active in 2019
# active_hosts = unique_hosts[unique_hosts['last_review'] > '2019']

# display(inactive_hosts)
'''end old method'''
    

# Unique Hosts DF now contains summary stats, plus host name, neighborhood group and room type of most recently reviewed property 
display(unique_hosts.head())

# display(no_review_hosts.groupby('neighbourhood_group').size())

display(unique_hosts.groupby(['status', 'neighbourhood_group']).size())



(48895, 16)
Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365'],
      dtype='object')


Unnamed: 0,host_id,total_reviews,tot_reviews_per_month,avg_availability,avg_price,host_name,neighbourhood_group,room_type,number_of_reviews,last_review,calculated_host_listings_count,status
0,2438,1,0.06,0.0,95.0,Tasos,Brooklyn,Entire home/apt,1,2018-03-17,1,inactive
1,2571,27,0.37,23.0,182.0,Teedo,Brooklyn,Entire home/apt,27,2019-05-21,1,active
2,2787,105,2.88,235.333333,100.666667,John,Brooklyn,Private room,17,2019-06-26,6,active
3,2845,46,0.46,360.0,162.0,Jennifer,Manhattan,Entire home/apt,45,2019-05-21,2,active
4,2868,2,0.06,221.0,60.0,Letha M.,Brooklyn,Entire home/apt,2,2017-07-31,1,inactive


status      neighbourhood_group
active      Bronx                   491
            Brooklyn               7989
            Manhattan              7947
            Queens                 2329
            Staten Island           188
inactive    Bronx                   137
            Brooklyn               5062
            Manhattan              5220
            Queens                  860
            Staten Island            28
no history  Bronx                   156
            Brooklyn               2875
            Manhattan              3376
            Queens                  760
            Staten Island            39
dtype: int64

In [74]:
### Don't run the line below again, or the status column will be added again! 🙈
bnb_data = pd.merge(bnb_data, unique_hosts[['host_id', 'status']], how='left', on='host_id')
bnb_data.rename(columns= {'status':'host_status'}, inplace=True)
display(bnb_data)



Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,host_status
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365,active
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355,active
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.94190,Private room,150,3,0,,,1,365,no history
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194,active
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.10,1,0,inactive
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48890,36484665,Charming one bedroom - newly renovated rowhouse,8232441,Sabrina,Brooklyn,Bedford-Stuyvesant,40.67853,-73.94995,Private room,70,2,0,,,2,9,active
48891,36485057,Affordable room in Bushwick/East Williamsburg,6570630,Marisol,Brooklyn,Bushwick,40.70184,-73.93317,Private room,40,4,0,,,2,36,no history
48892,36485431,Sunny Studio at Historical Neighborhood,23492952,Ilgar & Aysel,Manhattan,Harlem,40.81475,-73.94867,Entire home/apt,115,10,0,,,1,27,no history
48893,36485609,43rd St. Time Square-cozy single bed,30985759,Taz,Manhattan,Hell's Kitchen,40.75751,-73.99112,Shared room,55,1,0,,,6,2,active


   host_id    host_name host_status
0     2787         John      active
1     2845     Jennifer      active
2     4632    Elisabeth  no history
3     4869  LisaRoxanne      active
4     7192        Laura    inactive


In [75]:
host_names_ids = bnb_data[['host_id', 'host_name', 'host_status']].drop_duplicates()
print(host_names_ids.head())

   host_id    host_name host_status
0     2787         John      active
1     2845     Jennifer      active
2     4632    Elisabeth  no history
3     4869  LisaRoxanne      active
4     7192        Laura    inactive


In [125]:
# Which hosts are the busiest and why?

# also factor in reviews per month

# Finding hosts with multiple properties
multi_hosts = bnb_data.groupby(['host_id'])['id'].count().sort_values(ascending=False)

# Reframing results as DataFrame
multi_hosts = multi_hosts.to_frame(name='property_count')

# Filtering results to hosts with more than 5 properties
mh_mask = multi_hosts['property_count'] >= 5
multi_hosts = multi_hosts[mh_mask]

## display(multi_hosts)

#finding average availability by host id
avg_avail_by_host = bnb_data.groupby(['host_id'])['availability_365'].mean().sort_values()

# Reshaping Series to DF
avg_avail_by_host = avg_avail_by_host.to_frame(name='mean_availability')

## display(avg_avail_by_host)

# calculating total reviews per month by host
host_monthly_reviews = bnb_data.groupby(['host_id'])['reviews_per_month'].sum(numeric_only=True).sort_values()
host_monthly_reviews = host_monthly_reviews.to_frame(name='total_rev_per_mo')
## display(host_monthly_reviews)

# joining the two full-length host lists

host_stats = host_monthly_reviews.join(avg_avail_by_host, on='host_id', how='inner')
## display(host_stats)


#-------
# Joining to compare sets

busy_hosts = multi_hosts.join(host_stats, on='host_id', how='inner')

busy_hosts_named = pd.merge(busy_hosts, unique_hosts[['host_id', 'host_name', 'status']], how='left', on='host_id')

busy_hosts_named['rev_per_property'] = busy_hosts_named['total_rev_per_mo'] / busy_hosts_named['property_count']

# done computing, time to display

display(busy_hosts_named.sort_values('total_rev_per_mo', ascending=False).head(10))

display(busy_hosts_named.sort_values('rev_per_property', ascending=False).head(10))

# Depreciating the below calculation after researching availability_365: https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data/discussion/111835
# display(busy_hosts_named[busy_hosts_named['mean_availability'] > 0].sort_values('mean_availability').head(10))

Unnamed: 0,host_id,property_count,total_rev_per_mo,mean_availability,host_name,status,rev_per_property
0,219517861,327,397.56,301.492355,Sonder (NYC),active,1.21578
139,244361589,9,111.72,292.555556,Row NYC,active,12.413333
189,232251881,8,80.63,171.125,Lakshmee,active,10.07875
404,26432133,5,68.02,288.6,Danielle,active,13.604
77,137274917,12,62.89,235.583333,David,active,5.240833
26,224414117,30,59.1,289.3,Gabriel,active,1.97
303,156948703,6,56.44,339.666667,Asad,active,9.406667
67,344035,13,56.0,286.384615,Brooklyn& Breakfast -Len-,active,4.307692
357,37312959,5,53.53,164.8,Maya,active,10.706
18,119669058,34,45.17,326.558824,Melissa,active,1.328529


Unnamed: 0,host_id,property_count,total_rev_per_mo,mean_availability,host_name,status,rev_per_property
404,26432133,5,68.02,288.6,Danielle,active,13.604
139,244361589,9,111.72,292.555556,Row NYC,active,12.413333
357,37312959,5,53.53,164.8,Maya,active,10.706
189,232251881,8,80.63,171.125,Lakshmee,active,10.07875
303,156948703,6,56.44,339.666667,Asad,active,9.406667
355,187822288,5,43.11,186.4,Zahir,active,8.622
438,264092618,5,40.36,83.0,Ajmol,active,8.072
462,155691570,5,39.9,82.6,Mili,active,7.98
371,58391491,5,37.18,124.6,Juel,active,7.436
473,22959695,5,30.19,0.0,Gurpreet Singh,inactive,6.038


In [94]:
print(busy_hosts_named.columns)

busy_hosts_named[busy_hosts_named['mean_availability'] > 0].sort_values('mean_availability')


Index(['host_id', 'property_count', 'total_rev_per_mo', 'mean_availability',
       'host_name', 'status', 'rev_per_property'],
      dtype='object')


Unnamed: 0,host_id,property_count,total_rev_per_mo,mean_availability,host_name,status,rev_per_property
369,232578558,5,6.72,4.200000,Nick & D.,active,1.344000
491,43044876,5,2.12,7.200000,Haruhisa,active,0.424000
55,204852306,14,0.62,10.285714,Dee,inactive,0.044286
321,61042,6,6.60,11.333333,Marlon,active,1.100000
17,19303369,37,5.42,12.324324,Hiroki,active,0.146486
...,...,...,...,...,...,...,...
336,26556695,6,3.09,364.833333,Alyssa,active,0.515000
430,259776956,5,11.17,365.000000,Gladwyn,active,2.234000
282,9130040,6,3.74,365.000000,Candace,active,0.623333
76,96098402,12,2.29,365.000000,Wynpoints,inactive,0.190833


In [109]:
# How many neighborhood groups are available and which shows up the most?
print('number of neighborhood groups:')
display(bnb_data.groupby('neighbourhood_group').size().count())

print('total listings per neighborhood group:')
display(bnb_data.groupby('neighbourhood_group').size().sort_values(ascending=False))

print('total rooms available:')
display(bnb_data.groupby('neighbourhood_group')['availability_365'].sum().sort_values(ascending=False))


print('listings per neighborhood group, active hosts only:')
display(bnb_data[bnb_data['host_status']=='active'].groupby('neighbourhood_group').size().sort_values(ascending=False))



number of neighborhood groups:


5

total listings per neighborhood group:


neighbourhood_group
Manhattan        21661
Brooklyn         20104
Queens            5666
Bronx             1091
Staten Island      373
dtype: int64

total rooms available:


neighbourhood_group
Manhattan        2425586
Brooklyn         2015070
Queens            818464
Bronx             180843
Staten Island      74480
Name: availability_365, dtype: int64

listings per neighborhood group, active hosts only:


neighbourhood_group
Manhattan        12399
Brooklyn         11596
Queens            3869
Bronx              766
Staten Island      298
dtype: int64

In [116]:
groups_active_listings = bnb_data[bnb_data['host_status']=='active'].groupby('neighbourhood_group').size().sort_values(ascending=False).to_frame(name='active_listings')
groups_total_stay_nights = bnb_data.groupby('neighbourhood_group')['availability_365'].sum().sort_values(ascending=False).to_frame(name = 'total_stay_nights')

ng_rooms = groups_active_listings.merge(groups_total_stay_nights, on ='neighbourhood_group', how='outer')
display(ng_rooms)

ng_rooms['stay_nights_per_listing'] = round(ng_rooms['total_stay_nights']/ng_rooms['active_listings'], 2)

display(ng_rooms.sort_values('stay_nights_per_listing', ascending=False))

Unnamed: 0_level_0,active_listings,total_stay_nights
neighbourhood_group,Unnamed: 1_level_1,Unnamed: 2_level_1
Manhattan,12399,2425586
Brooklyn,11596,2015070
Queens,3869,818464
Bronx,766,180843
Staten Island,298,74480


Unnamed: 0_level_0,active_listings,total_stay_nights,stay_nights_per_listing
neighbourhood_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Staten Island,298,74480,249.93
Bronx,766,180843,236.09
Queens,3869,818464,211.54
Manhattan,12399,2425586,195.63
Brooklyn,11596,2015070,173.77


In [123]:
# Are private rooms the most popular in manhattan? (in Manhattan, are private rooms most popular?)
'''
mh_list = bnb_data[bnb_data['neighbourhood_group'] == 'Manhattan']

mh_popular_type = mh_list.groupby('room_type').size().sort_values(ascending=False)

display(mh_popular_type)
'''

bnb_data[bnb_data['host_status']=='active'].groupby(['neighbourhood_group', 'room_type']).size().sort_values(ascending=False)


neighbourhood_group  room_type      
Manhattan            Entire home/apt    7645
Brooklyn             Entire home/apt    5732
                     Private room       5607
Manhattan            Private room       4459
Queens               Private room       2317
                     Entire home/apt    1419
Bronx                Private room        453
Manhattan            Shared room         295
Bronx                Entire home/apt     279
Brooklyn             Shared room         257
Staten Island        Private room        147
                     Entire home/apt     143
Queens               Shared room         133
Bronx                Shared room          34
Staten Island        Shared room           8
dtype: int64

In [17]:
# Which hosts are the busiest based on their reviews?

display(bnb_data.groupby('host_id').size().sort_values(ascending=False).head())

# reviews per month, average reviews per month



host_id
219517861    327
107434423    232
30283594     121
137358866    103
16098958      96
dtype: int64

host_name
Michael         417
David           403
Sonder (NYC)    327
John            294
Alex            279
dtype: int64

In [129]:
# Which neighorhood group has the highest average price?

# display(bnb_data.groupby('neighbourhood_group').mean()['price'])

display(bnb_data[bnb_data['host_status']== 'active'].groupby('neighbourhood_group').mean(numeric_only = True).round(2)['price'].sort_values(ascending=False))

neighbourhood_group
Manhattan        190.45
Brooklyn         123.97
Queens            93.27
Staten Island     90.99
Bronx             81.24
Name: price, dtype: float64

In [140]:
# Which neighbor hood group has the highest total price?
# highest value calculation
    
display(bnb_data.groupby('neighbourhood_group').sum().sort_values('price', ascending=False)['price'])
display(bnb_data[bnb_data['host_status']=='active'].groupby('neighbourhood_group').sum().sort_values('price', ascending=False)['price'])


# highest individual listing calculation

bnb_data['unit_value'] = bnb_data['price'] * bnb_data['availability_365']

bnb_data.sort_values('unit_value', ascending=False).head()

neighbourhood_group
Manhattan        4264527
Brooklyn         2500600
Queens            563867
Bronx              95459
Staten Island      42825
Name: price, dtype: int64

neighbourhood_group
Manhattan        2361448
Brooklyn         1437614
Queens            360846
Bronx              62232
Staten Island      27115
Name: price, dtype: int64

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,host_status,unit_value
40433,31340283,2br - The Heart of NYC: Manhattans Lower East ...,4382127,Matt,Manhattan,Lower East Side,40.7198,-73.98566,Entire home/apt,9999,30,0,,,1,365,no history,3649635
4377,2953058,Film Location,1177497,Jessica,Brooklyn,Clinton Hill,40.69137,-73.96723,Entire home/apt,8000,1,1,2016-09-15,0.03,11,365,active,2920000
42523,33007610,70' Luxury MotorYacht on the Hudson,7407743,Jack,Manhattan,Battery Park City,40.71162,-74.01693,Entire home/apt,7500,1,0,,,1,364,no history,2730000
44034,33998396,3000 sq ft daylight photo studio,3750764,Kevin,Manhattan,Chelsea,40.7506,-74.00388,Entire home/apt,6800,1,0,,,6,364,no history,2475200
48043,36056808,Luxury TriBeCa Apartment at an amazing price,271248669,Jenny,Manhattan,Tribeca,40.71206,-74.00999,Entire home/apt,6500,180,0,,,1,365,no history,2372500


In [143]:
ng_values = bnb_data[bnb_data['host_status']=='active'].groupby('neighbourhood_group')['unit_value'].sum().sort_values(ascending=False).to_frame(name='ng_total_value')

ng_values['val_percent'] = ng_values['ng_total_value'] / ng_values['ng_total_value'].sum()

display(ng_values)

Unnamed: 0_level_0,ng_total_value,val_percent
neighbourhood_group,Unnamed: 1_level_1,Unnamed: 2_level_1
Manhattan,411788585,0.580702
Brooklyn,214649569,0.302698
Queens,64857882,0.091462
Bronx,11908010,0.016793
Staten Island,5917410,0.008345


In [133]:
bnb_data[bnb_data['host_status']=='active'].sort_values('unit_value', ascending=False).head(10)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,host_status,unit_value
4377,2953058,Film Location,1177497,Jessica,Brooklyn,Clinton Hill,40.69137,-73.96723,Entire home/apt,8000,1,1,2016-09-15,0.03,11,365,active,2920000
43009,33397385,Midtown Manhattan great location (Gramacy park),16105313,Debra,Manhattan,Midtown,40.74482,-73.98367,Entire home/apt,5100,30,1,2019-06-22,1.0,2,343,active,1749300
4376,2952861,Photography Location,1177497,Jessica,Brooklyn,Clinton Hill,40.69127,-73.96563,Entire home/apt,4500,1,5,2018-12-29,0.09,11,365,active,1642500
45666,34895693,Gem of east Flatbush,262534951,Sandra,Brooklyn,East Flatbush,40.65724,-73.9245,Private room,7500,1,8,2019-07-07,6.15,2,179,active,1342500
29662,22779726,East 72nd Townhouse by (Hidden by Airbnb),156158778,Sally,Manhattan,Upper East Side,40.76824,-73.95989,Entire home/apt,7703,1,0,,,12,146,active,1124638
28858,22263855,SPECTACULAR SOHO GREAT ROOM LOFT 6000sq feet,6145729,Stephanie,Manhattan,SoHo,40.72605,-74.00572,Entire home/apt,3000,7,1,2019-06-30,1.0,1,325,active,975000
42680,33133321,Majestic Mansion LifeStyle :),74373729,Shah,Queens,Bayside,40.77811,-73.77069,Entire home/apt,2600,6,3,2019-05-30,1.73,1,362,active,941200
21955,17666300,Ultimate 50th Floor Downtown Penthouse - 4000...,120250860,Diana,Manhattan,SoHo,40.72318,-74.00223,Entire home/apt,2250,2,21,2019-07-01,0.74,2,343,active,771750
22407,18094418,4000 SqFt Luxury Penthouse - Downtown NYC,120250860,Diana,Manhattan,SoHo,40.72207,-74.00232,Entire home/apt,2250,2,20,2019-06-15,0.82,2,341,active,767250
29663,22779746,East 7th Street III by (Hidden by Airbnb),156158778,Sally,Manhattan,East Village,40.72779,-73.98644,Entire home/apt,3518,1,0,,,12,215,active,756370


In [166]:
# Which top 5 hosts have the highest total price?

top5_hosts = bnb_data.groupby('host_id').sum(numeric_only=True).sort_values('price', ascending=False)['price'].to_frame().head()

display(top5_hosts)
# add an inner join to add host name

named_top5 = top5_hosts.merge(unique_hosts, on='host_id', how='inner')

named_top5[['host_name', 'host_id', 'price']]


top_host = top5_hosts.index[0]

top_host

Unnamed: 0_level_0,price
host_id,Unnamed: 1_level_1
219517861,82795
107434423,70331
156158778,37097
205031545,35294
30283594,33581


219517861

In [163]:
# Who currently has no (zero) availability with a review count of 100 or more?

no_avail = bnb_data[bnb_data['availability_365'] == 0]

over_100_review = no_avail[no_avail['number_of_reviews'] >= 100]

print(over_100_review.shape)
display(over_100_review.groupby('host_status').size())

display(over_100_review[['last_review', 'number_of_reviews', 'reviews_per_month']].sort_values('number_of_reviews', ascending=False).head(20))

display(over_100_review[['last_review', 'number_of_reviews', 'reviews_per_month']].sort_values('last_review').head(20))
display(over_100_review[['last_review', 'number_of_reviews' ,'reviews_per_month']].sort_values('last_review').tail(20))
# size 

(162, 18)


host_status
active      112
inactive     50
dtype: int64

Unnamed: 0,last_review,number_of_reviews,reviews_per_month
471,2019-07-07,480,6.7
9974,2018-11-25,424,8.86
9976,2018-11-24,408,8.56
22104,2019-06-21,368,13.24
5876,2019-06-24,351,6.09
22100,2019-06-12,325,11.72
890,2019-07-07,320,3.6
6888,2019-06-24,318,5.78
1242,2019-06-19,304,3.7
13539,2019-06-16,255,5.95


Unnamed: 0,last_review,number_of_reviews,reviews_per_month
802,2015-10-31,101,1.1
969,2016-01-03,116,1.32
813,2016-08-11,157,1.71
1603,2017-01-01,164,2.07
2742,2017-05-04,130,1.85
415,2017-05-25,115,1.18
13320,2017-07-10,112,2.63
8,2017-07-21,118,0.99
5411,2017-07-28,101,1.71
12375,2017-07-31,100,2.28


Unnamed: 0,last_review,number_of_reviews,reviews_per_month
180,2019-06-30,206,1.92
1933,2019-07-01,145,1.82
1089,2019-07-01,178,2.06
4942,2019-07-01,160,2.78
17986,2019-07-02,147,4.1
27007,2019-07-02,108,5.18
23562,2019-07-02,144,5.65
13769,2019-07-02,120,2.81
6404,2019-07-03,148,2.88
10856,2019-07-03,141,3.09


In [177]:
# What host has the highest total of prices and where are they located?
# top_host name taken from indexing top 5


df_of_top_host = bnb_data[bnb_data['host_id'] == top_host][['neighbourhood', 'neighbourhood_group']]

display(top5_hosts.merge(unique_hosts, on='host_id', how='inner'))

display(df_of_top_host.groupby(['neighbourhood_group','neighbourhood']).size())



Unnamed: 0,host_id,price,total_reviews,tot_reviews_per_month,avg_availability,avg_price,host_name,neighbourhood_group,room_type,number_of_reviews,last_review,calculated_host_listings_count,status
0,219517861,82795,1281,397.56,301.492355,253.195719,Sonder (NYC),Manhattan,Entire home/apt,7,2019-06-26,327,active
1,107434423,70331,29,6.04,253.810345,303.150862,Blueground,Manhattan,Entire home/apt,2,2019-05-16,232,active
2,156158778,37097,1,1.0,64.666667,3091.416667,Sally,Manhattan,Entire home/apt,1,2019-06-15,12,active
3,205031545,35294,127,21.21,220.326531,720.285714,Red Awning,Manhattan,Entire home/apt,8,2019-06-16,49,active
4,30283594,33581,65,3.94,313.421488,277.528926,Kara,Manhattan,Entire home/apt,1,2019-06-22,121,active


neighbourhood_group  neighbourhood     
Manhattan            Chelsea                 7
                     Financial District    218
                     Hell's Kitchen         15
                     Midtown                 4
                     Murray Hill            50
                     Theater District       27
                     Upper East Side         6
dtype: int64

In [178]:
# When did Danielle from Queens last receive a review?

name_mask = bnb_data['host_name'] == "Danielle"

ng_mask = bnb_data['neighbourhood_group'] == 'Queens'

danni_df = bnb_data[name_mask & ng_mask]

danni_df_by_date = danni_df.sort_values('last_review', ascending=False)

display(danni_df_by_date['last_review'].dropna().max())

bnb_data['last_review'].sort_values(ascending=False)

'2019-07-08'

48852    2019-07-08
42665    2019-07-08
44459    2019-07-08
44446    2019-07-08
44382    2019-07-08
            ...    
48890           NaN
48891           NaN
48892           NaN
48893           NaN
48894           NaN
Name: last_review, Length: 48895, dtype: object

## Further Questions

1. Which host has the most listings?

In [90]:
listing_qty = bnb_data.groupby('host_id').size().sort_values(ascending=False)

listing_qty.index[0]

219517861

2. How many listings have completely open availability?

In [62]:
open_avail_mask = bnb_data['availability_365'] == 365

len(bnb_data[open_avail_mask])

1295

3. What room_types have the highest review numbers?

In [73]:
display(bnb_data.groupby('room_type').sum(numeric_only=True).sort_values('number_of_reviews', ascending=False))


Unnamed: 0_level_0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
room_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Entire home/apt,468495600287,1569156532318,1034874.0,-1879267.0,5381480,216152,580403,26565.34,271834,2843783
Private room,434663231749,1618079882718,909320.3,-1650850.0,2004450,120067,538346,25529.62,72062,2482739
Shared room,26684386497,119044005530,47247.4,-85774.28,81348,7511,19256,1245.08,5409,187921


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please describe them here.

-- Add your conclusion --

# AirBnB NY Locations Data Case Study

The biggest additional layer I added to the data was Host Status. 

(Well, it resides in a bigger structure of the dataframe unique_hosts, which also holds summary statistics such as total reviews per month and average price)

 When sorting through the columns of bnb_data to see what would be dropped or modified when added to unique_hosts, there was time to evaluate the "last_review" column. 
 
 Last review was used to categorize hosts into "Active" (review in 2019 = True), "Inactive" (review prior to 2019 = True), and "No Review". 
 
 [If I hadn't already hard-coded answers below based on this, I'd go back and change the 'Active' status to represent hosts that have received reviews in the past 12 months, rather than just the last 6, but I didn't fully realize that the review data cut off in july of 2019 until after answering question #10]


Where relevant, I have included additional calculations pulled from only listings belonging to active hosts.


1. Which hosts are the busiest and why?

"Busyness" refers to a state of mind, which can't be resolved just by looking at numbers. However, we can look at numbers that may impact a host's feeling of busy-ness:
- Total quantity of reviews
- average number of reviews per month (frequent turnover, also to be viewed in context of host's total properties)

ergo:
+ Sonder(NYC) has both a large number of properties and a lot of reviews per month, but also high average availability, potentially pointing to low occupancy.

+ Danielle (host ID: 26432133) has a handful of listings, but an average of 13 reviews per month *per listing*. This points to each unit being filled at least 13 nights per month, despite having a calculated mean availability over 280. 
+ + Calculations up to this point have been done on the dataset of active hosts with at least 5 properties. If the threshold is lowered to 3 properties per host, Nalicia (host id:156684502) has a higher rate of reviews per listing per month (18.13), and it may be a matter of personal opinion whether Danielle or Nalicia is feels more busy.

2. How many neighborhood groups are available and which shows up the most?

- Manhattan has the most listings overall (21.7k) and also the most active listings (12.4k).

+ Despite the high number of rooms availalbe, Manhattan has the second highest rate of blackout nights (an average of 169 blackout nights per year), 
+ + Staten Island listings only have an average of 116 blackout nights per year.
+ + (Brooklyn listings have the highest average blackout dates per year at 191)

3. Are private rooms the most popular in Manhattan? 

- Among active listings, the most popular type of housing in Manhattan is 'Entire home/apt' 

+ + Brooklyn's top listing type is alos 'Entire home/apt, 
+ + Queens, the Bronx, and Staten Island, private rooms are most popular.

4. Which hosts are the busiest and based on their reviews?
(revisted #1 and tweaked its code to get these numbers)

- Louann (host id: 228415932) only has one property listed on AirBnB, but has a whopping 20.9 reviews per month. 
+ + Nalicia (host id: 156684502) is not too far behind with 18.1 reviews per month per property.

5. Which neighorhood group has the highest average price?

- Manhattan, by a country mile.
- - Manhattan: 196.88
- - Brooklyn: 124.38

+ Evaluating only active hosts:
+ + Manhattan: 190.45 
+ + Brooklyn down to 123.97

6.  Which neighbor hood group has the highest total price?

...Manhattan? (based on knowing from previous questions that it has the highest average price (#5) and that it shows up the most (#2))

- Doing new math: if all listed Manhattan homes (regardless of availability) were filled on the same night, the collective guests would spend $4.26 million for the privilege.

+ + Limiting this calculation to active hosts would drop the aggregate price to a mere $2.36 million for one night of accomodation.

- Multiplying the price of each listing (held by active hosts) by the number of nights that listing is available reveals the listing's theoretical annual value.
- -  All together, Manhattan's listings by active hosts have a theoretical annual value of $411 million, which accounts for 58% of the earning potential of AirBnB accomodations in New York.

7. Which top 5 hosts have the highest total price?

The 5 hosts with the top per-night earning potential are:
- Sonder (NYC)  (id: 219517861)    
- -  $82,795
- Blueground    (id: 107434423)    
- - $70,331
- Sally         (id: 156158778)     
- - $37,097
- Red Awning    (id: 205031545)    
- - $35,294
- Kara          (id: 30283594)      
- - $33,581

8. Who currently has no (zero) availability with a review count of 100 or more?

- There are currently 162 listings with over 100 reviews that have no availability at this time.

+ 112 of them are overseen by hosts that have received reviews (for any property) in 2019 
+ 50 listings with 0 availability are overseen by hosts that have not received any reviews in 2019.
+ + Eighteen of these formerly-popular listings have not been stayed in since 2017 (at the latest)
+ + 19 more were stayed in as recently as July 2019 
+ + + (including the most popular listing in this group, which has 480 reviews.)

9. What host has the highest total of prices and where are they located?

- Sonder(NYC) (id: 219517861) has 327 listings with a potential nightly value of $82,795. 
- - All of their listings are in Manhattan, with the majority (218) in the Financial District.

10. When did Danielle from Queens last receive a review?

- Danielle from Queens last received a review on July 08, 2019...
+ + ... which seems to be the last date any data was collected for this set.