# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [None]:
# print
# Which hosts are the busiest and why?
# Q: How many neighborhood groups are available and which shows up the most?
# A: From the time this data was taken, likely before NYCs pro-resident/anti short term rental legal changes, 
# the data indicates that there were five neighborhood groups, the most active being Manhattan with 21,661 locations.
# Are private rooms the most popular in manhattan?
# Which hosts are the busiest and based on their reviews?
# Which neighorhood group has the highest average price?
# Which neighborhood group has the highest total price?
# Which top 5 hosts have the highest total price?
# Who currently has no (zero) availability with a review count of 100 or more?
# What host has the highest total of prices and where are they located?
# When did Danielle from Queens last receive a review?

In [71]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from itertools import groupby

In [82]:
air_bnb = pd.read_csv('csv_files/AB_NYC_2019.csv', sep=',')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


Unnamed: 0,host_id,host_name,listings_count,sum_reviews,ttl_availability,avg_availability,rev_per_listing
23486,47621202,Dona,2,1205,506,253.0,602.5
7362,4734398,Jj,3,1798,974,324.666667,599.333333
13109,12949460,Asa,1,488,269,269.0,488.0
4281,2321321,Lloyd,1,454,353,353.0,454.0
18626,26432133,Danielle,5,2017,1443,288.6,403.4


In [205]:
# How many neighborhood groups are available and which shows up the most?

air_bnb_neighbourhood_group = air_bnb.groupby(['neighbourhood_group'], as_index = False).count()[['neighbourhood_group','id']]
air_bnb_neighbourhood_group.sort_values('id', kind = 'mergesort', ascending = False, inplace= True)
air_bnb_group_total = air_bnb_neighbourhood_group.reset_index(drop=True)
air_bnb_group_total

Unnamed: 0,neighbourhood_group,id
0,Manhattan,21661
1,Brooklyn,20104
2,Queens,5666
3,Bronx,1091
4,Staten Island,373


In [535]:
# Are private rooms the most popular in manhattan?

room_types = air_bnb.groupby(['neighbourhood_group','room_type',], as_index = False).count()[['neighbourhood_group','room_type','name']]
filtered_room_types = room_types[(room_types.neighbourhood_group == 'Manhattan')]

filtered_room_types
manhattan_entireHome_ct= filtered_room_types.iloc[0,2]
manhattan_privateRm_ct = filtered_room_types.iloc[1,2]
manhattan_sharedRm_ct = filtered_room_types.iloc[2,2]
manhattan_ttl_rms=filtered_room_types.sum(axis = 0, skipna =True, numeric_only = True)
manhattan_ttl_rms[0]
resulting_rms = (manhattan_entireHome_ct/manhattan_ttl_rms[0])*100
print(f'{round(resulting_rms)}%')
room_types

61%


Unnamed: 0,neighbourhood_group,room_type,name
0,Bronx,Entire home/apt,379
1,Bronx,Private room,652
2,Bronx,Shared room,59
3,Brooklyn,Entire home/apt,9558
4,Brooklyn,Private room,10127
5,Brooklyn,Shared room,413
6,Manhattan,Entire home/apt,13193
7,Manhattan,Private room,7979
8,Manhattan,Shared room,480
9,Queens,Entire home/apt,2096


In [325]:
# Which hosts are the busiest and based on their reviews?

air_bnb_hosts = air_bnb.groupby(['host_id','host_name'], as_index = False).agg(
    listings_count = ('name', np.count_nonzero),
    sum_reviews = ('number_of_reviews', np.sum),
    ttl_availability = ('availability_365',np.sum))

air_bnb_hosts['avg_availability'] = (air_bnb_hosts['ttl_availability']/air_bnb_hosts['listings_count'])
air_bnb_hosts['rev_per_listing'] = (air_bnb_hosts['sum_reviews']/air_bnb_hosts['listings_count'])


busiest_hosts = air_bnb_hosts[(air_bnb_hosts.avg_availability >= 250)]
busiest_hosts.sort_values('rev_per_listing', ascending = False).head()

#Sounder NYC has the more listings at 327 but Dona has 602.5 reviews between her TWO listings

Unnamed: 0,host_id,host_name,listings_count,sum_reviews,ttl_availability,avg_availability,rev_per_listing
23486,47621202,Dona,2,1205,506,253.0,602.5
7362,4734398,Jj,3,1798,974,324.666667,599.333333
13109,12949460,Asa,1,488,269,269.0,488.0
4281,2321321,Lloyd,1,454,353,353.0,454.0
18626,26432133,Danielle,5,2017,1443,288.6,403.4


In [334]:
#Which neighorhood group has the highest average price?
air_bnb['price'] = air_bnb['price'].astype(int)

air_bnb_group_prices = air_bnb.groupby(['neighbourhood_group'], as_index = False).mean(numeric_only = True).round(decimals = 2)[['neighbourhood_group','price']]
air_bnb_group_prices.sort_values('price', ascending = False, inplace = True)
air_bnb_group_prices.reset_index(drop=True)

Unnamed: 0,neighbourhood_group,price
0,Manhattan,196.88
1,Brooklyn,124.38
2,Staten Island,114.81
3,Queens,99.52
4,Bronx,87.5


In [7]:
# Which neighbor hood group has the highest total price?

air_bnb_group_total = air_bnb.groupby(['neighbourhood_group'], as_index = False).sum(numeric_only = True)[['neighbourhood_group','price']]
air_bnb_group_total.sort_values('price', ascending = False, inplace = True)
air_bnb_group_total.reset_index(drop=True)


Unnamed: 0,neighbourhood_group,price
0,Manhattan,4264527
1,Brooklyn,2500600
2,Queens,563867
3,Bronx,95459
4,Staten Island,42825


In [385]:
#Which top 5 hosts have the highest total price?
air_bnb_host_total = air_bnb.groupby(['host_id','host_name'], as_index = False).agg(
    count_listings = ('id', np.count_nonzero),
    sum_price = ('price', np.sum))
air_bnb_host_total.sort_values('sum_price', ascending = False, inplace = True)
air_bnb_host_total.reset_index(drop=True)
high_hosts_result = air_bnb_host_total.head()
high_hosts_result

Unnamed: 0,host_id,host_name,count_listings,sum_price
34629,219517861,Sonder (NYC),327,82795
29393,107434423,Blueground,232,70331
32054,156158778,Sally,12,37097
34034,205031545,Red Awning,49,35294
19564,30283594,Kara,121,33581


In [348]:
# Who currently has no (zero) availability with a review count of 100 or more?

host_data = air_bnb.groupby(['host_id','host_name','neighbourhood_group'], as_index=False).sum(numeric_only= False)[['host_id','host_name','neighbourhood_group','price','number_of_reviews','availability_365']]

popular_hosts = host_data[(host_data.number_of_reviews >= 100) & (host_data.availability_365 == 0)]

popular_hosts.reset_index(drop=True)

Unnamed: 0,host_id,host_name,neighbourhood_group,price,number_of_reviews,availability_365
0,7490,MaryEllen,Manhattan,79,118,0
1,36897,Lydia,Manhattan,90,107,0
2,79402,Christiana,Brooklyn,100,168,0
3,129352,Sol,Brooklyn,50,193,0
4,193722,Coral,Manhattan,196,114,0
...,...,...,...,...,...,...
130,143944704,Ash,Manhattan,239,104,0
131,155125855,Vicente,Manhattan,394,125,0
132,176185168,Janet,Queens,65,119,0
133,187487947,Diego,Brooklyn,459,164,0


In [381]:
# What host has the highest total of prices and where are they located?

host_data = host_data.sort_values('price',ascending = False).head()
host_data

Unnamed: 0,host_id,host_name,neighbourhood_group,price,number_of_reviews,availability_365
34740,219517861,Sonder (NYC),Manhattan,82795,1281,98588
29481,107434423,Blueground,Manhattan,69741,29,58347
34141,205031545,Red Awning,Manhattan,35294,127,10796
19626,30283594,Kara,Manhattan,33581,65,37924
32152,156158778,Sally,Manhattan,29194,1,711


In [422]:
# When did Danielle from Queens last receive a review?
recent_reviews = air_bnb[['host_name','neighbourhood_group','last_review']]
filter_1 = recent_reviews["host_name"] == "Danielle"
filter_2 = recent_reviews["neighbourhood_group"] == "Queens"

danielle_reviews = recent_reviews.where(filter_1 & filter_2)
danielle_reviews = danielle_reviews.sort_values('last_review', ascending = False).dropna()
danielle_reviews

Unnamed: 0,host_name,neighbourhood_group,last_review
22469,Danielle,Queens,2019-07-08
21517,Danielle,Queens,2019-07-07
20403,Danielle,Queens,2019-07-06
22068,Danielle,Queens,2019-07-06
7086,Danielle,Queens,2019-07-03
33861,Danielle,Queens,2019-06-20
27021,Danielle,Queens,2018-01-02


## Further Questions

1. Which host has the most listings?

In [368]:
# host_listings = air_bnb.groupby(['host_id','host_name','neighbourhood_group'], as_index=False).sum(numeric_only= False)[['host_id','host_name','neighbourhood_group','calculated_host_listings_count']]
most_listings = air_bnb.sort_values('calculated_host_listings_count', ascending = False)[['host_id','host_name','calculated_host_listings_count']].head(1)
most_listings

Unnamed: 0,host_id,host_name,calculated_host_listings_count
39773,219517861,Sonder (NYC),327


2. How many listings have completely open availability?

In [562]:
open_listings = air_bnb[(air_bnb.availability_365 == 365)]
open_listings[['name','host_id','room_type','availability_365']]
open_listings.reset_index(drop=True).tail()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
1290,36415840,A BEAUTIFUL SPACE IN HEART OF WILLIAMSBURG,223715460,Simon And Julian,Brooklyn,Williamsburg,40.71091,-73.9656,Entire home/apt,499,30,0,,,1,365
1291,36453952,West Village Studio on quiet cobblestone street,115491896,Will,Manhattan,West Village,40.7362,-74.00827,Entire home/apt,205,1,0,,,1,365
1292,36473253,Heaven for you(only for guy),261338177,Diana,Brooklyn,Gravesend,40.59118,-73.97119,Shared room,25,7,0,,,6,365
1293,36481315,The Raccoon Artist Studio in Williamsburg New ...,208514239,Melki,Brooklyn,Williamsburg,40.71232,-73.9422,Entire home/apt,120,1,0,,,3,365
1294,36483152,Garden Jewel Apartment in Williamsburg New York,208514239,Melki,Brooklyn,Williamsburg,40.71232,-73.9422,Entire home/apt,170,1,0,,,3,365


3. What room_types have the highest review numbers?

In [547]:
room_types_reviewed = air_bnb.groupby(['room_type'], as_index = False).sum(numeric_only = True)[['room_type','number_of_reviews']]
room_types_reviewed = room_types_reviewed.sort_values('number_of_reviews', ascending = False)

Unnamed: 0,room_type,number_of_reviews
0,Entire home/apt,580403
1,Private room,538346
2,Shared room,19256


In [558]:
busiest_host_by_listing = busiest_hosts.sort_values("listings_count", ascending = False).head(1)
busiest_host_by_reviews = busiest_hosts.sort_values("rev_per_listing", ascending = False).head(1)


602.5

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [565]:
#answering the questions in the style of a report
print(('\033[1mNYC AirBnb Listings Analysis\033[0m'
       '\nExamining the now banned practices of AirBnb '
       'by analyzing  NYC 2019 listings data AKA'
       'observations of an industry touting the virtues of passive income'
       ' and tourism over the sanctity of shelter.'))

print('\n\n\033[1mAirbnb Listings & Location Data, NYC 2019\033[0m')
#This section will cover questions regarding neighborhoods and listings: #s 2,3,5,6,2-2,3-2

print('\n--Participating NYC Neighborhood Groups & Listing Frequency--') #Questions no. 2 & 3
print(f'\nThe data covers all 2019 listings among {len(air_bnb_group_total.index)} neighborhood groups:')
neigh_grps = air_bnb_group_total.loc[:,"neighbourhood_group"]
n = 1
for neighbourhood in neigh_grps:
    print(f'{n}. {neighbourhood}')
    n += 1
print(f'\nOf those groups, {neigh_grps.loc[0]} shows up most frequently in the data set with {air_bnb_group_total.loc[0,"id"]} listings.')

print((f'\nWhile many proponents of short-term rentals argue that the majority of renters are letting private rooms '
       'or shared rooms in their homes, the truth is that the majority of listings in Manhattan are'
       f'{(filtered_room_types.loc[6][1])}. These listings make up {filtered_room_types.loc[6][2]} or {round(resulting_rms)}% '
       f'of Manhattan\'s total listings. This correlates with the finding that {room_types_reviewed.iloc[0,0]} listings have the most reviews'
       ' among the various room types.'))

print('\n--Neighbourhood Prices--') #questions no. 6 & 5

print((f'\nOne night at each room in {air_bnb_group_total.loc[0][0]} will run an individual ${air_bnb_group_total.loc[0][1]}'
       f' or ${air_bnb_group_prices.loc[2][1]} per night on average, the highest pricing among NYC neighbourhood groups.'))


print('\n\n\033[1mHost Data, NYC 2019\033[0m')
print('\n--Host Work Load, Availability & Reviews--') #questions no. 1,4,8,10, 1-1,2-1

print((f'{busiest_host_by_listing.iloc[0][1]} has {busiest_host_by_listing.iloc[0][2]} individual listings, by and far the most in the data set.'
       f'However, among all hosts, it can be deduced that the busiest bnb host overall was {busiest_host_by_reviews.iloc[0][1]}'
       f' who received a remarkable {busiest_host_by_reviews.iloc[0][3]} reviews between her {busiest_host_by_reviews.iloc[0][2]} listings.'))

print((f'Of the {air_bnb.host_id.nunique()} hosts in the data set, {len(popular_hosts.index)} have over'
      f' 100 reviews and are unavailable for rent. {len(open_listings.index)} have completely open availability.'))

print(f'Host Danielle\'s last review was on : {danielle_reviews.iat[0,2]}')

print('\n--Listing Prices--') #questions no. 7 & 9

print('\nThe top 5 hosts with the highest total prices are:')
name_result = high_hosts_result.loc[:,"host_name"]
n = 1
for host in name_result:
    print (f'{n}. {host}')
    
print(f'\n Given this information it is unsurprising that {host_data.iat[0,1]} from {host_data.iat[0,2]} has the highest total of prices.')





[1mNYC AirBnb Listings Analysis[0m
Examining the now banned practices of AirBnb by analyzing  NYC 2019 listings data AKAobservations of an industry touting the virtues of passive income and tourism over the sanctity of shelter.


[1mAirbnb Listings & Location Data, NYC 2019[0m

--Participating NYC Neighborhood Groups & Listing Frequency--

The data covers all 2019 listings among 5 neighborhood groups:
1. Manhattan
2. Brooklyn
3. Queens
4. Bronx
5. Staten Island

Of those groups, Manhattan shows up most frequently in the data set with 21661 listings.

While many proponents of short-term rentals argue that the majority of renters are letting private rooms or shared rooms in their homes, the truth is that the majority of listings in Manhattan areEntire home/apt. These listings make up 13193 or 61% of Manhattan's total listings. This correlates with the finding that Entire home/apt listings have the most reviews among the various room types.

--Neighbourhood Prices--

One night at each