# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv('files/AB_NYC_2019.csv')
df.head()



Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [None]:
# Splitting room type 
air_bnb= df[df.room_type.str.contains('|')]
air_bnb_room_type = air_bnb.room_type.apply(lambda x: x.split('|'))

df_r = pd.DataFrame(columns = df.columns)
for i in range (len(air_bnb)):
    type = air_bnb.iloc[i]
    for room_type in air_bnb_room_type[i]:
        df_r = df_r.append(pd.DataFrame([[type.host_name, type.neighbourhood_group, type.neighbourhood, type.latitude, type.longitude, room_type, type.price, type.minimum_nights, type.number_of_reviews, type.last_review, type.reviews_per_month, type.calculated_host_listings_count, type.availability_365]]))

df_r.drop('neighbourhood_group', axis = 1, inplace = True)
df_r.reset_index(drop=True,inplace=True)

df_r.head()

In [None]:
# Splitting neighborhood groups
air_bnb = df[df.neighbourhood_group.str.contains('|')]
air_bnb_neighbourhood_group = air_bnb.neighbourhood_group.apply(lambda x: x.split('|'))

df_n = pd.DataFrame(columns = df.columns)
for i in range (len(air_bnb)):
    type = air_bnb.iloc[i]
    for neighbourhood_group in air_bnb_neighbourhood_group[i]:
        df_n = df_n.append(pd.DataFrame([[type.host_name, neighbourhood_group, type.neighbourhood, type.latitude, type.longitude, room_type, type.price, type.minimum_nights, type.number_of_reviews, type.last_review, type.reviews_per_month, type.calculated_host_listings_count, type.availability_365]]))

df_n.drop('room_type', axis = 1, inplace = True)
df_n.reset_index(drop=True,inplace=True)


df_n.head()

In [49]:
# How many neighborhood groups are available and which shows up the most?
group = df.groupby('neighbourhood_group')['neighbourhood'].count()
group

neighbourhood_group
Bronx             1091
Brooklyn         20104
Manhattan        21661
Queens            5666
Staten Island      373
Name: neighbourhood, dtype: int64

In [104]:
# Are private rooms the most popular in manhattan?
private_rooms = df.groupby('neighbourhood_group').room_type.value_counts()
private_rooms

neighbourhood_group  room_type      
Bronx                Private room         652
                     Entire home/apt      379
                     Shared room           60
Brooklyn             Private room       10132
                     Entire home/apt     9559
                     Shared room          413
Manhattan            Entire home/apt    13199
                     Private room        7982
                     Shared room          480
Queens               Private room        3372
                     Entire home/apt     2096
                     Shared room          198
Staten Island        Private room         188
                     Entire home/apt      176
                     Shared room            9
Name: room_type, dtype: int64

In [5]:
# Which hosts are the busiest and based on their reviews?
busiest = df.groupby('host_id')['number_of_reviews'].count().nlargest(10)
busiest

host_id
219517861    327
107434423    232
30283594     121
137358866    103
12243051      96
16098958      96
61391963      91
22541573      87
200380610     65
1475015       52
Name: number_of_reviews, dtype: int64

In [7]:
#Which neighorhood group has the highest average price?
highest_avg = df.groupby('neighbourhood_group')['price'].mean().nlargest(5)
highest_avg

neighbourhood_group
Manhattan        196.875814
Brooklyn         124.383207
Staten Island    114.812332
Queens            99.517649
Bronx             87.496792
Name: price, dtype: float64

In [8]:
# Which neighbor hood group has the highest total price?
highest_total = df.groupby('neighbourhood_group')['price'].sum().nlargest(5)
highest_total

neighbourhood_group
Manhattan        4264527
Brooklyn         2500600
Queens            563867
Bronx              95459
Staten Island      42825
Name: price, dtype: int64

In [17]:
#Which top 5 hosts have the highest total price?
top_hosts = df.groupby('host_id')['price'].sum().nlargest(5)
top_hosts

host_id
219517861    82795
107434423    70331
156158778    37097
205031545    35294
30283594     33581
Name: price, dtype: int64

In [74]:
# Who currently has no (zero) availability with a review count of 100 or more? 
availability = df.query('number_of_reviews > 100').host_id.value_counts().nlargest(10)
availability

35524316    11
344035      10
16677326     7
59529529     6
40176101     6
6885157      6
50600973     5
37312959     5
9922972      5
7831209      5
Name: host_id, dtype: int64

In [83]:
# What host has the highest total of prices and where are they located?
highest_host = df.groupby('host_id')['price'].sum().nlargest(10)
highest_host

host_id
219517861    82795
107434423    70331
156158778    37097
205031545    35294
30283594     33581
12243051     20451
16098958     20060
836168       19500
200380610    18865
3750764      18780
Name: price, dtype: int64

In [None]:
# When did Danielle from Queens last receive a review?
danielle = df_n.groupby('host_name = Danielle')['last_review'].head(1)
danielle

## Further Questions

1. Which host has the most listings?

In [48]:
most_listings = df.groupby('host_id')['calculated_host_listings_count'].count().nlargest(10)
most_listings

host_id
219517861    327
107434423    232
30283594     121
137358866    103
12243051      96
16098958      96
61391963      91
22541573      87
200380610     65
1475015       52
Name: calculated_host_listings_count, dtype: int64

2. How many listings have completely open availability?

In [90]:
open = df.groupby('host_id')['availability_365'].count().nlargest(10)
open

host_id
219517861    327
107434423    232
30283594     121
137358866    103
12243051      96
16098958      96
61391963      91
22541573      87
200380610     65
1475015       52
Name: availability_365, dtype: int64

3. What room_types have the highest review numbers?

In [36]:
type = df.groupby('room_type')['number_of_reviews'].count()
type

room_type
Entire home/apt    25409
Private room       22326
Shared room         1160
Name: number_of_reviews, dtype: int64

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

From this dataset, we can conclude that Manhattan has the most Airbnbs available to rent out for a user. Private rooms are not the most popular in Manhattan, entire houses/buildings are. Host number 219517861 is the busiest based on the number of reviews they have. Manhattan has the highest average price for listings so we can conclude that it is the most expensive to rent out there. The highest total neighborhood group based on prices is Manhattan. 