# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [4]:
air_bnb = pd.read_csv('AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [22]:
# How many neighborhood groups are available and which shows up the most?
# Answer: 5, neighborhood groups are available; Manhattan shows up most

air_bnb.nunique().neighbourhood_group    



5

In [23]:
air_bnb['neighbourhood_group'].value_counts()


Manhattan        21661
Brooklyn         20104
Queens            5666
Bronx             1091
Staten Island      373
Name: neighbourhood_group, dtype: int64

In [40]:
# Are private rooms the most popular in manhattan?
#Answer: # No, Entire home/apt are the most popular 

rooms = air_bnb['room_type']
rooms.value_counts()


Entire home/apt    25409
Private room       22326
Shared room         1160
Name: room_type, dtype: int64

In [9]:
# Which hosts are the busiest and based on their reviews?
#Answer: Host_id 37312959, 344035, 26432133 are the busiest

busiest_hosts = air_bnb.groupby('host_id').number_of_reviews.sum()
busiest_hosts.sort_values(ascending=False).head(3)

host_id
37312959    2273
344035      2205
26432133    2017
Name: number_of_reviews, dtype: int64

In [24]:
#Which neighorhood group has the highest average price?
#Answer: Manhattan

air_bnb.groupby('neighbourhood_group').price.mean()


neighbourhood_group
Bronx             87.496792
Brooklyn         124.383207
Manhattan        196.875814
Queens            99.517649
Staten Island    114.812332
Name: price, dtype: float64

In [25]:
# Which neighborhood group has the highest total price?
#Answer: Manhattan

air_bnb.groupby('neighbourhood_group').price.sum()


neighbourhood_group
Bronx              95459
Brooklyn         2500600
Manhattan        4264527
Queens            563867
Staten Island      42825
Name: price, dtype: int64

In [29]:
#Which top 5 hosts have the highest total price?
#Answer: Host_id, 34646	29407	32069	34051	19574	 have the highest ttl price

host_total_price = air_bnb.groupby(["host_id"], as_index=False).price.sum()
host_total_price.sort_values(by = 'price', ascending=False).head(5)

Unnamed: 0,host_id,price
34646,219517861,82795
29407,107434423,70331
32069,156158778,37097
34051,205031545,35294
19574,30283594,33581


In [13]:
# Who currently has no (zero) availability with a review count of 100 or more?

noavail_100review = air_bnb[(air_bnb['availability_365'] == 0) & (air_bnb['number_of_reviews'] > 100)]
noavail_100review['host_name']

#Answer:

8         MaryEllen
94       Christiana
132             Sol
174           Coral
180            Doug
            ...    
29581      Kathleen
30461         Janet
31250        Albert
32670      Stephany
35014       Mariluz
Name: host_name, Length: 158, dtype: object

In [15]:
# What host has the highest total of prices and where are they located?
# Answer: Host_id 219517861; located in Manhattan
host_total_price = air_bnb.groupby(["host_id"], as_index=False).price.sum()
host_total_price.sort_values(by='price', ascending=False).head(1)

Unnamed: 0,host_id,price
34646,219517861,82795


In [41]:
air_bnb[air_bnb['host_id'] == 219517861].neighbourhood_group


38293    Manhattan
38294    Manhattan
38588    Manhattan
39769    Manhattan
39770    Manhattan
           ...    
47691    Manhattan
47692    Manhattan
47693    Manhattan
47814    Manhattan
47821    Manhattan
Name: neighbourhood_group, Length: 327, dtype: object

In [42]:
# When did Danielle from Queens last receive a review?
#Answer: 2019-07-08

danielle = air_bnb[(air_bnb['host_name'] == 'Danielle') & (air_bnb['neighbourhood_group'] == 'Queens')]
danielle.sort_values(by='last_review', ascending=False).head(1)


Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
22469,18173787,Cute Tiny Room Family Home by LGA NO CLEANING FEE,26432133,Danielle,Queens,East Elmhurst,40.7638,-73.87238,Private room,48,1,436,2019-07-08,16.03,5,337


## Further Questions

1. Which host has the most listings?

In [43]:
hosts = air_bnb.groupby('host_id', as_index=False).id.count()
hosts.sort_values(by='id', ascending=False)

#Answer: Most listings- host_id 219517861



Unnamed: 0,host_id,id
34646,219517861,327
29407,107434423,232
19574,30283594,121
31079,137358866,103
14436,16098958,96
...,...,...
13358,13540183,1
13357,13538150,1
13356,13535952,1
13355,13533446,1


2. How many listings have completely open availability?

In [34]:
air_bnb[air_bnb['availability_365'] == 365].shape[0]

#Answer:

1295

3. What room_types have the highest review numbers?

In [44]:
air_bnb.groupby('room_type').number_of_reviews.sum()

#Answer: Highest number of reviews, Entire home/apt


room_type
Entire home/apt    580403
Private room       538346
Shared room         19256
Name: number_of_reviews, dtype: int64

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [None]:
# The most popular rooms in Manhattan is Entire home/appt. 
# Mahattan also has the highest average and total prices
# 1295 places have open availablility 
# Shared rooms are the least popular room type

