# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
air_bnb = pd.read_csv('../files/AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [18]:
# How many neighborhood groups are available and which shows up the most?
air_bnb.neighbourhood_group.unique()
print(f'Total Neighborhood Groups: {len(air_bnb.neighbourhood_group.unique())}')

print(air_bnb.groupby('neighbourhood_group')['id'].count())

print('\nManhattan is the neighborbood that shows up the most')

Total Neighborhood Groups: 5
neighbourhood_group
Bronx             1091
Brooklyn         20104
Manhattan        21661
Queens            5666
Staten Island      373
Name: id, dtype: int64

Manhattan is the neighborbood that shows up the most


In [81]:
# Are private rooms the most popular in manhattan?

manhattan = air_bnb[air_bnb['neighbourhood_group'] == 'Manhattan']

print(manhattan.groupby('room_type').sum()['number_of_reviews'])

print('\nBased on the total number of reviews left, it looks like the Entire Home/apt category is most popular in Manhattan.')

room_type
Entire home/apt    235147
Private room       209150
Shared room         10272
Name: number_of_reviews, dtype: int64

Based on the total number of reviews left, it looks like the Entire Home/apt category is most popular in Manhattan.


In [46]:
# Which hosts are the busiest and based on their reviews?
review_sums = air_bnb.groupby('host_id').sum()['number_of_reviews']
print(review_sums.sort_values(ascending = False))

# Host with ID #37312959 is the busiest host with 2273 total reviews across their properties
print('\nHost with ID #37312959 is the busiest host with 2273 total reviews across their properties')

host_id
37312959     2273
344035       2205
26432133     2017
35524316     1971
40176101     1818
             ... 
34233009        0
34222449        0
34202667        0
34149113        0
274321313       0
Name: number_of_reviews, Length: 37457, dtype: int64

Host with ID #37312959 is the busiest host with 2273 total reviews across their properties


In [48]:
#Which neighorhood group has the highest average price?
print(air_bnb.groupby('neighbourhood_group')['price'].mean())

print('\nManhattan has the heighest average, with a price of $196.88')

neighbourhood_group
Bronx             87.496792
Brooklyn         124.383207
Manhattan        196.875814
Queens            99.517649
Staten Island    114.812332
Name: price, dtype: float64

Manhattan has the heighest average, with a price of $196.88


In [50]:
# Which neighbor hood group has the highest total price?
print(air_bnb.groupby('neighbourhood_group')['price'].sum())

print('\nManhattan has the highest total price.')

neighbourhood_group
Bronx              95459
Brooklyn         2500600
Manhattan        4264527
Queens            563867
Staten Island      42825
Name: price, dtype: int64

Manhattan has the highest total price.


In [52]:
#Which top 5 hosts have the highest total price?
top5 = air_bnb.groupby('host_name').sum().sort_values('price', ascending = False)
top5.head(5)

print('Top 5 Hosts: Sonder(NYC), Blueground, Michael, David, Alex')

Top 5 Hosts: Sonder(NYC), Blueground, Michael, David, Alex


In [55]:
# Who currently has no (zero) availability with a review count of 100 or more?

no_avail = air_bnb[(air_bnb['availability_365'] == 0) & (air_bnb['number_of_reviews'] >= 100)]

no_avail.count()

print('There are 162 listings given the current criteria')

There are 162 listings given the current criteria


In [64]:
# What host has the highest total of prices and where are they located?

print(air_bnb.groupby(['host_name','neighbourhood'])[['price']].sum().nlargest(1,['price']))

print('\nSonder has the highest total - located in the Financial District')

                                 price
host_name    neighbourhood            
Sonder (NYC) Financial District  57738

Sonder has the highest total - located in the Financial District


In [67]:
# When did Danielle from Queens last receive a review?
print(air_bnb[(air_bnb['host_name']=='Danielle') & (air_bnb['neighbourhood_group']=='Queens')]['last_review'])

print("\nDanielle's last review was on 2019-07-08")

7086     2019-07-03
16349           NaN
20403    2019-07-06
21517    2019-07-07
22068    2019-07-06
22469    2019-07-08
27021    2018-01-02
33861    2019-06-20
Name: last_review, dtype: object

Danielle's last review was on 2019-07-08


## Further Questions

1. Which host has the most listings?

In [79]:
print(air_bnb.groupby('host_name')['name'].count().nlargest())

print('Michael has the most listings')

host_name
Michael         417
David           403
Sonder (NYC)    327
John            294
Alex            279
Name: name, dtype: int64
Michael has the most listings


2. How many listings have completely open availability?

In [78]:
open365 = air_bnb[air_bnb['availability_365'] >= 365]
print(open365.id.count())

print('There are 1295 properties that have completely open availability right now.')

1295
There are 1295 properties that have completely open availability right now.


3. What room_types have the highest review numbers?

In [80]:
print(air_bnb.groupby('room_type')['number_of_reviews'].sum())

print('Entire home/apt has the highest number of reviews.')

room_type
Entire home/apt    580403
Private room       538346
Shared room         19256
Name: number_of_reviews, dtype: int64


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

1. There are 5 different nieghborhood groups. Manhattan shows up the most times, with 21,661 total listings.
2. The most popular room type in Manhattan is the Entire Home/Apt type. I found this by using a count of the reviews to see which apt had the most feedback/stays.
3. Manhattan has the highest mean price at 196.88.
4. Manhattan also has the highest total price (4,264,527). This makes sense, given it has the highest average and the most avaialable listings.
5. Top 5 Hosts: Sonder(NYC), Blueground, Michael, David, Alex. This is based on the sum of all of their listing prices.
6. There are 162 listings with 0 availability and over 100 reviews.
7. Sonder has the highest total price. He is located in the Financial District.

Further Questions:

1. Michael has the most listings at 417.
2. There are 1,295 total properties that have completely open availability right now.
3. Entire home/Apt is the room-type with the highest amount of reviews.
