# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [66]:
air_bnb = pd.read_csv('../files/AB_NYC_2019.csv')
display(air_bnb.head())

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [82]:
# Which hosts are the busiest and why?

grouped_data = air_bnb.groupby(['host_name']).sum().sort_values('calculated_host_listings_count', ascending = False, kind='mergesort')
display(grouped_data.loc[:][['calculated_host_listings_count']].head())

## Based only on the count of the calculated listings of each host, the top five busiest hosts are shown 

Unnamed: 0_level_0,calculated_host_listings_count
host_name,Unnamed: 1_level_1
Sonder (NYC),106929
Blueground,53824
Kara,14679
Kazuya,10609
Jeremy & Laura,9216


In [44]:
# How many neighborhood groups are available and which shows up the most?
neighborhood_data = air_bnb.groupby('neighbourhood_group').count().sort_values('id', ascending=False)
display(neighborhood_data.loc[:][['id']])

## Based on the count of the neighbourhood_group column, there are 5 neighborhood groups 
## with 'Manhattan' being the most frequent appearance


Unnamed: 0_level_0,id
neighbourhood_group,Unnamed: 1_level_1
Manhattan,21661
Brooklyn,20104
Queens,5666
Bronx,1091
Staten Island,373


In [112]:
# Are private rooms the most popular in manhattan?
manhattan_listings = air_bnb[air_bnb['neighbourhood_group'] == 'Manhattan'].groupby('room_type').mean().sort_values('price',ascending=False)
display(manhattan_listings[:][['price']])


## As there is no easy method by which to determine popularity for a room type based on the data given, 
## this answer demonstrates the amount of money each room type in Manhattan is listed for. 
## If the logic is accepted that the price of a type of room is based on the room type's popularity, 
## then it follows that the most popular room type in Manhattan is not a private room, but rather an entire home or apt.

manhattan_listing_count = air_bnb[air_bnb['neighbourhood_group'] == 'Manhattan'].groupby('room_type').count().sort_values('id',ascending=False)
display(manhattan_listing_count[:][['id']])

## If, on the other hand, it seems more logical to determine popularity of a ropom type based on the number of listings
## of that type, then the data still suggests that entire homes/apts are more popular in Manhattan than private rooms.

Unnamed: 0_level_0,price
room_type,Unnamed: 1_level_1
Entire home/apt,249.239109
Private room,116.776622
Shared room,88.977083


Unnamed: 0_level_0,id
room_type,Unnamed: 1_level_1
Entire home/apt,13199
Private room,7982
Shared room,480


In [83]:
# Which hosts are the busiest based on their reviews?

grouped_data = air_bnb.groupby(['host_name']).sum().sort_values('reviews_per_month', ascending = False, kind='mergesort')
display(grouped_data.loc[:][['number_of_reviews', 'reviews_per_month']].head())

## Based on the number of reviews per month of each host, the top five busiest hosts are shown 

Unnamed: 0_level_0,number_of_reviews,reviews_per_month
host_name,Unnamed: 1_level_1,Unnamed: 2_level_1
David,8103,508.61
Michael,11081,475.82
Alex,6204,443.44
Sonder (NYC),1281,397.56
John,7223,321.02


In [67]:
# Which neighborhood group has the highest average price?
neighbourhood_group_prices = air_bnb.groupby('neighbourhood_group').mean().sort_values('price',ascending=False)
display(neighbourhood_group_prices[:][['price']])

## The neighborhood group with the highest average price is Manhattan

Unnamed: 0_level_0,price
neighbourhood_group,Unnamed: 1_level_1
Manhattan,196.875814
Brooklyn,124.383207
Staten Island,114.812332
Queens,99.517649
Bronx,87.496792


In [68]:
# Which neighborhood group has the highest total price?
neighbourhood_group_prices = air_bnb.groupby('neighbourhood_group').sum().sort_values('price',ascending=False)
display(neighbourhood_group_prices[:][['price']])

## The neighborhood group with the highest total price is Manhattan

Unnamed: 0_level_0,price
neighbourhood_group,Unnamed: 1_level_1
Manhattan,4264527
Brooklyn,2500600
Queens,563867
Bronx,95459
Staten Island,42825


In [84]:
# Which top 5 hosts have the highest total price?

grouped_data = air_bnb.groupby(['host_name']).sum().sort_values('price', ascending = False, kind='mergesort')
display(grouped_data.loc[:][['price']].head())

## The top 5 hosts with the highest total price are shown

Unnamed: 0_level_0,price
host_name,Unnamed: 1_level_1
Sonder (NYC),82795
Blueground,70331
Michael,66895
David,65844
Alex,52563


In [109]:
# Who currently has no (zero) availability with a review count of 100 or more?

zero_available = air_bnb[air_bnb['availability_365'] == 0]
display(
    zero_available[zero_available['number_of_reviews'] >= 100]
    .drop_duplicates('host_name')
    .loc[:][['host_name','number_of_reviews', 'availability_365']]
    .reset_index(drop=True)
)

## Excluding duplicate host names, there are 142 hosts with more than 100 reviews and zero availability.
## These are shown


Unnamed: 0,host_name,number_of_reviews,availability_365
0,MaryEllen,118,0
1,Christiana,168,0
2,Sol,193,0
3,Coral,114,0
4,Doug,206,0
...,...,...,...
137,Kathleen,103,0
138,Janet,119,0
139,Albert,102,0
140,Stephany,131,0


In [92]:
# What host has the highest total of prices and where are they located?
grouped_data = air_bnb.groupby(['host_name']).sum().sort_values('price', ascending = False, kind='mergesort')
display(grouped_data.merge(air_bnb, on='host_name', how='outer').loc[0][['host_name','neighbourhood_group','neighbourhood','latitude_y','longitude_y','price_x']])


## According to the sum of all the prices for all the listings grouped by host, 
## the location of the host with the highest total of prices is shown.
## This location places the host on Maiden Lane, between Water Street and Pearl Street in Manhattan, New York.

host_name                    Sonder (NYC)
neighbourhood_group             Manhattan
neighbourhood          Financial District
latitude_y                       40.70637
longitude_y                     -74.00645
price_x                           82795.0
Name: 0, dtype: object

In [110]:
# When did Danielle from Queens last receive a review?
Danielle_queens = air_bnb[air_bnb['host_name'] == 'Danielle']
display(
    Danielle_queens[Danielle_queens['neighbourhood_group'] >= 'Queens']
    .drop_duplicates('host_name')
    .loc[:][['host_name', 'neighbourhood_group','last_review']]
)  

## Danielle from Queens' last review is shown

Unnamed: 0,host_name,neighbourhood_group,last_review
7086,Danielle,Queens,2019-07-03


## Further Questions

1. Which host has the most listings?

2. How many listings have completely open availability?

3. What room_types have the highest review numbers?

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --