# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [18]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [37]:
air_bnb = pd.read_csv('../downloads/AB_NYC_2019 - AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [28]:
# How many neighborhood groups are available and which shows up the most?
num_groups = len(air_bnb['neighbourhood_group'].unique())
print(f"There are {num_groups} neighborhood groups in the dataset.")

group_counts = air_bnb['neighbourhood_group'].value_counts()
most_common_group = group_counts.index[0]
print(f"The most common neighborhood group is {most_common_group}.")


There are 5 neighborhood groups in the dataset.
The most common neighborhood group is Manhattan.


In [82]:
# Are private rooms the most popular in manhattan?
manhattan_rooms = air_bnb[air_bnb['neighbourhood_group'] == 'Manhattan']
room_type_counts = manhattan_rooms['room_type'].value_counts()
room_type_counts
# if room_type_counts['Private room'] > room_type_counts['Entire home/apt'] and \
#    room_type_counts['Private room'] > room_type_counts['Shared room']:
#     print("Yes, private rooms are the most popular in Manhattan.")
# else:
#     print("No, private rooms are not the most popular in Manhattan.")


Entire home/apt    13199
Private room        7982
Shared room          480
Name: room_type, dtype: int64

In [38]:
# Which hosts are the busiest and based on their reviews?
host_review_counts = air_bnb.groupby('host_id')['number_of_reviews'].sum()
busiest_hosts = host_review_counts.sort_values(ascending=False)
print("Top 5 busiest hosts based on review counts:")
print(busiest_hosts.head())


Top 5 busiest hosts based on review counts:
host_id
37312959    2273
344035      2205
26432133    2017
35524316    1971
40176101    1818
Name: number_of_reviews, dtype: int64


In [49]:
#Which neighorhood group has the highest average price?
group_avg_prices = air_bnb.groupby('neighbourhood_group')['price'].mean()
highest_avg_price_group = group_avg_prices.sort_values(ascending=False).index[0]
group_avg_prices
print(f"The neighborhood group with the highest average price is {highest_avg_price_group}.")


The neighborhood group with the highest average price is Manhattan.


In [47]:
# Which neighbor hood group has the highest total price?
group_total_prices = air_bnb.groupby('neighbourhood_group')['price'].sum()
highest_total_price_group = group_total_prices.sort_values(ascending=False).index[0]
group_total_prices
print(f"The neighborhood group with the highest total price is {highest_total_price_group}.")


The neighborhood group with the highest total price is Manhattan.


In [51]:
#Which top 5 hosts have the highest total price?
host_total_prices = air_bnb.groupby('host_id')['price'].sum()
top_hosts = host_total_prices.sort_values(ascending=False).head(5)
print("Top 5 hosts with the highest total price:")
print(top_hosts)


Top 5 hosts with the highest total price:
host_id
219517861    82795
107434423    70331
156158778    37097
205031545    35294
30283594     33581
Name: price, dtype: int64


In [65]:
# Who currently has no (zero) availability with a review count of 100 or more?
airbnb_hosts = air_bnb[air_bnb['number_of_reviews'] >= 100]
unavailable_hosts = airbnb_hosts.groupby('host_id').filter(lambda x: (x['availability_365'] == 0).all())
unavailable_host_ids = unavailable_hosts['host_id'].unique()
print("Hosts with a review count of 100 or more and currently no availability:")
print(air_bnb[air_bnb['host_id'].isin(unavailable_host_ids)][['host_id', 'host_name']])


Hosts with a review count of 100 or more and currently no availability:
         host_id       host_name
8           7490       MaryEllen
94         79402      Christiana
132       129352             Sol
174       193722           Coral
180        67778            Doug
...          ...             ...
35009  209549523         Mariluz
35013  209549523         Mariluz
35014  209549523         Mariluz
35070     814747           Maeve
42337   22171095  George & Diana

[180 rows x 2 columns]


In [66]:
# What host has the highest total of prices and where are they located?
host_total_prices = air_bnb.groupby('host_id')['price'].sum()
highest_total_price_host_id = host_total_prices.idxmax()
highest_total_price_host_rows = air_bnb[air_bnb['host_id'] == highest_total_price_host_id]
location = highest_total_price_host_rows['neighbourhood'].iloc[0]
host_name = highest_total_price_host_rows['host_name'].iloc[0]
print(f"The host with the highest total price is {host_name}, located in {location}.")


The host with the highest total price is Sonder (NYC), located in Financial District.


In [68]:
# When did Danielle from Queens last receive a review?
danielle_from_queens = air_bnb[(air_bnb['host_name'] == 'Danielle') & (air_bnb['neighbourhood_group'] == 'Queens')]
danielle_from_queens = danielle_from_queens.sort_values('last_review', ascending=False)
last_review_date = danielle_from_queens['last_review'].iloc[0]
print(f"Danielle from Queens last received a review on {last_review_date}.")


Danielle from Queens last received a review on 2019-07-08.


## Further Questions

1. Which host has the most listings?

In [70]:
host_listing_count = air_bnb.groupby('host_id')['calculated_host_listings_count'].count()
most_listings_host = host_listing_count.idxmax()
most_listings_host_id = air_bnb[air_bnb['host_id'] == most_listings_host]
host_name = most_listings_host_id['host_name'].iloc[0]
print(f"The host with the most listings is {host_name}.")


The host with the most listings is Sonder (NYC).


2. How many listings have completely open availability?

In [71]:
open_listings = air_bnb[air_bnb['availability_365'] == 365]
open_listings_count = len(open_listings)
print(f"There are {open_listings_count} listings with completely open availability.")

There are 1295 listings with completely open availability.


3. What room_types have the highest review numbers?

In [75]:
reviews_by_room_type = air_bnb.groupby('room_type')['number_of_reviews'].sum()
reviews_by_room_type_sorted = reviews_by_room_type.sort_values(ascending=False)
reviews_by_room_type_sorted


room_type
Entire home/apt    580403
Private room       538346
Shared room         19256
Name: number_of_reviews, dtype: int64

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [None]:
There are 5 neighborhood groups in the dataset.
The most common neighborhood group is Manhattan.
Entire home/apt are the most popular listings in Manhattan.
The neighborhood group with the highest average and total price is Manhattan.
There are 180 hosts with a review count of 100 or more and currently no availability.
The host with the highest total price is Sonder (NYC), located in Financial District.
Danielle from Queens last received a review on 2019-07-08.
The host with the most listings is Sonder (NYC).
There are 1295 listings with completely open availability.
Entire home/apt have the highest numbers of reviews.
