# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [6]:
bnb = pd.read_csv(r"C:\Users\Godzilla\Documents\python\ABNB\AB_NYC_2019.csv")
bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [40]:
# How many neighborhood groups are available and which shows up the most?
bnb_groups = bnb['neighbourhood_group'].value_counts()


print(f"There are {len(bnb_groups)} neighborhood groups available")

There are 5 neighborhood groups available


In [85]:
# Are private rooms the most popular in manhattan?
#convert columns from obj to str
bnb['neighbourhood'] = bnb['neighbourhood'].astype(str)
bnb['room_type'] = bnb['room_type'].astype(str)


#count manhattan rooms
# filter out rooms in the Manhattan neighborhood
manhattan_rooms = bnb[bnb['neighbourhood_group'] == 'Manhattan']
total = len(manhattan_rooms)
#filter out rooms in the Manhattan neighborhood w private rooms
manhattan_rooms_priv = bnb[(bnb['neighbourhood_group'] == 'Manhattan') & (bnb['room_type'] == 'Private room')]
#filter out rooms in the Manhattan neighborhood w/o private rooms
manhattan_rooms_other = bnb[(bnb['neighbourhood_group'] == 'Manhattan') & (bnb['room_type'] != 'Private room')]
#create simple variables
private = len(manhattan_rooms_priv)
other =  len(manhattan_rooms_other)


print(f"The total number of AirBNBs in Manhattan is {total} with {private} of them being private rooms and {other} of them being either an Entire home/apt or a Shared room")
print(f"With this information we can conclude tha private rooms are NOT the most popular listing in Manhattan")

num_reviews = manhattan_rooms_priv['number_of_reviews'].count()

print(num_reviews)

The total number of AirBNBs in Manhattan is 21661 with 7982 of them being private rooms and 13679 of them being either an Entire home/apt or a Shared room
With this information we can conclude tha private rooms are NOT the most popular listing in Manhattan
7982


In [185]:
# Which hosts are the busiest and based on their reviews?

busiest_host = bnb.groupby('host_name')['number_of_reviews'].sum().sort_values(ascending=False) 
print(busiest_host)

host_name
'Cil         0.0
Nawar        0.0
Nathaniel    0.0
Nati         0.0
Natia        0.0
            ... 
Griffith     0.0
Grinis       0.0
Grisel       0.0
Grisha       0.0
현선           0.0
Name: number_of_reviews, Length: 11452, dtype: float64


In [101]:
#Which neighorhood group has the highest average price?
#groupby neighbourhood and select the 'price' column from the grouped data and calulate mean
neighbourhood_mean = bnb.groupby('neighbourhood')['price'].mean()


highest_neighbourhood = neighbourhood_mean.sort_values(ascending=False).head()
print(highest_neighbourhood)
print(f"The neighbourhood with the highest average price is {highest_neighbourhood.index[0]}")


neighbourhood
Fort Wadsworth    800.000000
Woodrow           700.000000
Tribeca           490.638418
Sea Gate          487.857143
Riverdale         442.090909
Name: price, dtype: float64
The neighbourhood with the highest average price is Fort Wadsworth


In [113]:
# Which neighbor hood group has the highest total price?

highest_price = bnb.groupby('neighbourhood_group')['price'].sum().sort_values(ascending=False).head(1)
print(highest_price)
print(f"The neighbourhood group with the highest total price is {highest_price.reset_index()['neighbourhood_group'][0]}")


neighbourhood_group
Manhattan    4264527
Name: price, dtype: int64
The neighbourhood group with the highest total price is Manhattan


In [122]:
#Which top 5 hosts have the highest total price?
highest_hosts = bnb.groupby('host_name')['price'].sum().sort_values(ascending=False).head(5)
print(highest_hosts)


print(f"The top 5 hosts with the highest total price are {highest_hosts.reset_index()['host_name'][0:5]} ")



host_name
Sonder (NYC)    82795
Blueground      70331
Michael         66895
David           65844
Alex            52563
Name: price, dtype: int64
The top 5 hosts with the highest total price are 0    Sonder (NYC)
1      Blueground
2         Michael
3           David
4            Alex
Name: host_name, dtype: object 


In [142]:
# Who currently has no (zero) availability with a review count of 100 or more?


#change from obj type to numeric type
bnb['number_of_reviews'] = pd.to_numeric(bnb['number_of_reviews'], errors='coerce')
zero_hundred = bnb[(bnb['availability_365'] == 0) & (bnb['number_of_reviews'] >= 100)]
print(zero_hundred)



Empty DataFrame
Columns: [id, name, host_id, host_name, neighbourhood_group, neighbourhood, latitude, longitude, room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365]
Index: []


In [144]:
# What host has the highest total of prices and where are they located?
#Which top 5 hosts have the highest total price?
highest_host = bnb.groupby('host_name')['price'].sum().sort_values(ascending=False).head(1)
print(highest_host)


print(f"The host with the highest total price is {highest_host.reset_index()['host_name'][0]} ")






host_name
Sonder (NYC)    82795
Name: price, dtype: int64
The host with the highest total price is Sonder (NYC) 


In [161]:
# When did Danielle from Queens last receive a review?
#find host
danielle_queens = bnb[(bnb['host_name'] == 'Danielle') & (bnb['neighbourhood_group'] == 'Queens')]
#sort dates
danielle_queens_sorted = danielle_queens.sort_values('last_review', ascending=False)
last_review_dani = danielle_queens_sorted.iloc[0]['last_review']
print(f"Danielle from Queens last recieved a review on {last_review_dani}")


Danielle from Queens last recieved a review on 2019-07-08


## Further Questions

1. Which host has the most listings?

In [166]:

host_most = bnb['host_name'].value_counts().head(1)


print(f"The host with the most listings is {host_most.index[0]} with {host_most.values[0]} listings.")

The host with the most listings is Michael with 417 listings.


2. How many listings have completely open availability?

In [169]:
open_avail = (bnb['availability_365'] == 365).sum()


print(f"There are {open_avail} listings with completely open availability.")

There are 1295 listings with completely open availability.


3. What room_types have the highest review numbers?

In [181]:
roomtype_reviews = bnb.groupby('room_type')['number_of_reviews'].sort()
highest_bytype = roomtype_reviews.sort_values(ascending=False)
print(highest_bytype)



TypeError: 'bool' object is not callable

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --
My final conclusion is that Manhattan is the most expensive neighborhood.