# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [5]:
# air_bnb = pd.read_csv('files/AB_NYC_2019.csv')
# air_bnb.head()

df = pd.read_csv("/Users/seanbunk/Documents/CodingTemple/advanced_python/AB_NYC_2019.csv")

In [6]:
df.columns

Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365'],
      dtype='object')

In [7]:
#1 How many neighborhood groups are available and which shows up the most?

num_hoods = df["neighbourhood_group"].nunique()
print(f"Number of neighborhood groups: {num_hoods}")

neighbourhood_counts = df['neighbourhood_group'].value_counts()
print("Most common neighborhood group:", neighbourhood_counts.index[0])

Number of neighborhood groups: 5
Most common neighborhood group: Manhattan


In [8]:
#2 Are private rooms the most popular in manhattan?  NO

manhattan_listings = df[df["neighbourhood_group"] == "Manhattan"]
room_type_counts = manhattan_listings["room_type"].value_counts()
print(room_type_counts)


Entire home/apt    13199
Private room        7982
Shared room          480
Name: room_type, dtype: int64


In [9]:
#3 Which hosts are the busiest and based on their reviews?    
reviews_per_month = df.groupby('host_name')['reviews_per_month'].mean() 
top_5 = reviews_per_month.sort_values(ascending=False).head(5)
top_5

host_name
Row NYC    18.620000
Nalicia    18.126667
Dona       13.990000
Aisling    13.420000
Malini     13.150000
Name: reviews_per_month, dtype: float64

In [10]:
#4 Which neighorhood group has the highest average price?  Manhattan

avg_prices = df.groupby("neighbourhood_group")["price"].mean().sort_values(ascending=False)
print(avg_prices.head(1))

neighbourhood_group
Manhattan    196.875814
Name: price, dtype: float64


In [11]:
#5 Which neighbor hood group has the highest total price?

total_prices = df.groupby("neighbourhood_group")["price"].sum().sort_values(ascending=False)
print(total_prices.head(1))

neighbourhood_group
Manhattan    4264527
Name: price, dtype: int64


In [12]:
#6 Which top 5 hosts have the highest total price?

top_hosts = df.groupby("host_name")["price"].sum().nlargest(5)
print(top_hosts)

host_name
Sonder (NYC)    82795
Blueground      70331
Michael         66895
David           65844
Alex            52563
Name: price, dtype: int64


In [16]:
#7 Who currently has no (zero) availability with a review count of 100 or more?

no_avail = df[(df["availability_365"] == 0) & (df["number_of_reviews"] >= 100)]
no_avail['host_name']

8         MaryEllen
94       Christiana
132             Sol
174           Coral
180            Doug
            ...    
29581      Kathleen
30461         Janet
31250        Albert
32670      Stephany
35014       Mariluz
Name: host_name, Length: 162, dtype: object

In [67]:
#8 What host has the highest total of prices and where are they located?  Sonder

host_prices = df.groupby('host_name')['price'].sum().sort_values(ascending=False)
top_host = host_prices.index[0]
top_host_price = host_prices.iloc[0]
top_host_location = df.loc[df['host_name'] == top_host, 'neighbourhood'].iloc[0]
print(top_host_location)
print(top_host)
print(top_host_price)


Financial District
Sonder (NYC)
82795


In [20]:
#9 When did Danielle from Queens last receive a review?  #2019-07-08

danielle_reviews = df[(df["neighbourhood_group"] == "Queens") & (df["host_name"] == "Danielle")]
danielle_reviews

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
7086,5115372,Comfy Room Family Home LGA Airport NO CLEANING...,26432133,Danielle,Queens,East Elmhurst,40.76374,-73.87103,Private room,54,1,430,7/3/19,13.45,5,347
16349,13151075,ASTORIA APARTMENT OUTDOOR SPACE,18051286,Danielle,Queens,Astoria,40.77221,-73.92901,Private room,50,1,0,,,1,0
20403,16276632,Cozy Room Family Home LGA Airport NO CLEANING FEE,26432133,Danielle,Queens,East Elmhurst,40.76335,-73.87007,Private room,48,1,510,7/6/19,16.22,5,341
21517,17222454,Sun Room Family Home LGA Airport NO CLEANING FEE,26432133,Danielle,Queens,East Elmhurst,40.76367,-73.87088,Private room,48,1,417,7/7/19,14.36,5,338
22068,17754072,Bed in Family Home Near LGA Airport,26432133,Danielle,Queens,East Elmhurst,40.76389,-73.87155,Shared room,38,1,224,7/6/19,7.96,5,80
22469,18173787,Cute Tiny Room Family Home by LGA NO CLEANING FEE,26432133,Danielle,Queens,East Elmhurst,40.7638,-73.87238,Private room,48,1,436,7/8/19,16.03,5,337
27021,21386105,Quiet & clean 1br haven with balcony near the ...,154256662,Danielle,Queens,Astoria,40.77134,-73.92424,Entire home/apt,250,3,1,1/2/18,0.05,1,180
33861,26814763,One bedroom with full bed / 1 stop from Manhattan,201647469,Danielle,Queens,Long Island City,40.74565,-73.94699,Private room,108,2,13,6/20/19,1.74,1,333


## Further Questions

1. Which host has the most listings?

In [18]:
most_listings_host = df['host_name'].value_counts().index[0]
print("Host with the most listings:", most_listings_host)

Host with the most listings: Michael


2. How many listings have completely open availability?

In [10]:
open_avail_count = df[df['availability_365'] == 365]['id'].count()
print("Number of listings with completely open availability:", open_avail_count)


Number of listings with completely open availability: 1295


3. What room_types have the highest review numbers?

In [9]:
room_type_review_counts = df.groupby('room_type')['number_of_reviews'].sum()
highest_review_room_type = room_type_review_counts.idxmax()
print("Room type with the highest review numbers:", highest_review_room_type)

Room type with the highest review numbers: Entire home/apt


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [None]:
# 1.) There are approximately 5 neighborhood groups, which Manhattan shows up the most.

# 2.) Entire home/apt are more popular compared to private rooms or shared rooms. 

# 3.) According to the data, host ROW NYC is the busiest, as they have the most number of reviews.

# 4.) The neighborhood group that has the highest average price. 

# 5.) Manhattan is the neighborhood group that has the highest totatl price.

# 6.) Sonder, Blueground, Michael, David, and Alex are the top 5 host with the highest price.          

# 7.) MaryEllen, Christiana, Sol, Coral, Doug, Kathleen, Janet, Albert, Stephany, Mariluz has zero availability with 100 reviews.
    
# 8.) The name of the host that has the highest total of prices is Sonder, which he is located in Financial District, NYC. 
    
# 9.) Danielle from Queens last revieved a review on 7-8-2019. 


