<a href="https://colab.research.google.com/github/somas1/CT/blob/main/Pandas/AirBnB_Case_Study_Project_Jupyter_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment.
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild.

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
air_bnb = pd.read_csv('AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [6]:
# How many neighborhood groups are available and which shows up the most?
#air_bnb.neighbourhood_group
neighbourhood_groups = air_bnb.groupby(['neighbourhood_group']).size()
neighbourhood_groups

neighbourhood_group
Bronx             1091
Brooklyn         20104
Manhattan        21661
Queens            5666
Staten Island      373
dtype: int64

There are five neighborhood groups with Manhattan having the largest number of entries.

In [5]:
# Are private rooms the most popular in manhattan?
Manhattan = air_bnb.query('neighbourhood_group == "Manhattan"')
Manhattan.groupby('room_type').size()

room_type
Entire home/apt    13199
Private room        7982
Shared room          480
dtype: int64

Private rooms are not the most popular rental in Manhattan. Entire home/apartment rentals are the most poplular.

In [7]:
# Which hosts are the busiest and based on their reviews?
reviews = air_bnb.groupby(['number_of_reviews','host_id','host_name']).count().sort_values('number_of_reviews',ascending = False)
reviews[[]].head()

number_of_reviews,host_id,host_name
629,47621202,Dona
607,4734398,Jj
597,4734398,Jj
594,4734398,Jj
576,47621202,Dona


Dona and Jj have the most reviews.

In [8]:
#Which neighorhood group has the highest average price?
air_bnb.groupby(['neighbourhood_group']).mean('numeric_only').sort_values(by = 'price', ascending = False)['price']

neighbourhood_group
Manhattan        196.875814
Brooklyn         124.383207
Staten Island    114.812332
Queens            99.517649
Bronx             87.496792
Name: price, dtype: float64

Manhattan has the highest average price for rentals.

In [9]:
# Which neighbor hood group has the highest total price?
air_bnb.groupby(['neighbourhood_group']).sum('numeric_only')['price']


neighbourhood_group
Bronx              95459
Brooklyn         2500600
Manhattan        4264527
Queens            563867
Staten Island      42825
Name: price, dtype: int64

Manhattan has the highest total summed price.

In [10]:
#Which top 5 hosts have the highest total price?
top_5 = air_bnb.groupby(['host_id', 'host_name']).sum('numeric_only').sort_values(by = 'price', ascending = False)
top_5.head()[['price']]

Unnamed: 0_level_0,Unnamed: 1_level_0,price
host_id,host_name,Unnamed: 2_level_1
219517861,Sonder (NYC),82795
107434423,Blueground,70331
156158778,Sally,37097
205031545,Red Awning,35294
30283594,Kara,33581


In [11]:
# Who currently has no (zero) availability with a review count of 100 or more?
review_query = air_bnb.query('availability_365 == 0 & number_of_reviews >= 100')
#review_query
review_query[['host_name', 'number_of_reviews', 'availability_365', 'neighbourhood_group']]

Unnamed: 0,host_name,number_of_reviews,availability_365,neighbourhood_group
8,MaryEllen,118,0,Manhattan
94,Christiana,168,0,Brooklyn
132,Sol,193,0,Brooklyn
174,Coral,114,0,Manhattan
180,Doug,206,0,Brooklyn
...,...,...,...,...
29581,Kathleen,103,0,Manhattan
30461,Janet,119,0,Queens
31250,Albert,102,0,Brooklyn
32670,Stephany,131,0,Brooklyn


In [12]:
# What host has the highest total of prices and where are they located?
highest_price_host = air_bnb.groupby(['host_name', 'neighbourhood_group']).sum('numeric_only').sort_values(by = 'price', ascending = False)
highest_price_host.head(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
host_name,neighbourhood_group,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Sonder (NYC),Manhattan,10885561678,71782340547,13316.25823,-24198.18856,82795,4353,1281,397.56,106929,98588


Sonder(NYC) in Manhattan had the highest total of prices.

In [13]:
# When did Danielle from Queens last receive a review?
Danielle_last_review = air_bnb.query('host_name == "Danielle"')
Danielle_last_review.sort_values(by= 'last_review', ascending= False)[['last_review']].head(1)


Unnamed: 0,last_review
22469,2019-07-08


Danielle received her last review on 07-08-2019

## Further Questions

1. Which host has the most listings?

In [14]:
most_listings = air_bnb.groupby(['host_name']).sum('numeric_only').sort_values(by = 'calculated_host_listings_count', ascending = False)
most_listings[['calculated_host_listings_count']].head(1)

Unnamed: 0_level_0,calculated_host_listings_count
host_name,Unnamed: 1_level_1
Sonder (NYC),106929


Sonder(NYC) has the most listings.

2. How many listings have completely open availability?

In [116]:
open_listings = air_bnb.query('availability_365 == 365')
open_listings.shape

(1295, 16)

There are 1295 listings with completely open availability.

3. What room_types have the highest review numbers?

In [15]:
room_types = air_bnb.groupby(['room_type']).sum('numberic_only').sort_values(by = 'number_of_reviews', ascending= False)
room_types[['number_of_reviews']]

Unnamed: 0_level_0,number_of_reviews
room_type,Unnamed: 1_level_1
Entire home/apt,580403
Private room,538346
Shared room,19256


Entire homes/apts and Private Rooms have the greatest number of reviews.

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

1. Which hosts are the busiest and why?

Dona and Jj are the busiest hosts based on their reviews.

**Dona** has 1205 reviews while **Jj** has 1191 reviews. [Link to calculations](https://colab.research.google.com/github/somas1/CT/blob/main/Pandas/AirBnB_Case_Study_Project_Jupyter_notebook.ipynb#scrollTo=Guf75AFw9WaE&line=1&uniqifier=1)

Having the most reviews suggests these hosts might be the busiest but without booking data, it's hard to say for sure.

Another way to hypothesize about the busiest host would be to look at who has the [greatest number of listings](https://colab.research.google.com/github/somas1/CT/blob/main/Pandas/AirBnB_Case_Study_Project_Jupyter_notebook.ipynb#scrollTo=UxGx6tUx9WaJ&line=1&uniqifier=1)

**Sonder (NYC)** has the greatest number of listings with 106,929 total listings.






In [41]:
sonders_open_listings = air_bnb.groupby(['host_name']).sum('numeric_only').sort_values(by = 'availability_365')
sonders_open_listings.query("host_name == 'Sonder (NYC)'")[['calculated_host_listings_count', 'availability_365']]


Unnamed: 0_level_0,calculated_host_listings_count,availability_365
host_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Sonder (NYC),106929,98588


In [40]:
Sonders = sonders_open_listings.query("host_name == 'Sonder (NYC)'")[['calculated_host_listings_count', 'availability_365']]
Sonders['calculated_host_listings_count'] - Sonders['availability_365']

host_name
Sonder (NYC)    8341
dtype: int64

Sonders (NYC) has had over 8000 bookings for the year.

2. How many neighborhood groups are available and which shows up the most?

There are **five** neighborhood groups:

Bronx

Brooklyn

Manhattan

Queens

Staten Island

**Manhattan** has the largest number of entries with 21,661.

3. Are private rooms the most popular in Manhattan?

Private rooms are not the most popular rental type in Manhattan [Entire Houses/Apartments](https://colab.research.google.com/github/somas1/CT/blob/main/Pandas/AirBnB_Case_Study_Project_Jupyter_notebook.ipynb#scrollTo=1og7JwnW9WaE&line=1&uniqifier=1) are the most popular in Manhattan with private rooms being second.

[Entire homes/apartments](https://colab.research.google.com/github/somas1/CT/blob/main/Pandas/AirBnB_Case_Study_Project_Jupyter_notebook.ipynb#scrollTo=MeNYjvMT9WaK&line=2&uniqifier=1) are also the most reviewed with private rooms being second.

4. Which hosts are the busiest based on their reviews.

See #1 above. Dona and Jj are busiest based on review counts.

5. Which neighborhood group has the highest average price?

[Manhattan has the highest average price.](https://colab.research.google.com/github/somas1/CT/blob/main/Pandas/AirBnB_Case_Study_Project_Jupyter_notebook.ipynb#scrollTo=Guf75AFw9WaE&line=2&uniqifier=1)

6. Which neighborhood group has the highest total price?

[Manhattan has the highest total price.](https://colab.research.google.com/github/somas1/CT/blob/main/Pandas/AirBnB_Case_Study_Project_Jupyter_notebook.ipynb#scrollTo=mRwmZSrO9WaE&line=3&uniqifier=1)

7. Which top 5 hosts have the highest total price?

[5 hosts with highest total price](https://colab.research.google.com/github/somas1/CT/blob/main/Pandas/AirBnB_Case_Study_Project_Jupyter_notebook.ipynb#scrollTo=2ikp4UFb9WaF&line=2&uniqifier=1)

1. Sonders (NYC)
2. Blueground
3. Sally
4. Red Awning
5. Kara

8. Who currently has no availability with a review count of 100 or more?

[Zero Available Spaces with 100 or More Reviews](https://colab.research.google.com/github/somas1/CT/blob/main/Pandas/AirBnB_Case_Study_Project_Jupyter_notebook.ipynb#scrollTo=fGhuSBpP9WaH&line=3&uniqifier=1)

There are 162 total hosts with no availability and 100 or more reviews.

9. What host has the highest total of prices and where are they located?

[Sonders (NYC) has the highest total of prices at $82,795 and they are located in Manhattan](https://colab.research.google.com/github/somas1/CT/blob/main/Pandas/AirBnB_Case_Study_Project_Jupyter_notebook.ipynb#scrollTo=Ik_HZndC9WaH&line=1&uniqifier=1)