# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidence to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [7]:
air_bnb = pd.read_csv('AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [27]:
# How many neighborhood groups are available and which shows up the most?

df = pd.DataFrame(air_bnb)
neighbourhood_groups = df['neighbourhood_group'].nunique()
neighbourhood_groups

neighbourhood_groups_counts = df['neighbourhood_group'].value_counts()
print(neighbourhood_groups_counts)

# There are 5 neighbourhood groups available with Manhattan showing up the most.

neighbourhood_group
Manhattan        21661
Brooklyn         20104
Queens            5666
Bronx             1091
Staten Island      373
Name: count, dtype: int64


In [29]:
# Are private rooms the most popular in manhattan?

room_type_count = df['room_type'].value_counts()
room_type_count

# Private rooms are not the most popular, the entire home/apartment option is the most popular.

room_type
Entire home/apt    25409
Private room       22326
Shared room         1160
Name: count, dtype: int64

In [95]:
# Which hosts are the busiest and based on their reviews?

host_name_and_reviews_count = air_bnb.groupby(['host_name', 'number_of_reviews']).size()
sorted_host_name_and_reviews_count = host_name_and_reviews_count.sort_values(ascending = False)
sorted_host_name_and_reviews_count.head(10)

# Blueground and Sonder (NYC) are the busiest hosts based on number of reviews.

host_name     number_of_reviews
Blueground    0                    204
Sonder (NYC)  0                    120
David         0                     94
Kara          0                     84
Michael       0                     82
Sonder        0                     67
Pranjal       0                     64
Daniel        0                     56
Kazuya        0                     52
Ken           0                     52
dtype: int64

In [37]:
# Which neighorhood group has the highest average price?

average_price_by_neighbourhood = df.groupby('neighbourhood_group')['price'].mean()
average_price_by_neighbourhood

# Manhattan has the highest average price.

neighbourhood_group
Bronx             87.496792
Brooklyn         124.383207
Manhattan        196.875814
Queens            99.517649
Staten Island    114.812332
Name: price, dtype: float64

In [45]:
# Which neighbor hood group has the highest total price?

total_price_by_neighbourhood = df.groupby('neighbourhood_group')['price'].sum()
total_price_by_neighbourhood

# Manhattan has the highest total price.

neighbourhood_group
Bronx              95459
Brooklyn         2500600
Manhattan        4264527
Queens            563867
Staten Island      42825
Name: price, dtype: int64

In [91]:
# Which top 5 hosts have the highest total price?

total_price_by_host = df.groupby('host_name')['price'].sum()
sorted_total_price_by_host = total_price_by_host.sort_values(ascending = False)
sorted_total_price_by_host.head()

# Sonder (NYC), Blueground, Michael, David, and Alex have the highest total price.

host_name
Sonder (NYC)    82795
Blueground      70331
Michael         66895
David           65844
Alex            52563
Name: price, dtype: int64

In [111]:
# Who currently has no (zero) availability with a review count of 100 or more?

availability_count = df[(df['availability_365'] == 0) & (df['number_of_reviews'] >= 100)]
sorted_availability_count = availability_count[['host_name', 'number_of_reviews', 'availability_365']]
sorted_availability_count

# There are 162 people shown below with zero availability with a review count of 100 or more.

Unnamed: 0,host_name,number_of_reviews,availability_365
8,MaryEllen,118,0
94,Christiana,168,0
132,Sol,193,0
174,Coral,114,0
180,Doug,206,0
...,...,...,...
29581,Kathleen,103,0
30461,Janet,119,0
31250,Albert,102,0
32670,Stephany,131,0


In [143]:
# What host has the highest total of prices and where are they located?

location_of_highest_total = df[(df['host_name'] == 'Sonder (NYC)')]
prices_by_location = location_of_highest_total[['host_name', 'price', 'neighbourhood', 'latitude', 'longitude']]
prices_by_location

# We know that Sonder (NYC) has the highest total of prices from a previous question and listed below are the neighbourhoods in which each of the properties are located.

Unnamed: 0,host_name,price,neighbourhood,latitude,longitude
38293,Sonder (NYC),302,Financial District,40.70637,-74.00645
38294,Sonder (NYC),229,Financial District,40.70771,-74.00641
38588,Sonder (NYC),232,Financial District,40.70743,-74.00443
39769,Sonder (NYC),262,Murray Hill,40.74792,-73.97614
39770,Sonder (NYC),255,Murray Hill,40.74771,-73.97528
...,...,...,...,...,...
47691,Sonder (NYC),135,Financial District,40.70818,-74.00631
47692,Sonder (NYC),165,Financial District,40.70691,-74.00682
47693,Sonder (NYC),165,Financial District,40.70772,-74.00673
47814,Sonder (NYC),699,Financial District,40.70840,-74.00518


In [149]:
# When did Danielle from Queens last receive a review?

danielle_from_queens = df[(df['host_name'] == 'Danielle') & (df['neighbourhood_group'] == 'Queens')]
danielle_from_queens_review = danielle_from_queens[['host_name', 'neighbourhood_group', 'neighbourhood', 'last_review']]
danielle_from_queens_review

# Listed below is every Danielle from Queens and showing their last review date.

Unnamed: 0,host_name,neighbourhood_group,neighbourhood,last_review
7086,Danielle,Queens,East Elmhurst,2019-07-03
16349,Danielle,Queens,Astoria,
20403,Danielle,Queens,East Elmhurst,2019-07-06
21517,Danielle,Queens,East Elmhurst,2019-07-07
22068,Danielle,Queens,East Elmhurst,2019-07-06
22469,Danielle,Queens,East Elmhurst,2019-07-08
27021,Danielle,Queens,Astoria,2018-01-02
33861,Danielle,Queens,Long Island City,2019-06-20


## Further Questions

1. Which host has the most listings?

In [153]:
host_listings = df['host_name'].value_counts()
sorted_host_listings = host_listings.sort_values(ascending = False)
sorted_host_listings

# Michael has the most listings.

host_name
Michael             417
David               403
Sonder (NYC)        327
John                294
Alex                279
                   ... 
Martin & Soledad      1
Soheil                1
Keno                  1
Keagon                1
Ilgar & Aysel         1
Name: count, Length: 11452, dtype: int64

2. How many listings have completely open availability?

In [155]:
completely_open_availability_count = df[(df['availability_365'] == 365)]
sorted_completely_open_availability_count = completely_open_availability_count[['availability_365', 'name', 'host_name', 'neighbourhood_group']]
sorted_completely_open_availability_count

# There are 1295 listings with completely open availability.

Unnamed: 0,availability_365,name,host_name,neighbourhood_group
0,365,Clean & quiet apt home by the park,John,Brooklyn
2,365,THE VILLAGE OF HARLEM....NEW YORK !,Elisabeth,Manhattan
36,365,Clean and Quiet in Brooklyn,Vt,Brooklyn
38,365,Country space in the city,Harriet,Brooklyn
97,365,"Upper Manhattan, New York",Elliott,Manhattan
...,...,...,...,...
48744,365,A BEAUTIFUL SPACE IN HEART OF WILLIAMSBURG,Simon And Julian,Brooklyn
48844,365,West Village Studio on quiet cobblestone street,Will,Manhattan
48868,365,Heaven for you(only for guy),Diana,Brooklyn
48880,365,The Raccoon Artist Studio in Williamsburg New ...,Melki,Brooklyn


3. What room_types have the highest review numbers?

In [157]:
room_type_and_reviews_count = air_bnb.groupby(['room_type', 'number_of_reviews']).size()
sorted_room_type_and_reviews_count = room_type_and_reviews_count.sort_values(ascending = False)
sorted_room_type_and_reviews_count.head(10)

# Entire home/apartments and private rooms have the highest review numbers.

room_type        number_of_reviews
Entire home/apt  0                    5077
Private room     0                    4661
                 1                    2554
Entire home/apt  1                    2553
                 2                    1787
Private room     2                    1592
Entire home/apt  3                    1349
Private room     3                    1132
Entire home/apt  4                    1121
                 5                     920
dtype: int64

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion. --

In [None]:
# 1. There are 5 neighbourhood groups available with Manhattan showing up the most.
# 2. Private rooms are not the most popular, the entire home/apartment option is the most popular.
# 3. Blueground and Sonder (NYC) are the busiest hosts based on number of reviews.
# 4. Manhattan has the highest average price.
# 5. Manhattan has the highest total price.
# 6. Sonder (NYC), Blueground, Michael, David, and Alex have the highest total price.
# 7. There are 162 people shown in the table with zero availability with a review count of 100 or more.
# 8. We know that Sonder (NYC) has the highest total of prices from a previous question and listed in the table are the neighbourhoods in which each of the properties are located.
# 9. Listed in the table is every Danielle from Queens and showing their last review date.

# Further Questions

# 1. Michael has the most listings.
# 2. There are 1295 listings with completely open availability.
# 3. Entire home/apartments and private rooms have the highest review numbers.
