# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [7]:
air_bnb = pd.read_csv(r'C:\Users\jason\Downloads\AB_NYC_2019 - AB_NYC_2019.csv', sep = ',')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [8]:
# How many neighborhood groups are available and which shows up the most?
df = pd.DataFrame(air_bnb)

df['neighbourhood_group'].value_counts()



Manhattan        21661
Brooklyn         20104
Queens            5666
Bronx             1091
Staten Island      373
Name: neighbourhood_group, dtype: int64

In [9]:
# Are private rooms the most popular in manhattan?
df = pd.DataFrame(air_bnb)

df['room_type'].value_counts()


Entire home/apt    25409
Private room       22326
Shared room         1160
Name: room_type, dtype: int64

In [12]:
# Which hosts are the busiest and based on their reviews?
df = pd.DataFrame(air_bnb)

df['host_rank'] = df.groupby('host_name')['reviews_per_month'].rank(ascending=False)

print(df)

             id                                               name   host_id  \
0          2539                 Clean & quiet apt home by the park      2787   
1          2595                              Skylit Midtown Castle      2845   
2          3647                THE VILLAGE OF HARLEM....NEW YORK !      4632   
3          3831                    Cozy Entire Floor of Brownstone      4869   
4          5022   Entire Apt: Spacious Studio/Loft by central park      7192   
...         ...                                                ...       ...   
48890  36484665    Charming one bedroom - newly renovated rowhouse   8232441   
48891  36485057      Affordable room in Bushwick/East Williamsburg   6570630   
48892  36485431            Sunny Studio at Historical Neighborhood  23492952   
48893  36485609               43rd St. Time Square-cozy single bed  30985759   
48894  36487245  Trendy duplex in the very heart of Hell's Kitchen  68119814   

           host_name neighbourhood_grou

In [14]:
#Which neighorhood group has the highest average price?

df = pd.DataFrame(air_bnb)

result = df.groupby('neighbourhood_group').agg({'price':'mean'}).sort_values(['price'],ascending=False)

print(result)


                          price
neighbourhood_group            
Manhattan            196.875814
Brooklyn             124.383207
Staten Island        114.812332
Queens                99.517649
Bronx                 87.496792


In [15]:
# Which neighbor hood group has the highest total price?
df = pd.DataFrame(air_bnb)

result = df.groupby('neighbourhood_group').agg({'price':'sum'}).sort_values(['price'],ascending=False)

print(result)



                       price
neighbourhood_group         
Manhattan            4264527
Brooklyn             2500600
Queens                563867
Bronx                  95459
Staten Island          42825


In [45]:
#Which top 5 hosts have the highest total price?
df = pd.DataFrame(air_bnb)

result = df.groupby('host_name').agg({'price':'sum'}).sort_values(['price'],ascending=False)

print(result.iloc[:5])

              price
host_name          
Sonder (NYC)  82795
Blueground    70331
Michael       66895
David         65844
Alex          52563


In [120]:
# Who currently has no (zero) availability with a review count of 100 or more?
df = pd.DataFrame(air_bnb)[(air_bnb['number_of_reviews']>=100) & (air_bnb['availability_365']==0)]

print(result)


Empty DataFrame
Columns: [id, name, host_id, host_name, neighbourhood_group, neighbourhood, latitude, longitude, room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365, host_rank]
Index: []


In [112]:
# What host has the highest total of prices and where are they located?
df = pd.DataFrame(air_bnb)

result = df.groupby(['host_name', 'neighbourhood']).agg({'price':'sum'}).sort_values('price', ascending=False)



print(result.iloc[:1])





                                 price
host_name    neighbourhood            
Sonder (NYC) Financial District  57738


In [119]:
# When did Danielle from Queens last receive a review?
df = pd.DataFrame(air_bnb)[(air_bnb['host_name'] =='Danielle') & (air_bnb['neighbourhood_group'] =='Queens')]

print(df['last_review'])




7086     2019-07-03
16349           NaN
20403    2019-07-06
21517    2019-07-07
22068    2019-07-06
22469    2019-07-08
27021    2018-01-02
33861    2019-06-20
Name: last_review, dtype: object


## Further Questions

1. Which host has the most listings?

In [121]:
df = pd.DataFrame(air_bnb)

df['host_rank'] = df.groupby('host_name')['calculated_host_listings_count'].rank(ascending=False)

print(df)

             id                                               name   host_id  \
0          2539                 Clean & quiet apt home by the park      2787   
1          2595                              Skylit Midtown Castle      2845   
2          3647                THE VILLAGE OF HARLEM....NEW YORK !      4632   
3          3831                    Cozy Entire Floor of Brownstone      4869   
4          5022   Entire Apt: Spacious Studio/Loft by central park      7192   
...         ...                                                ...       ...   
48890  36484665    Charming one bedroom - newly renovated rowhouse   8232441   
48891  36485057      Affordable room in Bushwick/East Williamsburg   6570630   
48892  36485431            Sunny Studio at Historical Neighborhood  23492952   
48893  36485609               43rd St. Time Square-cozy single bed  30985759   
48894  36487245  Trendy duplex in the very heart of Hell's Kitchen  68119814   

           host_name neighbourhood_grou

2. How many listings have completely open availability?

In [125]:
df = pd.DataFrame(air_bnb)

df[air_bnb['availability_365'] == 365]

print(df['id'].count())

48895


3. What room_types have the highest review numbers?

In [127]:
df = pd.DataFrame(air_bnb)

df['room_type_rank'] = df.groupby('room_type')['reviews_per_month'].rank(ascending=False)

print(df['room_type'])

0           Private room
1        Entire home/apt
2           Private room
3        Entire home/apt
4        Entire home/apt
              ...       
48890       Private room
48891       Private room
48892    Entire home/apt
48893        Shared room
48894       Private room
Name: room_type, Length: 48895, dtype: object


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --