# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [56]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [70]:
air_bnb = pd.read_csv('/Users/lylelasala/Documents/work/Data Sci/Python/AB_NYC_2019.csv')
air_bnb.head()

# /Users/lylelasala/Documents/work/Data Sci/Python

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [91]:
# Which hosts are the busiest and why?

listing_counts = air_bnb['host_id'].value_counts()
busiest_host = listing_counts.keys()[0]

print(f"The busiest host is Host ID: {busiest_host}.")

The busiest host is Host ID: 219517861.


In [93]:
# How many neighborhood groups are available and which shows up the most?

neighbourhood_group_count = len(air_bnb.neighbourhood_group.unique())
print("There are {} different neighbourhood groups available in the dataset".format(neighbourhood_group_count))

# neighbourhood_group_freq = air_bnb.neighbourhood_group.value_counts().head()
# print('The neighbourhood group that appears the most often is: {}'.format(neighbourhood_group_freq[0]['neighbourhood_group']))

neighbourhood_group_freq = air_bnb['neighbourhood_group'].value_counts()
sorted_neighbourhood_group_freq = neighbourhood_group_freq.sort_values(ascending=False)
most_common_neighbourhood_group = sorted_neighbourhood_group_freq.index[0]
print('The neighbourhood group that appears the most often is: {}'.format(most_common_neighbourhood_group))

There are 5 different neighbourhood groups available in the dataset
The neighbourhood group that appears the most often is: Manhattan


In [14]:
# Are private rooms the most popular in manhattan?

Manhattan_listings = air_bnb[air_bnb['neighbourhood_group'] == 'Manhattan']
private_room_count = len(Manhattan_listings[Manhattan_listings['room_type'] == 'Private room'])
total_Manhattan_listings = len(Manhattan_listings)

percentage_of_private_rooms = (private_room_count / total_Manhattan_listings) * 100

print('In Manhattan, {}% of listings are private rooms.'.format(percentage_of_private_rooms))


In Manhattan, 36.84963759752551% of listings are private rooms.


In [71]:
# Which hosts are the busiest and based on their reviews?

busy_hosts = {}

for _, row in air_bnb.iterrows():
    host_id = row['host_id']
    if host_id not in busy_hosts:
        busy_hosts[host_id] = {'listings': 0, 'reviews': 0}
    busy_hosts[host_id]['listings'] += 1
    busy_hosts[host_id]['reviews'] += row['number_of_reviews']

max_listings_host = max(busy_hosts, key=lambda x: busy_hosts[x]['listings'])
max_reviews_host = max(busy_hosts, key=lambda x: busy_hosts[x]['reviews'] / busy_hosts[x]['listings'])

print(f"The host with the most listings is host ID: {max_listings_host}.")
print(f"The host with the highest average review score is Host ID: {max_reviews_host}.")


The host with the most listings is host ID: 219517861.
The host with the highest average review score is Host ID: 47621202.


In [69]:
add = lambda x, y: x + y
result = add(5, 3)
print(result)

8


In [65]:
#Which neighorhood group has the highest average price?

average_price_per_neighborhood = air_bnb.groupby('neighbourhood_group')['price'].mean()
highest_average_price_neighborhood = average_price_per_neighborhood.idxmax()
print("The neighborhood group with the highest average price is:", highest_average_price_neighborhood)


The neighborhood group with the highest average price is: Manhattan


In [34]:
# Which neighbor hood group has the highest total price?

total_price_per_neighborhood = air_bnb.groupby('neighbourhood_group')['price'].sum()
highest_total_price_neighborhood = total_price_per_neighborhood.idxmax()
print("The neighborhood group with the highest total price is:", highest_total_price_neighborhood)


The neighborhood group with the highest total price is: Manhattan


In [39]:
#Which top 5 hosts have the highest total price?

host_prices = air_bnb.groupby('host_id')['price'].sum()

# Sort the DataFrame by the total price in descending order
host_prices_sorted = host_prices.sort_values(ascending=False)

# Display the top 5 hosts with the highest total price
top_hosts_highest_price = host_prices_sorted.head(5)
print(top_hosts_highest_price)

host_id
219517861    82795
107434423    70331
156158778    37097
205031545    35294
30283594     33581
Name: price, dtype: int64


In [74]:
# Who currently has no (zero) availability with a review count of 100 or more?

host_ids_to_check = [7490, 79402, 129352, 193722, 67778] 
valid_host_ids = []

for host_id in host_ids_to_check:

    host_data = air_bnb[air_bnb['host_id'] == host_id]
    if host_data['availability_365'].sum() == 0 and host_data['number_of_reviews'].sum() >= 100:
        valid_host_ids.append(host_id)

print(valid_host_ids)



[7490, 79402, 129352, 193722]


In [55]:
# What host has the highest total of prices and where are they located?
host_prices = {}
for price in air_bnb['price']:
    host_id = air_bnb['host_id'][air_bnb.index[-1]]
    host_prices[host_id] = host_prices.get(host_id, 0) + price

max_price_host = max(host_prices, key=host_prices.get)
location = air_bnb[air_bnb['host_id'] == max_price_host]['neighbourhood'].values[0]
print(f"The host with the highest total price is Host ID: {max_price_host}. They are located in {location}.")



The host with the highest total price is Host ID: 68119814. They are located in Hell's Kitchen.


In [75]:
# When did Danielle from Queens last receive a review?

danielle_listings = air_bnb[(air_bnb['host_name'] == 'Danielle') & (air_bnb['neighbourhood'] == 'Queens')]

# Check if there are any listings for Danielle from Queens
if not danielle_listings.empty:
    # Sort the DataFrame by the review date in descending order to get the latest review
    latest_review = danielle_listings.sort_values('last_review', ascending=False)
    
    # Print the date of the latest review
    print(latest_review['last_review'].values[0])
else:
    print("No listings found for Danielle from Queens.")

No listings found for Danielle from Queens.


## Further Questions

1. Which host has the most listings?

In [76]:
host_listings_count = air_bnb.groupby('host_id').size()
most_listings_host = host_listings_count.idxmax()

print(f"The host with the most listings is Host ID: {most_listings_host}.")


The host with the most listings is Host ID: 219517861.


2. How many listings have completely open availability?

In [77]:
open_availability_listings = air_bnb[air_bnb['availability_365'] == 365]
num_open_availability_listings = len(open_availability_listings)

print(f"There are {num_open_availability_listings} listings with completely open availability.")


There are 1295 listings with completely open availability.


3. What room_types have the highest review numbers?

In [63]:
room_type_reviews_count = air_bnb.groupby('room_type')['number_of_reviews'].sum()
room_types_sorted = room_type_reviews_count.sort_values(ascending=False)

print(room_types_sorted)


room_type
Entire home/apt    580403
Private room       538346
Shared room         19256
Name: number_of_reviews, dtype: int64


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [96]:
def AirBnB_Case_Study_Analysis():
    print("\nAirBnB Case Study Analysis:\n")
    
    print("1. The busiest host is Host ID: 219517861 concluded.")
    print("2. Even though there are 5 different neighborhood groups to analyze, Manhattan is the most common group that appearsthe most. Finding the frequency helped pull this out.")
    print("3. Private rooms seem to have a third of the listings among rooms. In Manhattan about 37 percent are private rooms.")
    print("4. The host with the most listings is host ID: 219517861. The host with the highest average review score is Host ID: 47621202.")
    print("5. Using the mean function, the neighborhood with the highest total price was Manhattan.")
    print("6. Using the sum function, the neighbourhood with the highest total price was Manhattan too.")
    print("7. These are the top 5 host that have the highest total price:")
    print("   - Host ID: 219517861, Total Price: 82795")
    print("   - Host ID: 107434423, Total Price: 70331")
    print("   - Host ID: 156158778, Total Price: 37097")
    print("   - Host ID: 205031545, Total Price: 35294")
    print("   - Host ID: 30283594, Total Price: 33581")
    print("8. These are whom have 0 availability: 7490, 79402, 129352, 193722")
    print("9. The host with the highest total price is Host ID: 68119814. They are located in Hell's Kitchen.")
    print("10. I found that there are no listings found for Danielle from Queens.")
    print("11. The host with the most listings is Host ID: 219517861.")
    print("12. Concluded that there are 1295 listings with completely open availability.")
    print("13. Here are the room types that have the higest review numbers: room_type:")
    print("   - Entire home/apt: 580403 reviews")
    print("   - Private room: 538346 reviews")
    print("   - Shared room: 19256 reviews\n")

AirBnB_Case_Study_Analysis()



AirBnB Case Study Analysis:

1. The busiest host is Host ID: 219517861 concluded.
2. Even though there are 5 different neighborhood groups to analyze, Manhattan is the most common group that appearsthe most. Finding the frequency helped pull this out.
3. Private rooms seem to have a third of the listings among rooms. In Manhattan about 37 percent are private rooms.
4. The host with the most listings is host ID: 219517861. The host with the highest average review score is Host ID: 47621202.
5. Using the mean function, the neighborhood with the highest total price was Manhattan.
6. Using the sum function, the neighbourhood with the highest total price was Manhattan too.
7. These are the top 5 host that have the highest total price:
   - Host ID: 219517861, Total Price: 82795
   - Host ID: 107434423, Total Price: 70331
   - Host ID: 156158778, Total Price: 37097
   - Host ID: 205031545, Total Price: 35294
   - Host ID: 30283594, Total Price: 33581
8. These are whom have 0 availability: 7