# Data Analysis on Airbnb Listings - New York 

## Import libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
%matplotlib inline

## Read in data

In [None]:
airbnb = pd.read_csv('listings.csv', low_memory=False)
pd.set_option('display.max_columns', None)
airbnb.head()

## About the Data

In [None]:
airbnb.shape

In [None]:
airbnb.dtypes

In [None]:
airbnb.isnull().sum(axis=0)

missing_percentage = airbnb.isnull().mean() * 100
print(missing_percentage)

In [None]:
airbnb.columns

In [None]:
sum(airbnb.duplicated())

In [None]:
airbnb = airbnb.drop_duplicates()

In [None]:
airbnb['neighbourhood_borough'].value_counts()

In [None]:
airbnb['neighbourhood'].value_counts()

## Data Cleaning

Note: some basic cleaning was done in MS Excel already such as: dropping unnecessary columns, dropping rows that have mostly NA values, renaming columns for clarity, changing the format of the date columns, and imputing zeros into the 'host_response_rate' and 'host_acceptance_rate columns.'

In [None]:
# Drop any irrelevant columns
airbnb = airbnb.drop('host_response_time', axis=1)

In [None]:
# Drop the rows that have 1-10% of missing data
airbnb = airbnb.dropna(subset=['name', 'host_name','host_since', 'host_listings_count',
                               'host_total_listings_count', 'host_verifications',
                               'host_has_profile_pic', 'host_identity_verified',
                               'host_is_superhost'])

In [None]:
# Fill in missing values for the 'license' column with "No license"
airbnb['license'].fillna('No license', inplace=True)
# Fill in missing values for the 'has_availability' column with "NA"
airbnb['has_availability'].fillna('NA', inplace=True)

In [None]:
# Imputation For Missing Values
# Fill NaN values in 'price' with the median
airbnb['price'] = airbnb['price'].fillna(airbnb['price'].median())

# Fill NaN values in 'bedrooms' and 'beds' and 'bathrooms' with median
airbnb['bedrooms'] = airbnb['bedrooms'].fillna(airbnb['bedrooms'].median())
airbnb['bathrooms'] = airbnb['bathrooms'].fillna(airbnb['bathrooms'].median())
airbnb['beds'] = airbnb['beds'].fillna(airbnb['beds'].median())
print(airbnb[['bedrooms', 'beds', 'price']].describe())

In [None]:
# Imputation For Missing Values continued
# Fill NaN values in columns with the median
airbnb['host_acceptance_rate'] = airbnb['host_acceptance_rate'].fillna(airbnb['host_acceptance_rate'].median()) 
airbnb['host_response_rate'] = airbnb['host_response_rate'].fillna(airbnb['host_response_rate'].median()) 
airbnb['review_scores_rating'] = airbnb['review_scores_rating'].fillna(airbnb['review_scores_rating'].median())                            
airbnb['review_scores_accuracy'] = airbnb['review_scores_accuracy'].fillna(airbnb['review_scores_accuracy'].median())                          
airbnb['review_scores_cleanliness'] = airbnb['review_scores_cleanliness'].fillna(airbnb['review_scores_cleanliness'].median())                       
airbnb['review_scores_checkin'] = airbnb['review_scores_checkin'].fillna(airbnb['review_scores_checkin'].median())                           
airbnb['review_scores_communication'] = airbnb['review_scores_communication'].fillna(airbnb['review_scores_communication'].median())                     
airbnb['review_scores_location'] = airbnb['review_scores_location'].fillna(airbnb['review_scores_location'].median())                         
airbnb['review_scores_value'] = airbnb['review_scores_value'].fillna(airbnb['review_scores_value'].median())
airbnb['reviews_per_month'] = airbnb['reviews_per_month'].fillna(airbnb['reviews_per_month'].median()) 

There are many practices for imputing these columns such as KNN, dropping the columns altogether, imputing zeros, etc. Imputation with the median was the best practice in this case, as it maintained the data accuracy. 

In [None]:
# Convert the data types for specific columns to DateTime format
airbnb['host_since'] = pd.to_datetime(airbnb['host_since'])
airbnb['first_review'] = pd.to_datetime(airbnb['first_review'])
airbnb['last_review'] = pd.to_datetime(airbnb['last_review'])

In [None]:
pd.set_option('display.max_columns', None)
airbnb.head(10)

In [None]:
airbnb.isnull().sum(axis=0)

In [None]:
airbnb.shape

In [None]:
# Save the cleaned DataFrame to a new CSV file
airbnb.to_csv('cleaned_nyclistings.csv', index=True)

## Exploratory Data Analysis

### Host Performance:

#### Q: What are the top 10 host IDs that get the most Airbnb bookings?

In [None]:
top_hosts = airbnb['host_id'].value_counts().head(10)
top_hosts_id = list(map(lambda x : str(x) , top_hosts.keys()))
top_hosts_count = list(top_hosts)

plt.figure(figsize=(10, 6))
plt.bar(top_hosts_id, top_hosts_count, color = 'cadetblue')
plt.title('Top 10 Hosts')
plt.xticks(rotation = 25)
plt.xlabel('Top Hosts ID')
plt.ylabel('Count of Hostings')
plt.show()

In [None]:
# Descriptive statistics for 'host acceptance' and 'response rates' variables
print(airbnb[['host_acceptance_rate', 'host_response_rate']].describe())

#### Q: Does host superhost status affect the review score ratings?

In [None]:
# Convert 't/f' to 'True/False' for easier readablity
airbnb['host_is_superhost'] = airbnb['host_is_superhost'].map({'t': True, 'f': False})

# Fill in missing values for 'review_scores_rating'
airbnb['review_scores_rating'] = airbnb['review_scores_rating'].fillna(0)

plt.figure(figsize=(8,4))
sns.barplot(x='host_is_superhost', y='review_scores_rating', data=airbnb, color = 'royalblue')
plt.xlabel('Host Superhost Status')
plt.ylabel('Review Scores Rating')
plt.title('Impact of Host Superhost Status on Review Scores Ratings')
plt.show()

#### Q: How does host verification status and host profile picture impact bookings?

In [None]:
# Convert 't/f' to 'Traue/False' for easier readablity
airbnb['host_identity_verified'] = airbnb['host_identity_verified'].map({'t': True, 'f': False})
airbnb['host_has_profile_pic'] = airbnb['host_has_profile_pic'].map({'t': True, 'f': False})

# Fill in missing values for 'reviews_per_month' and 'availability_365'
airbnb['reviews_per_month'] = airbnb['reviews_per_month'].fillna(0)
airbnb['availability_365'] = airbnb['availability_365'].fillna(0)

In [None]:
# Barplot for Host Identity Verified vs Reviews Per Month
plt.figure(figsize=(8,4))
sns.barplot(x='host_identity_verified', y='reviews_per_month', data=airbnb, color = 'lightseagreen')
plt.xlabel('Host Identity Verified')
plt.ylabel('Reviews per Month')
plt.title('Impact of Host Identity Verification on Reviews per Month')
plt.show()

# Barplot for Host Profile Picture vs Reviews Per Month
plt.figure(figsize=(8,4))
sns.barplot(x='host_has_profile_pic', y='reviews_per_month', data=airbnb, color = 'lightseagreen')
plt.xlabel('Host Has Profile Picture')
plt.ylabel('Reviews per Month')
plt.title('Impact of Host Profile Picture on Reviews per Month')
plt.show()

### Property Location and Pricing:

#### Q: Which room type is offered the most? What is the distribution of room types in different boroughs?

In [None]:
# Count of room types
room_type_count = airbnb['room_type'].value_counts().reset_index()
room_type_count

In [None]:
# Count of room types in different boroughs
plt.figure(figsize=(8,4))
sns.countplot(x = 'neighbourhood_borough', data = airbnb, hue='room_type', palette = 'Paired')
plt.title('Distribution of Room Types in Boroughs')
plt.xlabel('Borough')
plt.ylabel('Count')
plt.show()

#### Q: What is the distribution of listing prices for each room type and borough?

In [None]:
# Create a figure with subplots
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Price by room type (Bar Plot)
sns.barplot(x='room_type', y='price', data=airbnb, hue='room_type', palette='GnBu', ax=axes[0])
axes[0].set_title('Price Distribution by Room Type')
axes[0].set_xlabel('Room Type')
axes[0].set_ylabel('Price')

# Add value labels on top of the bars
for p in axes[0].patches:
    axes[0].annotate(f'{p.get_height():,.0f}', 
                     (p.get_x() + p.get_width() / 2., p.get_height()), 
                     ha='center', va='bottom', 
                     fontsize=10, color='black', 
                     xytext=(0, 5), 
                     textcoords='offset points')

# Price by borough (Violin Plot)
filtered_price = airbnb[airbnb['price'] < 1000]
# In the data, we have an outlier in the price column which is 100,000. 
# I filtered the price column to exclude 100,000 and any other high values that skews the data.

sns.violinplot(x='neighbourhood_borough', y='price', data=filtered_price, hue='neighbourhood_borough', 
               palette = 'BuGn')
plt.title('Price Distribution by Borough')
plt.xlabel('Borough')
plt.ylabel('Price')
plt.show()

In [None]:
# Table for prices by 'room type'
price_distribution = airbnb.groupby('room_type')['price'].describe()

# Table for prices by borough
price_dist = airbnb.groupby('neighbourhood_borough')['price'].describe()

#### Geospatial Analysis

In [None]:
plt.figure(figsize= (10,6))
sns.scatterplot(x = airbnb.longitude, y = airbnb.latitude, hue = airbnb.neighbourhood_borough, palette = 'viridis')
plt.title('Geographical Distribution of Listing by Borough')
plt.show()

In [None]:
import folium 
from folium.plugins import MarkerCluster

# Initialize the map to the center of NYC
nyc_center = [40.7128, -74.0060]
mymap = folium.Map(location=nyc_center, zoom_start=12)

# Create a marker cluster to handle a large number of points
marker_cluster = MarkerCluster().add_to(mymap)

# Add each listing to the marker cluster with a simplified popup
for index, row in airbnb.iterrows():
# Popup content
    popup_content = f"""
    <strong>Room Type:</strong> {row['room_type']}<br>
    <strong>Price:</strong> ${row['price']}<br>
    <strong>Availability (365 days):</strong> {row['availability_365']} days<br>
    <strong>Borough:</strong> {row['neighbourhood_borough']}
    """
    
# Add a marker with popup and color based on room type
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=popup_content,
        icon=folium.Icon(color='blue', icon='info-sign')
    ).add_to(marker_cluster)

mymap

### Reviews and Ratings:

#### Q: What factors contribute to review scores (e.g., amenities, price, availability, etc.)?

In [None]:
plt.figure(figsize=(8,6))
# Matrix for important numerical columns
correlation_matrix = ['price', 'availability_365', 'accommodates', 'bathrooms', 'bedrooms', 'beds', 
                      'number_of_reviews', 'review_scores_rating','reviews_per_month']
df_corr = airbnb[correlation_matrix]

# Plot the matrix
sns.heatmap(df_corr.corr(), annot=True, cmap = 'coolwarm')
plt.title('Correlation Matrix')
plt.show()

#### Q: What are the top 10 neighborhoods that have the highest-rated properties?

In [None]:
neighbourhood_ratings = airbnb.groupby('neighbourhood')['review_scores_rating'].mean().reset_index()
neighborhood_ratings_sorted = neighbourhood_ratings.sort_values(by = 'review_scores_rating', ascending = False)
neighborhood_ratings_sorted.head(15)

plt.figure(figsize = (10,6))
sns.barplot(x = 'review_scores_rating', y = 'neighbourhood', data = neighborhood_ratings_sorted.head(15), 
            color = 'mediumseagreen')
plt.xlabel('Review Scores Ratings')
plt.ylabel('Neighborhood')
plt.title('Top 10 Neighborhood with Highest Ratings')
plt.show()

#### Q: Reviews and Ratings Per Borough

In [None]:
plt.figure(figsize=(8,6))
result = airbnb.groupby(["neighbourhood_borough"])['reviews_per_month'].aggregate("median").reset_index().sort_values('reviews_per_month')
sns.barplot(x='neighbourhood_borough', y="reviews_per_month", data=airbnb, order=result['neighbourhood_borough'], hue = 'room_type')
plt.title('Reviews Per Month by Borough')
plt.show()

plt.figure(figsize=(8,6))
result2 = airbnb.groupby(["neighbourhood_borough"])['number_of_reviews'].aggregate("median").reset_index().sort_values('number_of_reviews')
sns.barplot(x='neighbourhood_borough', y="number_of_reviews", data=airbnb, order=result2['neighbourhood_borough'], hue = 'room_type')
plt.title('Number of Reviews by Borough')
plt.show()

plt.figure(figsize=(8,6))
result3 = airbnb.groupby(["neighbourhood_borough"])['review_scores_rating'].aggregate("median").reset_index().sort_values('review_scores_rating')
sns.barplot(x='neighbourhood_borough', y="review_scores_rating", data=airbnb, order=result3['neighbourhood_borough'], hue = 'room_type')
plt.title('Review Score Ratings by Borough')
plt.show()

### Amenities and Listings

#### Q: Do listings with more amenities receive higher ratings? 

In [None]:
# Count of how many amenities are in each listing
airbnb['num_amenities'] = airbnb['amenities'].str.len() 

# Drop any NA values 
airbnb_filter = airbnb[['amenities', 'review_scores_rating', 'num_amenities']].dropna()

# Plot the number of amenities by 'review_socres_ratings'
plt.figure(figsize=(10,6))
sns.scatterplot(x = 'num_amenities', y = 'review_scores_rating', data = airbnb_filter, color = 'steelblue')
plt.xlabel('Number of Amenities')
plt.ylabel('Review Score Ratings')
plt.title('Relationship Between Number of Amenities and Ratings')
plt.show()

### Availability and Booking:

#### Q: What is the average minimum number of nights booked for each type of room?

In [None]:
# Minimum nights booked for each room type
min_nights_grouped = airbnb.groupby('room_type')['minimum_nights'].mean().reset_index()

plt.figure(figsize = (10,6))
plt.title('Minimum Nights Booked for each Room Type')
plt.pie(min_nights_grouped['minimum_nights'], labels = min_nights_grouped['room_type'],
       autopct=lambda p: '{:.1f}'.format(p * sum(min_nights_grouped['minimum_nights']) / 100))
plt.show()

#### Q: Which boroughs are more frequently available (high availability_30, availability_60, etc.)?

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Plot each availability on a separate subplot using KDE plots
sns.kdeplot(ax=axes[0, 1], x='availability_30', data=airbnb, hue='neighbourhood_borough', fill=True)
axes[0, 1].set_title('Availability 30')

sns.kdeplot(ax=axes[1, 0], x='availability_60', data=airbnb, hue='neighbourhood_borough', fill=True)
axes[1, 0].set_title('Availability 60')

sns.kdeplot(ax=axes[1, 1], x='availability_90', data=airbnb, hue='neighbourhood_borough', fill=True)
axes[1, 1].set_title('Availability 90')

sns.kdeplot(ax=axes[0, 0], x='availability_365', data=airbnb, hue='neighbourhood_borough', fill=True)
axes[0, 0].set_title('Availability 365')

plt.tight_layout()
plt.show()

In [None]:
# Continuous variables for pair plotting
continuous_vars = ['price', 'number_of_reviews', 'review_scores_rating', 'accommodates']

# Create the pair plot
sns.pairplot(airbnb[continuous_vars], diag_kind='kde')
plt.suptitle('Pair Plot of Continuous Variables', y=1.02)
plt.show()

### Predictive Modeling

- #### Predict the price of a listing based on features like host features, availability, location, room type, and number of amenities.

In [None]:
X = airbnb[['host_is_superhost', 'host_identity_verified', 'host_total_listings_count', 'longitude', 'latitude', 
            'room_type', 'num_amenities', 'availability_30', 'availability_60', 'availability_90', 'availability_365',
            'accommodates', 'bathrooms', 'bedrooms', 'beds']]
y = airbnb['log_price'] = np.log(airbnb['price'] + 1)

# One-hot encoding for categorical variables
X = pd.get_dummies(X, columns=['host_is_superhost', 'host_identity_verified', 'room_type'], drop_first=True)

# Splitting the data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

from sklearn.metrics import mean_absolute_error, r2_score, mean_squared_error, root_mean_squared_error
y_pred = model.predict(X_test)
print('MAE:', mean_absolute_error(y_test, y_pred))
print('RMSE:', root_mean_squared_error(y_test, y_pred))
print('MSE:', mean_squared_error(y_test, y_pred))
print('R-squared:', r2_score(y_test, y_pred))

In [None]:
# Get feature importances from the Random Forest model that predicts the price of a listing based on features like host features, availability, location, room type, and number of amenities. 
# (checks which features influence price predictions the most)
importances = model.feature_importances_
feature_names = X.columns

# Sort feature importances in descending order
indices = np.argsort(importances)[::-1]

# Plot feature importances
plt.figure(figsize=(10, 6))
sns.barplot(x=importances[indices], y=feature_names[indices], hue=feature_names[indices], dodge=False, legend=False)
plt.title('Model 1 (Price of Listing) Feature Importances')
plt.xlabel('Relative Importance')
plt.ylabel('Feature')
plt.show()

Conclusion: The R-squared value shows the model captures a significant portion of the variance. The relatively low MAE shows that the predictions are close to log-transformed prices. The model could still be missing some nuances, as about 30% of the variance is unexplained. Transformations could possibly further boost the accuracy.

- #### Predict the booking success based on host-related features (e.g., response rate, acceptance rate, superhost status) and guest review scores.

In [None]:
X = airbnb[['host_response_rate', 'host_acceptance_rate', 'host_has_profile_pic', 'host_is_superhost', 
            'host_identity_verified', 'host_total_listings_count', 'availability_365', 
            'review_scores_accuracy', 'review_scores_cleanliness', 'review_scores_checkin',
            'review_scores_communication', 'review_scores_location', 'review_scores_value']]
y = airbnb['review_scores_rating']

# Convert categorical variables to numeric using .loc
X.loc[:, 'host_has_profile_pic'] = X['host_has_profile_pic'].map({'t': 1, 'f': 0})
X.loc[:, 'host_is_superhost'] = X['host_is_superhost'].map({'t': 1, 'f': 0})
X.loc[:, 'host_identity_verified'] = X['host_identity_verified'].map({'t': 1, 'f': 0})

# Split the data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

from sklearn.metrics import mean_absolute_error, r2_score, mean_squared_error
y_pred = model.predict(X_test)
print('MAE:', mean_absolute_error(y_test, y_pred))
print('MSE:', mean_squared_error(y_test, y_pred))
print('R-squared:', r2_score(y_test, y_pred))

In [None]:
# Get feature importances from the Random Forest model that predicts the booking success based on host-related features (e.g., response rate, acceptance rate, superhost status) and guest review scores 
# (checks which features influence price predictions the most)
importances = model.feature_importances_
feature_names = X.columns

# Sort feature importances in descending order
indices = np.argsort(importances)[::-1]

# Plot feature importances
plt.figure(figsize=(10, 6))
sns.barplot(x=importances[indices], y=feature_names[indices], hue=feature_names[indices], dodge=False, legend=False)
plt.title('Model 2 (Booking Success) Feature Importances')
plt.xlabel('Relative Importance')
plt.ylabel('Feature')
plt.show()

Conclusion: The MAE and MSE are relatively low, meaning that on average, the predictions are very close to the actual review scores. The R-squared value means that the model is quite effective at predicting the rating based on the given host-related features and guest review scores. While it doesn't capture all the variability (since 100% would be perfect), it performs quite well in explaining most of the variation in the review scores.

### Connect to SQL for Further Analysis

Essentially, the below python script goes through the Airbnb csv and uploads into a Pandas Dataframe and sends that Pandas Dataframe to MySQL using SQL Alchemy.

In [None]:
from sqlalchemy import create_engine
import sqlite3
import sqlalchemy as sql

In [None]:
# Python Script to create a table in MySQL for further analysis 

# MySQL connection string
conn = 'mysql+mysqlconnector://root:akanaz786@localhost:3306/airbnb_nyc'  

# Creating the engine
engine = sql.create_engine(conn)

# Reading CSV file into a pandas DataFrame
df = pd.read_csv('cleaned_nyclistings.csv')
#df = pd.read_csv('reviews.csv')

# Get the DDL statement (for debugging or schema checking)
ddl = pd.io.sql.get_schema(df, 'data1')
print(ddl)

# Writing the DataFrame to the MySQL database 
df.to_sql("data1", con=engine, schema='airbnb_nyc', if_exists='replace', index=False, chunksize=1000)

### Seasonality of Bookings

In [None]:
query = """SELECT * FROM data1;"""

# Use pandas read_sql with the SQLAlchemy engine
new_df = pd.read_sql(query, engine)
print(new_df)

In [None]:
# MySQL query which shows the average price of Airbnbs per month
query = """SELECT EXTRACT(MONTH FROM last_review) AS review_month, 
       AVG(price) AS avg_price
FROM data1
WHERE last_review IS NOT NULL
GROUP BY review_month
ORDER BY review_month;"""

# Use pandas read_sql with the SQLAlchemy engine
new_df = pd.read_sql(query, engine)
#print(new_df)

plt.figure(figsize=(10, 6))
sns.lineplot( data=new_df, x='review_month', y='avg_price', errorbar=None, color = 'seagreen')
plt.title('Average Price by Month')
plt.xlabel('Month')
plt.ylabel('Average Price')
plt.show()

In [None]:
# MySQL query to see how the number of active listings changed over time. How has the Airbnb platform expanded in terms of listings over the years?
query = """SELECT EXTRACT(YEAR FROM host_since) AS year, 
       COUNT(*) AS num_listings
FROM data1
GROUP BY year
ORDER BY year DESC;"""

# Use pandas read_sql with the SQLAlchemy engine
new_df = pd.read_sql(query, engine)
#print(new_df)

plt.figure(figsize=(10, 6))
sns.barplot(data=new_df, x='year', y='num_listings', errorbar=None, color = 'teal')
plt.title('Number of Listings by Year')
plt.xlabel('Year')
plt.ylabel('Number of Listings')
plt.show()

In [None]:
# Which day of the week receives the most bookings?
query = """SELECT DAYNAME(last_review) AS review_day, 
       COUNT(*) AS total_bookings
FROM data1
WHERE last_review IS NOT NULL
GROUP BY review_day
ORDER BY FIELD(review_day, 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday');"""

# Use pandas read_sql with the SQLAlchemy engine
new_df = pd.read_sql(query, engine)
#print(new_df)

plt.figure(figsize=(10, 6))
sns.barplot(x='review_day', y='total_bookings', data=new_df, errorbar=None, color = 'mediumseagreen')
plt.title('Total Bookings by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Total Bookings')
plt.show()

### Conclusion:

**Host Performance:**
The top host that gets the most Airbnb bookings is Host ID 107434423. The average host acceptance rate is 83%, and the average host response rate is 95%. A super host gets slightly higher client review ratings than a non-superhost. When a host is verified on Airbnb, they get more monthly reviews. However, having a profile picture on the Airbnb website does not necessarily mean getting more reviews per month. 
<br> **Property Location and Pricing:**
The room type offered the most for clients is an 'Entire home/apt' in Manhattan. The room type 'Private Rooms' is more provided in Brooklyn. Hotel Rooms are the most expensive room type, averaging around 325 USD. Manhattan is also the most costly and preferred borough for an Airbnb. 
<br> **Reviews and Ratings:**
The factors that contribute to reviews are 'accommodates', 'beds' as well as 'number_of_reviews' and 'reviews_per_month.' Many neighborhoods have high ratings ("5 stars") such as Woodrow, Bay Terrace in Staten Island, Chelsea in Staten Island, Willowbrook, Todt Hill, etc. The Staten Island borough has the highest overall number of reviews, reviews per month, but Brooklyn has the highest review scores rating for the "Hotel Room" room type.
<br> **Amenities and Listings:**
Listings that have more amenities tend to have higher ratings and overall customer satisfaction. 
<br> **Availability and Booking:**
Clients on average book their stay in the Airbnb for a minimum of 29 nights (excluding Hotel Rooms). The most available borough for 30, 60, 90, and 365 days is Manhattan. 
<br> **Predictive Modeling:**
After evaluating the feature importance from the first Random Forest model predicting listing prices, I found that the most influential features are the 'Private Room' type, longitude, host's total listings count, latitude, and availability for 365 days. In contrast, the analysis of the second Random Forest model predicting booking success reveals that the key features are guest review scores, specifically review_scores_value, review_scores_accuracy, and review_scores_cleanliness. This indicates that host-related characteristics have little impact on booking success for the second model. 
<br> **Seasonality of Bookings:**
The seasonality queries show that October has the highest average price for Airbnbs. The years 2014-2016 peaked in terms of how many active listings there were. Finally, the day of the week that receives most bookings is Sunday.  