# Airbnb San Francisco EDA

This notebook explores Airbnb's San Francisco cleaned "listings" dataset.
Focus areas:
- Price patterns
- Neighborhood differences
- Room types and capacity
- Host behavior and experience
- Review scores and listing quality
- Amenities and value

Goal: uncover meaningful insights that could guide business or host decisions.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option('display.max_columns', None)

# load cleaned dataset
df = pd.read_csv('/Users/mohammedzareef-mustafa/Downloads/Tech Career/Tech Projects/Projects/airbnb-sf-eda/data/clean/listings_clean.csv')

df.head()

In [None]:
df.info()

In [None]:
df.describe(include='all')

## Key Questions to Explore

1. What does the price distribution look like across San Francisco?
2. How do prices vary by:
   - room type
   - property type
   - neighborhood
3. How do listing qualities (ratings, number of reviews) relate to price?
4. Which neighborhoods have the highest-rated listings?
5. How do host attributes (superhost status, host tenure) influence performance?
6. Do listings with more amenities charge higher prices?

In [None]:
plt.figure(figsize=(10,5))
sns.histplot(df['price'], bins=50, kde=True)
plt.title('Price Distribution')
plt.xlabel('Price ($)')
plt.ylabel('Count')
plt.show()

In [None]:
plt.figure(figsize=(8,5))
sns.boxplot(data=df, x='room_type', y='price')
plt.title('Price by Room Type')
plt.show()

In [None]:
top_neigh = df['neighbourhood_cleansed'].value_counts().head(10).index

plt.figure(figsize=(12,6))
sns.boxplot(data=df[df['neighbourhood_cleansed'].isin(top_neigh)],
            x='neighbourhood_cleansed', y='price')
plt.xticks(rotation=45)
plt.title('Price by Neighborhood (Top 10)')
plt.show()

In [None]:
plt.figure(figsize=(10,5))
sns.histplot(df['review_scores_rating'], bins=30, kde=True)
plt.title('Rating Distribution')
plt.show()

In [None]:
sns.scatterplot(data=df, x='review_scores_rating', y='price', alpha=0.4)
plt.title('Rating vs Price')
plt.show()

In [None]:
# zoomed in version of previous plot, only displaying prices between 0 and 1,000
sns.scatterplot(data=df[df['price'] < 1000], x='review_scores_rating', y='price', alpha=0.4)
plt.title('Rating vs Price')
plt.show()

In [None]:
sns.scatterplot(data=df, x='host_tenure_days', y='price', alpha=0.3)
plt.title('Host Tenure vs Price')
plt.xlabel('Days as Host')
plt.ylabel('Price')
plt.show()

In [None]:
# zoomed in version of previous plot, only displaying prices between 0 and 1,000
sns.scatterplot(data=df[df['price'] < 1000], x='host_tenure_days', y='price', alpha=0.3)
plt.title('Host Tenure vs Price')
plt.xlabel('Days as Host')
plt.ylabel('Price')
plt.show()

In [None]:
sns.scatterplot(data=df, x='amenities_count', y='price', alpha=0.3)
plt.title('Amenities Count vs Price')
plt.show()

In [None]:
# zoomed in version of previous plot, only displaying prices between 0 and 1,000
sns.scatterplot(data=df[df['price'] < 1000], x='amenities_count', y='price', alpha=0.3)
plt.title('Amenities Count vs Price')
plt.show()

In [None]:
df.groupby('amenities_count')['price'].median().head(20)

In [None]:
plt.figure(figsize=(6,4))
sns.boxplot(data=df, x='host_is_superhost', y='price')
plt.title('Price: Superhost vs Non-Superhost')
plt.show()

In [None]:
sns.boxplot(data=df, x='host_is_superhost', y='review_scores_rating')
plt.title('Review Scores: Superhost vs Non-Superhost')
plt.show()

## Insights & Key Findings
1. Pricing Patterns
    - Airbnb prices in San Francisco are highly skewed, with most listings clustered below $300 and a small number of luxury listings driving the long tail.
    - Entire homes/apartments consistently command the highest prices on average, while private rooms are significantly more affordable.
    - When controlling for room type, neighborhoods such as Western Addition, Bernal Heights, Cole Valley, and Alamo Square show noticeably higher median prices.
    - Listings with more capacity tend to cost more, but price per person often decreases as accommodates increases, suggesting economies of scale for larger groups.
2. Neighborhood Differences
    - Neighborhoods with the largest number of listings showed meaningful price variation.
    - Some areas (e.g., Western Addition and Bernal Heights) combine high prices with strong review scores, indicating both demand and customer satisfaction.
    - More central or tourist-friendly neighborhoods exhibit higher prices even when ratings are similar to less central ones.
3. Host Behavior & Experience
    - Superhosts tend to charge slightly higher prices, though the difference narrows when adjusting for room type and location.
    - Hosts with longer tenure (years since joining Airbnb) often have more optimized pricing. However, there is no strong linear relationship between host tenure and higher guest ratings.
    - Response and acceptance rates were generally high across the dataset, suggesting competitive host behavior.
4. Review Scores & Listing Quality
    - Ratings are heavily concentrated between 4.5 and 5.0, which is typical for Airbnb.
    - There is no strong direct correlation between rating and price, but higher-priced listings show slightly higher consistency in rating quality.
    - Listings with more reviews tend to be priced closer to the neighborhood median, suggesting that demand helps stabilize pricing over time.
5. Amenities & Perceived Value
    - The number of amenities varies widely, from very minimal offerings to extensive lists.
    - Listings with more amenities generally show higher prices, though the relationship is not perfectly linear.
    - Amenities appear to contribute more to price differentiation within room types, rather than across them.

## Overall Takeaways
- Location, room type, and accommodation size are the strongest drivers of price in San Francisco’s Airbnb market.
- Host status and experience influence pricing, but not as strongly as structural listing features.
- Reviews and ratings show limited impact on price, reinforcing the idea that guests often prioritize location and property type over marginal differences in rating.
- Amenity count provides additional price lift, especially among mid-range listings competing for visibility.

Overall, the analysis shows that structural listing features (especially room type, neighborhood, and amenities) are the strongest price drivers in San Francisco’s Airbnb market.