Skip to content

harshit1326/AirBNB-Data-Analysis-Using-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Project Title: Airbnb Listings Data Analysis – Insights & Visualization

Project Summary: I performed an in-depth exploratory data analysis (EDA) on an Airbnb listings dataset containing 102,000+ entries. The goal was to extract actionable insights on listings, pricing, host reliability, and potential revenue opportunities.

Key Steps & Techniques Used:

Data Cleaning & Preprocessing:

Identified and handled missing values using df.isnull().sum(), replacing string placeholders like "NA", "null", or empty spaces with np.nan.

Dropped columns with excessive missing values (e.g., house_rules) using df.drop(axis=1, thresh=int(len(df)*0.6)).

Converted columns like price and availability 365 to numeric using pd.to_numeric after cleaning symbols like $ and ,.

Filled missing values in categorical columns with mode (df['neighbourhood group'].fillna(df['neighbourhood group'].mode()[0])) and numeric columns with median (df['price'].fillna(df['price'].median())).

Data Analysis Using Pandas & NumPy:

Aggregated data to find average price by room type (df.groupby('room type')['price'].mean()).

Identified top neighbourhoods and most active hosts using value_counts() and groupby().

Calculated potential revenue per listing: df['potential_revenue'] = df['price'] * df['availability 365'].

Summed potential revenue by host to find top-earning hosts (df.groupby('host name')['potential_revenue'].sum().sort_values(ascending=False).head(10)).

Data Visualization Using Matplotlib & Seaborn:

Bar plots to show listing counts by room type and top neighbourhoods.

Boxplots to visualize price distribution by room type.

Histograms to understand distribution of prices, availability, and review counts.

Scatter plots to explore relationships between price and number of reviews.

Heatmaps to identify missing values and correlations between numeric features.

Horizontal bar charts to highlight top hosts by potential revenue.

Key Insights Derived:

Entire homes tend to have higher average prices than private or shared rooms.

Certain neighbourhoods dominate the listings, indicating supply concentration and potential competition.

Most hosts are verified, which correlates with higher review counts.

The majority of listings have moderate availability, with a few full-time hosts dominating potential revenue.

Potential revenue analysis highlighted top hosts and neighbourhoods that could be targeted for investment or partnership.

Tools & Libraries Used:

Python: Pandas, NumPy

Visualization: Matplotlib, Seaborn

Jupyter Notebook for interactive analysis

Outcome: This project demonstrates practical data cleaning, EDA, aggregation, and visualization skills, providing actionable insights for Airbnb hosts, investors, or analysts to understand pricing trends, host performance, and neighbourhood potential.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published