# Airbnb Listings Data Analysis

## Overview
This notebook explores and analyzes Airbnb listing data to uncover insights about pricing, availability, location patterns, and host behavior.

## Dataset Description
The dataset contains information about Airbnb listings including:

| Column | Description |
|--------|-------------|
| id | Unique identifier for the listing |
| name | Title/name of the listing |
| host_id | Unique identifier for the host |
| neighbourhood_group | Broader area/borough |
| neighbourhood | Specific neighbourhood |
| latitude | Latitude coordinate |
| longitude | Longitude coordinate |
| room_type | Type of listing (Entire home, Private room, Shared room) |
| price | Nightly price in local currency |
| minimum_nights | Minimum nights required to book |
| number_of_reviews | Total number of reviews |
| last_review | Date of last review |
| reviews_per_month | Average reviews per month |
| calculated_host_listings_count | Number of listings the host has |
| availability_365 | Days available in next 365 days |

## Objectives
- Understand the distribution of listings across neighbourhoods
- Analyze pricing trends by location and room type
- Identify patterns in host behavior and availability
- Explore relationships between reviews, price, and availability

## Tools Used
- Python 3.x
- Pandas, NumPy
- Matplotlib, Seaborn

## AI Usage Disclosure
-AI tools were used for documentation and code assistance. Only column names and dataset structure were shared (as a best-practice approach, no data    values were shared with AI).



In [2]:
pwd

'C:\\Users\\mohit'

In [9]:
import pandas as pd

In [10]:
df = pd.read_csv(r"C:\Users\mohit\Downloads\airbnb-nyc-pandas\data\AB_NYC_2019.csv") 

In [12]:
df.columns.tolist()

['id',
 'name',
 'host_id',
 'host_name',
 'neighbourhood_group',
 'neighbourhood',
 'latitude',
 'longitude',
 'room_type',
 'price',
 'minimum_nights',
 'number_of_reviews',
 'last_review',
 'reviews_per_month',
 'calculated_host_listings_count',
 'availability_365']

![image.png](attachment:9e1da60b-09e1-49ab-b842-9f365c411081.png)

![image.png](attachment:98444a73-36f6-4ab6-8e85-c1d52a8f5ea9.png)

In [13]:
# so each listing is identified by an id,name
# host is the person offering the living space and his id also attached and name as well
# neighbourhood district and immediate neighbourhood are also given
# latitude,longitude are given for identifying on map the location
# whether entire home, private room or shared room i.e room type is provided
# per night price
# also a minimum no of nights might be required to make a booking, like some might have a minimum 2 nights
# total reviews received i.e number of reviews
# last review (i.e most recent review)
# avg reviews per month      (so 3 columns are for review related things)
# and info about no of listings of the host (one host might be provide many houses, flats etc)
# availibility_365 i.e like a listing might be available after some date due to some reason like someone already booked it for a period

In [14]:
# Quick overview of everything
df.info()

<class 'pandas.DataFrame'>
RangeIndex: 48895 entries, 0 to 48894
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              48895 non-null  int64  
 1   name                            48879 non-null  str    
 2   host_id                         48895 non-null  int64  
 3   host_name                       48874 non-null  str    
 4   neighbourhood_group             48895 non-null  str    
 5   neighbourhood                   48895 non-null  str    
 6   latitude                        48895 non-null  float64
 7   longitude                       48895 non-null  float64
 8   room_type                       48895 non-null  str    
 9   price                           48895 non-null  int64  
 10  minimum_nights                  48895 non-null  int64  
 11  number_of_reviews               48895 non-null  int64  
 12  last_review                     38843 non-n

In [15]:
# int,str and float data types are involved

In [16]:
# First few rows to see actual values
df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [17]:
# we can create 3 groups like: listing identifaction, host info and location info and keeping rest info as it is as they are important

![image.png](attachment:082eceba-68c2-489a-9c12-5c7ed1c62249.png)

In [18]:
# last few columns are the main thing

![image.png](attachment:e5ce670f-52f2-41bb-a040-3cc8ccd3d025.png)

In [19]:
# Stats for numeric columns (price, latitude, longitude, etc.)
df.describe()

Unnamed: 0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
count,48895.0,48895.0,48895.0,48895.0,48895.0,48895.0,48895.0,38843.0,48895.0,48895.0
mean,19017140.0,67620010.0,40.728949,-73.95217,152.720687,7.029962,23.274466,1.373221,7.143982,112.781327
std,10983110.0,78610970.0,0.05453,0.046157,240.15417,20.51055,44.550582,1.680442,32.952519,131.622289
min,2539.0,2438.0,40.49979,-74.24442,0.0,1.0,0.0,0.01,1.0,0.0
25%,9471945.0,7822033.0,40.6901,-73.98307,69.0,1.0,1.0,0.19,1.0,0.0
50%,19677280.0,30793820.0,40.72307,-73.95568,106.0,3.0,5.0,0.72,1.0,45.0
75%,29152180.0,107434400.0,40.763115,-73.936275,175.0,5.0,24.0,2.02,2.0,227.0
max,36487240.0,274321300.0,40.91306,-73.71299,10000.0,1250.0,629.0,58.5,327.0,365.0


In [20]:
pwd

'C:\\Users\\mohit'