# SUMMARY OF FIXES:

## - Removed "neighbourhood_group" (all nulls)

## - Converted "last_review" to datetime

## - Filled missing values in "price", "reviews_per_month", and names

## - Cleaned text in "neighbourhood"

## - Filtered unrealistic "price" and "minimum_nights"

## 1. Load the Data

In [1]:
import pandas as pd

df = pd.read_csv('listings.csv')

## 2. Initial Inspection

In [7]:
df.info()
df.describe()
df.head()
df.columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 447 entries, 0 to 446
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              447 non-null    int64  
 1   name                            447 non-null    object 
 2   host_id                         447 non-null    int64  
 3   host_name                       447 non-null    object 
 4   neighbourhood_group             0 non-null      float64
 5   neighbourhood                   447 non-null    object 
 6   latitude                        447 non-null    float64
 7   longitude                       447 non-null    float64
 8   room_type                       447 non-null    object 
 9   price                           415 non-null    float64
 10  minimum_nights                  447 non-null    int64  
 11  number_of_reviews               447 non-null    int64  
 12  last_review                     380 

Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365', 'number_of_reviews_ltm'],
      dtype='object')

## 3. Drop Unnecessary Columns

In [13]:
df.drop(columns=['neighbourhood_group'], inplace=True)

## 4. Handle Missing Values

In [14]:
df['name'].fillna('No Name', inplace=True)
df['host_name'].fillna('Unknown Host', inplace=True)
df['price'].fillna(df['price'].median(), inplace=True)
df['reviews_per_month'].fillna(0, inplace=True)
df['last_review'] = pd.to_datetime(df['last_review'], errors='coerce')

## 5. Fix Data Types

In [16]:
df['room_type'] = df['room_type'].astype('category')
df['neighbourhood'] = df['neighbourhood'].str.strip().str.lower()

## 6. Remove Duplicates

In [18]:
df.drop_duplicates(subset='id', inplace=True)

## 7. Handle Outliers

In [19]:
df = df[(df['price'] > 0) & (df['price'] < 1000)]
df = df[df['minimum_nights'] < 365]

## 8. Feature engineering (example: price per review)

In [20]:
df['price_per_review'] = df['price'] / (df['number_of_reviews'] + 1)

## 9. Save Cleaned Data

In [21]:
df.to_csv('cleaned_listings.csv', index=False)