### Feature Engineering

This step adds or changes data features to find deeper insights during analysis. It's not for modeling but to:

- Find useful patterns  
- Make visuals easier to understand  
- Help interpret the data better  

Examples of new features:

- Review activity flags (has reviews or active months)  
- Price categories  
- Availability indicators (like available more than half a year)  
- Groupings by region or room type  

These help create clearer visuals and better comparisons in the data.  


In [1]:
import pandas as pd

In [2]:
df=pd.read_csv("AB_NYC_Cleaned.csv")

<div style="background-color: #fffbea; border-left: 4px solid #f9a825; padding: 12px; margin: 10px 0; color: #212121;">
  <h4 style="color: #f9a825; margin: 0;">🔹 Feature Name: <code>has_review</code></h4>
  <p style="margin: 8px 0;"><strong>Logic:</strong><br>
    If a listing has more than <code>0</code> reviews, it is marked as <code>1</code>; otherwise, it is marked as <code>0</code>.
  </p>
  <p style="margin: 8px 0;"><strong>Purpose:</strong><br>
    This feature helps distinguish between listings that have received feedback and those that haven’t. It is useful in understanding listing activity, popularity, and reliability.
  </p>
  <p style="margin: 8px 0;"><strong>Why Useful for Visualization:</strong></p>
  <ul style="margin: 0 0 0 20px; padding: 0; list-style-type: disc;">
    <li>Allows analysis of differences in price, room type, or availability between reviewed and non-reviewed listings.</li>
    <li>Helpful for filtering and grouping during exploratory data analysis.</li>
  </ul>
</div>


In [3]:
# Create a new binary feature: has_review
df['has_review'] = df['number_of_reviews'].apply(lambda x: 1 if x > 0 else 0)

### Feature: `price_per_night`

**Logic:**  
Divide total price by minimum nights to get cost per night.

**Purpose:**  
Makes prices comparable across listings with different minimum stays.

**Use in Visualizations:**  
- Compare costs by room type and neighborhood group  
- Identify overpriced or underpriced listings  


In [4]:
import numpy as np

In [6]:
df['price_per_night'] = df.apply(
    lambda row: row['price'] / row['minimum_nights'] if row['minimum_nights'] > 0 else np.nan,
    axis=1
)

# Replace inf with NaN safely
df['price_per_night'] = df['price_per_night'].replace([np.inf, -np.inf], np.nan)


### Feature: `active_listing`

**Logic:**  
If `availability_365 > 0`, then `active_listing = 1`, else `0`.

**Purpose:**  
Marks listings that are currently available for booking.

**Use in Visualizations:**  
- Filter listings that are active  
- Study availability by room type or neighborhood  
- Analyze host market presence  


In [7]:
df['active_listing'] = df['availability_365'].apply(lambda x: 1 if x > 0 else 0)


### Feature: `seasonal_availability`

**Logic:**  
Group `availability_365` into categories like low, medium, and high availability.

**Purpose:**  
Turns numeric availability into simple labels for easier analysis.

**Use in Visualizations:**  
- Show listing counts by availability category  
- Compare patterns by room type and neighborhood  


In [8]:
def categorize_availability(x):
    if x == 0:
        return 'Not Available'
    elif x <= 120:
        return 'Low'
    elif x <= 240:
        return 'Medium'
    else:
        return 'High'

df['seasonal_availability'] = df['availability_365'].apply(categorize_availability)


### Feature: `is_entire_home`

**Logic:**  
A binary flag: 1 means Entire home/apt, 0 means other types.

**Purpose:**  
Shows whether a listing offers full privacy or shared space.

**Use in Visualizations:**  
- Compare price or availability for private vs shared spaces  
- Filter or group listings by privacy type  


In [9]:
df['is_entire_home'] = (df['room_type'] == 'Entire home/apt').astype(int)


In [10]:
df.to_csv("AB_NYC_Featured.csv",index=False)

### Conclusion – Feature Engineering

We created new features to better understand the Airbnb data and improve visualizations:

- `has_review` and `active_listing` show if listings are active or inactive.  
- `price_per_night` adjusts price based on minimum stay for fair comparison.  
- `seasonal_availability` groups listings by how often they are available during the year.  
- Binary features like `is_entire_home` help separate listings by type.

These features deepen our analysis and support clearer insights in later steps.
