# Subsetting in Pandas

## Prof. Dnyanesh Khedekar
This notebook helps you master subsetting (selecting and filtering data) using the `customer_data.csv` dataset created earlier.

## Step 1. Load the Dataset
We'll use the CSV created previously to explore subsetting techniques.

In [None]:
import pandas as pd
df = pd.read_csv('customer_data.csv')
df.head()

## 1Ô∏è‚É£ Select specific columns
Show only the `Customer_ID`, `Age`, and `Region` columns.

In [None]:
df[['Customer_ID', 'Age', 'Region']].head()

## 2Ô∏è‚É£ Filter rows based on one condition
Display all customers whose **Age is greater than 40**.

In [None]:
df[df['Age'] > 40].head()

## 3Ô∏è‚É£ Filter rows based on multiple conditions
Find customers from the **South region** whose **Income exceeds ‚Çπ50,000**.

In [None]:
df[(df['Region'] == 'South') & (df['Income'] > 50000)].head()

## 4Ô∏è‚É£ Subset with `isin()`
Retrieve records of customers from either the **North** or **West** regions.

In [None]:
df[df['Region'].isin(['North', 'West'])].head()

## 5Ô∏è‚É£ Subset using string matching
Find all customers whose **Region name starts with ‚ÄúE‚Äù** (like ‚ÄúEast‚Äù).

In [None]:
df[df['Region'].str.startswith('E', na=False)].head()

## 6Ô∏è‚É£ Filter using `query()` method
Get customers where **Gender is 'Female' and Age < 30** using the query interface.

In [None]:
df.query("Gender == 'Female' and Age < 30").head()

## 7Ô∏è‚É£ Subset using index ranges
Display rows from index **50 to 60**.

In [None]:
df.iloc[50:61]

## 8Ô∏è‚É£ Select rows with missing values
Show all/head rows where **Income is missing (NaN)**.

In [None]:
df[df['Income'].isna()].head()

## 9Ô∏è‚É£ Conditional column creation + subsetting
Create a new column `High_Spending` = True if `Purchase_Amount > 3000`, then show only those rows.

In [None]:
df['High_Spending'] = df['Purchase_Amount'] > 3000
df[df['High_Spending']].head()

## üîü Combine filtering and sorting
Get top 5 customers (by **Income**) from the **East region** who are older than **35**.

In [None]:
df[(df['Region'] == 'East') & (df['Age'] > 35)].sort_values('Income', ascending=False).head(5)

Great Job!