# Day 3: Pandas for Data Manipulation

Welcome! This notebook will help you master Pandas for data manipulation.

**Topics:**
- DataFrames and Series
- Reading and Writing Data
- Filtering and Grouping
- Data Cleaner Tool (mini project)

## 1. DataFrames and Series

In [1]:
import pandas as pd

# Series
s = pd.Series([1, 2, 3], name='Numbers')
display(s)

# DataFrame
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
display(df)

0    1
1    2
2    3
Name: Numbers, dtype: int64

Unnamed: 0,A,B
0,1,3
1,2,4


## 2. Reading and Writing Data

In [2]:
# Reading CSV
df = pd.read_csv('sample_data.csv')
display(df)

# Writing CSV
df.to_csv('output.csv', index=False)
print('Data written to output.csv')

Unnamed: 0,Name,Age,City,Salary
0,Alice,25,New York,70000
1,Bob,30,Los Angeles,80000
2,Charlie,35,New York,90000
3,David,28,Chicago,75000
4,Eva,22,Los Angeles,68000


Data written to output.csv


## 3. Filtering and Grouping

In [3]:
# Filtering
filtered = df[df['Age'] > 25]
display(filtered)

# Grouping
grouped = df.groupby('City')['Salary'].mean()
display(grouped)

Unnamed: 0,Name,Age,City,Salary
1,Bob,30,Los Angeles,80000
2,Charlie,35,New York,90000
3,David,28,Chicago,75000


City
Chicago        75000.0
Los Angeles    74000.0
New York       80000.0
Name: Salary, dtype: float64

## 4. Data Cleaner Tool Project

Let's build a simple data cleaner!

In [4]:
# Load data
df = pd.read_csv('sample_data.csv')

# Remove duplicates
df_clean = df.drop_duplicates()

# Handle missing values (drop rows with any missing values)
df_clean = df_clean.dropna()

# Save cleaned data
df_clean.to_csv('cleaned_data.csv', index=False)
print('Cleaned data saved to cleaned_data.csv')
display(df_clean)

Cleaned data saved to cleaned_data.csv


Unnamed: 0,Name,Age,City,Salary
0,Alice,25,New York,70000
1,Bob,30,Los Angeles,80000
2,Charlie,35,New York,90000
3,David,28,Chicago,75000
4,Eva,22,Los Angeles,68000
