<a href="https://colab.research.google.com/github/mishra-amit-300266/ML-Engineer-Roadmap/blob/main/Week1_Python_Math_Pandas/Day3_Pandas_Revision.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Day 3 – Pandas Revision: Practice Tasks and Feedback

This notebook summarizes all the tasks performed with the Titanic dataset using Pandas, along with improvements and additional notes.

---

## ✅ Task 1: Load & Explore Data

```python
import pandas as pd

url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(url)
df.head()
df.tail()
df.shape
df.columns
df.dtypes
df.info()
df.describe()
```
> ✔️ Tip: Always check for nulls using `df.isnull().sum()` before cleaning.

---

## ✅ Task 2: Column Selection & Filtering

```python
# Select specific columns
df[['Name', 'Age', 'Sex']]

# Filter passengers older than 50
df[df['Age'] > 50]

# Filter first-class survivors
df[(df['Survived'] == 1) & (df['Pclass'] == 1)]
```

---

## ✅ Task 3: Data Cleaning

```python
# Check for nulls
df.isnull().sum()

# Fill missing Age with median (corrected version)
df['Age'].fillna(df['Age'].median(), inplace=True)

# Drop rows with missing Embarked values
df.dropna(subset=['Embarked'], inplace=True)

# Convert Sex to categorical
df['Sex'] = df['Sex'].astype('category')

# Rename Pclass column
df.rename(columns={'Pclass': 'PassengerClass'}, inplace=True)
```

> ⚠️ **Correction**: Original line used `df.fillna(df['Age'].median)` without `()`. The correct version is `df['Age'].fillna(df['Age'].median(), inplace=True)`

---

## ✅ Task 4: Sorting and Grouping

```python
# Sort by Age descending
df.sort_values(by='Age', ascending=False).head()

# Group by Sex and average fare
df.groupby('Sex')['Fare'].mean()

# Group by PassengerClass and Survived, count passengers
df.groupby(['PassengerClass', 'Survived'])['Name'].count()
```

> 💡 Try using `.agg()` for multiple aggregations.

---

## ✅ Task 5: Combine and Save

```python
# Create new DataFrame
df_subset = df[['Name', 'Age', 'Fare']]

# Merge on index (optional)
merged_df = df_subset.merge(df, left_index=True, right_index=True)

# Save cleaned DataFrame
df.to_csv("cleaned_titanic.csv", index=False)
```

---

## 🏁 Summary

You've completed a solid round of real-world Pandas tasks:
- ✅ Data loading, exploring
- ✅ Filtering and selecting
- ✅ Handling missing values correctly
- ✅ Grouping, sorting, renaming, and saving

> ⭐ Great job! You're ready to start EDA with Matplotlib & Seaborn.
