# 04: Python & Tooling for Machine Learning
In this session, we’ll build fluency with the essential tools you’ll use for every ML project: NumPy, pandas, matplotlib, and scikit-learn.

## 🎯 Objectives
- Manipulate data using pandas and NumPy
- Visualize trends and distributions using matplotlib and seaborn
- Load, explore, and summarize datasets
- Perform simple transformations to prepare data for modeling

## 📊 Working with DataFrames

In [None]:
import pandas as pd

# Load sample data
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv')
df.head()

## 📈 Summary Statistics

In [None]:
df.describe()

## 🧹 Data Cleaning

In [None]:
# Remove missing values (if any)
df_clean = df.dropna()
# Create a new column
df_clean['tip_percent'] = 100 * df_clean['tip'] / df_clean['total_bill']
df_clean.head()

## 🎨 Visualization

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

sns.scatterplot(data=df_clean, x='total_bill', y='tip', hue='sex')
plt.title("Tips vs. Total Bill")
plt.grid(True)
plt.show()

## 🔍 Simple Model with Scikit-learn

In [None]:
from sklearn.linear_model import LinearRegression

X = df_clean[['total_bill']]
y = df_clean['tip']

model = LinearRegression().fit(X, y)
print("Slope:", model.coef_[0])
print("Intercept:", model.intercept_)

## ✅ Summary Quiz
1. What does `df.describe()` show you?
2. Why is feature engineering (like tip %) helpful?
3. What does the `.fit()` method do in sklearn?