# Day 8 - Otto Product Recommendation Challenge

In this notebook, we continue to improve our Otto Product Recommendation system by experimenting with feature scaling and model evaluation strategies.

## Tasks for Day 8
- Task 1: Feature Scaling using `StandardScaler` and `MinMaxScaler`
- Task 2: Train models on scaled data
- Task 3: Compare performance of scaled vs unscaled features



In [None]:
# Load necessary libraries and the dataset
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load the data
data = pd.read_csv("train.csv")
X = data.drop(columns=["target"])
y = data["target"]

# Encode labels
y = y.str.extract("Class_(\d)").astype(int)


# 💡 Code Explanation:
# - This block loads the Otto dataset and drops the target column from features.
# - It also extracts numerical class labels from the 'target' column using regular expressions.

# 📘 Knowledge Points:
# - `pandas.read_csv` for reading CSV files
# - Regex-based label encoding
# - Feature and label separation


In [None]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# 💡 Code Explanation:
# - This step splits the dataset into training and testing sets using an 80-20 ratio.

# 📘 Knowledge Points:
# - `train_test_split` from sklearn


In [None]:
# Apply StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


# 💡 Code Explanation:
# - Standardization: transforms the data to have mean 0 and standard deviation 1.

# 📘 Knowledge Points:
# - `StandardScaler` for feature normalization


In [None]:
# Train and evaluate model on scaled data
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train_scaled, y_train)
y_pred = clf.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy with StandardScaler:", accuracy)


# 💡 Code Explanation:
# - Trains a RandomForest classifier using scaled data.
# - Measures performance using accuracy score.

# 📘 Knowledge Points:
# - `RandomForestClassifier`
# - `accuracy_score` as evaluation metric
