# 🧠 02 - Model Training Notebook

In this notebook, we'll:
- Preprocess the transaction data
- Train a Logistic Regression model
- Evaluate its performance on unseen data


In [None]:
from src.data_preprocessing import load_data, preprocess_data
from src.model_training import train_model


In [None]:
df = load_data('../data/transactions.csv')
df = preprocess_data(df)
df.head()


In [None]:
feature_columns = ['TX_AMOUNT', 'HIGH_AMOUNT', 'HOUR', 'DAY']
X = df[feature_columns]
y = df['TX_FRAUD']


In [None]:
model = train_model(X, y)


## 📊 Model Evaluation Meaning:

After training the model, we check how well it predicts fraud on unseen data (test set).

We look at:
- **Confusion Matrix**: Correct and incorrect predictions
- **Classification Report**: Precision, Recall, F1-Score
- **Accuracy**: Overall correct prediction percentage


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
import seaborn as sns
import matplotlib.pyplot as plt


In [None]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model on train set
model = train_model(X_train, y_train)

# Predict on test set
y_pred = model.predict(X_test)


In [None]:
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False)
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()


In [None]:
print("Classification Report:\n")
print(classification_report(y_test, y_pred))

print("Accuracy Score:", accuracy_score(y_test, y_pred))
