## Modeling (Task 2)

This notebook trains and evaluates two models for each dataset:

- Baseline: **Logistic Regression**
- Ensemble: **Random Forest**

Metrics for imbalanced classification:

- **AUC-PR** (Average Precision)
- **F1-score**
- **Confusion Matrix**

Cross-validation:

- **Stratified K-Fold (k=5)**



In [None]:
from pathlib import Path
import sys

# Ensure repo root is on PYTHONPATH so `import src...` works in Jupyter
sys.path.insert(0, str(Path("..").resolve()))

import json

from src.modeling.task2_train import Task2Paths, train_and_evaluate_task2

RAW_DIR = Path("../data/raw")
REPORTS_DIR = Path("../reports")
MODELS_DIR = Path("../models")

paths = Task2Paths(raw_dir=RAW_DIR, reports_dir=REPORTS_DIR, models_dir=MODELS_DIR)



In [None]:
# Run Task 2 for Fraud_Data
# Tip: set sample_frac=0.2 for quicker iteration while experimenting

fraud_results = train_and_evaluate_task2(
    "fraud",
    paths,
    cv_splits=5,
    sample_frac=None,
)

print(json.dumps(fraud_results, indent=2)[:2000])



In [None]:
# Run Task 2 for creditcard
# Tip: set sample_frac=0.2 for quicker iteration while experimenting

cc_results = train_and_evaluate_task2(
    "creditcard",
    paths,
    cv_splits=5,
    sample_frac=None,
)

print(json.dumps(cc_results, indent=2)[:2000])

