# 📝 Baseline Evaluation Notebook

This notebook demonstrates how to run the **baseline evaluation pipeline**.
By default, it uses **DeepSeek-R1-Distill-Qwen-7B**, the baseline model from my MSc project.

👉 You can replace it with any Hugging Face model by editing the `MODEL_NAME` variable.

In [None]:
# --- Parameters ---
MODEL_NAME = "deepseek-ai/deepseek-r1-distill-qwen-7b"  # baseline
MAX_INPUT_TOKENS = 4096
NUM_TASKS = 10  # small for demo; increase for full runs

In [None]:
# Install dependencies (skip if already installed in your environment)
!pip install -r ../requirements.txt

In [None]:
# Step 1: Build retrieval dataset
!python ../scripts/build_retrieval_dataset.py \
  --tokenizer_name $MODEL_NAME \
  --class_type B --num_tasks $NUM_TASKS --num_distractors 20 \
  --out_csv ../data/processed/demo_baseline.csv

In [None]:
# Step 2: Truncate prompts
!python ../scripts/truncate_prompts.py \
  --in_csv ../data/processed/demo_baseline.csv \
  --out_csv ../data/processed/demo_baseline_truncated.csv \
  --tokenizer_name $MODEL_NAME \
  --max_input_tokens $MAX_INPUT_TOKENS

In [None]:
# Step 3: Run evaluation
!python -m src.run_retrieval_eval \
  --model_name $MODEL_NAME \
  --data_path ../data/processed/demo_baseline_truncated.csv \
  --output_path ../results/demo_baseline_results.csv \
  --max_tokens 1024 \
  --max_input_tokens $MAX_INPUT_TOKENS

## ✅ Results
Your evaluation results will be saved in:
```
../results/demo_baseline_results.csv
```