# **End‑to‑End ML Pipeline: ArXiv → Q&A → Fine‑Tune → Evaluate**

This notebook walks through the **full lifecycle** of your ML workflow:

1. **Logging Configuration** – Ensures consistent, readable logs across all stages.
2. **Fetch Data from ArXiv** – Uses `ArxivFetcher` to pull recent papers matching a search query, then guides you on how to manually create Q&A pairs.
3. **Generate Synthetic Q&A Dataset** – Converts paper summaries into prompt‑completion JSONL format for fine‑tuning.
4. **Fine‑Tune with QLoRA** – Trains a base model on the synthetic Q&A dataset using LoRA adapters.
5. **Evaluate Base vs Fine‑Tuned Models** – Compares outputs using string similarity against reference answers, with full results printed to logs.

Each stage is modular — you can run them independently or in sequence.

---
## **1. Logging Configuration**
We use a centralized `configure_logging()` function from `logging_utils.py` so all scripts share the same format and log level.
This ensures:
- Timestamps for every log entry
- Module name in logs for easy tracing
- Consistent formatting across scripts

In [2]:
from logging_utils import configure_logging

log = configure_logging()

---
## **2. Fetch Data from ArXiv**
We use `ArxivFetcher` to:
- Query the ArXiv API for recent papers matching `Config.SEARCH_QUERY`
- Limit results to `Config.MAX_RESULTS`
- Group papers into batches of `Config.GROUP_SIZE`
- Save them to `Config.OUTPUT_FILE` in JSON format (e.g., `arxiv_summaries.json`)

**Manual Q&A Generation Step:**
After running this cell, you will have a grouped summaries file (e.g., `arxiv_summaries.json`). To create question‑answer pairs for each summary:
1. Open the generated JSON file.
2. Copy the content of each group **one by one**.
3. Paste it into Chat‑GPT (or another LLM) with a prompt asking it to generate **question and answer pairs** in JSON format.
4. Save the generated Q&A pairs into a new JSON file (e.g., `summaries_qa.json`).

**Example Chat‑GPT Prompt:**
```
You are given a JSON object containing a list of papers, each with a title and summary.
For each paper, generate 3–5 question and answer pairs that test understanding of the paper's content.
Return the output in JSON format with the following structure:
[
  {
    "title": "<paper title>",
    "qa_pairs": [
      {"question": "...", "answer": "..."},
      {"question": "...", "answer": "..."}
    ]
  },
  ...
]
Do not include any extra commentary — only return valid JSON.
```

This manual step ensures that each paper summary is transformed into high‑quality, context‑specific Q&A data for fine‑tuning.

In [3]:
from arxiv_fetcher import ArxivFetcher

try:
    ArxivFetcher().run()
except Exception as e:
    log.error(f"ArXiv fetch failed: {e}", exc_info=True)

2025-09-14 23:18:25 - INFO     - arxiv_fetcher - Fetching papers from ArXiv...
2025-09-14 23:18:25 - INFO     - arxiv - Requesting page (first: True, try: 0): https://export.arxiv.org/api/query?search_query=machine+learning&id_list=&sortBy=submittedDate&sortOrder=descending&start=0&max_results=100
2025-09-14 23:18:25 - INFO     - arxiv - Got first page: 100 of 432377 total results
2025-09-14 23:18:25 - INFO     - arxiv_fetcher - Saving grouped summaries...
2025-09-14 23:18:25 - INFO     - arxiv_fetcher - Saved 20 groups (100 papers).
2025-09-14 23:18:25 - INFO     - arxiv_fetcher - Output file saved at: C:\Projects\MLE Courses\CarlosLao-homework\class7\homework\data\arxiv_summaries.json
2025-09-14 23:18:25 - INFO     - arxiv_fetcher - ArXiv fetch complete.


---
## **3. Generate Synthetic Q&A Dataset**
We use `QAGenerator` to:
- Read grouped Q&A data from `Config.QA_INPUT_FILE` (e.g., `summaries_qa.json`)
- Convert them into **prompt‑completion** format:
  ```
  <|system|>{system_prompt}<|user|>{question}<|assistant|>{answer}
  ```
- Save as JSONL (`Config.SYNTH_QA_FILE`) for fine‑tuning

This step transforms raw Q&A pairs into a **training‑ready dataset**.

In [4]:
from qa_generator import QAGenerator

try:
    QAGenerator().run()
except Exception as e:
    log.error(f"Q&A generation failed: {e}", exc_info=True)

2025-09-14 23:19:02 - INFO     - qa_generator - Starting Q&A generation...
2025-09-14 23:19:02 - INFO     - qa_generator - Generated 500 Q&A entries.
2025-09-14 23:19:02 - INFO     - qa_generator - Output file saved at: C:\Projects\MLE Courses\CarlosLao-homework\class7\homework\data\synthetic_qa.jsonl
2025-09-14 23:19:02 - INFO     - qa_generator - Q&A generation complete.


---
## **4. Fine‑Tune with QLoRA**
We use `QLoRAFineTuner` to:
- Load the base model (`Config.BASE_MODEL`)
- Apply LoRA adapters with parameters from `Config`
- Tokenize the synthetic Q&A dataset
- Train using Hugging Face `SFTTrainer`
- Save the fine‑tuned model to `Config.MODEL_DIR`

This step adapts the base model to our **domain‑specific Q&A style**.

In [6]:
from qlora_fine_tuner import QLoRAFineTuner

try:
    QLoRAFineTuner().run()
except Exception as e:
    log.error(f"Fine-tuning failed: {e}", exc_info=True)

2025-09-14 23:20:26 - INFO     - qlora_fine_tuner - Starting fine-tuning...


  GPU_BUFFERS = tuple([torch.empty(2*256*2048, dtype = dtype, device = f"{DEVICE_TYPE}:{i}") for i in range(n_gpus)])


==((====))==  Unsloth 2025.9.4: Fast Llama patching. Transformers: 4.56.1.
   \\   /|    NVIDIA GeForce RTX 4060 Laptop GPU. Num GPUs = 1. Max memory: 7.996 GB. Platform: Windows.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
2025-09-14 23:20:28 - INFO     - accelerate.utils.modeling - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).


Unsloth 2025.9.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
Generating train split: 500 examples [00:00, 18781.92 examples/s]
Map: 100%|██████████| 500/500 [00:00<00:00, 1231.22 examples/s]
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 500 | Num Epochs = 5 | Total steps = 80
O^O/ \_/ \    Batch size per device = 8 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (8 x 4 x 1) = 32
 "-____-"     Trainable parameters = 41,943,040 of 8,072,204,288 (0.52% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
10,3.6862
20,2.4088
30,1.8357
40,1.6658
50,1.64
60,1.562
70,1.5564
80,1.5067


2025-09-14 23:29:35 - INFO     - qlora_fine_tuner - Model directory saved at: C:\Projects\MLE Courses\CarlosLao-homework\class7\homework\models\llama-3.1-8b-qlora-finetuned
2025-09-14 23:29:35 - INFO     - qlora_fine_tuner - Fine-tuning complete.


---
## **5. Evaluate Base vs Fine‑Tuned Models**
We use `ModelEvaluator` to:
- Load evaluation Q&A pairs from `Config.EVAL_QA_FILE`
- Generate answers from both the base and fine‑tuned models
- Compare answers to reference answers using `SequenceMatcher`
- Save results to `Config.EVAL_OUTPUT_FILE`
- **With `print_results=True`, also log the full evaluation results**

This step quantifies **performance gains** from fine‑tuning and shows them directly in the logs.

In [8]:
from model_evaluator import ModelEvaluator

try:
    ModelEvaluator().run(print_results=True)
except Exception as e:
    log.error(f"Evaluation failed: {e}", exc_info=True)

2025-09-14 23:30:33 - INFO     - model_evaluator - Starting model evaluation...
2025-09-14 23:30:33 - INFO     - model_evaluator - Loaded 10 Q&A pairs.
2025-09-14 23:30:33 - INFO     - model_evaluator - Generating answers with model: unsloth/llama-3.1-8b-unsloth-bnb-4bit
==((====))==  Unsloth 2025.9.4: Fast Llama patching. Transformers: 4.56.1.
   \\   /|    NVIDIA GeForce RTX 4060 Laptop GPU. Num GPUs = 1. Max memory: 7.996 GB. Platform: Windows.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
2025-09-14 23:30:34 - INFO     - accelerate.utils.modeling - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk)