# AskQE Pipeline - Qwen2.5-3B-Instruct Baseline

This notebook runs the complete AskQE pipeline using the **Qwen/Qwen2.5-3B-Instruct** model.
All results are saved in `results Qwen3B baseline/` folder.

**Note:** Models are cached on Google Drive for faster subsequent runs.

## 0. Mount Google Drive & Configure Model Cache

In [None]:
import os
import sys

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    from google.colab import drive
    drive.mount('/content/drive')
    
    DRIVE_CACHE_DIR = '/content/drive/MyDrive/AskQE_Models_Cache'
    os.makedirs(DRIVE_CACHE_DIR, exist_ok=True)
    
    os.environ['HF_HOME'] = DRIVE_CACHE_DIR
    os.environ['TRANSFORMERS_CACHE'] = os.path.join(DRIVE_CACHE_DIR, 'transformers')
    os.environ['SENTENCE_TRANSFORMERS_HOME'] = os.path.join(DRIVE_CACHE_DIR, 'sentence_transformers')
    
    print(f'Model cache directory: {DRIVE_CACHE_DIR}')
else:
    print('Not running in Colab - using default cache directories')

## Setup - Install Dependencies

In [None]:
import subprocess

subprocess.run([sys.executable, '-m', 'pip', 'install', '-q', 'transformers', 'torch', 'accelerate', 'nltk', 'sentence-transformers', 'sacrebleu', 'textstat'], check=True)

if IN_COLAB:
    if not os.path.exists('/content/askqe'):
        subprocess.run(['git', 'clone', 'https://github.com/Simone280802/AskQE_DNLP_2025-2026.git', '/content/askqe'], check=True)
    PROJECT_ROOT = '/content/askqe'
else:
    PROJECT_ROOT = os.getcwd()

RESULTS_DIR = os.path.join(PROJECT_ROOT, 'results Qwen3B baseline')
os.makedirs(RESULTS_DIR, exist_ok=True)

print(f'Project root: {PROJECT_ROOT}')
print(f'Results directory: {RESULTS_DIR}')

## Pre-download Models

Download all models needed for the pipeline. These will be cached on Drive for reuse.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, LongformerTokenizer, LongformerForSequenceClassification
from sentence_transformers import SentenceTransformer
import torch

print('=== Downloading/Loading Models ===')
print('This may take a while on first run, but will be cached on Drive for future use.\n')

MODELS = {
    'qwen': 'Qwen/Qwen2.5-3B-Instruct',
    'sbert': 'sentence-transformers/all-MiniLM-L6-v2',
    'answerability': 'potsawee/longformer-large-4096-answerable-squad2'
}

# Download Qwen model
print(f"[1/3] Loading {MODELS['qwen']}...")
tokenizer = AutoTokenizer.from_pretrained(MODELS['qwen'])
model = AutoModelForCausalLM.from_pretrained(MODELS['qwen'], torch_dtype=torch.bfloat16, device_map='auto')
del model, tokenizer
torch.cuda.empty_cache() if torch.cuda.is_available() else None
print('      ✓ Qwen cached')

# Download SBERT model
print(f"[2/3] Loading {MODELS['sbert']}...")
sbert_model = SentenceTransformer(MODELS['sbert'])
del sbert_model
print('      ✓ SBERT cached')

# Download Answerability model (Longformer)
print(f"[3/3] Loading {MODELS['answerability']}...")
try:
    ans_tokenizer = LongformerTokenizer.from_pretrained(MODELS['answerability'])
    ans_model = LongformerForSequenceClassification.from_pretrained(MODELS['answerability'])
    del ans_tokenizer, ans_model
    print('      ✓ Answerability model cached')
except Exception as e:
    print(f'      ⚠ Could not load: {e}')

print('\n=== All models cached! ===')

---

## 1. Question Generation (QG)

Generate questions for vanilla, atomic, and semantic pipelines.

In [None]:
os.chdir(os.path.join(PROJECT_ROOT, 'QG', 'code'))
output_path = os.path.join(RESULTS_DIR, 'QG', 'vanilla_qwen-3b.jsonl')
os.makedirs(os.path.dirname(output_path), exist_ok=True)
subprocess.run([sys.executable, '-u', 'qwen-3b.py', '--output_path', output_path, '--prompt', 'vanilla'], check=True)

In [None]:
output_path = os.path.join(RESULTS_DIR, 'QG', 'atomic_qwen-3b.jsonl')
subprocess.run([sys.executable, '-u', 'qwen-3b.py', '--output_path', output_path, '--prompt', 'atomic'], check=True)

In [None]:
output_path = os.path.join(RESULTS_DIR, 'QG', 'semantic_qwen-3b.jsonl')
subprocess.run([sys.executable, '-u', 'qwen-3b.py', '--output_path', output_path, '--prompt', 'semantic'], check=True)

## 2. Question Answering (QA)

Run QA for ALL configurations automatically:
- 3 pipelines (vanilla, atomic, semantic)
- 5 languages (es, fr, hi, tl, zh)
- 8 perturbations
- Source and BT based

In [None]:
os.chdir(os.path.join(PROJECT_ROOT, 'QA', 'code'))
subprocess.run([sys.executable, '-u', 'qwen-3b.py', '--run_all'], check=True)

## 3. BioMQM Pipeline

In [None]:
os.chdir(os.path.join(PROJECT_ROOT, 'biomqm', 'askqe'))
output_path = os.path.join(RESULTS_DIR, 'biomqm', 'askqe_qg_qwen3b.jsonl')
os.makedirs(os.path.dirname(output_path), exist_ok=True)
subprocess.run([sys.executable, '-u', 'qwen-3b.py', '--output_path', output_path, '--prompt', 'atomic'], check=True)

---

## 4. Evaluation Metrics

### 4.1 SBERT

In [None]:
os.chdir(os.path.join(PROJECT_ROOT, 'evaluation', 'sbert'))
output_file = os.path.join(RESULTS_DIR, 'evaluation', 'sbert', 'qwen-3b.csv')
os.makedirs(os.path.dirname(output_file), exist_ok=True)
subprocess.run([sys.executable, 'sbert.py', '--model', 'qwen-3b', '--output_file', output_file], check=True)

### 4.2 String Comparison

In [None]:
os.chdir(os.path.join(PROJECT_ROOT, 'evaluation', 'string-comparison'))
subprocess.run([sys.executable, 'string_comparison.py'], check=True)

### 4.3 BT-Score

In [None]:
os.chdir(os.path.join(PROJECT_ROOT, 'evaluation', 'bt-score'))
subprocess.run([sys.executable, 'run_bt.py'], check=True)

---

## 5. Desiderata Evaluation

In [None]:
os.chdir(os.path.join(PROJECT_ROOT, 'evaluation', 'desiderata'))
subprocess.run([sys.executable, 'i_avg_questions.py'], check=True)
subprocess.run([sys.executable, 'i_duplicate.py'], check=True)
subprocess.run([sys.executable, 'i_diversity.py'], check=True)
subprocess.run([sys.executable, 'q_answerability.py'], check=True)
subprocess.run([sys.executable, 'q_readability.py'], check=True)

---

## Pipeline Complete!

All results saved in `results Qwen3B baseline/`.