This notebook shows a minimal, safe, and runnable QLoRA finetuning workflow for the BD-NSCA dataset (gatekeeper CSV). It is intentionally conservative for smoke tests (short steps, low steps for training). Citations: Dettmers et al., 2023 (QLoRA); Zhang et al., 2020 (BD-NSCA dataset description).

Commented installs for Colab or local: pip install transformers accelerate bitsandbytes peft datasets
(These are commented to avoid running heavy installs during tests.)

In [None]:
# Reads a local CSV and writes train.jsonl using scripts/colab_helpers.convert_csv_to_jsonl
from pathlib import Path
import json
from scripts import colab_helpers
root = Path("../")  # notebook parent
csv_path = root / "annotation_pipeline" / "data" / "gatekeeper_dataset.csv"
out_dir = Path("../data")
out_dir.mkdir(parents=True, exist_ok=True)
if csv_path.exists():
    out_path = out_dir / "train.jsonl"
    colab_helpers.convert_csv_to_jsonl(str(csv_path), str(out_path))
    print("Wrote", out_path)
else:
    print("CSV not found; place gatekeeper_dataset.csv at", csv_path)

This cell shows a minimal training loop using PEFT/QLoRA. For smoke tests, use max_steps=60 or 1. Replace with your own compute settings for full runs.

In [None]:
def mock_train(max_steps=1):
    print(f"Mock QLoRA training for {max_steps} steps (no-op)")

mock_train(max_steps=1)

After training save adapter and optionally export to gguf/ggml for Ollama or local inference. See IMPLEMENTATION_CHECKLIST.md for commands.

Two Vietnamese prompts from BD-NSCA report:
- "Hãy đóng vai một NPC..."
- "Sau khi người chơi nói, hãy trả lời..."
Include examples and expected behavior; keep prompts short for tests.

Groq can be used as an alternative "teacher" to generate prompts/responses. In Colab set your API key via an environment variable (do not hard-code secrets):

```
import os
os.environ['GROQ_API_KEY'] = 'YOUR_GROQ_API_KEY'  # set this in Colab securely
```

Example usage (do not execute network calls in tests):

```
from scripts.colab_helpers import call_groq_api
resp = call_groq_api("A short NPC prompt", model_id='llama-3.1-8b', max_tokens=64)
print(resp)
```

Be mindful of rate limits and cost; use `batch_generate_with_groq()` with caching for larger generation runs.

In [None]:
# import requests
# resp = requests.post("http://localhost:11434/api/generate", json={"model": "your-adapter-model", "prompt": "Xin chào"})
# print(resp.json())
print("Inference demo placeholder")