# Annotated LoRA run (walkthrough)

This short notebook documents a small LoRA smoke run for the project and includes annotated steps, commands, and placeholder locations for screenshots you may capture while running in Colab. It's written so a beginner can follow along. If you open this in Colab, uncomment the install cell and run; otherwise run the shell commands locally as shown.

## 1) Install lightweight deps (Colab)

Uncomment and run the cell below in Colab to install minimal dependencies for a LoRA smoke run. In a local dev environment you can instead run `pip install -r requirements.txt` and `pip install peft` as needed.

In [None]:
# In Colab, uncomment and run this to install peft (LoRA) and any light deps
# %pip install -q peft transformers datasets pandas
print('If running locally, ensure requirements.txt + peft are installed')

## 2) Prepare a tiny dataset (smoke)
We use the repository's smoke generator to create a small CSV you can use to validate the training pipeline without long runs or GPUs. Run this locally or in Colab.

In [None]:
# Run the smoke generator to produce a small CSV
!python scripts/generate_synthetic_smoke.py

## 3) Run a LoRA smoke command (dry run first)
First try `--dry-run` to validate dataset and trainer construction won't fail. Then run with `--mode lora` for a real adapter attach and brief epoch.

Command (dry-run):

In [None]:
# Dry-run: prepares dataset and skips heavy imports/training
!python scripts/finetune_gemma_from_csv.py --csv synthetic_wifi_5ghz_outdoor_smoke.csv --dry-run --max-rows 50

If the dry-run prints a dataset summary and exits, it's safe to proceed.

Run the real LoRA smoke (small epoch) below; this will attach LoRA adapters and run training (requires GPU for speed but can run on CPU for small examples).

In [None]:
# LoRA smoke training (small)
# Increase --num-epochs for real experiments, remove --max-rows then
!python scripts/finetune_gemma_from_csv.py --csv synthetic_wifi_5ghz_outdoor_smoke.csv --mode lora --lora-r 8 --lora-alpha 32 --lora-dropout 0.05 --num-epochs 1 --per-device-batch-size 2 --max-rows 200

## 4) Annotate screenshots (placeholders)
If running in Colab, capture screenshots for: model load memory usage, trainer creation, and a final checkpoint saved. Place them in `site/en/gemma/docs/core/screenshots/` and reference them here: 

![model-load](/site/en/gemma/docs/core/screenshots/model_load.png)
![trainer-created](/site/en/gemma/docs/core/screenshots/trainer_created.png)
![checkpoint-saved](/site/en/gemma/docs/core/screenshots/checkpoint_saved.png)

Replace those files with real screenshots when available.

## 5) Troubleshooting notes
- If you run out of memory, reduce `--per-device-batch-size` and try `--mode lora`.
- If LoRA adapter attach fails, ensure `peft` is installed and compatible with your `transformers` version.
- For QLoRA, ensure `bitsandbytes` and `peft` are installed in a CUDA-enabled runtime.