# Parameter-Efficient Techniques (PEFT)

PEFT lets you adapt a large model to a new task by training only a small set of *adapter* parameters instead of the full model. You save GPU memory, disk space, and—most importantly—time, while often matching the performance of full fine‑tuning.

* **Cost‑effective:** only a few million parameters are updated instead of
  billions.
* **Faster iteration:** shorter training cycles enable rapid experimentation.
* **Modularity:** you can keep one base checkpoint and swap in adapters for
  different tasks or domains.
* **Easy deployment:** the frozen base model stays intact; share or combine
  lightweight adapters as needed.

In this notebook you will learn how to:

1. Load a pre‑trained model.
2. Attach a [LoRA](https://arxiv.org/abs/2106.09685) adapter.
3. Train the adapter on your dataset.
4. Save and reuse the adapter without touching the base weights.


In [1]:
from peft import get_peft_model, LoraConfig, TaskType
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("gpt2")
lora_config = LoraConfig(
    r=16,
    task_type=TaskType.CAUSAL_LM,
    lora_alpha=32,
    lora_dropout=0.05
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

trainable params: 589,824 || all params: 125,029,632 || trainable%: 0.4717


If you are using one of PhyAGI's composable models such as `MixFormerSequential`, you need to tell PEFT **which internal modules should receive LoRA layers**.  The snippet below shows a sensible default for decoder‑style transformers; feel free to adjust the `target_modules` list if your architecture uses different block names.

**Tip:** Use `model.named_modules()` and look for large `Linear` or `projection` layers to decide which modules carry most of the trainable parameters.


In [2]:
from peft import get_peft_model, LoraConfig, TaskType
from phyagi.models.mixformer_sequential.modeling_mixformer_sequential import MixFormerSequentialConfig, MixFormerSequentialForCausalLM

config = MixFormerSequentialConfig()
model = MixFormerSequentialForCausalLM(config)

lora_config = LoraConfig(
    r=16,
    task_type=TaskType.CAUSAL_LM,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["Wqkv"]
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

trainable params: 1,310,720 || all params: 356,269,184 || trainable%: 0.3679


Once the adapter is injected, you can fit it with any training loop:

* `HfTrainer`: quick experiments on a single GPU.
* `PlTrainer`: clean multi‑GPU / TPU scaling.
* `DsTrainer`: when you need to squeeze every last bit of memory.

Because only the LoRA weights are trainable, **use a higher learning rate** (e.g. `2 × 10⁻⁴`) and shorten the schedule; adapters converge quickly.

**Note**: The base model remains frozen on disk. After training, only ≈ 5 MB of LoRA weights are saved. Distribute or chain them without redistributing the original checkpoint.

For a deep dive into advanced adapter types (IA‑3, Prefix‑Tuning, Prompt Tuning, etc.) see the official [PEFT documentation](https://huggingface.co/docs/peft).