### let's understand lora (low-rank adaptation)

> **why do we need lora?**
- when we fine-tune large language models, it's incredibly expensive to update all parameters.
- lora provides a memory-efficient alternative that achieves similar results while only training a small fraction of parameters.
- traditional fine-tuning requires storing and updating the entire model, which is impractical for most users without expensive hardware.
- lora introduces a clever "bypass" solution that keeps the original pre-trained weights frozen and only trains small adapter modules.

> **what is lora?**
- lora stands for low-rank adaptation, a technique that makes fine-tuning large models more accessible.
- instead of modifying all weights directly, lora decomposes weight updates into smaller matrices through low-rank decomposition.
- this dramatically reduces the number of trainable parameters (often by 10,000x or more) while maintaining performance.
- example: instead of training billions of parameters in a large model, lora might only train a few million parameters.

> **how lora works?**
- lora freezes the pre-trained model weights completely.
- for each weight matrix we want to adapt, lora adds a parallel "bypass" connection.
- this bypass consists of two smaller matrices: a down-projection and an up-projection.
- the original path: input → original frozen weight → output
- the lora path: input → down-projection → up-projection → output
- the final output combines both paths.

> **three key steps**
- 1. decompose each weight matrix update into two smaller matrices (down-projection and up-projection)
- 2. initialize these matrices so their product is zero (ensuring no change to behavior initially)
- 3. train only these small matrices while keeping the original weights frozen

> **why is this efficient?**
- the rank of these matrices (r) is tiny compared to the original dimensions.
- this makes the number of trainable parameters much smaller than the original model.
- storage requirements are reduced significantly, often enabling fine-tuning on consumer hardware.
- during inference, lora matrices can be merged with the original weights with no performance penalty.

> **benefits of lora**
- dramatically reduced memory requirements for fine-tuning
- faster training times
- lower computational costs
- ability to switch between different adaptations quickly
- preserves the general knowledge of the base model while adding specialized capabilities