## Fine-tune Mistral models using 🤗 [`peft`](https://github.com/huggingface/peft) adapters, [`transformers`](https://github.com/huggingface/transformers) & [`bitsandbytes`](https://github.com/TimDettmers/bitsandbytes)


This notebook is made by following this tutorial :

https://github.com/huggingface/notebooks/blob/main/sagemaker/28_train_llms_with_qlora/sagemaker-notebook.ipynb

https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fine-tune-llms-in-2024-with-trl.ipynb

To understand about how Lora work, take a look at this link:

https://huggingface.co/docs/peft/main/en/conceptual_guides/lora

https://blog.dailydoseofds.com/p/full-model-fine-tuning-vs-lora-vs

## 1-Introduction

In this sagemaker example, we are going to learn how to apply [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)
to fine-tune Mistral LLM. QLoRA is an efficient finetuning technique that quantizes a pretrained language model to 4 bits and attaches small “Low-Rank Adapters” which are fine-tuned. This enables fine-tuning of models with up to 65 billion parameters on a single GPU; despite its efficiency, QLoRA matches the performance of full-precision fine-tuning and achieves state-of-the-art results on language tasks.

In our example, we are going to leverage Hugging Face [Transformers](https://huggingface.co/docs/transformers/index), [Accelerate](https://huggingface.co/docs/accelerate/index), and [PEFT](https://github.com/huggingface/peft).

In Detail you will learn how to:
1. Setup Development Environment
2. Load and prepare the dataset
3. Fine-Tune Mistral LLM with QLoRA on Amazon SageMaker

### Quick intro: PEFT or Parameter Efficient Fine-tuning

[PEFT](https://github.com/huggingface/peft), or Parameter Efficient Fine-tuning, is a new open-source library from Hugging Face to enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. PEFT currently includes techniques for:

- (Q)LoRA: [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685.pdf)
- Prefix Tuning: [P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks](https://arxiv.org/pdf/2110.07602.pdf)
- P-Tuning: [GPT Understands, Too](https://arxiv.org/pdf/2103.10385.pdf)
- Prompt Tuning: [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/pdf/2104.08691.pdf)


## 2-Environment Setup


Before you begin, make sure you have the necessary environment and dependencies set up. This notebook was tested in Amazon SageMaker Studio with the following configuration:

- **Python Version:** Pytorch 2.1.0 Python 3.10 kernel
- **Instance Type:** ml.g5.12xlarge

<!-- %pip install -q -U  bitsandbytes==0.41.3
%pip install -q -U transformers==4.36.2 peft==0.7.1 accelerate==0.25.0 trl==0.7.7
%pip install -q -U datasets==2.16.0  wandb==0.16.1 sentencepiece==0.1.99 openpyxl
%pip install trl==0.7.7 -->
