# [xturing](https://github.com/stochasticai/xturing) - DistilGPT-2 efficient fine-tuning tutorial

This tutorial aims to show how easy it is to perform fine-tuning with xturing. If you have access to A100 80GB GPUs, we recommend you to start the LLaMA notebook. This model is much better and the results are impressive!

## 1. Install the `xturing` library

In [None]:
!pip install xturing --upgrade

## 2. Download and unzip the dataset

In [None]:
!wget https://d33tr4pxdm6e2j.cloudfront.net/public_content/tutorials/datasets/alpaca_data.zip
!unzip alpaca_data.zip

## 3. Load the dataset and initialize the model

In [1]:
from xturing.datasets.instruction_dataset import InstructionDataset
from xturing.models import BaseModel

instruction_dataset = InstructionDataset("../llama/alpaca_data")
# Initializes the model
model = BaseModel.create("distilgpt2_lora")

  from .autonotebook import tqdm as notebook_tqdm
[92m2023-03-27 16:00:54,235 | DEBUG | xturing.models.causal 44 | Finetuning parameters: learning_rate=0.003 gradient_accumulation_steps=1 batch_size=16 weight_decay=0.01 warmup_steps=50 eval_steps=5000 save_steps=5000 max_length=512 num_train_epochs=3 logging_steps=10 max_grad_norm=2.0 save_total_limit=4 optimizer_name='adamw' output_dir='saved_model'[0m
[92m2023-03-27 16:00:54,236 | DEBUG | xturing.models.causal 45 | Generation parameters: penalty_alpha=0.6 top_k=0 max_new_tokens=256 do_sample=True top_p=0.92[0m


trainable params: 147456 || all params: 82060032 || trainable%: 0.17969283755580304


## 4. Start the finetuning

Before starting the finetuning you can specify several parameters. In this example, the `batch_size` and the `num_train_epochs` will be modified.

In [2]:
finetuning_config = model.finetuning_config()

finetuning_config.batch_size = 64
finetuning_config.num_train_epochs = 1

print(f"Finetuning parameters: {finetuning_config}")

Finetuning parameters: learning_rate=0.003 gradient_accumulation_steps=1 batch_size=64 weight_decay=0.01 warmup_steps=50 eval_steps=5000 save_steps=5000 max_length=512 num_train_epochs=1 logging_steps=10 max_grad_norm=2.0 save_total_limit=4 optimizer_name='adamw' output_dir='saved_model'


In [3]:
# Finetuned the model
model.finetune(dataset=instruction_dataset)

  rank_zero_warn(
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
  rank_zero_warn(
You are using a CUDA device ('NVIDIA A100-SXM4-80GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name          | Type                 | Params
-------------------------------------------------------
0 | pytorch_model | PeftModelForCausalLM | 82.1 M
-------------------------------------------------------
147 K     Trainable params
81.9 M    Non-trainable params
82.1 M    Total params
328.240   Total estimated model params size (MB)
  rank_zero_warn(


Epoch 0:   0%|          | 0/813 [00:00<?, ?it/s] 

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch 0:  75%|███████▌  | 612/813 [02:05<00:41,  4.87it/s, v_num=3, loss=0.977]

Token indices sequence length is longer than the specified maximum sequence length for this model (1469 > 1024). Running this sequence through the model will result in indexing errors


Epoch 0: 100%|██████████| 813/813 [02:44<00:00,  4.93it/s, v_num=3, loss=0.598]

`Trainer.fit` stopped: `max_epochs=1` reached.


Epoch 0: 100%|██████████| 813/813 [02:45<00:00,  4.92it/s, v_num=3, loss=0.598]


## 5. Generate an output text with the fine-tuned model

You can also customize some parameters to generate text.

In [5]:
generation_config = model.generation_config()
generation_config

GenerationConfig(penalty_alpha=0.6, top_k=0, max_new_tokens=256, do_sample=True, top_p=0.92)

In [4]:
# Once the model has been finetuned, you can start doing inferences
output = model.generate(texts=["Give three tips for staying healthy."])
print("Generated output by the model: {}".format(output))

  0%|          | 0/1 [00:00<?, ?it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
100%|██████████| 1/1 [00:00<00:00,  1.22it/s]

Generated output by the model: ['Give three tips for staying healthy.1. Whether you are new to physical activity or simply want a better life in your home.\n2. The reasons for your lifestyle choices are far more important to you than anything else.\n3. Know that each and every day is different, and that your body, mind, and body is something that both people and those who live in that situation have to work on.<|endoftext|>']





## 6. Save your model

In [6]:
# Save the model
model.save("./distilgpt2")

# If you want to load the model just do BaseModel.load("./distilgpt2")


## Do you have any questions?

You can open an issue in our [GitHub repo](https://github.com/stochasticai/xturing) 
