## Prepare Environment

We first create a virtual environment and install the required packages.

```shell
cat /etc/os-release
nvcc -V
cd ../personal_copilot
python3.11 -m venv .copilot
source .copilot/bin/activate
pip install --upgrade pip setuptools wheel
pip install torch torchvision torchaudio
pip install packaging
pip install flash-attn
pip install -r training/requirements.txt
pip install -r dateset_generation/requirements.txt
```

## Generate Dataset

Follow `personal_copilot/README.md`. 

```shell
export GH_ACCESS_TOKEN=xxxx
```

In [None]:
import os
os.getcwd()

In [None]:
os.chdir("../dataset_generation")
os.getcwd()

Clone repos

In [None]:
# !python clone_hf_repos.py

Check repos

In [None]:
!ls hf_public_repos

In [None]:
import nltk
nltk.download('punkt')

Run data processing pipeline

In [None]:
# !python pipeline.py

We could collate and push to hub.

```shell
python prepare_hf_dataset.py
```

but since this is a public dataset, we can also just download it from the hub.

## Train Model

```shell
python train.py \
    --model_name_or_path "bigcode/starcoder2-7b" \
    --lora_r 32 \
    --lora_alpha 64 \
    --lora_dropout 0.0 \
    --lora_target_modules "c_proj,c_attn,q_attn,c_fc,c_proj" \
    --use_nested_quant \
    --bnb_4bit_compute_dtype "bfloat16" \
    --use_flash_attn \
    --use_peft_lora \
    --use_4bit_quantization \
    --dataset_name "smangrul/hug_stack" \
    --dataset_text_field "text" \
    --max_seq_length 1024 \
    --fim_rate 0.5 \
    --fim_spm_rate 0.5 \
    --splits "train" \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --bf16 \
    --learning_rate 5e-4 \
    --lr_scheduler_type "cosine" \
    --weight_decay 0.01 \
    --max_steps 1000 \
    --warmup_steps 30 \
    --dataloader_num_workers 4 \
    --evaluation_strategy "steps" \
    --eval_steps 50 \
    --save_steps 50 \
    --logging_steps 25 \
    --output_dir "peft-lora-starcoder2-7b-personal-copilot-dual-3090-local" 
```

If the training is interrupted, we can resume it by adding `--resume_from_checkpoint "path/to/checkpoint"`.

```shell
    python train.py \
    --model_name_or_path "bigcode/starcoder2-7b" \
    --lora_r 32 \
    --lora_alpha 64 \
    --lora_dropout 0.0 \
    --lora_target_modules "c_proj,c_attn,q_attn,c_fc,c_proj" \
    --use_nested_quant \
    --bnb_4bit_compute_dtype "bfloat16" \
    --use_flash_attn \
    --use_peft_lora \
    --use_4bit_quantization \
    --dataset_name "smangrul/hug_stack" \
    --dataset_text_field "text" \
    --max_seq_length 1024 \
    --fim_rate 0.5 \
    --fim_spm_rate 0.5 \
    --splits "train" \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --bf16 \
    --learning_rate 5e-4 \
    --lr_scheduler_type "cosine" \
    --weight_decay 0.01 \
    --max_steps 1000 \
    --warmup_steps 30 \
    --dataloader_num_workers 4 \
    --evaluation_strategy "steps" \
    --eval_steps 50 \
    --save_steps 50 \
    --logging_steps 25 \
    --output_dir "peft-lora-starcoder2-7b-personal-copilot-dual-3090-local" \
    --resume_from_checkpoint "peft-lora-starcoder2-7b-personal-copilot-dual-3090-local/checkpoint-450"
```

### Using Tensorboard

```shell
cd personal_copilot/training/peft-lora-starcoder2-7b-personal-copilot-dual-3090-local
tensorboard --logdir=runs --bind_all
```