# Instruction-tune Llama 2

Reference: Philipp Schmid https://www.philschmid.de/instruction-tune-llama-2

## Install dependencies

In [1]:
!pip install "transformers==4.31.0" "datasets==2.13.0" "peft==0.4.0" "accelerate==0.21.0" "bitsandbytes==0.40.2" "trl==0.4.7" "safetensors>=0.3.1" --upgrade

Collecting transformers==4.31.0
  Downloading transformers-4.31.0-py3-none-any.whl (7.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m101.1 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
[?25hCollecting datasets==2.13.0
  Downloading datasets-2.13.0-py3-none-any.whl (485 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m485.6/485.6 kB[0m [31m71.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting peft==0.4.0
  Downloading peft-0.4.0-py3-none-any.whl (72 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate==0.21.0
  Downloading accelerate-0.21.0-py3-none-any.whl (244 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m52.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes==0.40.2
  Downloading bitsandbytes-0.40.2-py3-none-any.whl (92.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Dataset

Use Databricks Dolly dataset `databricks/databricks-dolly-15k`.

Let's first load the dataset from the hub.

In [2]:
from datasets import load_dataset

# Load the dataset from the hub
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

Downloading readme: 100%|██████████| 8.20k/8.20k [00:00<00:00, 21.1MB/s]


Downloading and preparing dataset json/databricks--databricks-dolly-15k to /home/ec2-user/.cache/huggingface/datasets/databricks___json/databricks--databricks-dolly-15k-7427aa6e57c34282/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...


Downloading data: 100%|██████████| 13.1M/13.1M [00:00<00:00, 151MB/s]
Downloading data files: 100%|██████████| 1/1 [00:00<00:00,  1.78it/s]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 1282.66it/s]
                                                        

Dataset json downloaded and prepared to /home/ec2-user/.cache/huggingface/datasets/databricks___json/databricks--databricks-dolly-15k-7427aa6e57c34282/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data.




Now take a look at the dataset. The data is in JSON format with the following schema:

```js
{
    'instruction': 'I am trying to book a flight from Singapore to Sydney, what shall I do if the flight is too expensive?', 
    'context': '', 
    'response': 'You will have the option to choose from local Asian low-cost airlines such as Scoot, Jetstar, or AirAsia which would provide cheaper flights options.', 
    'category': 'general_qa'
}
```

In [3]:
from random import randrange

print(f'dataset size: {len(dataset)}')
print(dataset[randrange(len(dataset))])

dataset size: 15011
{'instruction': 'I am trying to book a flight from Singapore to Sydney, what shall I do if the flight is too expensive?', 'context': '', 'response': 'You will have the option to choose from local Asian low-cost airlines such as Scoot, Jetstar, or AirAsia which would provide cheaper flights options.', 'category': 'general_qa'}


Let's define a function to convert the data into a collection of tasks described by instructions.

In [4]:
def format_instructions(sample):
    return f"""### Instruction:
Use the Input below to create an instruction, which could have been used to generate the Input using an LLM.

### Input:
{sample['response']}

### Response:
{sample['instruction']}
"""

Test the `format_instructions` function with a random sample in the dataset.

In [6]:
from random import randrange

sample_idx = randrange(len(dataset))
print(dataset[sample_idx])
print(format_instructions(dataset[sample_idx]))

{'instruction': 'When would a railway be considered a heritage railway?', 'context': "A heritage railway or heritage railroad (US usage) is a railway operated as living history to re-create or preserve railway scenes of the past. Heritage railways are often old railway lines preserved in a state depicting a period (or periods) in the history of rail transport. The British Office of Rail and Road defines heritage railways as follows:...'lines of local interest', museum railways or tourist railways that have retained or assumed the character and appearance and operating practices of railways of former times. Several lines that operate in isolation provide genuine transport facilities, providing community links. Most lines constitute tourist or educational attractions in their own right. Much of the rolling stock and other equipment used on these systems is original and is of historic value in its own right. Many systems aim to replicate both the look and operating practices of historic f

## Instruction-tune Llama 2

Flash Attention needs NVIDIA Ampere GPUs. See if we have it.

In [8]:
!python -c "import torch; assert torch.cuda.get_device_capability()[0] >= 8, 'Hardware not supported for Flash Attention'"

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ec2-user/miniconda3/lib/python3.11/site-packages/torch/cuda/__init__.py", line 381, in get_device_capability
    prop = get_device_properties(device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/miniconda3/lib/python3.11/site-packages/torch/cuda/__init__.py", line 395, in get_device_properties
    _lazy_init()  # will define _get_device_properties
    ^^^^^^^^^^^^
  File "/home/ec2-user/miniconda3/lib/python3.11/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx


The driver is missing. See how to install it here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html

If you need to use a password for `ec2-user`, first change to root with `$ sudo su`, then `sudo passwd ec2-user` and type in a new password twice. After password is created, switch back to `ec2-user` by `$ su ec2-user`.

For getting the driver from AWS (located in S3), add an IAM role to the EC2 which can access S3.

After installation, checking the driver and GPU details should give you this:

```bash
$ nvidia-smi -q | head

==============NVSMI LOG==============

Timestamp                                 : Mon Jul 31 14:13:24 2023
Driver Version                            : 535.54.03
CUDA Version                              : 12.2

Attached GPUs                             : 1
GPU 00000000:00:1E.0
    Product Name                          : NVIDIA A10G
```

Now this check should pass with no complaint and the pip installs would work.

In [1]:
!python -c "import torch; assert torch.cuda.get_device_capability()[0] >= 8, 'Hardware not supported for Flash Attention'"

In [2]:
!pip install ninja packaging

Collecting ninja
  Downloading ninja-1.11.1-py2.py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (145 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m146.0/146.0 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: ninja
Successfully installed ninja-1.11.1


In [3]:
!MAX_JOBS=4 pip install flash-attn --no-build-isolation

Collecting flash-attn
  Downloading flash_attn-2.0.2.tar.gz (4.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.2/4.2 MB[0m [31m54.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25lerror
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py egg_info[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m [31m[13 lines of output][0m
  [31m   [0m 
  [31m   [0m 
  [31m   [0m torch.__version__  = 2.0.1+cu117
  [31m   [0m 
  [31m   [0m 
  [31m   [0m Traceback (most recent call last):
  [31m   [0m   File "<string>", line 2, in <module>
  [31m   [0m   File "<pip-setuptools-caller>", line 34, in <module>
  [31m   [0m   File "/tmp/pip-install-6hpibdd5/flash-attn_ec1990251ecc4243a47398f0a6794995/setup.py", line 108, in <module>
  [31m   [0m     raise_if_cuda_home_none("flash_attn")
  [31m   [0m   File "/tmp/pip-install-6hpibdd5/flash-a

I need a newer version of CUDA?
 
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=CentOS&target_version=7&target_type=runfile_local

Or install cuda with `conda`?

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

```bash
$ conda install cuda -c nvidia
```

In [5]:
!MAX_JOBS=4 pip install flash-attn --no-build-isolation

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting flash-attn
  Downloading flash_attn-2.0.2.tar.gz (4.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.2/4.2 MB[0m [31m56.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Collecting einops (from flash-attn)
  Downloading einops-0.6.1-py3-none-any.whl (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m322.2 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: flash-attn
  Building wheel for flash-attn (setup.py) ... [?25lerror
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py bdist_wheel[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m [31m[114 lines of output][0m
  [31m   [0m 
  [31m   [0m 
  [31m   [0m torch.__version__  = 2.0.1+cu117
  [31m   [0m 
  [31m   [0m 
  [31m   [0m fat

Then I got this.
```bash
The detected CUDA version (12.2) mismatches the version that was used to compile
      PyTorch (11.7). Please make sure to use the same CUDA versions.
```
So let me try install CUDA pytorch 11.7

```bash
# uninstall the current CUDA 12.2 
$ conda remove cuda
# install 11.7
$ conda install cuda -c nvidia/label/cuda-11.7.0
```

After this, I still needed to install `typing-extensions`:

```bash
$ pip install typing-extensions
```

Then it finally started to build 
```bash
Building wheels for collected packages: flash-attn
  Building wheel for flash-attn (setup.py) ... \
```
