### Flash Attention 2 Install

This short installation guide is divided into two parts: CUDA Toolkit and Pip. The first part covers installation instructions for the NVIDIA CUDA Compiler (NVCC) driver which is a requirement for installing Flash Attention 2.

If you're using the "Jupyter PyTorch" template from runpod.io and you try running the command below, it will tell you it could not find the command:

In [4]:
!nvcc --version

/bin/bash: line 1: nvcc: command not found


If, however, you're running this in a different environment, and you get an output like the one below, you already have NVCC installed:

```shell
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0
```

If that's the case, please skip directly to the "Pip Install" section.

### CUDA Toolkit Install

The "Jupyter PyTorch" template is based on a Docker image using Ubuntu. First, we need to figure out which Ubuntu version is running:

In [1]:
!lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy


That's Ubuntu 22.04, great. We can now head to NVIDIA's [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) page and select the proper configuration:

- **Operating System**: `Linux`
- **Architecture**: `x86_64`
- **Distribution**: `Ubuntu`
- **Version**: `22.04`
- **Install Type**: `deb(local)`

![](./asc2doc/print_https://github.com/dvgodoy/FineTuningLLMs/blob/main/images/appendix/cuda1.png?raw=True)

Once you finish selecting the configuration, it will present you a set of installation instructions:

![](./asc2doc/print_https://github.com/dvgodoy/FineTuningLLMs/blob/main/images/appendix/cuda2.png?raw=True)

We're dividing these instructions into four groups:

- **Group 1: Downloading Only**
  - `wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin`
  - `sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600`
  
- **Group 2: Downloading and Installing**
  - `wget https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda-repo-ubuntu2204-12-6-local_12.6.2-560.35.03-1_amd64.deb`
  - `sudo dpkg -i cuda-repo-ubuntu2204-12-6-local_12.6.2-560.35.03-1_amd64.deb`
  
- **Group 3: Editing the Command**
  - `sudo cp /var/cuda-repo-ubuntu2204-12-6-local/cuda-XXXXXXXX-keyring.gpg /usr/share/keyrings/`
  
- **Group 4: Installing Only**
  - `sudo apt-get update`
  - `sudo apt-get -y install cuda-toolkit-12-6 --fix-missing`

Let's run them all, one by one.

#### Group 1: Downloading Only

This group is easy, fast, and straightforward. Just run the commands below and wait a few seconds for it to finish running:

In [None]:
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
!sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

```
--2024-11-14 13:50:06--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 190 [application/octet-stream]
Saving to: ‘cuda-ubuntu2204.pin’

cuda-ubuntu2204.pin 100%[===================>]     190  --.-KB/s    in 0s      

2024-11-14 13:50:06 (10.7 MB/s) - ‘cuda-ubuntu2204.pin’ saved [190/190]
```

#### Group 2: Downloading and Installing

The versions used in the commands below (12.6.2 and 12.6.2-560.35.03-1) may change between the time of writing and you visiting NVIDIA's page. It should work fine if you run the commands below "as is" but, if you want to get the latest version, make sure to copy it from the configuration page and update them.

This commands will take a few minutes to run and, at the end, they will show you the latest version of the command you need to run in "Group 3":

`sudo cp /var/cuda-repo-ubuntu2204-12-6-local/cuda-F9A63CE3-keyring.gpg /usr/share/keyrings/`

For version 12.6.2 we're using here, the keyring sequence is `F9A63CE3`. In the previous version, 12.6.0, the output was using a different sequence (`3DBA81E7`)

If you're using the latest version from NVIDIA, pay attention to the last line in the output and update it accordingly:

In [None]:
!wget https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda-repo-ubuntu2204-12-6-local_12.6.2-560.35.03-1_amd64.deb
!sudo dpkg -i cuda-repo-ubuntu2204-12-6-local_12.6.2-560.35.03-1_amd64.deb

```
--2024-11-14 13:50:45--  https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda-repo-ubuntu2204-12-6-local_12.6.2-560.35.03-1_amd64.deb
Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3420245310 (3.2G) [application/x-deb]
Saving to: ‘cuda-repo-ubuntu2204-12-6-local_12.6.2-560.35.03-1_amd64.deb’

cuda-repo-ubuntu220 100%[===================>]   3.18G  85.1MB/s    in 3m 29s  

2024-11-14 13:54:15 (15.6 MB/s) - ‘cuda-repo-ubuntu2204-12-6-local_12.6.2-560.35.03-1_amd64.deb’ saved [3420245310/3420245310]

Selecting previously unselected package cuda-repo-ubuntu2204-12-6-local.
(Reading database ... 24363 files and directories currently installed.)
Preparing to unpack cuda-repo-ubuntu2204-12-6-local_12.6.2-560.35.03-1_amd64.deb ...
Unpacking cuda-repo-ubuntu2204-12-6-local (12.6.2-560.35.03-1) ...
Setting up cuda-repo-ubuntu2204-12-6-local (12.6.2-560.35.03-1) ...

The public cuda-repo-ubuntu2204-12-6-local GPG key does not appear to be installed.
To install the key, run this command:
sudo cp /var/cuda-repo-ubuntu2204-12-6-local/cuda-F9A63CE3-keyring.gpg /usr/share/keyrings/
```

#### Group 3: Editing the Command

The only command in this group must be exactly the one listed in the last line of the output from Group 2. It will run instantly since it's only copying a file to a different directory:

In [5]:
!sudo cp /var/cuda-repo-ubuntu2204-12-6-local/cuda-F9A63CE3-keyring.gpg /usr/share/keyrings/

#### Group 4: Installing Only

Now we're ready to install the CUDA Toolkit itself. I've added `--fix-missing` to the last command to make it more robust. These two commands will take quite a while to run:

In [6]:
!sudo apt-get update
!sudo apt-get -y install cuda-toolkit-12-6 --fix-missing

Get:1 file:/var/cuda-repo-ubuntu2204-12-6-local  InRelease [1572 B]
Get:1 file:/var/cuda-repo-ubuntu2204-12-6-local  InRelease [1572 B]
Get:2 file:/var/cuda-repo-ubuntu2204-12-6-local  Packages [41.7 kB]            
Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]      
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease                         
Get:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1581 B]
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]        
Get:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1108 kB]
Get:8 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]      
Get:9 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages [44.7 kB]
Get:10 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2424 kB]
Get:11 http://archive.ubuntu.com/ubuntu jammy-updates/main i386 Packages [897 kB]
Get

#### Checking the Installation

After installing everything, you should be able to successfully run the command below:

In [None]:
!nvcc --version

```
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0
```

### Pip Install

If you already have NVIDIA CUDA Compiler driver installed, installing Flash Attention 2 itself is a piece of cake:

In [None]:
!pip install -U flash-attn transformers

Once it's finished installing, you can run Transformers' helper function, `is_flash_attn_2_available()`:

In [9]:
from transformers.utils import is_flash_attn_2_available
is_flash_attn_2_available()

True

That's it! Enjoy Flash Attention 2!