# Prerequisites

**Tested on Ubuntu 22.04**
Using vscode and remote ssh extension.

# Ollama

## Install Linux 

[Ollama Linux installation instructions](https://ollama.com/download/linux)

```sh
curl -fsSL https://ollama.com/install.sh | sh
```

## Try it

```sh
ollama list
ollama pull llama3.1:70b
ollama run llama3.1:70b
```



# Setup Python Environment

## Create your virtualenv with conda

```sh
conda create -n plotomatic python=3.12
conda activate plotomatic
```

## Remove conda packages (if there are issues)
If you see this error, there is a bug with nvidia and conda where you will probably need to remove al of the conda packages using `conda remove` with the same args to to uninstall them, then reinstall them after. I have seen this if I try to install a new VSCode extension, too.

```InvalidSpec: The package "nvidia/linux-64::cuda-compiler==12.6.2=0" is not available for the specified platform```

```sh
# One of these breaks conda for new install (probably cuda), so remove them first before adding new packages
conda remove \
        cuda cuda-nvcc cuda-cudart cuda-compiler \
        pytorch-cuda pytorch torchvision \
        tensorflow-gpu tensorflow cudnn \
        ipykernel sqlite nbconvert
```

## Install conda packages

In [None]:
# ipykernel is for vscode
# %conda install \
#         cuda cuda-nvcc cuda-cudart cuda-compiler \
#         pytorch-cuda pytorch torchvision \
#         tensorflow-gpu tensorflow cudnn \
#         ipykernel sqlite nbconvert

# Restart the Juptyer Notebook kernel

# Cuda, TensorRT

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#install

Skip the following if you aren't using CUDA

# Check if nvidia drivers and cuda are working

In [None]:
# CUDA Version: 12.4
# Driver Version: 550.107.02
!nvidia-smi

## Install NVIDIA drivers

Only do this if `nvidia-smi` didn't work.

```sh
sudo ubuntu-drivers devices | grep recommended
sudo apt-get install nvidia-driver-550
sudo reboot
```

### Check the driver is installed and working

In [None]:
!apt list --installed | grep nvidia-driver
!nvidia-smi

### Install CUDA
```sh
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
sudo apt-get install -y cuda-drivers
```

In [None]:
# Make sure TensorRT is installed

!dpkg-query -W tensorrt 
# 10.5.0.18-1+cuda12.6

!dpkg-query -W cuda-toolkit
# 12.6.1-1

## Python packages

In [None]:
%pip install --upgrade pip wheel setuptools 

# Ninja will help some packages compile faster
%pip install ninja 

# These need to be installed first
%pip install \
    nvidia-tensorrt \
    tensorflow \
    torch \
    tensorflow>=2.17.0 \
    cuda-python>=12.6.0 \
    torchvision>=0.15.2

In [None]:
# Make sure they work
import tensorrt
import tensorflow
import cuda
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Number of CUDA devices: {torch.cuda.device_count()}")

for i in range(torch.cuda.device_count()):
    print(f"Device {i}: {torch.cuda.get_device_name(i)}")

In [None]:
%pip install \
    huggingface_hub \
    transformers>=4.25.1 \
    diffusers \
    accelerate \
    ipywidgets \
    matplotlib \
    sentencepiece \
    numpy \
    rembg[GPU] \
    pydantic \
    unidecode \
    deepdiff \
    json-repair \
    ollama \
    graphviz

In [None]:
# llamaindex
%pip install llama-index-llms-ollama llama_index llama-index-embeddings-ollama

In [None]:
# NVIDIA NeMo Guardrails
%pip install nemoguardrails llama-index-output-parsers-guardrails

In [None]:
# For Cogvideo
# TODO: Move this to a separate notebook
# https://huggingface.co/THUDM/CogVideoX-5b-I2V
# %pip install --upgrade transformers accelerate diffusers imageio-ffmpeg tbb xfuser[flash_attn] onediff

# xfuser is for https://github.com/xdit-project/xDiT to run CogVideoX with parallel inference
# onediff, nexfort is for single gpu acceleration with xdit

# Acceleration for Cogvideo # full options are cpu/cu118/cu121/cu124
# %pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu124 
# %pip install optimum-quanto

In [None]:
# Faster image processing
# %pip uninstall pillow
%pip install pillow-simd

In [None]:
# For llama3-vision local
# %pip install --upgrade transformers>=4.45.0

# Transformers conflict with coquitts

In [None]:
# See how much memory is available
!free -h

In [None]:
# Check CPU information
!lscpu | head -n 16

## Make sure GPU works from pytorch

In [None]:
import torch
from IPython.display import display, Markdown

def show_memory_usage():
    # GPU Info
    if torch.cuda.is_available():
        gpu_table = """### GPU VRAM Usage

| GPU Index | GPU Name | Used VRAM (GB) | Total VRAM (GB) | VRAM Usage (%) |
|-----------|----------|------------------|-------------------|------------------|
"""
        for i in range(torch.cuda.device_count()):
            gpu_name = torch.cuda.get_device_name(i)
            free, total = torch.cuda.mem_get_info(i)
            used = total - free
            used_gb = used / 1024 ** 3
            total_gb = total / 1024 ** 3
            percent_used = used_gb / total_gb * 100.0

            gpu_table += f"| {i} | {gpu_name} | {used_gb:.2f} GB | {total_gb:.2f} GB | {percent_used:.2f}%        |\n"

        display(Markdown(gpu_table))
    else:
        display(Markdown("**No GPU available**"))


show_memory_usage()

### Log into Hugging Face

So we can pull models!

https://huggingface.co/settings/tokens to get a token and set it in `HF_TOKEN` environment variable or do the following in your shell

```sh
huggingface-cli login
```

See https://huggingface.co/docs/huggingface_hub/en/quick-start for more information.

In [None]:
from huggingface_hub import login
login()

In [None]:
# Make sure it works by fetching info about a model
from huggingface_hub import ModelCard

model_card = ModelCard.load('black-forest-labs/FLUX.1-dev')
print(model_card.data.tags)

## Flash Attention

Ninja will make it build much faster.

Per [PyPi](https://pypi.org/project/flash-attn/):

> Without ninja, compiling can take a very long time (2h) since it does not use multiple CPU cores. With ninja compiling takes 3-5 minutes on a 64-core machine.

In [None]:
%pip install packaging ninja
!ninja --version

In [None]:
# Adjust MAX_JOBS to suit your machine. I used 8 for 64GB RAM 
%env MAX_JOBS=8

%pip install flash-attn --no-build-isolation

# NIM


# Next Step
Onto [Step 2: Title + Plot](./2_title_plot.ipynb)