## <font color='purple'>__KLC Containers__</font>
Much like __conda environments__, <font color='purple'>__containers__</font> enable you to package software and dependencies.  

### <font color='purple'>__Why Containers?__</font>

While __conda environments__ are best suited for managing Python/R packages and their related libraries, they fall short for larger projects that require non-Python dependencies. However, <font color='purple'>__containers__</font> __shine__ when you need to: 


- safeguard your project environment __against admin updates/changes__ to KLC/Quest or accidental environment modifications by collaborators

- install something that __requires admin or super user privileges__ (apt-get, or yum) that we don't have on KLC/Quest

- __transfer your complete workflow__ (via a *container image*) from machine to machine (KLC to your computer; MIT server to KLC, etc.) 

- create virtual machines that run on a __completely different operating system__ (Windows, MacOS, Ubuntu) than KLC/Quest (RHEL7)

This guide demonstrates how to install __llama_cpp_python__ on KLC using <font color='purple'>__containers__</font>. 





_Note: __llama_cpp_python__ is a Python wrapper for the C/C++ implementation of Meta's LLaMa architecture. For more information see this repo on llama.cpp (https://github.com/ggerganov/llama.cpp), this for the python wrapper (https://github.com/abetlen/llama-cpp-python), or this overview (https://www.datacamp.com/tutorial/llama-cpp-tutorial)_

This guide: 

- 1.) summarizes the __terminology__ and steps used when working with containers; 

- 2.) demonstrates how to copy a <font color='purple'>__Docker image__</font> to use on KLC for __llama_cpp_python__ that runs on CPUs-only.

- 3.) walks through how to install llama_cpp_python that runs on GPUs by:
     
     - a.) <font color='purple'>__Building__</font> a <font color='purple'>__Docker image__</font> that runs on KLC/Quest's GPU nodes
     
     - b.) <font color='purple'>__Pushing__</font> this <font color='purple'>__Docker image__</font> to <font color='purple'>__Docker hub__</font> to access it from KLC
     
     - c.) <font color='purple'>__Building__</font> a <font color='purple'>__Singularity image__</font> from your underlying <font color='purple'>__Docker image__</font>
     
     - d.) Running that <font color='purple'>__Singularity image__</font> on our KLC GPU node (that has only 1 GPU) and a Quest GPU node (multiple GPUs are available through the SLURM scheduler).

- 4.) saves a singularity image as a KLC __module__ to easily access in the future.

### <font color='purple'>1.) __Terminology and Steps__</font>

Here are some key terms to know when working with containers: 

- <font color='purple'>__Images__</font>: blueprints or templates that define the contents and configuration of a container. You can think of this as the _blueprint_ containing all the components and instructions to build a house.

- <font color='purple'>__Build__</font>: the process of creating a <font color='purple'>__container image__</font> from a set of instructions (a <font color='purple'>__Dockerfile__</font>) or <font color='purple'>__Singularity recipe file__</font>.  Think of this as _following a blueprint_ to construct a house. _Note that containers cannot be modified after the `build' step._

- <font color='purple'>__Push__</font>: uploading a <font color='purple'>__container image__</font> built locally to a public registry where it can be accesssed and shared.  This is like moving a completed house (<font color='purple'>__container image__</font>) to a neighborhood, like (<font color='purple'>__Docker Hub__</font>) for others to see.

- <font color='purple'>__Pull__</font>: downloading a <font color='purple'>__container image__</font> from a registry to another system. We can pull a <font color='purple'>__docker image__</font> from Docker Hub to KLC/Quest.

This guide leverages both <font color='purple'>__Singularity__</font> and <font color='purple'>__Docker__</font>.  While there are multiple differences between the two platforms, our use-case comes down to:

-  adopting <font color='purple'>__Docker__</font> for containers in non-HPC environments.  For instance, we don't have sudo privileges on KLC, so we need to create container images on our own computers.

- leveraging <font color='purple'>__Singularity__</font> for containers in HPC environments, like KLC/Quest.


### <font color='purple'>2.) __Access llama_cpp_python (CPU) Docker image on KLC__</font>

The easiest way to create a <font color='purple'>__Singularity image__</font> is from an existing <font color='purple'>__Docker image__</font>.  For instance, there are many images available for llama_cpp_python (cpu-only) on Docker Hub.  One example is the image we created here: https://hub.docker.com/r/rskellogg/llama-cpp-python/tags

<img src="rs_llama.png" width="700" height="450">
<!-- ![alt text](/Users/ambreenchaudhri/Desktop/new_tools/llama_cpp_python/rs_llama.png) -->

To translate this to a Singularity image on KLC, run: 

``` 
module purge
module load singularity
singularity pull docker://rskellogg/llama-cpp-python:latest
```
_Please note that these images must be available in a public Docker Hub container.  Also, please specify any tags for the image._


Running this command will create a singularity image file (as shown below):

![alt text](sif_cpu.png)

You can run a Singularity container in a few ways:
- (1) from inside an interactive __shell__; 
- (2) through an __exec__ or __run__ command (depending on how the script was written.)

#### <font color='purple'>_Interactive Shell_</font>

To run the llama_cppy_python (CPU-only) in an interactive shell, type:

``` 
singularity shell /kellogg/software/simages/llama-cpp-python.sif
```
In order to access a space other than your home directory from inside your singularity image, you will need to "Bind" that directory to your image.  Here is an example of binding two directories (one with the model, another with the code) to my singularity image:

```
singularity shell -B /kellogg/software/llama_cpp/models:/model -B /kellogg/software/llama_cpp/code:/code /kellogg/software/simages/llama-cpp-python.sif
```

To run llama_cpp_python, you can download any compatible model and model weights.  For instance, different mistral models can be found here: https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF
The script below uses this model: __mistral-7b-v0.1.Q5_K_S.gguf__ 

In [None]:
############################
# Llama_cpp_python test code
############################

# libraries
from llama_cpp import Llama

# Inputs
my_model_path = "/model/mistral-7b-v0.1.Q5_K_S.gguf"
CONTEXT_SIZE = 512

# LOAD THE MODEL
llm = Llama(
  model_path=my_model_path,  # The path to the model file
  n_ctx=CONTEXT_SIZE,  # The max sequence length to use - adjust based on your model's requirements
  n_threads=4,  # The number of CPU threads to use
  n_gpu_layers=0  # Set to 0 if you want to use CPU only and -1 if you want to use all available GPUs
)

# send prompts
response = llm("Can you provide a concise summary of Debord's Societe du Spectacle?", max_tokens=1000)
response_text = response['choices'][0]['text']
print(response_text)

This code is saved as __llama_cpp_python_test.py__ in your /kellogg/software/llama_cpp/code.  You can launch this code inside the shell with: 

``` 
python3 /code/llama_cpp_test.py
```

#### <font color='purple'>_Execute Command_</font>

We can launch a script from inside the singularity container without entering the singularity shell by either using "singularity exec" or "singularity run". To use a "singularity run" command, the appropriate "runscript" sections of the Dockerfile or singularity recipe would need to be created. Whereas, we can launch "singularity exec" on any Singularity image.

This command will run the same llama test file using "singularity exec":

``` 
singularity exec -B /kellogg/software/llama_cpp/models:/model -B /kellogg/software/llama_cpp/code:/code /kellogg/software/simages/llama-cpp-python.sif python3 /code/llama_cpp_test.py
```

_If you need more guidance on binding directories, please see NUIT's explanation of "Binding Directories" for singularity use here: https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1748._

### <font color='purple'>3.) __Create and Run a llama_cpp_python GPU Image on KLC__</font>

In order to create a llama_cpp_python image that runs on both GPUs and CPUs, we will need to create a Docker image that accounts for the architecture of our GPU nodes and KLC and Quest.  We can start by 
by cloning the llama_cpp_python git repo:

```
git clone --depth 1 --branch main https://github.com/abetlen/llama-cpp-python
```

The following Dockerfile enables us to build a Docker Image for llama_cpp_python using Cuda 12.1 on our GPU nodes.

In [None]:
```
ARG CUDA_IMAGE="12.1.1-devel-ubuntu22.04"
FROM nvidia/cuda:${CUDA_IMAGE}

# We need to set the host to 0.0.0.0 to allow outside access
ENV HOST 0.0.0.0

RUN apt-get update && apt-get upgrade -y \
    && apt-get install -y git build-essential \
    python3 python3-pip gcc wget \
    ocl-icd-opencl-dev opencl-headers clinfo \
    libclblast-dev libopenblas-dev \
    && mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd

# setting build related env vars
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1

# Install depencencies
RUN python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette pydantic-settings starlette-context

# added a line for additional packages
RUN python3 -m pip install numpy pandas parmap nltk statsmodels scipy
RUN python3 -m pip install huggingface_hub notebook jupyterlab
RUN python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
RUN python3 -m pip install 'transformers[torch]'

RUN git clone https://github.com/NVIDIA/apex
WORKDIR /apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... 
RUN git checkout 23.08
RUN python3 -m pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

RUN rm -rf /apex
WORKDIR /

# Install llama-cpp-python (build with cuda)
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

#RUN python3 -m pip cache remove

COPY . .

# Run the server
CMD python3 -m llama_cpp.server
```

Please save this file locally on your computer. Then use it to replace the existing Dockerfile in your cloned llama_cpp_python folder: 

```
# overwrite the cuda_simple Dockerfile with our Dockerfile
cp Dockerfile llama-cpp-python/docker/cuda_simple/Dockerfile
```

Next, please make sure you have Docker Desktop on your computer and an active Docker account.  You can now build your Docker container with: 

```
# build the container
cd llama-cpp-python 
docker build -f docker/cuda_simple/Dockerfile -t llama:<llama-cpp-version> . --platform=linux/amd64
```
Please insert the llama-cpp version here: __&lt;llama-cpp-version&gt;__ like this:

```
docker build -f docker/cuda_simple/Dockerfile -t llama:v0.2.38 . --platform=linux/amd64
```

Once the build step is complete, please login to Dockerhub.  Then you can tag and push your docker image to Docker Hub with:

```
# Log in to Dockerhub
docker login

# Tag the Docker image
docker tag llama:v0.2.38 ambreenwchaudhri/llama:v0.2.38

# Push the Docker image to Dockerhub
docker push ambreenwchaudhri/llama:v0.2.38
```

Please replace __ambreenwchaudhri__ with your Docker Hub username. 

Now that the Docker Image is built and publicly available on Dockerhub, you can move to KLC and create a Singularity image there.

Please sign into any KLC node.  From the command line, type:

```
module load singularity/3.8.1
singularity pull docker://ambreenwchaudhri/llama-cpp-python:v0.2.38
```

### <font color='purple'>4.) __Save an Image as a KLC Module__</font>

To use this singularity image, you will need to access a GPU node on KLC or Quest. 
To use an interactive session on our GPU node, login to any KLC node and type the following:

```
srun --partition=kellogg --account=kellogg --nodes=1 --ntasks-per-node=1 --gres=gpu:a100:1 --mem=20G --time=05:30:00 --pty bash -l
srun --partition=gengpu --account=p30790 --nodes=1 --ntasks-per-node=1 --gres=gpu:a100:1 --mem=20G --time=05:30:00 --pty bash -l
```

Once you are on the GPU node, you can run the same test file through an interactive shell session with the following: 

```
module load singularity/3.8.1
singularity shell -B /kellogg/software/llama_cpp/models:/model -B /kellogg/software/llama_cpp/code:/code /kellogg/software/llama_cpp/llama_v0.2.38.sif
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.1/compat
python3 /code/llama_cpp_test.py

Since calling singularity containers requires so many steps, we can package all of these steps into a KLC/Quest module using a .lua file.  Within /kellogg/software/modulefiles/llama_cpp, I created a .lua file called 2.38.lua that looks like this: 

In [None]:
# new lua file
depends_on("singularity/3.8.1")
local python3_str = 'singularity exec --nv -B /kellogg/software/llama_cpp/models:/model -B /kellogg/software/llama_cpp/code:/code --env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.1/compat /kellogg/software/llama_cpp/llama_v0.2.38.sif python3 $@'
set_shell_function("python3", python3_str)

This .lua file translates all of our singularity container calling steps into a KLC model that we can access with the following steps:

```
module purge
module use modulefiles
module load llama_cpp/2.38
```

Once the module is loaded, you can simply login to a GPU node and launch your code file with:

```
python3 /code/llama_cpp_test.py
```