## Monday, February 5, 2024

A quick test to validate this environment is good to go with transformers.

Hmm I have a local environment variable set for the HuggingFace Transformers model cache folder and yet, when I download a model here, it gets loaded into the default '~/cache/huggingface/hub' folder ... meh.

In [1]:
!ls ~/.cache/huggingface/hub

models--bert-base-uncased
models--mistralai--Mistral-7B-Instruct-v0.2
models--nomic-ai--nomic-embed-text-v1
models--sentence-transformers--all-mpnet-base-v2
tmp9s591511
version.txt


Always start with making sure any cuda code will target the 4090.

In [2]:
# only target the 4090 ...
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

Let's conduct a simple test using the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) model from HuggingFace.

In [3]:
model_name = "mistralai/Mistral-7B-Instruct-v0.2"

In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer

Using the default code shown in the Model card, the model gets loaded to the CPU Ram, then to the GPU VRAM where it runs out of GPU memory!

Then when I try to load it directly to the GPU, it fails with the error:

'ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`'

So then I ran 'mamba install conda-forge::accelerate'

In [5]:
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [6]:
# This way of loading the model loads it to the CPU memory, NOT the GPU VRAM memory. 
# And when we try to then load it to the GPU, we run out of VRAM!
# model = AutoModelForCausalLM.from_pretrained(model_name)

# mamba install conda-forge::accelerate

# And when I tried this, after install accelerate, it still ran out of VRAM!
# model = AutoModelForCausalLM.from_pretrained(model_name, device_map=device)


# And when I run this, I get this error message:
#   ImportError: Using `load_in_8bit=True` requires Accelerate: `pip install accelerate` 
#   and the latest version of bitsandbytes `pip install -i https://test.pypi.org/simple/ bitsandbytes` or `pip install bitsandbytes`.
# model = AutoModelForCausalLM.from_pretrained(model_name, 
#                                              device_map=device,
#                                              load_in_8bit=True)

# mamba install conda-forge::bitsandbytes

# Wow! Now when I run this, I get a ton of error messages related to CUDA ... like the following ...
# CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
# model = AutoModelForCausalLM.from_pretrained(model_name, 
#                                              device_map=device,
#                                              load_in_8bit=True)

# Running this generates the same mess of CUDA errors ... man, I got to wonder, do I need to install the CUDA Toolkit??
# model = AutoModelForCausalLM.from_pretrained(model_name,
#                                               load_in_8bit=True,
#                                               device_map='auto',
#                                               torch_dtype=torch.float16,
#                                               low_cpu_mem_usage=True,
#                                               )


# So yeah, I actually just installed the CUDA 12.3 toolkit and we are still getting these CUDA errors! WTF!?


# This code worked in another notebook but different model and within docker ...
# I am now thinking this may have to do with 'bitsandbytes' problems ....
model = AutoModelForCausalLM.from_pretrained(model_name,
                                              load_in_8bit=True,
                                              device_map=device,
                                              torch_dtype=torch.float16,
                                              low_cpu_mem_usage=True,
                                              )



False

The following directories listed in your path were found to be non-existent: {PosixPath('/usr/share/gconf/ubuntu.default.path')}
The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/share/gconf/ubuntu.mandatory.path')}
The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/2836,unix/KAUWITB'), PosixPath('local/KAUWITB')}
The following directories listed in your path were found to be non-existent: {PosixPath('vs/workbench/api/node/extensionHostProcess')}
The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
The following directories listed in your path were found to be non-existent: {PosixPath('/home/rob/miniforge3/


python -m bitsandbytes


  warn(msg)
  warn(msg)
  warn(msg)


RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback):

        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

In [6]:
messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]


In [7]:
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

In [8]:
model_inputs = encodeds.to(device)
model.to(device)

OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacity of 23.65 GiB of which 209.56 MiB is free. Including non-PyTorch memory, this process has 23.43 GiB memory in use. Of the allocated memory 23.05 GiB is allocated by PyTorch, and 1.17 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [None]:
# from transformers import AutoModelForCausalLM, AutoTokenizer

# device = "cuda" # the device to load the model onto

# model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
# tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

# messages = [
#     {"role": "user", "content": "What is your favourite condiment?"},
#     {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
#     {"role": "user", "content": "Do you have mayonnaise recipes?"}
# ]

# encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

# model_inputs = encodeds.to(device)
# model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
