## Instructions

In order to run the following models on the cluster, you need to follow these steps:

1. Create an environment called nlp. It contains ipykernel (to make the kernel available) and a utility to show progress bars:
   
    `conda create -n nlp python=3.12 ipykernel ipywidgets`

2. Activate the environment

   `conda activate nlp`

3. Load a specific version of CUDA

    `module load CUDA/12.6.0`

4. Install PyTorch. You have to make sure the it is compatible with your CUDA version. Check that on https://pytorch.org/get-started/locally/. For the above selected version we can just run:

`pip install torch torchvision torchaudio`

5. Install additional packages to handle deep learning models:

`pip install transformers accelerate qwen-vl-utils`

7. Make the kernel available:

    `python -m ipykernel install --user --name nlp`

## Sentiment analysis

Analyse if a text is positive or negative with https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest

In [1]:
from transformers import pipeline

sentiment_task = pipeline("sentiment-analysis",
                          model="cardiffnlp/twitter-roberta-base-sentiment-latest",
                          tokenizer="cardiffnlp/twitter-roberta-base-sentiment-latest")
sentiment_task("This is a great movie!")

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'label': 'positive', 'score': 0.9849875569343567}]

In [2]:
sentiment_task("One one side the actors were great, on the other the photography is bad.")

[{'label': 'negative', 'score': 0.5444068312644958}]

## Describing images

Describe an image using a multimodal model: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct

In [1]:
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

In [2]:
'''
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
)
'''

'\nmodel = Qwen2VLForConditionalGeneration.from_pretrained(\n    "Qwen/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"\n)\n'

In [3]:
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-2B-Instruct", torch_dtype="auto", device_map="auto"
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [4]:
#processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")


messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "ducks.jpg" ,
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


['The image depicts two ducks in a shallow, rocky body of water. The duck in the foreground is a male mallard, characterized by its green head and white and brown body. The duck in the background is a female mallard, with a similar coloration but with a more muted brown and white plumage. The water is clear, allowing the rocks beneath to be visible, and the ducks are standing on the rocky shore. The scene is serene and natural, with the ducks appearing to be at ease in their environment.']
