## multilabel-classification using transformers library

In [None]:
import torch
from transformers import pipeline
import time

In [None]:
hf_token = '.............'
generator = pipeline(
    "text-generation",
    model="meta-llama/Llama-3.1-8B-Instruct",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
    token=hf_token
)

config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

Device set to use cpu


In [None]:
valid_labels = ["explicit_nudity", "suggestive", "violence", "disturbing_content","rude_gestures", "alcohol", "drugs", "tobacco", "hate_speech", "safe"]

In [None]:
labels_for_prompt = ", ".join(f"'{label}'" for label in valid_labels)

In [None]:
text_to_analyze='He was drunk. He drew his sword and charged, intending to cut them down where they stood.'

In [None]:
messages = [
  {
    "role": "system",
    "content": f"You are an expert content moderator. Your task is to identify ALL applicable categories for the user's text from the following list: {labels_for_prompt}. Your response MUST be a comma-separated list of the category names. If none of the categories apply, respond with only the word 'safe'."
  },
  {
    "role": "user",
    "content": f"Please classify the following text: \"{text_to_analyze}\""
  },]

In [None]:
start_time = time.time()
outputs = generator(messages, max_new_tokens=40, return_full_text=False)
end_time = time.time()

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [None]:
print(outputs)
print(f"inferencing finished in {end_time - start_time:.2f} seconds.")

[{'generated_text': 'alcohol, violence'}]

In [None]:
raw_prediction = outputs[0]['generated_text'].strip().lower()

In [None]:
potential_labels = [label.strip() for label in raw_prediction.split(',')]
final_labels = [label for label in potential_labels if label in valid_labels]
if not final_labels or final_labels == ['safe']:
    final_labels = ['safe']

In [None]:
print(final_labels)

['alcohol', 'violence']


## multilabel-classification using vLLM: The Throughput King

**vLLM** is an open-source library from UC Berkeley designed to maximize throughput—the number of requests you can process per second.

**What it is**: A high-performance serving library that replaces the backend of the standard transformers pipeline.

**Key Innovation**: PagedAttention. This is the secret sauce. Inspired by virtual memory and paging in traditional operating systems, PagedAttention breaks the large, clunky KV Cache into smaller, fixed-size "pages" or "blocks."

**Analogy**: Imagine trying to park a very long limousine (a request's KV Cache) in a busy parking lot. You need a huge, continuous empty space. With PagedAttention, you can break the limousine into segments and park each segment in any small, available spot. This eliminates wasted memory and makes it much easier to manage thousands of requests.

**Benefit**: Because memory is managed so efficiently, vLLM can pack far more requests onto a single GPU, leading to a massive increase in throughput (often 20x or more) compared to standard pipelines.

## <font color="red">Failed</font>

The error persists because standard Google Colab environments do not support vLLM. The issue isn't software; it's the hardware.

### The Core Reason: GPU Hardware Incompatibility
The free GPUs provided by Google Colab (usually NVIDIA T4 or sometimes P100) are based on older hardware architectures.

Google Colab GPU: The NVIDIA T4 has a "Turing" architecture (Compute Capability 7.5).

vLLM Requirement: Modern, high-performance libraries like vLLM are built and optimized for newer "Ampere" (A100) or "Hopper" (H100) GPU architectures (Compute Capability 8.0 and higher).

In [None]:
!pip install vllm



In [None]:
from vllm import LLM, SamplingParams
import time

INFO 08-14 11:36:36 [__init__.py:235] Automatically detected platform tpu.
INFO 08-14 11:36:37 [tpu.py:202] tpu_commons not found, using vLLM's TpuPlatform


In [None]:
texts_to_test = [
  "The violent gang members got into a fight at the bar, shouting hateful slurs while drinking whiskey and smoking cigarettes.",
  "He drew his sword and charged, intending to cut them down where they stood.",
  "The community is organizing a bake sale to raise funds for the local park.",
  "I'm feeling lonely tonight, looking for someone to chat with privately.",
  "The driver got angry and showed him the middle finger before speeding off."
]

In [None]:
prompt_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are an expert content moderator.
Your task is to identify ALL applicable categories for the user's text from the following list: 'explicit_nudity', 'suggestive', 'violence', 'disturbing_content',
'rude_gestures', 'alcohol', 'drugs', 'tobacco', 'hate_speech', 'safe'. Your response MUST be a comma-separated list of the category names.
If none of the categories apply, respond with only the word 'safe'.<|eot_id|><|start_header_id|>user<|end_header_id|>

Please classify the following text: "{text}"<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""


In [None]:
full_prompts = [prompt_template.format(text=text) for text in texts_to_test]

In [None]:
sampling_params = SamplingParams(
  temperature=0.0, # Use 0 for deterministic classification
  max_tokens=40    # Max length of the generated label list
)

In [None]:
from huggingface_hub import login
hf_token = '..................'
login(token=hf_token)

In [None]:
!pip uninstall -y torch-xla

Found existing installation: torch-xla 2.6.0
Uninstalling torch-xla-2.6.0:
  Successfully uninstalled torch-xla-2.6.0


In [None]:
!pip install -U vllm transformers

Collecting transformers
  Downloading transformers-4.55.2-py3-none-any.whl.metadata (41 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Downloading transformers-4.55.2-py3-none-any.whl (11.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.3/11.3 MB[0m [31m109.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.55.0
    Uninstalling transformers-4.55.0:
      Successfully uninstalled transformers-4.55.0
Successfully installed transformers-4.55.2


In [None]:
llm = LLM(model="meta-llama/Llama-3.1-8B-Instruct", trust_remote_code=True)

ValidationError: 1 validation error for ModelConfig
  Value error, Model architectures ['LlamaForCausalLM', 'TransformersForCausalLM'] failed to be inspected. Please check the logs for more details. [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error

In [None]:
start_time = time.time()
outputs = llm.generate(full_prompts, sampling_params)
end_time = time.time()

In [None]:
print(outputs)
print(f"inferencing finished in {end_time - start_time:.2f} seconds.")