In [1]:
import warnings
warnings.filterwarnings("ignore")

In [2]:
!pip install transformers



# LLM Example

Today we'll see how to work with decoder models in the zero-shot mode. We'll start with the basic GPT3 zero-shot example and then switch to more advanced LLMs.

In [3]:
import torch

# If there's a GPU available...
if torch.cuda.is_available():

    # Tell PyTorch to use the GPU.
    device = torch.device("cuda")

    print('There are %d GPU(s) available.' % torch.cuda.device_count())

    print('We will use the GPU:', torch.cuda.get_device_name(0))

# If not...
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")
device

There are 1 GPU(s) available.
We will use the GPU: Tesla T4


device(type='cuda')

## ruGPT3 example

Load [ruGPT3](https://huggingface.co/ai-forever/rugpt3large_based_on_gpt2).

In [4]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ai-forever/rugpt3large_based_on_gpt2")

model = AutoModelForCausalLM.from_pretrained("ai-forever/rugpt3large_based_on_gpt2")

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/574 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/622 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/3.14G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.14G [00:00<?, ?B/s]

In [5]:
model.cuda()

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 1536)
    (wpe): Embedding(2048, 1536)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-23): 24 x GPT2Block(
        (ln_1): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=4608, nx=1536)
          (c_proj): Conv1D(nf=1536, nx=1536)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=6144, nx=1536)
          (c_proj): Conv1D(nf=1536, nx=6144)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=1536, out_features=50257, bias=False)
)

# Zero-shot

We use model loss for the zero-shot classification.

GPT-based models utilize per-token cross-entropy
loss, which is reduced to negative log probability
due to one-hot encoding of the tokens. **The idea is to select the target label associated with the prompt that results in the lowest sum of negative log probabilities for its tokens.**



In [6]:
import math
def get_loss_num(text):
    # Tokenize the input text and move it to the specified device
    inputs = tokenizer(text, return_tensors="pt").to(device)

    # Shift the inputs to create labels for the next-token prediction task
    labels = inputs["input_ids"].clone()

    # Move labels to the correct device if you're using GPU
    labels = labels.to(device)

    # Calculate loss
    outputs = model(**inputs, labels=labels)
    loss = outputs.loss
    return loss.item()


### Task: twitter tone analysis

Today we'll solve a sentiment analysis task. Let us start with some toy examples and try to come up with the prompts that can distinguish positive and negative texts.

**Positive promt example**

In [7]:
text = 'жизнь отличная'
get_loss_num('Позитивный твит: ' + text)

`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.


6.202009201049805

**Negative prompt example**

In [8]:
get_loss_num('Негативный твит: ' + text)

7.3455810546875

Let's add smiles!

In [9]:
print(get_loss_num('Позитивный твит: ' + text + ')))'))
print(get_loss_num('Негативный твит: ' + text + '((('))

6.151878356933594
7.050114631652832


Now we implement a function that selects the label which yeilds the lowest loss.

In [10]:
def predict_zero_shot(text, pos = 'Позитивный твит: {})))', neg = 'Негативный твит: {}((('):
  pos_loss = get_loss_num(pos.format(text))
  neg_loss = get_loss_num(neg.format(text))
  if pos_loss < neg_loss:
    return 'positive'
  return 'negative'

predict_zero_shot(text)

'positive'

Let's apply this approach to the twitter sentimant classification task.

In [11]:
!wget -O twitter_short.csv https://drive.usercontent.google.com/download?id=17qSrjy5NyknCfhs1kqGwHcHgml9UzpvS&export=download&authuser=0&confirm=t&uuid=cb32846f-bc96-4eb0-9e29-57d27a89e369&at=AN_67v2rr2Fh_KVc0V-EDJQ7bufm:1729946024386

--2025-10-11 16:53:48--  https://drive.usercontent.google.com/download?id=17qSrjy5NyknCfhs1kqGwHcHgml9UzpvS
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 142.250.101.132, 2607:f8b0:4023:c06::84
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|142.250.101.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14363 (14K) [application/octet-stream]
Saving to: ‘twitter_short.csv’


2025-10-11 16:53:50 (71.1 MB/s) - ‘twitter_short.csv’ saved [14363/14363]



In [12]:
import pandas as pd
df = pd.read_csv('twitter_short.csv', index_col = 0)
df.head()

Unnamed: 0,text,label
0,на работе был полный пиддес :| и так каждое за...,negative
1,"Коллеги сидят рубятся в Urban terror, а я из-з...",negative
2,@elina_4post как говорят обещаного три года жд...,negative
3,"Желаю хорошего полёта и удачной посадки,я буду...",negative
4,"Обновил за каким-то лешим surf, теперь не рабо...",negative


In [13]:
df.tail()

Unnamed: 0,text,label
95,"Встречайте, мои супер одногруппницы, будущие и...",positive
96,"все,я вас покидаю,результаты гляну вечером)#би...",positive
97,RT @Dasha_crazy_69: @DashkaTeddy дыы))) но кто...,positive
98,Почти приехали в родное селенье!) @ москва-рига,positive
99,На*уй ваши Канары и Мальдивы ! Тут новая тема ...,positive


In [14]:
from sklearn.metrics import accuracy_score
df['preds'] = df.text.apply(predict_zero_shot)
accuracy_score(df.label, df.preds)

0.74

In [15]:
from sklearn.metrics import f1_score
def encode_label(x):
  if x == 'negative':
    return 0
  return 1
f1_score(df.label.apply(encode_label), df.preds.apply(encode_label))

0.7868852459016393

## QWEN2.5

[Qwen2.5](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) is a small LLM which can be run in Colab.

In [16]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model.to(device);


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

First look how it works for simple text generation task.

In [17]:
text = "Продолжи поговорку:\nБез труда"
print(text)

Продолжи поговорку:
Без труда


In [18]:
tokens = tokenizer(text, add_special_tokens=True, return_tensors="pt").to(device)
tokens

{'input_ids': tensor([[ 53645,   9516,  47081,   1802,   5063,  14497, 125661,  35252,    510,
          60332,  31885,  10813,  19763,  39490]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

First try:

In [19]:
outputs = model.generate(**tokens, top_k=1).cpu()
print(tokenizer.batch_decode(outputs)[0])


Продолжи поговорку:
Без труда не выйдешь, ни в чем не поверишь.

Вот продолжение этой


In [20]:
outputs = model.generate(**tokens, num_beams=4, max_length=30).cpu()
print(tokenizer.batch_decode(outputs)[0])

Продолжи поговорку:
Без труда ничего не добьешься, но без труда ничего не добь


In [21]:
outputs = model.generate(**tokens, num_beams=4, num_return_sequences=4, max_length=40).cpu()
print("\n\n\n".join(tokenizer.batch_decode(outputs)))

Продолжи поговорку:
Без труда ничего не добьешься, но без труда ничего и не добьешься.

Вот несколько вариантов


Продолжи поговорку:
Без труда ничего не добьешься, но без труда ничего и не добьешься.

Ваш ответ:
Без


Продолжи поговорку:
Без труда ничего не добьешься, но без труда ничего и не добьешься.

Ваш ответ: Без т


Продолжи поговорку:
Без труда ничего не добьешься, но без труда ничего и не добьешься.

Вот продолжение этой п


## System prompt

A **system prompt** (or system message) is a special instruction provided to an LLM that defines its behavior, tone, personality, and constraints during interactions with users. It serves as a foundational guideline that sets expectations for how the model should respond to user inputs throughout a session.

But how? Let's ask [Mistral](https://chat.mistral.ai/), [ChatGPT](https://chatgpt.com), or Gemini! Open a model chat and type:


```
Add system prompt in gwen 2.5
```

Let's now add a system prompt!



In [22]:
system_prompt = "Ты — помощник, который генерирует пословицы на русском языке."  # Define your system prompt

prompt = "Продолжи поговорку:\nБез труда"
# Combine system prompt and user prompt into a full prompt
full_prompt = f"{system_prompt}\n\n{prompt}"
# Tokenize the full prompt
tokens = tokenizer(full_prompt, return_tensors="pt").to(device)

# Generate the response using the Qwen-2 model
outputs = model.generate(**tokens, num_beams=4, num_return_sequences=4, max_length=70).cpu()
print("\n\n\n".join([x.split('\n\n')[-1] for x in tokenizer.batch_decode(outputs)]))

Без труда не пришёл, без труда и не


1. Без труда не пришё


Без труда не пришёл, 
без труда


Без труда не пришёл, без труда не у


## ChatTemplate

**ChatTemplate** is a mechanism in the Hugging Face Transformers library that controls how the input text (prompt) is constructed for chat-based models.

Essentially, every chat model (such as LLaMA-2-Chat, Qwen-Chat, Mistral-Instruct, and others) has its own dialogue format — with special system tokens, separators between user and assistant messages, and sometimes additional instruction
s.

To avoid writing all of this manually, `Transformers` provides the `ChatTemplate` utility.

**In simple terms**
it is a set of rules that turns a list of chat messages (with roles like system, user, assistant)
into a single text prompt that the model can actually process.

### Example

```
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me a joke."}
]
text = chat_template.format(messages)
print(text)

```
Will give us:


```
This template might produce something like:
<|system|>
You are a helpful assistant.
<|user|>
Tell me a joke.
<|assistant|>
That formatted text is what gets tokenized and sent to the model.
```




### Why it matters
* Different models expect different chat formats (e.g., OpenAI-style, LLaMA-style, ChatML).
* The ChatTemplate ensures token alignment between pretraining and inference.
* Using the wrong template can cause degraded performance or hallucinations.

**!Warning!** Do not believe LLM recommendations about chat-template formats for the particular models, see documentation!

**Or just do not think and use a predefined template!**
Every modern chat model now defines its own chat_template inside the tokenizer config:

```
tokenizer.apply_chat_template(messages, tokenize=False)
```



In [29]:
prompt = "Продолжи поговорку:\nБез труда"
messages = [
    {"role": "system", "content": "Ты — помощник, который генерирует пословицы на русском языке."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
print(text)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
response

<|im_start|>system
Ты — помощник, который генерирует пословицы на русском языке.<|im_end|>
<|im_start|>user
Продолжи поговорку:
Без труда<|im_end|>
<|im_start|>assistant



'Без труда ничего не достичь.'

## Gwen2.5 for sentiment analysis

Now, let's look how it solves the sentiment analysis task. First, try the simple generation approach.



In [23]:
text = 'жизнь отличная'
prompt = "Напиши pos в случае если приведенный текст твита позитивный и neg в случае если негативный. Ничего больше не добавляй. Текст твита:\n{}".format(text)
print(prompt)
# Combine system prompt and user prompt into a full prompt
# Tokenize the full prompt
tokens = tokenizer(prompt, return_tensors="pt").to(device)

outputs = model.generate(**tokens, num_beams=2, num_return_sequences=1, max_length=100).cpu()
print(tokenizer.batch_decode(outputs)[0].replace(prompt,''))

Напиши pos в случае если приведенный текст твита позитивный и neg в случае если негативный. Ничего больше не добавляй. Текст твита:
жизнь отличная
, работа хорошая, семья счастливая, друзья хорошие, планы на будущее интересные. 

pos = 1
neg = 0

pos = 1
neg = 0

pos


Add a system prompt.

In [24]:
system_prompt = "Ты — помощник, который задачу sentiment analysis."  # Define your system prompt
text = 'жизнь отличная'

prompt = "Напиши pos в случае если приведенный текст твита позитивный и neg в случае если негативный. Ничего больше не добавляй. Текст твита:\n{}".format(text)
# Combine system prompt and user prompt into a full prompt
full_prompt = f"{system_prompt}\n\n{prompt}"
# Tokenize the full prompt
tokens = tokenizer(full_prompt, return_tensors="pt").to(device)

outputs = model.generate(**tokens, num_beams=2, num_return_sequences=1, max_length=100).cpu()
print(tokenizer.batch_decode(outputs)[0].replace(full_prompt,''))

, всё в порядке

pos, neg

pos, neg

pos, neg

pos, neg

pos, neg

pos, neg

pos, neg

pos,


In [30]:
text = 'ваще чума=)'
prompt = "Напиши pos в случае если приведенный текст твита позитивный и neg в случае если негативный. Ничего больше не добавляй. Текст твита:\n{}".format(text)

messages = [
    {"role": "system", "content": "Ты — помощник, который задачу sentiment analysis."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
print(text)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
response

<|im_start|>system
Ты — помощник, который задачу sentiment analysis.<|im_end|>
<|im_start|>user
Напиши pos в случае если приведенный текст твита позитивный и neg в случае если негативный. Ничего больше не добавляй. Текст твита:
ваще чума=)<|im_end|>
<|im_start|>assistant



'pos'

The model is too small and the result is now that good. But what about the loss variant?

In [25]:
print(get_loss_num('Позитивный твит: ' + text))
print(get_loss_num('Негативный твит: ' + text))

3.7961068153381348
4.003837585449219


In [26]:
print(get_loss_num('Позитивный твит: ' + text + ')))'))
print(get_loss_num('Негативный твит: ' + text + '((('))

4.081406116485596
4.601905345916748


In [27]:
from sklearn.metrics import accuracy_score
df['preds_qwen'] = df.text.apply(predict_zero_shot)
accuracy_score(df.label, df.preds_qwen)

0.81

In [28]:
f1_score(df.label.apply(encode_label), df.preds_qwen.apply(encode_label))

0.8347826086956521