<h3 align="center">Status : <span class="badge"><b>En cours</b></span></h3>

<h1 align="center">RL GRPO</h1>

---

<h1 align="center">Training a small Unimarc reasoner with RL</h1>

# Requirements

In [1]:
import torch
print("PyTorch:")
print("PyTorch version is:" + torch.__version__)
print("PyTorch is working with CUDA" if torch.cuda.is_available() else "Error! It is not working correctly")
print("The GPU model is: "+ torch.cuda.get_device_name(0))

PyTorch:
PyTorch version is:2.6.0+cu124
PyTorch is working with CUDA
The GPU model is: NVIDIA A2


In [None]:
!pip install -U --quiet datasets transformers huggingface_hub accelerate flash-attn

# Load model

In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
model_name = "Qwen/Qwen2.5-0.5B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    attn_implementation="flash_attention_2"
)
model.config.sliding_window = None

In [6]:
device = "cuda" # for GPU usage or "cpu" for CPU usage
model = model.to(device)
model.eval()

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 896)
    (layers): ModuleList(
      (0-23): 24 x Qwen2DecoderLayer(
        (self_attn): Qwen2Attention(
          (q_proj): Linear(in_features=896, out_features=896, bias=True)
          (k_proj): Linear(in_features=896, out_features=128, bias=True)
          (v_proj): Linear(in_features=896, out_features=128, bias=True)
          (o_proj): Linear(in_features=896, out_features=896, bias=False)
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
          (up_proj): Linear(in_features=896, out_features=4864, bias=False)
          (down_proj): Linear(in_features=4864, out_features=896, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
        (post_attention_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
      )
    )
    (norm): Qwen2RMSNorm((896,), eps=1e-06)
    (rotary_emb): Qwen2RotaryEmbe

In [17]:
# !Important
tokenizer.padding_side = "left"

## Inspect tokenizer

In [7]:
len(tokenizer)

151665

In [16]:
print(f"**EOS**\nEOS token: {tokenizer.eos_token}\n- EOS token id: {tokenizer.eos_token_id}\n\n**PAD**\nPAD token: {tokenizer.pad_token}\n- PAD token id: {tokenizer.pad_token_id}")

**EOS**
EOS token: <|im_end|>
- EOS token id: 151645

**PAD**
PAD token: <|endoftext|>
- PAD token id: 151643


In [18]:
tokenizer("the sky is blue", return_tensors="pt").to(device)

{'input_ids': tensor([[ 1782, 12884,   374,  6303]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1]], device='cuda:0')}

In [19]:
input_ids = tokenizer("the sky is blue", return_tensors="pt").to(device).input_ids[0]

In [20]:
tokenizer.decode(input_ids)

'the sky is blue'

In [21]:
if hasattr(tokenizer, "chat_template"):
    print("Current chat template:", tokenizer.chat_template)

Current chat template: {%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0]['role'] == 'system' %}
        {{- messages[0]['content'] }}
    {%- else %}
        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
    {%- endif %}
    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
    {%- if messages[0]['role'] == 'system' %}
        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
    {%- else %}
        {{- '<|im_start|>system\nYou are Qwe

## Inference

### Without applying chat templating

In [37]:
prompt = "<|im_start|>system: You are a helpful assistant<|im_end|><|im_start|>user: complete this sentence 'the sky is blue and '<|im_end|><|im_start|>assistant: "
inputs = tokenizer(prompt, return_tensors="pt").to(device)

In [38]:
outputs = model.generate(**inputs,
                         max_new_tokens = 4096,
                         use_cache = True,)

In [39]:
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

<|im_start|>system: You are a helpful assistant<|im_end|><|im_start|>user: complete this sentence 'the sky is blue and '<|im_end|><|im_start|>assistant:  the sky is blue and the clouds are white.<|im_end|>


### With applying chat templating

In [58]:
SYSTEM_PROMPT = """
You are an expert in Unimarc/XML bibliographic records.
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

# source: https://arxiv.org/pdf/2503.19470v1
SYSTEM_PROMPT_1 = """
You are a helpful assistant that can solve the given question step by step. 
Given a question, you need to first think about the reasoning process in
the mind and then provide the answer. The reasoning process and
answer are enclosed within <think> </think> and <answer> </answer> tags respectively. 
For example, <think> This is the reasoning process. </think>
<answer> The final answer is \boxed{answer here}
</answer>. In the last part of the answer, the final exact answer is enclosed within \boxed{}
with xml format.
"""


SYSTEM_PROMPT_2 = (
    "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant "
    "first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning "
    "process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., "
    "<think> reasoning process here </think><answer> answer here </answer>"
)

user_prompt = """
Generate a valid Unimarc/XML record from these unstructured informations:
Title: Electric vehicle tribology
Subtitle: Challenges and opportunities for a sustainable transportation future
Author: Leonardo I. Farfan-Cabrera, Ali Erdemir
Publisher: Elsevier
Year: 2024
ISBN: 978-0-443-14074-7
Language: English
Collection/Series: Elsevier Series on Tribology and Surface Engineering
Edition: Not specified
Material description: 1 vol. (XI-313 p.), couv. ill. en coul., 23 cm
Abstract/Notes: "Electric vehicle tribology, challenges and opportunities for a sustainable transportation future" provides practical, comprehensive guidance on a new and increasingly important area of tribology. Building skills from fundamentals to solution design, this book demonstrates the unique tribological techniques essential to the efficient electrification of transport systems. Led by professors with a combined three decades in industry and academia, and collecting insights from experts around the world, this book begins with the essential knowledge regarding both electric vehicles and tribology. After outlining the unique tribological needs of EVs, the book then breaks down the components and hardware required. It provides detailed protocols and methods for the testing and improvement of lubricants and materials as well as a dedicated section on modern lubrication specific to EVs. Throughout, it considers the critical question of sustainable tribology and the long-term sustainable options for lubrication and materials for electric vehicles.
Source of the abstract/notes: 4e de couverture
Table of contents: Not specified
Keywords: Tribologie (technologie), Tribologie (Technologie)
"""

In [59]:
messages = [
  {"role": "system", "content": SYSTEM_PROMPT_1},
  {"role": "user", "content": user_prompt},
]

In [60]:
inputs = tokenizer.apply_chat_template(
    messages,
    return_dict=True,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to(device)

In [61]:
outputs = model.generate(
    **inputs,
    max_new_tokens = 2048,
    use_cache = True,
)

In [64]:
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

system

You are a helpful assistant that can solve the given question step by step. 
Given a question, you need to first think about the reasoning process in
the mind and then provide the answer. The reasoning process and
answer are enclosed within <think> </think> and <answer> </answer> tags respectively. 
For example, <think> This is the reasoning process. </think>
<answer> The final answer isoxed{answer here}
</answer>. In the last part of the answer, the final exact answer is enclosed withinoxed{}
with xml format.

user

Generate a valid Unimarc/XML record from these unstructured informations:
Title: Electric vehicle tribology
Subtitle: Challenges and opportunities for a sustainable transportation future
Author: Leonardo I. Farfan-Cabrera, Ali Erdemir
Publisher: Elsevier
Year: 2024
ISBN: 978-0-443-14074-7
Language: English
Collection/Series: Elsevier Series on Tribology and Surface Engineering
Edition: Not specified
Material description: 1 vol. (XI-313 p.), couv. ill. en coul., 23 

In [62]:
generated_ids = [
   output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, outputs)
]

In [63]:
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])

To generate a valid Unimarc/XML record from the given information, we need to follow these steps:

1. Identify the key elements from each piece of information.
2. Determine which XML element best represents the content of the document.
3. Create the XML structure based on the identified elements.

Step 1: Key elements from each piece of information:
- Title: Electric vehicle tribology
- Subtitle: Challenges and opportunities for a sustainable transportation future
- Author: Leonardo I. Farfan-Cabrera, Ali Erdemir
- Publisher: Elsevier
- Year: 2024
- ISBN: 978-0-443-14074-7
- Language: English
- Collection/Series: Elsevier Series on Tribology and Surface Engineering
- Edition: Not specified
- Material description: 1 vol. (XI-313 p.), couv. ill. en coul., 23 cm
- Abstract/Notes: "Electric vehicle tribology, challenges and opportunities for a sustainable transportation future"
- Source of the abstract/notes: 4e de couverture
- Table of contents: Not specified
- Keywords: Tribologie (techn

# Load data

In [65]:
from datasets import load_dataset, Dataset
import pandas as pd

In [66]:
dataset = load_dataset("Geraldine/Unimarc-rcr061522302-1.5k", split="train")
dataset

Generating train split: 100%|██████████| 1510/1510 [00:00<00:00, 23014.98 examples/s]


Dataset({
    features: ['ppn', 'question', 'answer'],
    num_rows: 1510
})

In [69]:
def format_prompt(example):
    data = {
        "prompt": [
            {
            "role": "system",
            "content": SYSTEM_PROMPT_1,
        },
            {
            "role": "user",
            "content": example['question'].strip(),
        }],
        "ability": "fact-reasoning",
        "reward_model": {
            "style": "rule",
            "ground_truth": example['answer'].strip()
        },
        "extra_info": {
            'index': example['ppn'].strip(),
        }
    }
    return data

In [71]:
dataset = dataset.map(format_prompt)

Map: 100%|██████████| 1510/1510 [00:00<00:00, 4445.16 examples/s]


In [78]:
dataset

Dataset({
    features: ['ppn', 'question', 'answer', 'prompt', 'ability', 'reward_model', 'extra_info'],
    num_rows: 1510
})

In [77]:
dataset[0]["prompt"]

[{'content': '\nYou are a helpful assistant that can solve the given question step by step. \nGiven a question, you need to first think about the reasoning process in\nthe mind and then provide the answer. The reasoning process and\nanswer are enclosed within <think> </think> and <answer> </answer> tags respectively. \nFor example, <think> This is the reasoning process. </think>\n<answer> The final answer is \x08oxed{answer here}\n</answer>. In the last part of the answer, the final exact answer is enclosed within \x08oxed{}\nwith xml format.\n',
  'role': 'system'},
 {'content': 'Title: Electric vehicle tribology\nSubtitle: Challenges and opportunities for a sustainable transportation future\nAuthor: Leonardo I. Farfan-Cabrera, Ali Erdemir\nPublisher: Elsevier\nYear: 2024\nISBN: 978-0-443-14074-7\nLanguage: English\nCollection/Series: Elsevier Series on Tribology and Surface Engineering\nEdition: Not specified\nMaterial description: 1 vol. (XI-313 p.), couv. ill. en coul., 23 cm\nAbst