# LLM - EVAL
In questo notebook si utilizza lm-eval-harness, un framework open-source sviluppato da EleutherAI, per valutare le prestazioni di modelli linguistici di grandi dimensioni (LLM) su benchmark standardizzati.
Il tool fornisce un'interfaccia unificata per testare diversi modelli su una varietà di task linguistici, tra cui completamento di frasi, QA, classificazione, ragionamento e altro ancora.

L’obiettivo è ottenere metriche quantitative su 3 dataset ben definiti, al fine di confrontare modelli diversi o monitorare l’evoluzione delle performance durante il training.

In particolare i modelli coinvolti sono 3:
1. Llama3.2-3B-it
2. Gemma-2-2b-it
3. Qwen3-4b-it

Ed i task sulla quale verranno valutati saranno i seguenti:
1. GSM8K per valutare le performance nell'ambito matematico;
2. Logi_Qa 2.0 per valutare le performance nell'ambito di inferenza logica;
3. CoQa per valutare le performance nell'ambito del senso comune;

## Downloading framework

In [2]:
!pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git
!pip install bitsandbytes

Collecting git+https://github.com/EleutherAI/lm-evaluation-harness.git
  Cloning https://github.com/EleutherAI/lm-evaluation-harness.git to /tmp/pip-req-build-tyby1a70
  Running command git clone --filter=blob:none --quiet https://github.com/EleutherAI/lm-evaluation-harness.git /tmp/pip-req-build-tyby1a70
  Resolved https://github.com/EleutherAI/lm-evaluation-harness.git to commit fcddf195ec6bb69c63e36d54d75354f6ecaabab7
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting evaluate (from lm_eval==0.4.9)
  Downloading evaluate-0.4.5-py3-none-any.whl.metadata (9.5 kB)
Collecting jsonlines (from lm_eval==0.4.9)
  Downloading jsonlines-4.0.0-py3-none-any.whl.metadata (1.6 kB)
Collecting pytablewriter (from lm_eval==0.4.9)
  Downloading pytablewriter-1.2.1-py3-none-any.whl.metadata (38 kB)
Collecting rouge-score>=0.0.4 (from lm_eval==0.4.9)
  Downloa

# Gemma

## MATH

### Baseline

In [None]:
!lm_eval --model hf \
    --model_args pretrained=google/gemma-2-2b-it,load_in_4bit=True,dtype="float16" \
    --tasks gsm8k \
    --device "cuda" \
    --batch_size 8\
    --output_path ./gemma \
    --log_samples \
    --apply_chat_template

2025-05-30 10:37:24.350809: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1748601444.538776     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1748601444.599472     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 100%|█████████████████████████████| 838/838 [00:00<00:00, 5.12MB/s]
tokenizer_config.json: 100%|███████████████| 47.0k/47.0k [00:00<00:00, 4.88MB/s]
tokenizer.model: 100%|█████████████████████| 4.24M/4.24M [00:00<00:00, 54.8MB/s]
tokenizer.json: 100%|███████████████████████| 17.5M/17.5M [00:00<00:00, 163MB/s]
special_tokens_map.json: 100%|█████████████████| 636/636 [00:00<00:00, 5.66MB/s]
The `load_in_4bit` and `

### Post Fine Tuning 1

Viene riportato l'eval del modello in seguito al primissimo training effettuato, quindi con una configurazione di LoRA pari ad: (r=8 ed $\alpha=16$).

In [None]:
!lm_eval --model hf \
    --model_args pretrained=google/gemma-2-2b-it,load_in_4bit=True,peft=stefra/GEMMA2BITMATHR8A16,dtype="float16" \
    --tasks gsm8k \
    --device "cuda" \
    --batch_size 8\
    --output_path ./gemma \
    --log_samples \
    --apply_chat_template

2025-05-30 07:18:44.806153: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1748589525.012036     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1748589525.071567     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 100%|█████████████████████████████| 838/838 [00:00<00:00, 5.46MB/s]
tokenizer_config.json: 100%|███████████████| 47.0k/47.0k [00:00<00:00, 3.95MB/s]
tokenizer.model: 100%|█████████████████████| 4.24M/4.24M [00:00<00:00, 49.1MB/s]
tokenizer.json: 100%|███████████████████████| 17.5M/17.5M [00:00<00:00, 206MB/s]
special_tokens_map.json: 100%|█████████████████| 636/636 [00:00<00:00, 6.38MB/s]
The `load_in_4bit` and `

### Post Fine Tuning 2

Viene riportato l'eval del modello in seguito al secondo training effettuato, quindi con una configurazione di LoRA pari ad: (r=4 ed $\alpha=8$).

In [3]:
!lm_eval --model hf \
    --model_args pretrained=google/gemma-2-2b-it,load_in_4bit=True,peft=stefra/mathprova \
    --tasks gsm8k \
    --device "cuda" \
    --batch_size 8\
    --output_path ./gemma \
    --log_samples \
    --apply_chat_template \

2025-07-11 14:03:43.528793: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752242623.871056     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752242623.971466     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 100%|█████████████████████████████| 838/838 [00:00<00:00, 5.06MB/s]
tokenizer_config.json: 100%|███████████████| 47.0k/47.0k [00:00<00:00, 4.67MB/s]
tokenizer.model: 100%|█████████████████████| 4.24M/4.24M [00:00<00:00, 8.05MB/s]
tokenizer.json: 100%|██████████████████████| 17.5M/17.5M [00:00<00:00, 35.9MB/s]
special_tokens_map.json: 100%|█████████████████| 636/636 [00:00<00:00, 4.85MB/s]
The `load_in_4bit` and `

## Logic Inference

### Baseline

In [None]:
!lm_eval --model hf \
    --model_args pretrained=google/gemma-2-2b-it,load_in_4bit=True \
    --tasks logiqa2 \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./gemma \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-11 09:17:19.411682: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752225439.774111     115 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752225439.880910     115 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 100%|█████████████████████████████| 838/838 [00:00<00:00, 4.64MB/s]
tokenizer_config.json: 100%|███████████████| 47.0k/47.0k [00:00<00:00, 4.65MB/s]
tokenizer.model: 100%|█████████████████████| 4.24M/4.24M [00:00<00:00, 9.17MB/s]
tokenizer.json: 100%|██████████████████████| 17.5M/17.5M [00:00<00:00, 32.1MB/s]
special_tokens_map.json: 100%|█████████████████| 636/636 [00:00<00:00, 4.19MB/s]
The `load_in_4bit` and `

### Post fine-tuning 1

Viene riportato l'eval del modello in seguito al primissimo training effettuato, quindi con una configurazione di LoRA pari ad: (r=8 ed $\alpha=16$).

In [None]:
!lm_eval --model hf \
    --model_args pretrained=google/gemma-2-2b-it,load_in_4bit=True,peft=stefra/GEMMA2BITLOGICINFERENCE \
    --tasks logiqa2 \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./gemma \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-11 08:46:28.853339: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752223589.222103     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752223589.326974     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 100%|█████████████████████████████| 838/838 [00:00<00:00, 4.00MB/s]
tokenizer_config.json: 100%|███████████████| 47.0k/47.0k [00:00<00:00, 4.87MB/s]
tokenizer.model: 100%|█████████████████████| 4.24M/4.24M [00:00<00:00, 8.43MB/s]
tokenizer.json: 100%|██████████████████████| 17.5M/17.5M [00:00<00:00, 40.6MB/s]
special_tokens_map.json: 100%|█████████████████| 636/636 [00:00<00:00, 3.94MB/s]
The `load_in_4bit` and `

### Post fine-tuning 2

Viene riportato l'eval del modello in seguito al primissimo training effettuato, quindi con una configurazione di LoRA pari ad: (r=4 ed $\alpha=8$).

In [None]:
!lm_eval --model hf \
    --model_args pretrained=google/gemma-2-2b-it,peft=stefra/logicinference,load_in_4bit=True\
    --tasks logiqa2 \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./gemma \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-12 17:05:44.360165: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752339944.558958     115 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752339944.618054     115 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 100%|█████████████████████████████| 838/838 [00:00<00:00, 4.42MB/s]
tokenizer_config.json: 100%|███████████████| 47.0k/47.0k [00:00<00:00, 8.90MB/s]
tokenizer.model: 100%|█████████████████████| 4.24M/4.24M [00:00<00:00, 7.15MB/s]
tokenizer.json: 100%|██████████████████████| 17.5M/17.5M [00:00<00:00, 32.5MB/s]
special_tokens_map.json: 100%|█████████████████| 636/636 [00:00<00:00, 4.29MB/s]
The `load_in_4bit` and `

## Common Sense

### Baseline

In [None]:
!lm_eval --model hf \
    --model_args pretrained=google/gemma-2-2b-it,load_in_4bit=True \
    --tasks coqa \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./gemma \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-12 09:34:23.254584: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752312863.486427     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752312863.550850     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 100%|█████████████████████████████| 838/838 [00:00<00:00, 4.30MB/s]
tokenizer_config.json: 100%|███████████████| 47.0k/47.0k [00:00<00:00, 5.09MB/s]
tokenizer.model: 100%|█████████████████████| 4.24M/4.24M [00:00<00:00, 9.40MB/s]
tokenizer.json: 100%|██████████████████████| 17.5M/17.5M [00:00<00:00, 37.1MB/s]
special_tokens_map.json: 100%|█████████████████| 636/636 [00:00<00:00, 5.16MB/s]
The `load_in_4bit` and `

### Post fine-tuning 1

Viene riportato l'eval del modello in seguito al primissimo training effettuato, quindi con una configurazione di LoRA pari ad: (r=8 ed $\alpha=16$).

In [None]:
!lm_eval --model hf \
    --model_args pretrained=google/gemma-2-2b-it,peft=stefra/GEMMA2BITCOMMONSENSE,load_in_4bit=True\
    --tasks coqa \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./gemma \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-12 14:11:14.492645: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752329474.851042     115 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752329474.959658     115 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 100%|█████████████████████████████| 838/838 [00:00<00:00, 5.98MB/s]
tokenizer_config.json: 100%|███████████████| 47.0k/47.0k [00:00<00:00, 4.77MB/s]
tokenizer.model: 100%|█████████████████████| 4.24M/4.24M [00:00<00:00, 9.49MB/s]
tokenizer.json: 100%|██████████████████████| 17.5M/17.5M [00:00<00:00, 40.1MB/s]
special_tokens_map.json: 100%|█████████████████| 636/636 [00:00<00:00, 7.00MB/s]
The `load_in_4bit` and `

### Post fine-tuning 2

Viene riportato l'eval del modello in seguito al primissimo training effettuato, quindi con una configurazione di LoRA pari ad: (r=4 ed $\alpha=8$).

In [None]:
!lm_eval --model hf \
    --model_args pretrained=google/gemma-2-2b-it,peft=stefra/commonsense,load_in_4bit=True\
    --tasks coqa \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./gemma \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-12 12:13:06.383149: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752322386.563280     115 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752322386.611843     115 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 100%|█████████████████████████████| 838/838 [00:00<00:00, 7.83MB/s]
tokenizer_config.json: 100%|███████████████| 47.0k/47.0k [00:00<00:00, 4.71MB/s]
tokenizer.model: 100%|█████████████████████| 4.24M/4.24M [00:00<00:00, 8.26MB/s]
tokenizer.json: 100%|██████████████████████| 17.5M/17.5M [00:00<00:00, 29.1MB/s]
special_tokens_map.json: 100%|█████████████████| 636/636 [00:00<00:00, 3.51MB/s]
The `load_in_4bit` and `

# LLama

## MATH

### Baseline

In [None]:
!lm_eval --model hf \
    --model_args pretrained=unsloth/Llama-3.2-3B-Instruct-bnb-4bit,parallelize=True \
    --tasks gsm8k \
    --device "cuda" \
    --batch_size 8\
    --output_path ./llama \
    --log_samples \
    --apply_chat_template \

2025-07-11 04:44:44.060729: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752209084.439787     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752209084.542368     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 1.47kB [00:00, 6.96MB/s]
tokenizer_config.json: 54.7kB [00:00, 118MB/s]
tokenizer.json: 100%|██████████████████████| 17.2M/17.2M [00:00<00:00, 28.7MB/s]
special_tokens_map.json: 100%|█████████████████| 454/454 [00:00<00:00, 3.32MB/s]
chat_template.jinja: 3.83kB [00:00, 20.7MB/s]
model.safetensors: 100%|████████████████████| 2.24G/2.24G [00:09<00:00, 243MB/s]
generation_config.json: 100%|██████████████████| 234/2

### Post Fine-Tuning

Viene riportato l'eval del modello in seguito al primissimo training effettuato, quindi con una configurazione di LoRA pari ad: (r=8 ed $\alpha=16$).

In [None]:
!lm_eval --model hf \
    --model_args pretrained=unsloth/Llama-3.2-3B-Instruct-bnb-4bit,peft=francescoocurcio/new_llama3.2-3B-math-ftn-math-3epoch_12.5k-sysprompt_no,parallelize=True \
    --tasks gsm8k \
    --device "cuda" \
    --batch_size 8\
    --output_path ./llama \
    --log_samples \
    --apply_chat_template \

2025-07-10 16:25:40.914093: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752164741.128573     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752164741.188146     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 1.47kB [00:00, 5.70MB/s]
tokenizer_config.json: 54.7kB [00:00, 104MB/s]
tokenizer.json: 100%|██████████████████████| 17.2M/17.2M [00:00<00:00, 28.1MB/s]
special_tokens_map.json: 100%|█████████████████| 454/454 [00:00<00:00, 2.30MB/s]
chat_template.jinja: 3.83kB [00:00, 14.7MB/s]
model.safetensors: 100%|████████████████████| 2.24G/2.24G [00:09<00:00, 230MB/s]
generation_config.json: 100%|███████████████████| 234/

## Logic Inference

### Baseline

In [None]:
!lm_eval --model hf \
    --model_args pretrained=unsloth/Llama-3.2-3B-Instruct-bnb-4bit,parallelize=True \
    --tasks logiqa2 \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./llama \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-12 17:13:22.403528: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752340402.767140     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752340402.871753     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 1.47kB [00:00, 7.35MB/s]
tokenizer_config.json: 54.7kB [00:00, 114MB/s]
tokenizer.json: 100%|██████████████████████| 17.2M/17.2M [00:00<00:00, 28.9MB/s]
special_tokens_map.json: 100%|█████████████████| 454/454 [00:00<00:00, 3.14MB/s]
chat_template.jinja: 3.83kB [00:00, 19.1MB/s]
model.safetensors: 100%|████████████████████| 2.24G/2.24G [00:09<00:00, 225MB/s]
generation_config.json: 100%|██████████████████| 234/2

### Post Fine-Tuning

Viene riportato l'eval del modello in seguito al primissimo training effettuato, quindi con una configurazione di LoRA pari ad: (r=8 ed $\alpha=16$).

In [None]:
!lm_eval --model hf \
    --model_args pretrained=unsloth/Llama-3.2-3B-Instruct-bnb-4bit,peft=francescoocurcio/new_llama3.2-3B-log-ftn-lioa-3epoch_10k-sysprompt_no,parallelize=True \
    --tasks logiqa2 \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./llama \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-12 18:38:15.279387: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752345495.540621     115 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752345495.621301     115 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 1.47kB [00:00, 7.89MB/s]
tokenizer_config.json: 54.7kB [00:00, 141MB/s]
tokenizer.json: 100%|██████████████████████| 17.2M/17.2M [00:01<00:00, 13.4MB/s]
special_tokens_map.json: 100%|█████████████████| 454/454 [00:00<00:00, 2.29MB/s]
chat_template.jinja: 3.83kB [00:00, 2.50MB/s]
model.safetensors: 100%|████████████████████| 2.24G/2.24G [00:08<00:00, 269MB/s]
generation_config.json: 100%|██████████████████| 234/2

## Common Sense

### Baseline

In [None]:
!lm_eval --model hf \
    --model_args pretrained=unsloth/Llama-3.2-3B-Instruct-bnb-4bit,parallelize=True \
    --tasks coqa \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./llama \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-12 17:28:33.194022: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752341313.565819     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752341313.668111     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 1.47kB [00:00, 8.94MB/s]
tokenizer_config.json: 54.7kB [00:00, 151MB/s]
tokenizer.json: 100%|██████████████████████| 17.2M/17.2M [00:00<00:00, 30.5MB/s]
special_tokens_map.json: 100%|█████████████████| 454/454 [00:00<00:00, 3.49MB/s]
chat_template.jinja: 3.83kB [00:00, 2.73MB/s]
model.safetensors: 100%|████████████████████| 2.24G/2.24G [00:11<00:00, 197MB/s]
generation_config.json: 100%|██████████████████| 234/2

### Post Fine-Tuning

Viene riportato l'eval del modello in seguito al primissimo training effettuato, quindi con una configurazione di LoRA pari ad: (r=8 ed $\alpha=16$).

In [None]:
!lm_eval --model hf \
    --model_args pretrained=unsloth/Llama-3.2-3B-Instruct-bnb-4bit,peft=francescoocurcio/new_llama3.2-3B-log-ftn-csqa-3epoch_trainsplit-sysprompt_no,parallelize=True \
    --tasks coqa \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./llama \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-12 21:45:17.805690: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752356718.175928     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752356718.286035     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 1.47kB [00:00, 7.04MB/s]
tokenizer_config.json: 54.7kB [00:00, 108MB/s]
tokenizer.json: 100%|██████████████████████| 17.2M/17.2M [00:00<00:00, 28.5MB/s]
special_tokens_map.json: 100%|█████████████████| 454/454 [00:00<00:00, 3.90MB/s]
chat_template.jinja: 3.83kB [00:00, 20.6MB/s]
model.safetensors: 100%|████████████████████| 2.24G/2.24G [00:11<00:00, 191MB/s]
generation_config.json: 100%|██████████████████| 234/2

# QWEN

## MATH

### Baseline

In [None]:
!lm_eval --model hf \
    --model_args pretrained=unsloth/Qwen3-4B-unsloth-bnb-4bit,parallelize=True \
    --tasks gsm8k \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./qwen \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-13 18:35:50.336811: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752431750.552995     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752431750.613277     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 1.70kB [00:00, 7.98MB/s]
tokenizer_config.json: 10.5kB [00:00, 36.3MB/s]
vocab.json: 2.78MB [00:00, 40.6MB/s]
merges.txt: 1.67MB [00:00, 143MB/s]
tokenizer.json: 100%|██████████████████████| 11.4M/11.4M [00:00<00:00, 21.5MB/s]
added_tokens.json: 100%|███████████████████████| 707/707 [00:00<00:00, 5.08MB/s]
special_tokens_map.json: 100%|█████████████████| 614/614 [00:00<00:00, 4.44MB/s]
chat_template.jinja: 4.67

### Post Fine-Tuning

Viene riportato l'eval del modello in seguito al primissimo training effettuato, quindi con una configurazione di LoRA pari ad: (r=8 ed $\alpha=16$).

In [None]:
!lm_eval --model hf \
    --model_args pretrained=unsloth/Qwen3-4B-unsloth-bnb-4bit,peft=lorenagullone/QWEN4B_MATH_LoRA_r8_alpha16,parallelize=True \
    --tasks gsm8k \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./qwen \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-14 15:33:21.625103: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752507201.988085     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752507202.094223     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 1.70kB [00:00, 9.91MB/s]
tokenizer_config.json: 10.5kB [00:00, 28.2MB/s]
vocab.json: 2.78MB [00:00, 43.1MB/s]
merges.txt: 1.67MB [00:00, 54.5MB/s]
tokenizer.json: 100%|██████████████████████| 11.4M/11.4M [00:00<00:00, 20.5MB/s]
added_tokens.json: 100%|███████████████████████| 707/707 [00:00<00:00, 4.31MB/s]
special_tokens_map.json: 100%|█████████████████| 614/614 [00:00<00:00, 2.82MB/s]
chat_template.jinja: 4.6

## Logic Inference

### Baseline

In [None]:
!lm_eval --model hf \
    --model_args pretrained=unsloth/Qwen3-4B-unsloth-bnb-4bit,parallelize=True \
    --tasks logiqa2 \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./qwen \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-13 17:58:22.408274: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752429502.773436     115 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752429502.876145     115 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 1.70kB [00:00, 7.16MB/s]
tokenizer_config.json: 10.5kB [00:00, 48.8MB/s]
vocab.json: 2.78MB [00:00, 49.0MB/s]
merges.txt: 1.67MB [00:00, 134MB/s]
tokenizer.json: 100%|██████████████████████| 11.4M/11.4M [00:00<00:00, 21.2MB/s]
added_tokens.json: 100%|███████████████████████| 707/707 [00:00<00:00, 5.19MB/s]
special_tokens_map.json: 100%|█████████████████| 614/614 [00:00<00:00, 4.64MB/s]
chat_template.jinja: 4.67

### Post Fine-Tuning

Viene riportato l'eval del modello in seguito al primissimo training effettuato, quindi con una configurazione di LoRA pari ad: (r=8 ed $\alpha=16$).

In [None]:
!lm_eval --model hf \
    --model_args pretrained=unsloth/Qwen3-4B-unsloth-bnb-4bit,peft=lorenagullone/QWEN4B_LogicInference_LoRA_r8_alpha16,parallelize=True \
    --tasks logiqa2 \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./qwen \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-14 14:11:14.219628: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752502274.399430     115 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752502274.451035     115 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 1.70kB [00:00, 9.32MB/s]
tokenizer_config.json: 10.5kB [00:00, 27.2MB/s]
vocab.json: 2.78MB [00:00, 46.1MB/s]
merges.txt: 1.67MB [00:00, 127MB/s]
tokenizer.json: 100%|██████████████████████| 11.4M/11.4M [00:00<00:00, 23.7MB/s]
added_tokens.json: 100%|███████████████████████| 707/707 [00:00<00:00, 5.91MB/s]
special_tokens_map.json: 100%|█████████████████| 614/614 [00:00<00:00, 5.53MB/s]
chat_template.jinja: 4.67

## Common Sense

### Baseline

In [None]:
!lm_eval --model hf \
    --model_args pretrained=unsloth/Qwen3-4B-unsloth-bnb-4bit,parallelize=True \
    --tasks coqa \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./qwen \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-13 17:24:13.676891: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752427454.026618     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752427454.128797     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 1.70kB [00:00, 7.50MB/s]
tokenizer_config.json: 10.5kB [00:00, 31.9MB/s]
vocab.json: 2.78MB [00:00, 52.2MB/s]
merges.txt: 1.67MB [00:00, 123MB/s]
tokenizer.json: 100%|██████████████████████| 11.4M/11.4M [00:00<00:00, 19.8MB/s]
added_tokens.json: 100%|███████████████████████| 707/707 [00:00<00:00, 7.15MB/s]
special_tokens_map.json: 100%|█████████████████| 614/614 [00:00<00:00, 4.17MB/s]
chat_template.jinja: 4.67

### Post Fine-Tuning

Viene riportato l'eval del modello in seguito al primissimo training effettuato, quindi con una configurazione di LoRA pari ad: (r=8 ed $\alpha=16$).

In [None]:
!lm_eval --model hf \
    --model_args pretrained=unsloth/Qwen3-4B-unsloth-bnb-4bit,peft=lorenagullone/QWEN4B_CommonSenseQA_LoRA_r8_alpha16,parallelize=True \
    --tasks coqa \
    --device "cuda" \
    --batch_size 8 \
    --output_path ./qwen \
    --log_samples \
    --apply_chat_template \
    --trust_remote_code 

2025-07-14 07:03:33.373750: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752476613.560064     112 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752476613.620909     112 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
config.json: 1.70kB [00:00, 9.92MB/s]
tokenizer_config.json: 10.5kB [00:00, 37.1MB/s]
vocab.json: 2.78MB [00:00, 65.7MB/s]
merges.txt: 1.67MB [00:00, 117MB/s]
tokenizer.json: 100%|██████████████████████| 11.4M/11.4M [00:00<00:00, 17.2MB/s]
added_tokens.json: 100%|███████████████████████| 707/707 [00:00<00:00, 5.80MB/s]
special_tokens_map.json: 100%|█████████████████| 614/614 [00:00<00:00, 5.06MB/s]
chat_template.jinja: 4.67