# Evaluation of Fine-Tuned Large Language Models for ILENIA Aguila7B and Latxa Projects

In the field of linguistic resource development, **ILENIA** (*Impulsing Languages in Artificial Intelligence*) is part of the **Strategic Project for Economic Recovery and Transformation** (PERTE), within the framework of the **New Language Economy** (NEL). This project aims to boost Spain's new digital economy based on natural language, leveraging the potential of **Spanish** and other official languages as drivers of economic growth and international competitiveness in key areas such as **artificial intelligence**, **translation**, **education**, **cultural production and dissemination**, **research**, and **science**.

ILENIA is a **collaborative initiative** that coordinates the development of **multilingual resources**, with a particular focus on **multilingual models** for text, speech, and machine translation. The goal is to address societal needs and align with contemporary technology, where **multilingualism** and **cross-lingual transfer** play a crucial role.

The project spans **36 months** and is managed through a network comprising four centers that share **methodologies**, **objectives**, and **techniques**. The overall coordination is led by the **Barcelona Supercomputing Center-Centro Nacional de Supercomputación** (BSC-CNS).

Within this framework, ILENIA works with **textual and speech data** sourced from various inputs. **Language models (LLMs)** are essential for creating new applications, enabling ongoing advancements in **natural language processing** in monolingual, multilingual, and multimodal contexts.

The **generation and refinement** of LLMs is a progressive process that allows for **exponential growth** in model creation, optimizing the costs and resources required for training. This is crucial for developing **innovative and efficient solutions** in language processing.

In this notebook, we will evaluate several LLMs using **fine-tuning techniques**, focusing on the **ILENIA Aguila7B** and **Latxa** projects. This analysis aims to assess the **performance** and **capabilities** of these models based on the project’s objectives, providing a detailed view of their effectiveness and applicability.

Below, you will find links to the **Hugging Face** platform, which contains datasets, models, and AI resources developed to date within the ILENIA framework. These resources will serve as the basis for the evaluations conducted in this analysis.

## Evaluation of Ǎguila-7B

### Model Description

**Ǎguila-7B** is a transformer-based causal language model designed for Catalan, Spanish, and English. It is derived from the Falcon-7B model and has been trained on a trilingual corpus containing 26 billion tokens collected from publicly available corpora and web crawls.

For more details, you can read the [Ǎguila-7B Hugging Face project description](https://huggingface.co/projecte-aina/aguila-7b).

### Intended Uses and Limitations

Ǎguila-7B is primarily designed for causal language modeling tasks, specifically text generation. While it is ready-to-use for generating text, it is intended to be fine-tuned for various downstream tasks to enhance its performance for specific applications.


### How to Use Ǎguila-7B

Imagine you're a writer looking for inspiration. You have a great opening sentence, but you need the model's help to continue your story. This is exactly what the **Ǎguila-7B model can do**. **By providing it with an initial text, the model generates a continuation, helping you craft coherent and engaging narratives**. In this example, we’ll walk through the process of using Ǎguila-7B to generate text, demonstrating its capabilities and providing insight into its performance.

To use the model, you can follow these steps:

#### Import Required Libraries

First, we need to bring in the essential tools for our task. We begin by importing the necessary libraries:

In [1]:
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

Here’s what each import does:

- **torch:** This is the core library for working with neural networks. It handles tensor computations and deep learning operations.
- **pipeline:** A versatile utility from the transformers library that simplifies the use of pre-trained models for various tasks, including text generation.
- **AutoTokenizer:** This class automatically fetches the tokenizer that matches our model. The tokenizer converts text into a format that the model understands.
- **AutoModelForCausalLM:** This class loads the model architecture designed for causal language modeling, which is ideal for generating text.

### Define the Input Text

Next, we set up the initial text that we want the model to build upon:

In [2]:
input_text = "El mercat del barri és fantàstic, hi pots trobar"

Here, `input_text` is a sample sentence in Catalan. This is the starting point, and the model will generate additional text based on this input.

### Load the Model and Tokenizer

With the text defined, we now load the model and tokenizer:

In [3]:
model_id  = "projecte-aina/aguila-7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/844k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/503k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.20M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/470 [00:00<?, ?B/s]

- **model_id:** Specifies which model we are using from the Hugging Face model hub.
- **AutoTokenizer.from_pretrained(model_id):** Loads the tokenizer associated with the Ǎguila-7B model. The tokenizer breaks down our text into tokens that the model can process.

### Create a Text Generation Pipeline

We then set up the pipeline that will handle the text generation:

In [4]:
generator = pipeline(
    "text-generation",
    model=model_id,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)

config.json:   0%|          | 0.00/1.01k [00:00<?, ?B/s]

configuration_RW.py:   0%|          | 0.00/2.61k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/projecte-aina/aguila-7b:
- configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modelling_RW.py:   0%|          | 0.00/47.6k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/projecte-aina/aguila-7b:
- modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/165M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

This pipeline is like a pre-configured tool for generating text. Here’s how we set it up:

- **"text-generation":** Specifies that we are using the pipeline for generating text.
- **model=model_id:** Tells the pipeline which model to use.
- **tokenizer=tokenizer:** Provides the tokenizer that converts text into tokens.
- **torch_dtype=torch.bfloat16:** Optimizes the computation by using a data type that balances performance and memory usage.
- **trust_remote_code=True:** Assures that the model’s code is trusted.
- **device_map="auto":** Automatically selects the best hardware (CPU or GPU) for running the model.

### Generate Text

With everything set up, we can now generate text:

In [5]:
generation = generator(
    input_text,
    do_sample=True,
    top_k=50,
    eos_token_id=tokenizer.eos_token_id,
)



Here’s what each parameter does:

- **input_text:** The text that we want to expand.
- **do_sample=True:** Allows the model to sample from the top predictions, introducing variety into the generated text.
- **top_k=50:** Limits the sampling to the top 50 possible next tokens, ensuring more relevant and coherent outputs.
- **eos_token_id=tokenizer.eos_token_id:** Marks the end of the generated sequence with the end-of-sequence token.

### Print the Result

Finally, we print the text generated by the model:

In [6]:
print(f"Result: {generation[0]['generated_text']}")

Result: El mercat del barri és fantàstic, hi pots trobar de tot. 

Jo faig un recorregut per


### Limitations and Bias

At the time of submission, there were no specific measures taken to estimate bias and toxicity in the model. It is important to acknowledge that the model may exhibit biases, as it was trained on data collected from various web sources. Future research will address these issues, and updates will be made to this model card if necessary.

### Language Adaptation

The Ǎguila-7B model was adapted from the original Falcon-7B model for Spanish and Catalan by swapping the tokenizer and adjusting the embedding layer.

### Example Evaluation

To evaluate the performance of Ǎguila-7B, we can use the provided example code to generate text based on a given input. This will help us assess the model's ability to generate coherent and contextually relevant text.

In [7]:
input_text = "La tecnologia és fonamental en el desenvolupament de"

# Generate text using the model
generation = generator(
    input_text,
    do_sample=True,
    top_k=10,
    eos_token_id=tokenizer.eos_token_id,
)

# Display the generated text
print(f"Generated Text: {generation[0]['generated_text']}")

Generated Text: La tecnologia és fonamental en el desenvolupament de la intel·ligència humana. 

El coneixement és una


In this example, the model is expected to generate text that logically continues the input provided. Evaluating the relevance and coherence of the generated text will help determine the effectiveness of Ǎguila-7B for text generation tasks.

# Fine-Tuning and Evaluation of Latxa 7B for Basque Language Tasks 🇪🇸🇪🇺

In this notebook, we will explore how to use and evaluate the **Latxa 7B** model, a Basque-specific variant of the LLaMA model. Latxa was developed by the **HiTZ Research Center & IXA Research Group** with the goal of enhancing AI technologies for low-resource languages like Basque.

## Objectives:
1. Load and use the **Latxa 7B** model for text generation in Basque.
2. Evaluate the model on basic text generation tasks such as summarization and question answering.
3. Discuss the model’s strengths and limitations in generating content in the Basque language.


## Load the Latxa 7B Model

First, let's load the model using the `transformers` library. We'll initialize the text generation pipeline for Basque.

In [None]:
# Install the necessary libraries (if not installed already)
!pip install transformers

# Import the library
from transformers import pipeline

# Load the Latxa 7B model for text generation
pipe = pipeline("text-generation", model="HiTZ/latxa-7b-v1.2")

# Example Basque text
text = "Euskara adimen artifizialera iritsi da!"

# Generate text using the model
result = pipe(text, max_new_tokens=50, num_beams=5)

# Show the generated text
print(result[0]['generated_text'])



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/602 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Evaluating the Model for Text Generation

Let’s test the model’s ability to generate text in different scenarios.

### Case 1: Continue a Sentence in Basque

In this section, we will provide the model with an incomplete sentence and observe how it completes it.

In [None]:
text = "Euskara adimen artifizialera iritsi da!"
result = pipe(text, max_new_tokens=50, num_beams=5)

# Print the generated response
print(result[0]['generated_text'])

### Case 2: Answering Knowledge-Based Questions in Basque

We will ask the model some general knowledge questions in Basque to see how it performs.

In [None]:
# Basque question
question = "Nork idatzi zuen Don Quijote?"

# Generate an answer
result = pipe(question, max_new_tokens=50, num_beams=5)

# Display the result
print(result[0]['generated_text'])

## Evaluating the Model on Specific Basque Tasks

Now, we will evaluate the model on more specific tasks, such as summarizing articles or answering questions related to Basque culture or news.

### Case 3: Summarize a Basque News Article

In [None]:
# Example of a Basque news article
article = """
Euskara hizkuntza gutxitua izan da azken hamarkadetan, baina azken urteotan, hainbat ekimen martxan jarri dira
hizkuntza biziberritzeko. Adimen artifiziala eta teknologia berriak ere lagungarri izan daitezke euskara zabaltzeko eta
erabilera areagotzeko.
"""

# Generate a summary
summary = pipe(article, max_new_tokens=50, num_beams=5)

# Display the generated summary
print(summary[0]['generated_text'])

## Discussion of the Results and Model Limitations

In this section, we evaluate the quality of the generated text. Key questions to consider:

* Is the generated text coherent?
* Is it accurate in its use of the Basque language?
* Are there any grammatical or syntactic errors?
* Does the model reflect any biases in its responses?

It is important to note that the Latxa 7B model is specifically designed for Basque, and its performance in other languages such as Spanish or English may be limited.

The Latxa 7B model is a significant step forward for low-resource languages like Basque. In our tests:

* **Text Generation:** The model generates coherent and contextually appropriate content in Basque.
* **Question Answering:** While simple factual questions were answered well, the model showed limitations with more complex or open-ended queries.
* **Summarization:** The model generated clear summaries, though there was some loss of detail in certain cases.

Since the model is not fine-tuned for specific tasks like instruction following or chat assistance, additional tuning such as Instruction Tuning or Reinforcement Learning could further improve its performance in these areas.