<img src="../assets/CoLLIE_blue.png" alt="GoLLIE" width="200"/>

# Custom Tasks with GoLLIE

This notebook is an example of how to run Custom Tasks with GoLLIE. This notebook covers:

- How to define the guidelines for a task
- How to load GoLLIE
- How to generate model inputs
- How to parse the output
- How to implement a scorer and evaluate the output

You can modify this notebook to run any task task you want 

### Import requeriments

See the requeriments.txt file in the main directory to install the required dependencies

In [None]:
import sys

sys.path.append("../")  # Add the GoLLIE base directory to sys path

In [None]:
import rich
import logging
from src.model.load_model import load_model
import black
import inspect
from jinja2 import Template as jinja2Template
import tempfile
from src.tasks.utils_typing import AnnotationList

logging.basicConfig(level=logging.INFO)
from typing import Dict, List, Type

## Load GoLLIE

We will load GOLLIE-7B from the huggingface-hub.
You can use the function AutoModelForCausalLM.from_pretrained if you prefer it. However, we provide a handy load_model function with many functionalities already implemented that will assist you in reproducing our results.

Please note that setting use_flash_attention=True is mandatory. Our flash attention implementation has small numerical differences compared to the attention implementation in Huggingface. Using use_flash_attention=False will result in the model producing inferior results. Flash attention requires an available CUDA GPU. Running GOLLIE pre-trained models on a CPU is not supported. We plan to address this in future releases.

- Set force_auto_device_map=True to automatically load the model on available GPUs.
- Set quantization=4 if the model doesn't fit in your GPU memory.

In [None]:
model, tokenizer = load_model(
    inference=True,
    model_weights_name_or_path="HiTZ/GoLLIE-7B",
    quantization=None,
    use_lora=False,
    force_auto_device_map=True,
    use_flash_attention=True,
    torch_dtype="bfloat16",
)

In [None]:
# import torch
# device = "cuda:0" if torch.cuda.is_available() else "cpu"
# device = "cpu"
# model.to(device)

## Define the guidelines

First, we will define the labels and guidelines for the task. We will represent them as Python classes.

The following guidelines have been defined for this example. They were not part of the pre-training dataset. Therefore, we will run GOLLIE in zero-shot settings using unseen labels.

We will use the `Generic` class, which is a versatile class that allows for the implementation of any task you want. However, since the model has never seen the Generic label during training, we will rename it to Template, which is recognized by the model (as it was used in the Tacred dataset).

We will define two classes: `Launcher` and `Mission`. Each class will have a definition and a set of slots that the model needs to fill. Each slot also requires a type definition and a short description, which can include examples. For instance, for the `Launcher` class, we define three slots:

- The `mention`, which will be the name of the Launcher vehicle and should be a string.
- The `space_company` that operated the vehicle, which will also be a string.
- The `crew`, which is defined as a list of astronauts. Therefore, GoLLIE will fill this slot with a list of strings.

ðŸ’¡ Be creative and try to define your own guidelines to test GoLLIE!

In [None]:
from typing import List

from src.tasks.utils_typing import dataclass
from src.tasks.utils_typing import Generic as Template

"""
Entity definitions
"""


@dataclass
class ValueChainStep(Template):
    """This refers to the position a semiconductor company has in a stylized value chain. This position is determined by the
    company's activities, like what products or services it offers. We want to classify companies into 5 steps and 5 steps only.
    The 5 classes are (seperated by semicolon): [material_resource; tool_resource; Chip Design; Fabrication;
    Assembly, Testing & Packaging (ATP)].

    IMPORTANT: choose AT LEAST one class. One class is always correct, sometimes more.

    """

    predicted_steps: List[str]
    """
    Here you should give the step of the semiconductor value chain that you think the company is in. Only use 5 the classes that I
    gave you:  [material_resource; tool_resource; Chip Design; Fabrication; Assembly, Testing & Packaging (ATP)] A firm can be in more
    than one step, and even in all steps.
    """


ENTITY_DEFINITIONS: List[Template] = [ValueChainStep]

if __name__ == "__main__":
    cell_txt = In[-1]

### Print the guidelines to guidelines.py

Due to IPython limitations, we must write the content of the previous cell to a file and then import the content from that file.

In [None]:
with open("guidelines.py", "w", encoding="utf8") as python_guidelines:
    print(cell_txt, file=python_guidelines)

from guidelines import *

We use inspect.getsource to get the guidelines as a string

In [None]:
guidelines = [inspect.getsource(definition) for definition in ENTITY_DEFINITIONS]

## Define input sentence

Here we define the input sentence and the gold labels.

You can define and empy list as gold labels if you don't have gold annotations.

### So here I jump in. Get the text from orbis and cc

In [None]:
import pandas as pd

df = pd.read_pickle(
    "/home/zelle/development/projects/ascii/reference-data/data_raw_direct_source_drop/joshua/llm_data/gt_orb.pickle"
)

In [None]:
df

In [None]:
i = 100

In [None]:
# take the first text for model
text = df["text"][i]
rich.print(text)

In [None]:
# text = "This company is mainly providing raw materials to customers world wide. It offers the highest quality and clean extraction methods."
gold = [ValueChainStep(predicted_steps=df["class"][i])]

In [None]:
df["class"][i]

## Filling a template

We need to define a template. For this task, we will include only the class definitions and the text to be annotated. However, you can design different templates to incorporate more information (for example, event triggers, as demonstrated in the Event Extraction notebook).

We will use Jinja templates, which are easy to implement and exceptionally fast. For more information, visit: https://jinja.palletsprojects.com/en/3.1.x/api/#high-level-api.



In [None]:
template_txt = """# The following lines describe the task definition
# Here is what you should identify, the segment that a semiconductor company belongs to in the value chain.
{%- for definition in guidelines %}
{{ definition }}
{%- endfor %}

# This is the text to analyze
text = {{ text.__repr__() }}

# Now here you should say what segments or value chain steps the company belongs to:
result = [
{%- for ann in annotations %}
    {{ ann }},
{%- endfor %}
]
"""

In [None]:
template = jinja2Template(template_txt)
# Fill the template
formated_text = template.render(
    guidelines=guidelines, text=text, annotations=gold, gold=gold
)

### Black Code Formatter

We use the Black Code Formatter to automatically unify all the prompts to the same format. 

https://github.com/psf/black

In [None]:
black_mode = black.Mode()
formated_text = black.format_str(formated_text, mode=black_mode)

### Print the filled and formatted template

In [None]:
rich.print(formated_text)

## Prepare model inputs

We remove everything after `result =` to run inference with the model.

In [None]:
prompt, _ = formated_text.split("result =")
prompt = prompt + "result ="

Tokenize the input sentence

In [None]:
model_input = tokenizer(prompt, add_special_tokens=True, return_tensors="pt")

Remove the `eos` token from the input

In [None]:
model_input["input_ids"] = model_input["input_ids"][:, :-1]
model_input["attention_mask"] = model_input["attention_mask"][:, :-1]

## Run GoLLIE

We generate the predictions using GoLLIE.

We use `num_beams=1` and `do_sample=False` in our exmperiments. But feel free to experiment with differen decoding strategies ðŸ˜Š

In [None]:
%%time

model_ouput = model.generate(
    **model_input.to(model.device),
    max_new_tokens=128,
    do_sample=True,
    min_new_tokens=1,
    num_beams=1,
    num_return_sequences=1,
)

### Print the results

In [None]:
for y, x in enumerate(model_ouput):
    print(f"Answer {y}")
    rich.print(tokenizer.decode(x, skip_special_tokens=True).split("result = ")[-1])

## Parse the output

The output is a Python list of instances, we can execute it  ðŸ¤¯

We define the AnnotationList class to parse the output with a single line of code. The `AnnotationList.from_output` function filters any label that we did not define (hallucinations) to prevent getting an `undefined class` error. 

In [None]:
result = AnnotationList.from_output(
    tokenizer.decode(model_ouput[0], skip_special_tokens=True).split("result = ")[-1],
    task_module="guidelines",
)
rich.print(result)

Labels are an instance of the defined classes:

In [None]:
type(result[0])

In [None]:
result[0].predicted_steps

# Evaluate the result

Finally, we will evaluate the outputs from the model.

First, we define an Scorer, for Named Entity Recognition, we will use the `SpanScorer` class.

We need to define the `valid_types` for the scorer, which will be the labels that we have defined. 

In [None]:
from src.tasks.utils_scorer import TemplateScorer


class MyScorer(TemplateScorer):
    """Compute the F1 score for Generic Task"""

    valid_types: List[Type] = ENTITY_DEFINITIONS

### Instanciate the scorer

In [None]:
scorer = MyScorer()

### Compute F1 

In [None]:
scorer_results = scorer(reference=[gold], predictions=[result])
rich.print(scorer_results)

GoLLIE has successfully labeled a sentence using a set of labels that were not part of the pretraining dataset ðŸŽ‰ðŸŽ‰ðŸŽ‰

GoLLIE will perform well on labels with well-defined and clearly bounded guidelines. 

Please share your cool experiments with us; we'd love to see what everyone is doing with GoLLIE!
- [@iker_garciaf](https://twitter.com/iker_garciaf)
- [@osainz59](https://twitter.com/osainz59)