<img src="../assets/CoLLIE_blue.png" alt="GoLLIE" width="200"/>

# Event Extraction with GoLLIE

This notebook provides an example of how to conduct Event Extraction using GoLLIE.

In the paper, to compare GoLLIE with the previous state-of-the-art (SOTA), we divided the Event Extraction task into two sub-tasks: **Event Extraction** (EE) and **Event Argument Extraction** (EAE). The former focuses on detecting event instances in a text, while the latter, given an event instance, predicts its attributes, such as the persons involved and their roles. Additionally, GoLLIE is capable of performing **End-to-End Event Extraction**. In this notebook, we will demonstrate all three scenarios. This notebook covers:

- How to define guidelines for a task
- How to load GoLLIE
- How to conduct Event Extraction (EE) using GoLLIE
- How to perform Event Argument Extraction (EAE) with GoLLIE
- How to execute end-to-end Event Extraction with GoLLIE
- How to implement a scorer and evaluate the results

You can modify this notebook to run any Event Extraction task you want

### Import requeriments

See the requeriments.txt file in the main directory to install the required dependencies

In [168]:
import sys

sys.path.append("../")  # Add the GoLLIE base directory to sys path

In [169]:
import rich
import logging
from src.model.load_model import load_model
import black
import inspect
from jinja2 import Template
import tempfile
from src.tasks.utils_typing import AnnotationList

logging.basicConfig(level=logging.INFO)
from typing import Dict, List, Type

## Load GoLLIE

We will load GOLLIE-7B from the huggingface-hub.
You can use the function AutoModelForCausalLM.from_pretrained if you prefer it. However, we provide a handy load_model function with many functionalities already implemented that will assist you in reproducing our results.

Please note that setting use_flash_attention=True is mandatory. Our flash attention implementation has small numerical differences compared to the attention implementation in Huggingface. Using use_flash_attention=False will result in the model producing inferior results. Flash attention requires an available CUDA GPU. Running GOLLIE pre-trained models on a CPU is not supported. We plan to address this in future releases.

- Set force_auto_device_map=True to automatically load the model on available GPUs.
- Set quantization=4 if the model doesn't fit in your GPU memory.

In [3]:
model, tokenizer = load_model(
    inference=True,
    model_weights_name_or_path="HiTZ/GoLLIE-7B",
    quantization=None,
    use_lora=False,
    force_auto_device_map=True,
    use_flash_attention=True,
    torch_dtype="bfloat16",
)

INFO:root:Loading model model from HiTZ/GoLLIE-7B
INFO:root:We will load the model using the following device map: auto and max_memory: None
Loading the tokenizer from the `special_tokens_map.json` and the `added_tokens.json` will be removed in `transformers 5`,  it is kept for forward compatibility, but it is recommended to update your `tokenizer_config.json` by uploading it again. You will see the new `added_tokens_decoder` attribute that will store the relevant information.
INFO:root:Loading model with dtype: torch.bfloat16


>>>> Flash Attention installed
>>>> Flash RoPE installed


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:Model dtype: torch.bfloat16
INFO:root:Total model memory footprint: 13477.101762 MB
INFO:root:Quantization is enabled, we will not merge LoRA layers into the model. Inference will be slower.


# Event Extraction

## Define the guidelines

First, we will define the labels and guidelines for the task. We will represent them as Python classes.

We will use the ACE05 dataset guidelines for this example. You can find more information about the dataset and the guidelines here: https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-entities-guidelines-v6.6.pdf

ðŸ’¡ Be creative and try to define your own guidelines to test GoLLIE!

In [170]:
from typing import List

from src.tasks.utils_typing import Event, dataclass


# The following lines describe the task definition
@dataclass
class JusticeEvent(Event):
    """A JusticeEvent refers to any judicial action such as: arresting, jailing, releasing, granting parole, trial
    starting, hearing, charging, indicting, suing, convicting, sentencing, fine, executing, extraditing, adquiting,
    appealing or pardoning a Person entity."""

    mention: str
    """The text span that most clearly expresses the event.
        Such as: "parole", "sued", "sentenced", "appeal", "charged" 
    """


@dataclass
class PersonnelEvent(Event):
    """A PersonnelEvent occurs when a Person entity changes its job position (JobTitle entity) with respect an
    Organization entity. It includes when a person starts working, ends working, changes offices within, gets nominated or is
    elected for a position in a Organization."""

    mention: str
    """The text span that most clearly expresses the event.
        Such as: "won", "appoint", "retired", "fired", "appointed" 
    """


@dataclass
class BusinessEvent(Event):
    """A BusinessEvent refers to actions related to Organizations such as: creating, merging, declaring bankruptcy or
    ending organizations (including government agencies)."""

    mention: str
    """The text span that most clearly expresses the event.
        Such as: "started", "open", "create", "closing", "merged" 
    """


@dataclass
class LifeEvent(Event):
    """A LifeEvent occurs whenever a Person Entity borns, dies, gets married, divorced or gets injured."""

    mention: str
    """The text span that most clearly expresses the event.
        Such as: "wounded", "divorce", "birth", "born", "marriage" 
    """


@dataclass
class MovementEvent(Event):
    """A TransportEvent occurs whenever an Artifact (Weapon or Vehicle) or a Person is moved from one Place (GPE, Facility,
    Location) to another. This event requires the explicit mention of the Artifact or Person.
    """

    mention: str
    """The text span that most clearly expresses the event.
        Such as: "travel", "arrived", "going", "moving", "take" 
    """


@dataclass
class TransactionEvent(Event):
    """A TransactionEvent refers to buying, selling, loaning, borrowing, giving, or receving of Artifacts or
    Organizations; or giving, receiving, borrowing, or lending Money."""

    mention: str
    """The text span that most clearly expresses the event.
        Such as: "donate", "received", "paying", "seize", "contributions" 
    """


@dataclass
class ContactEvent(Event):
    """A ContactEvent occurs whenever two or more entities (persons or organization's representatives) come together at
    a single location and interact with one another face-to-face or directly enages in discussion via written or
    telephone communication."""

    mention: str
    """The text span that most clearly expresses the event.
        Such as: "meetings", "conference", "talked", "met", "letters" 
    """


@dataclass
class ConflictEvent(Event):
    """A ConflictEvent refers to either violent physical acts causing harm or damage, but are not covered by Life events
    (conflicts, clashes, fighting, gunfire, ...) or demonstrations (protests, sit-ins, strikes, riots, ...).
    """

    mention: str
    """The text span that most clearly expresses the event.
        Such as: "terrorism", "combat", "hit", "fight", "bombing" 
    """


EVENTS_DEFINITIONS: List[Event] = [
    LifeEvent,
    MovementEvent,
    TransactionEvent,
    BusinessEvent,
    ConflictEvent,
    ContactEvent,
    PersonnelEvent,
    JusticeEvent,
]

if __name__ == "__main__":
    cell_txt = In[-1]

### Print the guidelines to guidelines.py

Due to IPython limitations, we must write the content of the previous cell to a file and then import the content from that file.

In [171]:
with open("guidelines_ee.py", "w", encoding="utf8") as python_guidelines:
    print(cell_txt, file=python_guidelines)

import guidelines_ee as guidelines_ee

We use inspect.getsource to get the guidelines as a string

In [172]:
guidelines = [inspect.getsource(definition) for definition in guidelines_ee.EVENTS_DEFINITIONS]

## Define input sentence

Here we define the input sentence and the gold labels.

You can define and empy list as gold labels if you don't have gold annotations.

In [173]:
text = (
    "The Times reported that Vivendi Universal was negotiating to sell its flagship theme parks to the New York"
    " investment firm, Blackstone Group, for 8 billion as the first step toward dismantling its entertainment empire."
)
gold = [
    guidelines_ee.TransactionEvent(mention="sell"),
]

## Filling a template

For EE we will use the following prompt template.
We use Jinja templates, which are easy to implement and exceptionally fast. For more information, visit: https://jinja.palletsprojects.com/en/3.1.x/api/#high-level-api.

```Python
# The following lines describe the task definition
{%- for definition in guidelines %}
{{ definition }}
{%- endfor %}

# This is the text to analyze
text = {{ text.__repr__() }}

# The annotation instances that take place in the text above are listed here
result = [
{%- for ann in annotations %}
    {{ ann }},
{%- endfor %}
]

```

This template is stored in `templates/prompt.txt`

In [174]:
# Read template
with open("../templates/prompt.txt", "rt") as f:
    template = Template(f.read())
# Fill the template
formated_text = template.render(guidelines=guidelines, text=text, annotations=gold, gold=gold)

### Black Code Formatter

We use the Black Code Formatter to automatically unify all the prompts to the same format. 

https://github.com/psf/black

In [175]:
black_mode = black.Mode()
formated_text = black.format_str(formated_text, mode=black_mode)

### Print the filled and formatted template

In [176]:
rich.print(formated_text)

## Prepare model inputs

We remove everything after `result =` to run inference with the model.

In [177]:
prompt, _ = formated_text.split("result =")
prompt = prompt + "result ="

Tokenize the input sentence

In [178]:
model_input = tokenizer(prompt, add_special_tokens=True, return_tensors="pt")

Remove the `eos` token from the input

In [179]:
model_input["input_ids"] = model_input["input_ids"][:, :-1]
model_input["attention_mask"] = model_input["attention_mask"][:, :-1]

## Run GoLLIE

We generate the predictions using GoLLIE.

We use `num_beams=1` and `do_sample=False` in our exmperiments. But feel free to experiment with differen decoding strategies ðŸ˜Š

In [180]:
%%time

model_ouput = model.generate(
    **model_input.to(model.device),
    max_new_tokens=128,
    do_sample=False,
    min_new_tokens=0,
    num_beams=1,
    num_return_sequences=1,
)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


CPU times: user 600 ms, sys: 5.04 ms, total: 606 ms
Wall time: 608 ms


### Print the results

In [181]:
for y, x in enumerate(model_ouput):
    print(f"Answer {y}")
    rich.print(tokenizer.decode(x, skip_special_tokens=True).split("result = ")[-1])

Answer 0


## Parse the output

The output is a Python list of instances, we can execute it  ðŸ¤¯

We define the AnnotationList class to parse the output with a single line of code. The `AnnotationList.from_output` function filters any label that we did not define (hallucinations) to prevent getting an `undefined class` error. 

In [182]:
result = AnnotationList.from_output(
    tokenizer.decode(model_ouput[0], skip_special_tokens=True).split("result = ")[-1], task_module="guidelines_ee"
)
rich.print(result)

Labels are an instance of the defined classes:

In [183]:
type(result[0])

guidelines_ee.TransactionEvent

In [184]:
result[0].mention

'sell'

# Evaluate the result

Finally, we will evaluate the outputs from the model.

First, we define an Scorer, for EE, we will use the `EventScorer` class.

We need to define the `valid_types` for the scorer, which will be the labels that we have defined. 

In [185]:
from src.tasks.utils_scorer import EventScorer


class MyScorer(EventScorer):
    """Event scorer."""

    valid_types: List[Type] = guidelines_ee.EVENTS_DEFINITIONS

### Instanciate the scorer

In [186]:
scorer = MyScorer()

### Compute F1 

In [187]:
scorer_results = scorer(reference=[gold], predictions=[result])
rich.print(scorer_results)

GoLLIE has successfully predicted the `TransactionEvent` in the sentence. The argument F1-score is 0.0 because we are only predicting event instances. We will predict the arguments in the next step.

# Event Argument Extraction

In this step, we will predict the arguments for the `TransactionEvent`. `TransactionEvent` is a coarse category that encompasses two fine-grained categories: `TransferMoney` and `TransferOwnership`. Each of these fine-grained categories has distinct arguments. First, we define both classes and their respective arguments.

In [188]:
from typing import List

from src.tasks.utils_typing import Event, dataclass
from src.tasks.ace.prompts import TransactionEvent


# The following lines describe the task definition
@dataclass
class TransferMoney(TransactionEvent):
    """TransferMoney Events refer to the giving, receiving, borrowing, or lending money when it is not in the context of
    purchasing something. The canonical examples are: (1) people giving money to organizations (and getting nothing tangible in
    return); and (2) organizations lending money to people or other orgs."""

    mention: str  # The text span that most clearly expresses (triggers) the event
    giver: List[str]  # The donating agent
    recipient: List[str]  # The recipient agent
    beneficiary: List[str]  # The agent that benefits from the transfer
    money: List[str]  # The amount given, donated or loaned
    time: List[str]  # When the amount is transferred
    place: List[str]  # Where the transation takes place


@dataclass
class TransferOwnership(TransactionEvent):
    """TransferOwnership Events refer to the buying, selling, loaning, borrowing, giving, or receiving of artifacts or
    organizations."""

    mention: str  # The text span that most clearly expresses (triggers) the event
    buyer: List[str]  # The buying agent
    seller: List[str]  # The selling agent
    beneficiary: List[str]  # The agent that benefits from the transaction
    artifact: List[str]  # The item or Organization that was bought or sold
    price: List[str]  # The sale price of the artifact
    time: List[str]  # When the sale takes place
    place: List[str]  # Where the sale takes place


EVENTS_EAE_DEFINITIONS: List[Event] = [TransferMoney, TransferOwnership]

if __name__ == "__main__":
    cell_txt = In[-1]

As before, we write the content of the previous cell to a file and then import its content. We export the guidelines to a separate file and load them under a different name; otherwise, Python will not load them correctly.

In [189]:
with open("guidelines_eae.py", "w", encoding="utf8") as python_guidelines:
    print(cell_txt, file=python_guidelines)

import guidelines_eae as guidelines_eae

In [190]:
guidelines = [inspect.getsource(definition) for definition in guidelines_eae.EVENTS_EAE_DEFINITIONS]

Here we define the input sentence and the gold labels.

You can define and empy list as gold labels if you don't have gold annotations.

In [202]:
text = (
    "The Times reported that Vivendi Universal was negotiating to sell its flagship theme parks to the New York"
    " investment firm, Blackstone Group, for 8 billion as the first step toward dismantling its entertainment empire."
)
gold = [
    guidelines_eae.TransferOwnership(
        mention="sell",
        buyer=["Blackstone Group"],
        seller=["Vivendi Universal"],
        beneficiary=[],
        artifact=["theme parks"],
        price=["8 billion"],
        time=[],
        place=[],
    )
]

## Filling a template

For EAE, we will use a distinct template. This template incorporates both the event category and the trigger. As a result, GoLLIE only needs to predict the arguments corresponding to the specified events.

We use Jinja templates, which are easy to implement and exceptionally fast. For more information, visit: https://jinja.palletsprojects.com/en/3.1.x/api/#high-level-api.

```Python
# The following lines describe the task definition
{%- for definition in guidelines %}
{{ definition }}
{%- endfor %}

# This is the text to analyze
text = {{ text.__repr__() }}

# The list called result contains the instances for the following events according to the guidelines above:
{%- for ann in gold %}
#    - "{{ann.mention}}" triggers a {{ann.__class__.__name__}} event.
{%- endfor %}
# 
result = [
{%- for ann in annotations %}
    {{ ann }},
{%- endfor %}
]

```

This template is stored in `templates/prompt_ace_eae.txt`

In [203]:
# Read template
with open("../templates/prompt_ace_eae.txt", "rt") as f:
    template = Template(f.read())
# Fill the template
formated_text = template.render(guidelines=guidelines, text=text, annotations=gold, gold=gold)

### Black Code Formatter

We use the Black Code Formatter to automatically unify all the prompts to the same format. 

https://github.com/psf/black

In [204]:
black_mode = black.Mode()
formated_text = black.format_str(formated_text, mode=black_mode)

In [205]:
rich.print(formated_text)

As observed in the previous cell, there's an additional comment preceding the `result` variable that indicates the event trigger and the category of the event.

## Prepare model inputs

We remove everything after `result =` to run inference with the model.

In [206]:
prompt, _ = formated_text.split("result =")
prompt = prompt + "result ="

Tokenize the input sentence

In [207]:
model_input = tokenizer(prompt, add_special_tokens=True, return_tensors="pt")

Remove the `eos` token from the input

In [208]:
model_input["input_ids"] = model_input["input_ids"][:, :-1]
model_input["attention_mask"] = model_input["attention_mask"][:, :-1]

## Run GoLLIE

We generate the predictions using GoLLIE.

We use `num_beams=1` and `do_sample=False` in our exmperiments. But feel free to experiment with differen decoding strategies ðŸ˜Š

In [209]:
%%time

model_ouput = model.generate(
    **model_input.to(model.device),
    max_new_tokens=128,
    do_sample=False,
    min_new_tokens=0,
    num_beams=1,
    num_return_sequences=1,
)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


CPU times: user 2.49 s, sys: 6.28 ms, total: 2.5 s
Wall time: 2.51 s


### Print the results


In [210]:
for y, x in enumerate(model_ouput):
    print(f"Answer {y}")
    rich.print(tokenizer.decode(x, skip_special_tokens=True).split("result = ")[-1])

Answer 0


## Parse the output

The output is a Python list of instances, we can execute it  ðŸ¤¯

We define the AnnotationList class to parse the output with a single line of code. The `AnnotationList.from_output` function filters any label that we did not define (hallucinations) to prevent getting an `undefined class` error. 

In [211]:
result = AnnotationList.from_output(
    tokenizer.decode(model_ouput[0], skip_special_tokens=True).split("result = ")[-1], task_module="guidelines_eae"
)
rich.print(result)

Labels are an instance of the defined classes:



In [212]:
type(result[0])

guidelines_eae.TransferOwnership

In [213]:
result[0].mention

'sell'

# Evaluate the result

Finally, we will evaluate the outputs from the model.

First, we define an Scorer, for EE, we will use the `EventScorer` class.

We need to define the `valid_types` for the scorer, which will be the labels that we have defined. 


In [214]:
from src.tasks.utils_scorer import EventScorer


class MyScorer(EventScorer):
    """Event scorer."""

    valid_types: List[Type] = guidelines_eae.EVENTS_EAE_DEFINITIONS

### Compute F1 

In [215]:
scorer = MyScorer()

In [216]:
scorer_results = scorer(reference=[gold], predictions=[result])
rich.print(scorer_results)

GoLLIE has successfully predicted the event arguments for the `TransferOwnership` event in the sentence. In the EAE task, we can anticipate the F1 score for the Events to always be `1.0`, as we are providing the model with the gold triggers and classes. While it's possible to implement the EE and EAE tasks as a pipeline, let's venture into something even more exciting! Next, we will explore end-to-end event extraction with GoLLIE.   ðŸš€ðŸš€ðŸš€



# END-to-END Event Extraction

In the EAE task, we provided the model with the gold triggers and classes. Now, let's modify the template to remove that information.

We will use the same guidelines as in the EAE task, meaning that we will define the `TransferMoney` and `TransferOwnership` events. It's worth noting that the ACE05 dataset includes 32 fine-grained event classes. But incorporating all of them would result in an huge prompt with a lot of tokens. While GoLLIE can natively handle inputs of up to `4096` tokens (which can be expanded using RoPE scaling to `64000` tokens or even more), we will keep this example simple by focusing on only 2 event classes. Feel free to experiment by adding more event definitions!! 

It's important to mention that GoLLIE has not been pre-trained for end-to-end Event Extraction tasks. Nonetheless, its robust generalization capabilities make such a task feasible.

In [217]:
template_txt = """# The following lines describe the task definition
{%- for definition in guidelines %}
{{ definition }}
{%- endfor %}

# This is the text to analyze
text = {{ text.__repr__() }}

# The annotation instances that take place in the text above are listed here
result = [
{%- for ann in annotations %}
    {{ ann }},
{%- endfor %}
]
"""

In [218]:
template = Template(template_txt)
# Fill the template
formated_text = template.render(guidelines=guidelines, text=text, annotations=gold, gold=gold)

In [219]:
black_mode = black.Mode()
formated_text = black.format_str(formated_text, mode=black_mode)
rich.print(formated_text)

As observed in the previous cell, we no longer provide the model with the gold trigger or event class. Therefore, GoLLIE will attempt to predict the correct trigger, event class, and arguments.

In [220]:
prompt, _ = formated_text.split("result =")
prompt = prompt + "result ="
model_input = tokenizer(prompt, add_special_tokens=True, return_tensors="pt")
model_input["input_ids"] = model_input["input_ids"][:, :-1]
model_input["attention_mask"] = model_input["attention_mask"][:, :-1]

Let's run the model ðŸ‘€

In [221]:
%%time

model_ouput = model.generate(
    **model_input.to(model.device),
    max_new_tokens=128,
    do_sample=False,
    min_new_tokens=0,
    num_beams=1,
    num_return_sequences=1,
)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


CPU times: user 2.47 s, sys: 7.25 ms, total: 2.48 s
Wall time: 2.49 s


And parse the output ðŸ¤ž

In [224]:
result = AnnotationList.from_output(
    tokenizer.decode(model_ouput[0], skip_special_tokens=True).split("result = ")[-1], task_module="guidelines_eae"
)
rich.print(result)

In [225]:
from src.tasks.utils_scorer import EventScorer


class MyScorer(EventScorer):
    """Event scorer."""

    valid_types: List[Type] = guidelines_eae.EVENTS_EAE_DEFINITIONS


scorer = MyScorer()
scorer_results = scorer(reference=[gold], predictions=[result])
rich.print(scorer_results)

GoLLIE has successfully performed end-to-end event extraction, even though it was not pre-trained for this task. The predicted trigger, event class, and all the arguments are correct. This demonstrates the strong generalization capabilities of the model ðŸŽ‰ðŸŽ‰ðŸŽ‰


We have not extensively tested GoLLIE's capabilities to perform end-to-end event extraction in zero-shot settings (unseen event types). However, we plan to do so in the near future. Feel free to venture into the unknown, and please share your exciting experiments with us; we'd love to see what everyone is achieving with GoLLIE!
- [@iker_garciaf](https://twitter.com/iker_garciaf)
- [@osainz59](https://twitter.com/osainz59)