# Sesión 2 - Python GATEnlp

Python GATEnlp es una herramienta gráfica que permite definir distintos pipelines para la obtención de anotaciones a través del uso de Gazzetteers y reglas.

El objetivo de la práctica es ver las posibilidades de GATE y crear distintos recursos para realizar la detección de entidades usando un enfoque basado en conocimiento. Iremos procesando un texto de ejemplo.

Lo primero que haremos será instalar GATEnlp y stanza y descargar el modelo en español de stanza.

In [None]:
#quitamos que se muestren los mensajes de log como DEBUG e INFO
import logging, sys
logging.disable(sys.maxsize)


!pip3 install gatenlp[all]

!pip3 install stanza
import stanza
stanza.download('es')



Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.9.0.json:   0%|   …

## Apartado 1.1 (Resuelto)

Cargamos un documento en inglés y lo anotamos con el servicio de GateCloudAnnotator (https://cloud.gate.ac.uk/) y más concretamente el de ANNIE. Este componente permite la detección de distintas entidades como son:
*   :Person
* :Location
* :Organization
* :Date
* :Address
* :Money
* :Percent
* :Token
* :SpaceToken
* :Sentence


In [None]:
# Definimos un ejemplo de texto en inglés.
texto_en= """
Roger Federer (German: [ˈrɔdʒər ˈfeːdərər]; born 8 August 1981) is a Swiss professional tennis player. He is ranked No. 9 in the world by the Association of Tennis Professionals (ATP). He has won 20 Grand Slam men's singles titles, an all-time record shared with Rafael Nadal and Novak Djokovic. Federer has been world No. 1 in the ATP rankings a total of 310 weeks – including a record 237 consecutive weeks – and has finished as the year-end No. 1 five times. Federer has won 103 ATP singles titles, the second most of all-time behind Jimmy Connors, including a record six ATP Finals.

Federer has played in an era where he dominated men's tennis together with Rafael Nadal and Novak Djokovic, who have been collectively referred to as the Big Three and are widely considered three of the greatest tennis players of all-time.[c] A Wimbledon junior champion in 1998, Federer won his first Grand Slam singles title at Wimbledon in 2003 at age 21. In 2004, he won three out of the four major singles titles and the ATP Finals,[d] a feat he repeated in 2006 and 2007. From 2005 to 2010, Federer made 18 out of 19 major singles finals. During this span, he won his fifth consecutive titles at both Wimbledon and the US Open. He completed the career Grand Slam at the 2009 French Open after three previous runner-ups to Nadal, his main rival up until 2010. At age 27, he also surpassed Pete Sampras's then-record of 14 Grand Slam men's singles titles at Wimbledon in 2009.

Although Federer remained in the top 3 through most of the 2010s, the success of Djokovic and Nadal in particular ended his dominance over grass and hard courts. From mid-2010 through the end of 2016, he only won one major title. During this period, Federer and Stan Wawrinka led the Switzerland Davis Cup team to their first title in 2014, adding to the gold medal they won together in doubles at the 2008 Beijing Olympics. Federer also has a silver medal in singles from the 2012 London Olympics, where he finished runner-up to Andy Murray. After taking half a year off in late 2016 to recover from knee surgery, Federer had a renaissance at the majors. He won three more Grand Slam singles titles over the next two years, including the 2017 Australian Open over Nadal and a men's singles record eighth Wimbledon title later in 2017. He also became the oldest ATP world No. 1 in 2018 at age 36.

A versatile all-court player, Federer's perceived effortlessness has made him highly popular among tennis fans. Originally lacking self-control as a junior, Federer transformed his on-court demeanor to become well-liked for his general graciousness, winning the Stefan Edberg Sportsmanship Award 13 times. He has also won the Laureus World Sportsman of the Year award a record five times. Outside of competing, he played an instrumental role in the creation of the Laver Cup team competition. Federer is also an active philanthropist. He established the Roger Federer Foundation, which targets impoverished children in southern Africa, and has raised funds in part through the Match for Africa exhibition series. Federer is routinely one of the top ten highest-paid athletes in any sport and ranked first among all athletes with $100 million in endorsement income in 2020."""

from gatenlp import Document
from gatenlp.processing.client.gatecloud import GateCloudAnnotator

# Definimos el anotador en la nube
annotator = GateCloudAnnotator(
    url="https://cloud-api.gate.ac.uk/process-document/annie-named-entity-recognizer",
    outset_name="ANNIE",
    ann_types=":Address,:Date,:Location,:Organization,:Person,:Money,:Percent,:Token,:SpaceToken,:Sentence"
)

doc = Document(texto_en)
# Ejecutamos el annotador y mostramos el documento anotado
doc = annotator(doc)
doc

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Apartado 1.2 (Resuelto)

Vamos a hacer un ejemplo de reglas para la detección de entidades usando los Gazetteers que son un conjunto de listas de palabras que se identificarán en GATE. Para ello descargamos el ejemplo que se proporciona y está en el AulaVirtual y se descomprime.

Cargamos un texto de ejemplo que se proporciona en español y se muestra.

In [None]:
#Descargamos los ficheros de ejemplo
!wget --no-check-certificate -q https://valencia.inf.um.es/valencia-tgine/gatenlpUM.zip -O gatenlpUM.zip
!unzip -o gatenlpUM.zip

import os
from gatenlp import Document
from gatenlp.processing.gazetteer import TokenGazetteer, StringGazetteer
from gatenlp.processing.tokenizer import NLTKTokenizer

# Cargamos un documento a partir de un fichero y lo mmostramos
doc = Document.load("rafa_nadal.txt")
doc

Archive:  gatenlpUM.zip
  inflating: gazetteer/lists.def     
  inflating: gazetteer/loc_spanish_city.lst  
  inflating: gazetteer/spanish_firstname.lst  
  inflating: rafa_nadal.txt          


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Apartado 1.3 (Resuelto)

Para poder usar los módulos de Gazetteer y PAMPAC de GATE es necesario tener un Tokenizer definido. Nosotros vamos a usar el Tokenizer de Stanza. No solamente utilizaremos el Tokenizer sino que también obtendremos las categorías gramaticales haciendo uso del POS Tagger y también utilizaremos la detección de entidades. Para esto se define una función llamada *obtanerAnotacionesStanzaEnGate* que se describe a continuación

In [None]:
# definimos una función para crear las anotaciones de Stanza para que se muestren en GATE
def obtenerAnotacionesStanzaEnGate (doc):
  import string
  spanish_punctuation = string.punctuation + '¿'+'¡'

  nlp = stanza.Pipeline(lang='es', processors='tokenize,pos,ner')
  doctext = nlp(doc.text)

  annset = doc.annset()
  for sent in doctext.sentences:
    for tok in sent.tokens:
      kind = "word"
      orth = "lowercase"
      if tok.text.isupper():
        orth = "uppercase"
      elif tok.text[0].isupper():
        orth = "upperInitial"
      if tok.text.isnumeric():
        kind = "number"
      elif tok.text in spanish_punctuation:
        kind = "punctuation"
      ann = annset.add(tok.start_char,tok.end_char,"Token",{'string':tok.text, 'kind':kind, 'orth':orth, 'length':tok.end_char-tok.start_char, 'pos': tok.words[0].upos if tok.words[0].upos else ""})
  for ent in doctext.ents:
      ann = annset.add(ent.start_char,ent.end_char, ent.type,{'string': ent.text})
  return doc

#limpiamos las anotaciones
doc.annset().clear()
#ejecutamos la función de anotación de Stanza
doc=obtenerAnotacionesStanzaEnGate(doc)
#mostramos el documento
doc


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.9.0.json:   0%|   …

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Apartado 1.4 - Gazetteer (Resuelto)

Los Gazetteers son listas de expresiones de texto que representan algo como pueden ser nombres de ciudades, nombres de primera persona, meses del año, etc. Todas estas listas se definen en fichero list.def. Este fichero es un índice con el siguiente formato:
loc_spanish_city.lst:location:city
spanish_firstname.lst:person_first

```
loc_spanish_city.lst:location:city
spanish_firstname.lst:person_first
```
En la primera columna se define el nombre del fichero que contiene la lista y seguidamente se define el **majorType** y el **minorType**.

Las anotaciones resultantes del proceso se suelen guardar en el tipo de anotación **Lookup**.

In [None]:
# creamos un gazetteer con el fichero descargado list.def
gazetteer = StringGazetteer(source="gazetteer/lists.def", source_fmt="gate-def", outset_name="",  ann_type="Lookup")

# eliminamos todas las anotaciones de tipo Lookup que ya existen actualmente
doc.annset("").remove(doc.annset("").with_type("Lookup"))

# llamamos al gazetteer
doc = gazetteer(doc)
doc

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Apartado 1.5 (Resuelto)

PAMPAC “PAttern Matching with PArser Combinators” permite definir reglas complejas para la anotación de entidades en el texto a partir de patrones de texto.

Para eso se definen un conjunto de reglas que se basan en un tipo de expresiones regulares. Por ejemplo, la siguiente regla obtendrá todas las anotaciones de tipo **Lookup** cuyo majorType sea *"location"* y creará una nueva anotación llamada **LOC**.

```
r1 = Rule(
    # first the pattern
    AnnAt("Lookup", features=dict(majorType="location"),name="location1"),
    # then the action for the pattern
    AddAnn(name="location1", type="LOC", features=dict(rule="location1"))
    )



In [None]:
from gatenlp.pam.pampac import Ann, AnnAt, Rule, Pampac, AddAnn, N, Seq, Or
from gatenlp.pam.matcher import FeatureMatcher

# eliminamos todas las anotaciones del conjunto "Out1"
doc.annset("Out1").clear()

r1 = Rule(
    # first the pattern
    AnnAt("Lookup", features=dict(majorType="location"),name="location1"),
    # then the action for the pattern
    AddAnn(name="location1", type="LOC", features=dict(rule="location1"))
    )


# Create the annotation set for the annotations we want to match (just the tokens)
anns2match = doc.annset(name="").with_type("Token", "Lookup")

# Get the annotation set where we want to put new annotations
outset = doc.annset("Out1")

# Create the Pampac instance from the single rule and run it on the annotations, also specify output set
# The run method returns the list of offsets and the action return values where the rule matches in the doc
rules =[r1]
Pampac(*rules).run(doc, anns2match, outset=outset)
doc

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Apartado 1.6
Vamos a crear un conjunto de reglas para identificar nombres de persona en español:
* Creamos un nuevo gazetteer “surname.lst”.
* Insertamos “Nadal”, "Parera", "Djokovic" y “Ferrer” en la lista.
* Insertamos una nueva línea en el fichero list.def:
 * surname.lst:surname

Creamos nuevas reglas para identificar personas (**PER**):
* Regla 2:
  * Una persona se forma por un “person_first” y un “surname”

```
r2 = Rule(
    # first the pattern
    Seq(
      AnnAt("Lookup", features=dict(majorType="person_first")),
      AnnAt("Lookup", features=dict(majorType="surname")),
      name="person1"
      ),
    # then the action for the patter
    AddAnn(name="person1", type="PER", features=dict(rule="person1"))
    )
```

* Regla 3:
  * Una persona se forma por un “person_first” y un *Token* con su primer caracter en *uppercase*

* Regla 4:
  * Una persona se forma por un *Token* con su primer caracter en *uppercase* y un  *“surname”*


In [None]:
# creamos un gazetteer con el fichero descargado list.def
gazetteer = StringGazetteer(source="gazetteer/lists.def", source_fmt="gate-def", outset_name="",  ann_type="Lookup")

# eliminamos todas las anotaciones de tipo Lookup que ya existen actualmente
doc.annset("").remove(doc.annset("").with_type("Lookup"))

# llamamos al gazetteer
doc = gazetteer(doc)

# eliminamos todas las anotaciones del conjunto "Out1"
doc.annset("Out1").clear()

r1 = Rule(
    # first the pattern
    AnnAt("Lookup", features=dict(majorType="location"),name="location1"),
    # then the action for the pattern
    AddAnn(name="location1", type="LOC", features=dict(rule="location1"))
    )

r2 = Rule(
    # first the pattern
    Seq(
        AnnAt("Lookup", features=dict(majorType="person_first")),
        AnnAt("Lookup", features=dict(majorType="surname")),
        name="person1"
        ),
    # then the action for the pattern
    AddAnn(name="person1", type="PER", features=dict(rule="person1"))
    )

#Crear nueva regla r3
r3 = Rule(
    # first the pattern
    Seq(
        AnnAt("Lookup", features=dict(majorType="person_first")),
        AnnAt("Token", features=dict(orth="upperInitial")),
        name="person2"
        ),
    # then the action for the pattern
    AddAnn(name="person2", type="PER", features=dict(rule="person2"))
    )

#Crear nueva regla r4
r4 = Rule(
    # first the pattern
    Seq(
        AnnAt("Token", features=dict(orth="upperInitial")),
        AnnAt("Lookup", features=dict(majorType="surname")),
        name="person3"
        ),
    # then the action for the pattern
    AddAnn(name="person3", type="PER", features=dict(rule="person3"))
    )


# Create the annotation set for the annotations we want to match (just the tokens)
anns2match = doc.annset(name="").with_type("Token", "Lookup")

# Get the annotation set where we want to put new annotations
outset = doc.annset("Out1")

# Create the Pampac instance from the single rule and run it on the annotations, also specify output set
# The run method returns the list of offsets and the action return values where the rule matches in the doc
rules =[r1, r2, r3, r4]
Pampac(*rules).run(doc, anns2match, outset=outset)
doc



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Apartado 1.7
Modificamos la regla 3 para indicar que una **PER** está formada por 1 o 2 *“person_first”*

```
Rule 3:
r3 = Rule(
    # first the pattern
    Seq(
        N(
            AnnAt("Lookup", features=dict(majorType="person_first")),
            min=1, max=2
          ),
        AnnAt("Token", features=dict(orth="upperInitial")),
        name="person2"
      ),
    # then the action for the pattern
    AddAnn(name="person2", type="PER", features=dict(rule="person2"))
    )
```

Modificamos la regla 4 indicando que una **PER** está formada por 1 o 2 *“surname”*


```
Rule 4:
r4 = Rule(
    # first the pattern
    Seq(
        AnnAt("Token", features=dict(orth="upperInitial")),
        N(
            AnnAt("Lookup", features=dict(majorType="surname")),
            min=1, max=2
            ),
        name="person3"
        ),
    # then the action for the pattern
    AddAnn(name="person3", type="PER", features=dict(rule="person3"))
    )```




In [None]:
# creamos un gazetteer con el fichero descargado list.def
gazetteer = StringGazetteer(source="gazetteer/lists.def", source_fmt="gate-def", outset_name="",  ann_type="Lookup")

# eliminamos todas las anotaciones de tipo Lookup que ya existen actualmente
doc.annset("").remove(doc.annset("").with_type("Lookup"))

# llamamos al gazetteer
doc = gazetteer(doc)

# eliminamos todas las anotaciones del conjunto "Out1"
doc.annset("Out1").clear()

r1 = Rule(
    # first the pattern
    AnnAt("Lookup", features=dict(majorType="location"),name="location1"),
    # then the action for the pattern
    AddAnn(name="location1", type="LOC", features=dict(rule="location1"))
    )

r2 = Rule(
    # first the pattern
    Seq(N(AnnAt("Lookup", features=dict(majorType="person_first")),min=1, max=2), N(AnnAt("Lookup", features=dict(majorType="surname")),min=1,max=2), name="person1"),
    # then the action for the pattern
    AddAnn(name="person1", type="PER", features=dict(rule="person1"))
    )

#Crear nueva regla r3
r3 = Rule(
    # first the pattern
    Seq(
        N(
            AnnAt("Lookup", features=dict(majorType="person_first")),
            min=1, max=2
          ),
        AnnAt("Token", features=dict(orth="upperInitial")),
        name="person2"
      ),
    # then the action for the pattern
    AddAnn(name="person2", type="PER", features=dict(rule="person2"))
    )

#Crear nueva regla r4
r4 = Rule(
    # first the pattern
    Seq(
        AnnAt("Token", features=dict(orth="upperInitial")),
        N(
            AnnAt("Lookup", features=dict(majorType="surname")),
            min=1, max=2
            ),
        name="person3"
        ),
    # then the action for the pattern
    AddAnn(name="person3", type="PER", features=dict(rule="person3"))
    )


# Create the annotation set for the annotations we want to match (just the tokens)
anns2match = doc.annset(name="").with_type("Token", "Lookup")

# Get the annotation set where we want to put new annotations
outset = doc.annset("Out1")

# Create the Pampac instance from the single rule and run it on the annotations, also specify output set
# The run method returns the list of offsets and the action return values where the rule matches in the doc
rules =[r1, r2, r3, r4]
Pampac(*rules).run(doc, anns2match, outset=outset)
doc

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Ejercicio 1
Crear los recursos necesarios para identificar fechas en español con los siguientes patrones (reglas):

* Number + “de” +Month + “de” + Number
   * 12 de agosto de 2006
* Number
   * 2008
* Month + "de" + Number
  * diciembre de 2023

Cambiar la primera regla para que pueda identificar lo siguiente
* [Day] + Number + “de” + month + ["de" + Number]
   * Lunes 15 de marzo, martes 12 de junio de 2023, 12 de junio.


In [None]:
# creamos un gazetteer con el fichero descargado list.def
gazetteer = StringGazetteer(source="gazetteer/lists.def", source_fmt="gate-def", outset_name="",  ann_type="Lookup")

# eliminamos todas las anotaciones de tipo Lookup que ya existen actualmente
doc.annset("").remove(doc.annset("").with_type("Lookup"))

# llamamos al gazetteer
doc = gazetteer(doc)

# eliminamos todas las anotaciones del conjunto "Out1"
doc.annset("Out1").clear()

r1 = Rule(
    # first the pattern
    AnnAt("Lookup", features=dict(majorType="location"),name="location1"),
    # then the action for the pattern
    AddAnn(name="location1", type="LOC", features=dict(rule="location1"))
    )

r2 = Rule(
    # first the pattern
    Seq(N(AnnAt("Lookup", features=dict(majorType="person_first")),min=1, max=2), N(AnnAt("Lookup", features=dict(majorType="surname")),min=1,max=2), name="person1"),
    # then the action for the pattern
    AddAnn(name="person1", type="PER", features=dict(rule="person1"))
    )

#Crear nueva regla r3
r3 = Rule(
    # first the pattern
    Seq(
        N(
            AnnAt("Lookup", features=dict(majorType="person_first")),
            min=1, max=2
          ),
        AnnAt("Token", features=dict(orth="upperInitial")),
        name="person2"
      ),
    # then the action for the pattern
    AddAnn(name="person2", type="PER", features=dict(rule="person2"))
    )

#Crear nueva regla r4
r4 = Rule(
    # first the pattern
    Seq(
        AnnAt("Token", features=dict(orth="upperInitial")),
        N(
            AnnAt("Lookup", features=dict(majorType="surname")),
            min=1, max=2
            ),
        name="person3"
        ),
    # then the action for the pattern
    AddAnn(name="person3", type="PER", features=dict(rule="person3"))
    )

#Crear nueva regla r5
r5 = Rule(
    # first the pattern
    Seq(
        N(
          AnnAt("Lookup", features=dict(minorType="day")),
          min=0, max=1
        ),
        AnnAt("Token", features=dict(kind="number")),
        AnnAt("Token", features=dict(string="de")),
        AnnAt("Lookup", features=dict(minorType="month")),
        N(
          Seq(
              AnnAt("Token", features=dict(string="de")),
              AnnAt("Token", features=dict(kind="number")),
          ), min=0, max=1
        ),
        name="date1"
        ),
    # then the action for the pattern
    AddAnn(name="date1", type="DATE", features=dict(rule="date1"))
    )

#Crear nueva regla r6
r6 = Rule(
    # first the pattern
    AnnAt("Token", features=dict(kind="number", length=4, string=lambda x: int(x) > 1900),name="date2"),
    # then the action for the pattern
    AddAnn(name="date2", type="DATE", features=dict(rule="date2"))
    )

#Crear nueva regla r7
r7 = Rule(
    # first the pattern
    Seq(
        AnnAt("Lookup", features=dict(minorType="month")),
        AnnAt("Token", features=dict(string="de")),
        AnnAt("Token", features=dict(kind="number", length=4, string=lambda x: int(x) > 1900)),
        name="date3"
        ),
    # then the action for the pattern
    AddAnn(name="date3", type="DATE", features=dict(rule="date3"))
    )
# Create the annotation set for the annotations we want to match (just the tokens)
anns2match = doc.annset(name="").with_type("Token", "Lookup")

# Get the annotation set where we want to put new annotations
outset = doc.annset("Out1")

# Create the Pampac instance from the single rule and run it on the annotations, also specify output set
# The run method returns the list of offsets and the action return values where the rule matches in the doc
rules =[r1, r2, r3, r4, r5, r6, r7]
Pampac(*rules).run(doc, anns2match, outset=outset)
doc


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Ejercicio 2 - Para entregar

Crear los recursos necesarios para identificar localizaciones en español con los siguientes patrones (reglas):

* “en” + Location
  * en Murcia, en Orizaba
* “en” + Token(upperInitial)
  * en Murcia, en Orizaba
* “en” + “el”|”la”|”los”|”las” + Token
  * en el colegio, en la clase, en los botes, en las camas


In [38]:
# creamos un gazetteer con el fichero descargado list.def
gazetteer = StringGazetteer(source="gazetteer/lists.def", source_fmt="gate-def", outset_name="",  ann_type="Lookup")

# eliminamos todas las anotaciones de tipo Lookup que ya existen actualmente
doc.annset("").remove(doc.annset("").with_type("Lookup"))

# llamamos al gazetteer
doc = gazetteer(doc)

# eliminamos todas las anotaciones del conjunto "Out1"
doc.annset("Out1").clear()

r1 = Rule(
    Seq(
        AnnAt("Token", features=dict(string="de")),
        AnnAt("Lookup", features=dict(majorType="location")),
        name="location1"
      ),
    AddAnn(name="location1", type="LOC", features=dict(rule="location1"))
    )

r2 = Rule(
    Seq(
        AnnAt("Token", features=dict(string="en")),
        AnnAt("Token", features=dict(orth="upperInitial")),
        name="location2"
      ),
    AddAnn(name="location2", type="LOC", features=dict(rule="location2"))
    )

r3 = Rule(
    Seq(
        AnnAt("Token", features=dict(string="en")),
        AnnAt("Token", features=dict(pos="DET")),
        AnnAt("Token", features=dict()),
        name="location3"
      ),
    AddAnn(name="location3", type="LOC", features=dict(rule="location3"))
    )

# Create the annotation set for the annotations we want to match (just the tokens)
anns2match = doc.annset(name="").with_type("Token", "Lookup")

# Get the annotation set where we want to put new annotations
outset = doc.annset("Out1")

# Create the Pampac instance from the single rule and run it on the annotations, also specify output set
# The run method returns the list of offsets and the action return values where the rule matches in the doc
rules =[r1, r2, r3]
Pampac(*rules).run(doc, anns2match, outset=outset)
doc

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Ejercicio 3 - Para entregar

Crear los recursos necesarios para identificar cantidades de dinero en español con los siguientes patrones (reglas):

* Token.pos="NUM" + Moneda (euros, dólares)
  * 100.000 euros, 200.000 dólares
* “\$” + Token.pos = "NUM"
  * $ 100.000
* Token.pos="NUM" + "€"
  * 200.000 €


In [46]:
# creamos un gazetteer con el fichero descargado list.def
gazetteer = StringGazetteer(source="gazetteer/lists.def", source_fmt="gate-def", outset_name="",  ann_type="Lookup")

# eliminamos todas las anotaciones de tipo Lookup que ya existen actualmente
doc.annset("").remove(doc.annset("").with_type("Lookup"))

# llamamos al gazetteer
doc = gazetteer(doc)

# eliminamos todas las anotaciones del conjunto "Out1"
doc.annset("Out1").clear()

r1 = Rule(
    Seq(
        AnnAt("Token", features=dict(pos="NUM")),
        AnnAt("Token", features=dict(string=lambda x: x in ("euros", "dólares"))),
        name="dinero1"
      ),
    AddAnn(name="dinero1", type="MON", features=dict(rule="dinero1"))
)

r2 = Rule(
    Seq(
        AnnAt("Token", features=dict(string="$")),
        AnnAt("Token", features=dict(pos="NUM")),
        name="dinero2"
        ),
    AddAnn(name="dinero2", type="MON", features=dict(rule="dinero2"))
)

r3 = Rule(
    Seq(
        AnnAt("Token", features=dict(pos="NUM")),
        AnnAt("Token", features=dict(string="€")),
        name="dinero3"
    ),
    AddAnn(name="dinero3", type="MON", features=dict(rule="dinero3"))
)


# Create the annotation set for the annotations we want to match (just the tokens)
anns2match = doc.annset(name="").with_type("Token", "Lookup")

# Get the annotation set where we want to put new annotations
outset = doc.annset("Out1")

# Create the Pampac instance from the single rule and run it on the annotations, also specify output set
# The run method returns the list of offsets and the action return values where the rule matches in the doc
rules =[r1, r2, r3]
Pampac(*rules).run(doc, anns2match, outset=outset)
doc

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>