# 🎳 bumpner example - hierarchical entities with yaml

Sometimes we have entity taxonomies that are inherently hierarchical.

In these cases, we want a representation of the taxonomy that allows the model to first generate the top-level entity, and then continue to predict the child entities until we hit a 'root' entity.

I like yaml for this representation, since the indendation makes it clearer which entities are nested, as opposed to something like Pydantic objects or JSON.

In [1]:
import time
import pandas as pd
from functools import partial
from tabulate import tabulate
from guidance import gen
from guidance.models import Transformers, LlamaCpp

from bumpner import Bumpner

tabulate = partial(
    tabulate, headers="keys", showindex="never", tablefmt="simple_outline"
)

## Load model & motivating example

In [2]:
# You can use the below model from huggingface if you don't want to download the gguf
# model = Transformers(
#     "microsoft/Phi-3-mini-128k-instruct",
#     trust_remote_code=True,
#     device_map='cuda'
# )
model = LlamaCpp(
    'examples/Phi-3-mini-4k-instruct-q4.gguf',
    n_gpu_layers=-1,
    n_ctx=1028
)
wakeup = model + gen("Say hi", max_tokens=1)
few_shot = """
Input: John worked at Apple.
Output:
John: PERSON.GENERIC_PERSON
worked: O
at: O
Apple: ORG
.: O
"""
text = """
I work at Aperture Science with John.
We work on the portal gun product together.
He loves listening to elton john, drake, and lady gaga.
His num is s23AHG.
"""

## Hierarchical Entities from YAML

```yaml
PERSON:
  description: People, including fictional.
  CELEBRITY:
    description: A famous person.
    SINGER:
      description: A musician who is well-known as a singer, like Michael Jackson
      rules:
        keyword:
            - elton john
            - drake
    ACTOR:
      description: A person who is well-know for their acting in movies or TV shows.
  GENERIC_PERSON:
    description: A person, who we don't really recognize

PRODUCT:
  description: Products offered by a company.
  rules:
    keyword:
      - conversion gel
      - handheld portal gun

ORG:
  description: Companies, agencies, institutions, etc.

IDNUMBER:
  description: Identifier for Aperture Science employees
  rules:
    regex:
      - s\d{2}\w{3}
```

In [3]:
bumpner = Bumpner(
    model,
    "./entities.yaml", 
    few_shot=few_shot
)
start = time.time()
result = bumpner(text)
print(f"Took {time.time() - start} seconds")
del bumpner

Took 0.804356575012207 seconds


In [4]:
print(tabulate(pd.DataFrame(result, columns=["word", "label"])))

┌───────────┬─────────────────────────┐
│ word      │ label                   │
├───────────┼─────────────────────────┤
│ i         │ O                       │
│ work      │ O                       │
│ at        │ O                       │
│ aperture  │ ORG                     │
│ science   │ ORG                     │
│ with      │ O                       │
│ john      │ PERSON.GENERIC_PERSON   │
│ .         │ O                       │
│ we        │ O                       │
│ work      │ O                       │
│ on        │ O                       │
│ the       │ O                       │
│ portal    │ O                       │
│ gun       │ O                       │
│ product   │ O                       │
│ together  │ O                       │
│ .         │ O                       │
│ he        │ O                       │
│ loves     │ O                       │
│ listening │ O                       │
│ to        │ O                       │
│ elton     │ PERSON.CELEBRITY.SINGER │
