# NER and aspect extraction with an open-source LLM (Mixtral 8x7b)


Large Language Models (LLMs) have reshaped natural language processing (NLP), offering powerful capabilities in tasks like information extraction (IE) from historical texts. Chat-based generative models completely change the way we can interact with and analyse our corpora. These models enable users to engage with training data using natural language, revolutionizing communication paradigms and propagating a wide adoption of AI-tools across text-based tasks. However, concerns about **data privacy**, and **access** have arisen due to the dominance of closed-source models from industry giants like OpenAI and Google. To address these issues, there's a growing interest in open-source alternatives, which provide transparency and control over models and data.

This Jupyter Notebook explores the potential of open-source LLMs for NER and aspect recognition in historical texts. We'll showcase zero- and few-shot learning to overcome **data scarcity**, a pivotal problem in applying IE in literary-historical contexts. We aim to showcase how open-source LLMs can illuminate the past and shape the future of historical scholarship!


The Notebook showcases the following procedures:



1.  **Zero-shot NER/aspect extraction.**

    *   With [Mixtral 8x7b](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) (**multilingual model trained on French, English, German**), with (at the time of writing) a leading position on **Dutch** benchmarks too, despite not having been trained for this purpose.



2.   **Few-shot NER/aspect extraction**
    *   With Mixtral 8x7b


3.   **Evaluation**
    * Sample through the results manually.
    * Calculate F1 (if training data available)


We implement the code using the package **[LangChain](https://www.langchain.com/)**, a popular wrapper around both closed and open-source LLMs.








## Required background knowledge 🧠

❗🎓 To adapt and use this Notebook to produce entities for your own texts, you need to have an intuitive understanding of the following concepts:



*   [named entity recognition](https://en.wikipedia.org/wiki/Named-entity_recognition)
*   few-shot modelling
*   zero-shot modelling
*   prompting
*   Large Language Models (generative AI)
*   [HuggingFace model hub](https://huggingface.co/)
*   [BIO-labels / span evaluation](https://pypi.org/project/nervaluate/)
*   Evaluation metrics (F1, accuracy, precision, recall)
*   GitHub
*   [LangChain](https://www.langchain.com/)


To adapt the code, you need to know about:


* Functions and classes in Python
* Pandas dataframe operations
* Jupyter Notebooks


# Load packages 📚


In [1]:
!pip install langchain langchain-huggingface langchain-community langchain-core  nervaluate pandas torch transformers datasets huggingface_hub numpy scikit-learn



In [2]:
import pandas as pd
import torch
from transformers import BitsAndBytesConfig
from langchain import HuggingFacePipeline
from langchain_community.llms import HuggingFaceHub
from langchain_huggingface.llms.huggingface_endpoint import HuggingFaceEndpoint
from langchain import PromptTemplate, LLMChain
import os
import json
from nervaluate import Evaluator
import numpy as np
import pandas as pd
from sklearn.metrics import classification_report
import time
import ast

from langchain.schema.output_parser import StrOutputParser
from pydantic_core import from_json
from pydantic import BaseModel

import glob
from typing import List, Optional
from pydantic import BaseModel, ValidationError
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts.few_shot import FewShotPromptTemplate
from langchain_core.prompts.prompt import PromptTemplate

In [3]:
!pip install session-info



In [4]:
import session_info
session_info.show()

  mod_version = _find_version(mod.__version__)


# Load example data 📜

In this code snippet we collect a multilingual corpus of travel literature from the GitHub repository pertaining to GhentCDH. You can find more information on this example corpus on our [GitHub repository](https://github.com/GhentCDH/CLSinfra).

To show you how this workflow can work for different languages, we'll load in our **Dutch** and **English** annotations. We annotated two aspects in these texts: **fauna** 🐱 and **flora** 🌺. These include common names and scientific denominations.


In [5]:
path = "./"

In [6]:
EN_fauna_flora = pd.read_csv("EN_fauna_flora_df.csv")
NL_fauna_flora = pd.read_csv("NL_fauna_flora_df.csv")

In [7]:
NL_fauna_flora.sample(10)

Unnamed: 0,sentence,text,_sentence_text,aspect_cat
381,BHL_7_sample_Dutch_19.0.txt_1058-1188,randdorens,Brongniartii Miq . en Af . hystrix Parm . ) wa...,FLORA
551,DBNL-10_sample20 (1).txt_9729-9791,paard,In 't midden is een wacht te paard om voor de ...,FAUNA
490,BHL_7_sample_Dutch_19.0.txt_19462-19642,arme flora,Wat evenwel voorloopig nog zonder merkbaar gev...,FLORA
522,BHL_7_sample_Dutch_19.0.txt_25740-25906,dorengroepen,Op die takken staan de dorengroepen meer gedro...,FLORA
254,BHL_794_sample_Dutch_18.0.txt_2047-2156,wal,"Men verzamelt die zeer fterk , droogt ze in hu...",FLORA
515,BHL_7_sample_Dutch_19.0.txt_24806-24918,parenchijm,"Uitgenomen eenige vaatbundelstrengen , diè in ...",FLORA
257,BHL_794_sample_Dutch_18.0.txt_2579-2643,appelen,Ik heb 'er velerlei foorten van ichone appelen...,FLORA
320,BHL_794_sample_Dutch_18.0.txt_7373-7654,wortelen,Ook hebben zy menigen goeden maal- tyd van vel...,FLORA
435,BHL_7_sample_Dutch_19.0.txt_8371-8863,middendorens,In zeer enkele gevallen komt er een tweede der...,FLORA
342,BHL_794_sample_Dutch_18.0.txt_12673-12699,kalkoen,"Sols , en een kalkoen 20 .",FAUNA


## Set environment ❗


**IMPORTANT STEP**: before you can proceed with the code in this Notebook, you have to [request an API token](https://huggingface.co/docs/api-inference/quicktour) from the HuggingFace model hub. Make an account on the website, and follow their directions to create a token. This ensures that HuggingFace controls how many API calls you can make.

In [8]:
#Enter the necessary API token to gain access to the HuggingFace API
HUGGINGFACEHUB_API_TOKEN = "token"
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

In [9]:
#State the model id on HuggingFace for Mixtral (French, Italian, German, Spanish, English)
repo_id = "meta-llama/Llama-3.1-8B-Instruct"

# Zero-shot NER/aspect extraction


Here, we'll use the framework LangChain to send a request to the open-source generative LLM to extract aspects from our texts.

The model choice is a repo id which the user can adjust according to their needs.
As an example, we're using the multilingual generative LLM **mistralai/Mixtral-8x7B-Instruct-v0.1**, which is fine-tuned for multiple languages except for Dutch.

We construct a structured prompt example for the model to extract entities/aspects from the texts in several categories, which you can fully adapt to your needs and texts.

**CATEGORIES**


---


The entities and categories we will focus on in this notebook are the following:
- **FAUNA**
- **FLORA**
- **PERSON**
- **LOCATION**
- **ORGANISATION**

## Validate the output of the LLM

We want our prompt to return our NER-results as a valid JSON output. However, LLMs tend to output incomplete or invalid JSON-schemas, or hallucinates output. Luckily, **Pydantic** is a library which can fix these issues.

First, we'll construct a **Pydantic Class** to assert which data types we expect from the model output for each entity.

In [10]:
class NER(BaseModel):
    """

    This class asserts the data types we expect from the output of the LLM.
      person: Optionally a list, otherwise None.
      organisation: Optionally a list, otherwise None
      location:
      fauna:
      flora:
    """
    person: Optional[list] = None
    organisation: Optional[list] = None
    location: Optional[list] = None
    fauna: Optional[list] = None
    flora: Optional[list] = None

Let's test this out! We'll try to simulate an incomplete JSON output and feed it to our class.

In [11]:
partial_json = '{"location": ["Rome"], "person": ["John", "Capt. Cook"], "random": ["hallucination"]'

In [12]:
validator = NER.model_validate(from_json(partial_json, allow_partial=True))

print(repr(validator))

NER(person=['John', 'Capt. Cook'], organisation=None, location=['Rome'], fauna=None, flora=None)


As you can see, the class helps us to parse out the objects which are interesting to our use-case. As you can see, **hallucinations in the output are ignored**, and **the partial JSON-object is validated** automatically!


Now we can easily take the attributes from our validator!

In [13]:
validator.person

['John', 'Capt. Cook']

# Build a prompt

By means of experiment, we will feed several pieces of information to the LLM which we deem interesting to our use-case.
Similar to modelling, there are no clear-cut ways to build a prompt; and it's **all a matter of experimentation**!

🧠❗ Play around with the question, personality, and template!


In [14]:
# specify the question/request posed to the LLM

question = "Extract the relevant entities from the given sentence."

In [15]:
# specify the personality you expect from the LLM

personality = "You are a named entity recognizer trained to recognize entities in travelogues."

In [16]:
# add a JSON object with the category names followed by the expected data type

schema_entity = {
        "person": ["string"],
        "location": ["string"],
        "fauna": ["string"],
        "flora": ["string"],
        "organisation": ["string"],
        }

In [17]:
# add the category names with small global introduction/definition as a string

categories =  """person: proper names of people,
location: proper names of locations,
fauna: common and scientific names of animals and fauna,
flora: common and scientific names of vegetation, plants, flowers and flora,
organisation: proper names of organisations"""

In [18]:
# This brings all the elements above together in a template.
# The sentence is clearly indicated by <<<>>>, which helps the model to stick to the text given.

template = """ {personality}.
Your task is to identify the named entities in a sentence.
Structure the answer according to {schema_entity}. Do not deviate from this schema.
The sentence is indicated by <<<>>>.

Question: {question}
Sentence: <<<{sentence}>>>

Answer: """

In [19]:
sentence = "I was walking in Rome when I saw a beautiful deer and rabbits. I wanted to touch it but it ran through the dandelions."

In [20]:
# define the prompt

prompt = PromptTemplate.from_template(template)

In [21]:
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

llm = HuggingFaceEndpoint(
  repo_id=repo_id,
  huggingfacehub_api_token = HUGGINGFACEHUB_API_TOKEN,
)

chat_model = ChatHuggingFace(llm=llm)

chain = prompt | chat_model

response = chain.invoke({
    "question": question,
    "schema_entity": schema_entity,
    "personality": personality,
    "sentence": sentence,
    "categories": categories,
})


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [22]:
# print the raw response from the LLM
print(response.content)

{
  'person': ['I'],
  'location': ['Rome'],
  'fauna': ['deer', 'rabbits'],
  'flora': ['dandelions'],
  'organisation': []
}


In [23]:
import re
main_response = response.content.split("You are a named entity recognizer trained to recognize entities in travelogues.")[0]
# parse the response content to extract the first JSON part between the curly braces

In [24]:
main_response = main_response.replace("\n", '')

import ast
ast_response = ast.literal_eval(main_response)

result = NER.model_validate(ast_response)
print(repr(result))



NER(person=['I'], organisation=[], location=['Rome'], fauna=['deer', 'rabbits'], flora=['dandelions'])


In [25]:
result.fauna

['deer', 'rabbits']

In [26]:
result.location

['Rome']

## Functions

In this section, we write functions for making calls to the LLM and parsing the output.


1.  In our function *parse_llm_response*, we split the output and only take the element after our "Answer:"-section in our prompt. Then, we cast the result to JSON by applying [Pydantic](https://docs.pydantic.dev/latest/concepts/json/) to transform partial JSON outputs to a valid JSON object, parse the entity text and their labels.

2.   In our function *llm_output*, we call the LLM and apply our parsing function to the output.



---

❗💭 **Mind you that these functions will have to be adapted according to the output of your LLM of choice, given that the output is unpredictable and changes when your prompt does.**






In [27]:
# parse the llm response
# cast to json
# parse all the entities and their categories

def parse_llm_response(response, basemodel_class = NER):
  try:
    main_response = main_response.replace("{{", "{")
    main_response = main_response.replace("}}", "}")
    json_response = main_response.content.split("{")[1]  # Ensure we close the JSON object properly
    json_response = "{" + json_response.split("}")[0] + "}"  # Ensure we close the JSON object properly
    print("JSON response:")
    print(json_response)
    ast_response = ast.literal_eval(json_response)
    result = NER.model_validate(ast_response)

    category_entity = []
    for entity in result:
      if entity[1] != None: #if the model returned a valid result for the categories which is not None
        category = entity[0]
        entity_text_list = entity[1]

        for ent in entity_text_list:
          category_entity.append((ent, category))

    return category_entity

  except:
    print(response.content)
    return {(None, None)}



In [28]:
### call to the LLM and parse the response

def llm_output(sentence):
    llm = HuggingFaceEndpoint(
        repo_id=repo_id,
        huggingfacehub_api_token=os.getenv("HUGGINGFACEHUB_API_TOKEN"),
    )

    chat_model = ChatHuggingFace(llm=llm)

    chain = prompt | chat_model

    response = chain.invoke({
        "question": question,
        "schema_entity": schema_entity,
        "personality": personality,
        "sentence": sentence,
        "categories": categories,
    })

    return parse_llm_response(response)

## Apply the LLM to a Pandas DataFrame

Here, we take a sample of our corpus to showcase a possible approach.

In [29]:
English_corpus_sample = EN_fauna_flora[1:2]

In [30]:
English_corpus_sample

Unnamed: 0.1,Unnamed: 0,sentence,text,_sentence_text,aspect_cat
1,1,GB-117_sample_English_18.txt_7251-7283,flying fish,Some flying fish seen this day .,FAUNA


### Chunk the text into smaller parts

Most models takes a maximum input of **32768** tokens. Therefore, we make sure to split up the text in smaller bits before you proceed. In this case, our text is already in small pieces.

The model is probably more inclined to make mistakes when the text chunks are too large. One of the reasons for this is that the models have a tendency to focus on the beginning or the end of an input, and pay less attention to the middle part (this paper [linktekst](https://arxiv.org/pdf/2307.03172) expertly explains it!). On the other hand, there is a [strict rate limit](https://huggingface.co/docs/api-inference/faq) on the HuggingFace API. Experiment with these settings to see if this approach is useful for your use-case!

In [31]:
sample_text = English_corpus_sample.iloc[0]["text"]

In [32]:
#pick a sample of the first ten sentences in our corpus

test = EN_fauna_flora[:10]
#keep only unique _sentence_text
test = test.drop_duplicates(subset = ["_sentence_text"])


In [33]:
test

Unnamed: 0.1,Unnamed: 0,sentence,text,_sentence_text,aspect_cat
0,0,GB-117_sample_English_18.txt_4026-4114,Tropic Bird,"In the A.M. saw a Tropic Bird , which , I beli...",FAUNA
1,1,GB-117_sample_English_18.txt_7251-7283,flying fish,Some flying fish seen this day .,FAUNA
2,2,GB-117_sample_English_18.txt_7364-7486,Egg Bird,"In the P.M. saw an Egg Bird , and yesterday a ...",FAUNA
5,5,GB-117_sample_English_18.txt_7915-8097,land bird,Before 5 o'Clock we were obliged to close reef...,FAUNA
6,6,GB-117_sample_English_18.txt_9838-9965,Port Egmont Hen,"Last night we saw a Port Egmont Hen , and this...",FAUNA


In [34]:
# We apply our llm to a subset of our chunk to show how it works!

test["NER_results"] = test._sentence_text.apply(llm_output)

{
  'person': [],
  'location': ['A.M.', 'Latitudes'],
  'fauna': ['Tropic Bird'],
  'flora': [],
  'organisation': []
}
{'person': [], 'location': [], 'fauna': ['flying fish'], 'flora': [], 'organisation': []}
{
  'person': [],
  'location': [],
  'fauna': ['Egg Bird', 'Gannet', 'Birds'],
  'flora': [],
  'organisation': []
}
{'person': [], 'location': [], 'fauna': ['bird'], 'flora': [], 'organisation': []}
{
  'person': [],
  'location': ['Port Egmont'],
  'fauna': ['Port Egmont Hen', 'More', 'Pintado bird', 'Albetrosses'],
  'flora': [],
  'organisation': []
}


In [35]:
# Let's check the output of our NER-analysis!

test

Unnamed: 0.1,Unnamed: 0,sentence,text,_sentence_text,aspect_cat,NER_results
0,0,GB-117_sample_English_18.txt_4026-4114,Tropic Bird,"In the A.M. saw a Tropic Bird , which , I beli...",FAUNA,"{(None, None)}"
1,1,GB-117_sample_English_18.txt_7251-7283,flying fish,Some flying fish seen this day .,FAUNA,"{(None, None)}"
2,2,GB-117_sample_English_18.txt_7364-7486,Egg Bird,"In the P.M. saw an Egg Bird , and yesterday a ...",FAUNA,"{(None, None)}"
5,5,GB-117_sample_English_18.txt_7915-8097,land bird,Before 5 o'Clock we were obliged to close reef...,FAUNA,"{(None, None)}"
6,6,GB-117_sample_English_18.txt_9838-9965,Port Egmont Hen,"Last night we saw a Port Egmont Hen , and this...",FAUNA,"{(None, None)}"


In [36]:
#First we make a new representation of the dataframe where each extracted entity-label tuple is put on a separate row.
test = test.explode("NER_results")
#remove empty rows where NER_results is an empty list
# Then, we convert the results into two separate columns: entity and label
test['entity'], test['label'] = zip(*test.NER_results)

In [37]:
test

Unnamed: 0.1,Unnamed: 0,sentence,text,_sentence_text,aspect_cat,NER_results,entity,label
0,0,GB-117_sample_English_18.txt_4026-4114,Tropic Bird,"In the A.M. saw a Tropic Bird , which , I beli...",FAUNA,"(None, None)",,
1,1,GB-117_sample_English_18.txt_7251-7283,flying fish,Some flying fish seen this day .,FAUNA,"(None, None)",,
2,2,GB-117_sample_English_18.txt_7364-7486,Egg Bird,"In the P.M. saw an Egg Bird , and yesterday a ...",FAUNA,"(None, None)",,
5,5,GB-117_sample_English_18.txt_7915-8097,land bird,Before 5 o'Clock we were obliged to close reef...,FAUNA,"(None, None)",,
6,6,GB-117_sample_English_18.txt_9838-9965,Port Egmont Hen,"Last night we saw a Port Egmont Hen , and this...",FAUNA,"(None, None)",,


In [38]:
test.drop(columns = ["Unnamed: 0", "sentence", "text", "NER_results"], inplace = True)

### Inspect the results manually

There you go! We extracted entities and labels using a large open-source LLM.

Have a look at the results! They are pretty impressive, given that we never gave the LLM any examples. The results are fully **zero-shot**. However, due to the validation approach, we are aware of the fact that some results **may be missing**.

1.   Check if there are any hallucinated entities/aspects present in the output which were not in the sentence.
2.   Manually check some automatically labelled samples by sampling through your output.
3. Adapt your prompt accordingly.




---

🧠❗
Perhaps we can improve the prompt by applying a **few-shot approach**, where we feed the LLM some examples of text chunks from our corpus and the expected response in our prompt.



In [39]:
test.sample(5)

Unnamed: 0,_sentence_text,aspect_cat,entity,label
5,Before 5 o'Clock we were obliged to close reef...,FAUNA,,
0,"In the A.M. saw a Tropic Bird , which , I beli...",FAUNA,,
1,Some flying fish seen this day .,FAUNA,,
2,"In the P.M. saw an Egg Bird , and yesterday a ...",FAUNA,,
6,"Last night we saw a Port Egmont Hen , and this...",FAUNA,,


### Save results to a DataFrame

If we're satisfied with the results, we can eventually save them to a Dataframe.

In [40]:
path = "Llama_NER_aspect.csv"
test.to_csv(path, index = False)

# Few-shot NER/aspect extraction

In a few-shot approach, we **add some examples of expected output to the prompt.** This is different from fine-tuning a model, as we are not updating the weights of our model (which is what happens during training). For our use-case, we will do this in a couple of steps:

1. We choose some samples from our gold-standard labelled dataset to add to the prompt and save these examples in a list called *examples*.
2. We make a PromptTemplate object with our examples and our template.
3. We feed this PromptTemplate object to LangChain's [FewShotPromptTemplate](https://https://python.langchain.com/v0.1/docs/modules/model_io/prompts/few_shot_examples/) object.
4.  This FewShotPromptTemplate is then added to our prompt when we construct the llm chain.

Our chain, like, in our example above, can then be applied to a DataFrame and saved as a .csv-file.

---
❗ To add examples to our prompt, it needs to be clear for the model which elements are placeholders and which elements are strings. If we want to add a JSON-object to our prompt to show the model which output we expect, we need to put this object between 4 curly brackets. Placeholders need to be put inside of 1 curly bracket pair.

❗ If we want to compare the evaluation of our zero-shot approach to our few-shot approach, we apply both approaches to the same test set - cast the result to IOB-labels and calculate and compare the F1-scores for both approaches. Make sure that the examples you include in your few-shot approach are **not part of the test set**!

❗ Experiment with adding more information to the prompt! We could add more information on the entities we expect the model to extract (annotation guidelines) or give the model more context by adding a texts in the form of a vector database as background information!


In [41]:
examples = [
    {"input": """The hill-city of Perugia supplies an instructive contrast with the
hill-city of Cortona. The obvious contrast in the matter of modern
prosperity and importance is an essential part of the comparative
history. Cortona has through all ages lived on, but not much more
than lived on. Perugia has, through all ages, kept, if not a place
in the first rank of Italian cities, yet at any rate a high place in
the second rank. She never had the European importance of Venice,
Genoa, Florence, Naples, and Milan, or of Pisa in her great days.
But in the purely Italian history of all ages Perugia keeps herself
before our eyes, as a city of mark, from the wars of the growing Roman
commonwealth down to the struggle which in our own days freed her from
a second Roman yoke. In the civil wars of the old Rome, in the wars
between the Goth and the New Rome, in the long tale of the troubled
greatness of mediæval Italy, Etruscan Perusia, Roman Augusta Perusia,
mediæval and modern Perugia, holds no mean place. And the last act
in the long drama is not the least notable. It sounds like a bit out
of Plutarch's "Life of Timoleôn," when we read or when we remember
how, twice within our own days, little more than twenty and thirty
years back, the fortress of the tyrants was swept away, as the great
symbolic act which crowned the winning back of freedom in its newest
form. When a city has such a tale as this to tell, we do not expect, we
do not wish, that its only or its chief interest should gather round
the monuments of an early and almost præhistoric day of greatness. At
Cortona we are glad that things Etruscan are undoubtedly uppermost.
At Perugia we are glad that things Etruscan are there to be seen in
abundance; but we also welcome the monuments of Roman days, pagan
and Christian; we welcome the streets, the churches, and palaces of
mediæval times, and even the works of recent times indeed. The Place of
Victor Emmanuel with the modern buildings which crown it, supplanting
the fortress of Pope Paul, as that supplanted the houses, churches, and
palaces of earlier times, is as much a part of the history of Perugia
as the Arch of Augustus or the Etruscan wall itself.""",
     "answer": """{{{{'person': ['Plutarch', 'Victor Emmanuel', 'Pope Paul', 'Augustus'],
     'location': ['Perugia', 'Cortona', 'Venice', 'Genoa', 'Florence', 'Naples', 'Milan', 'Pisa', 'Rome', 'Italy', 'Etruscan Perusia', 'Roman Augusta Perusia'],}}}}""",
     "personality": personality,
     "schema_entity": """{{{{'person': ['string'],
 'location': ['string'],
 'fauna': ['string'],
 'flora': ['string'],
 'organisation': ['string']}}}}""",
     "question": question,},

    {"input": """If the journey be made on a market or fair day, the space between the
walls and the station at Arezzo may be seen crowded with white oxen,
suggesting the thought of triumphs and triumphal sacrifices. Their
race, it was said, prayed to the gods that Marcus and Julian might not
win victories which would lead to their destruction. And the prayer
seems to have been answered, as the breed specially connected with
Clitumnus has clearly not died out, even by the banks of Clanis. The
journey is not a long one; yet, if we had time to see everything, we
might well wish to break it, as we pass by the hill of Castiglione
Fiorentino, with its walls and towers. That strong and stern
hill-fortress comes in well between Arezzo and Cortona. Arezzo covers
a hill, but it can hardly be said to stand on a hill-top; Castiglione
distinctly does stand on a hill-top; Cortona sits enthroned on a height
which it would hardly be straining language to speak of as a mountain.
We have now come to a site of the oldest class, the stronghold on the
height, like Akrokorinthos and the Larissa of Argos. But at Argos and
Corinth the mountain-fortress became, at a later stage, the citadel of
the younger city which grew up at the mountain's foot. But at Cortona,
as at greater Perugia, the city still abides on the height; it has
never come down into the plain. So it has remained at Laon; so it has
become at Girgenti, where the vast lower space of the later Akragas
is forsaken, and the modern town has shrunk up within the lines of
the ancient acropolis. From the ground below Cortona we look up to a
city like those of old, great and fenced up to heaven; the "diadem
of towers" is there still, though it is now made up of a group of
towers, ecclesiastical, municipal, and military, none of them of any
account in itself, but each of which joins with its fellows to make
up an effective whole. At Cortona indeed, as at Argos and Corinth,
there is an upper and a lower city, and the upper city is pretty well
forsaken. But while at Argos and Corinth the lower city stands in the
plain, and the acropolis soars far above it, at Cortona the lower city
itself stands so high up the hill that it is only when we reach it
that we fully understand that there is a higher city still. The site
itself belongs so thoroughly to the oldest days of our European world
that there is a certain kind of satisfaction in finding that the main
interest of the place belongs to those oldest days. We are well pleased
that everything of later times is of quite a secondary character, and
that the distinctive character of Cortona is to be the city of the
Etruscan walls.""",
     "answer": """{{{{'person': ['Marcus', 'Julian', 'Clitumnus'],
     'location': ['Arezzo', 'Castiglione Fiorentino', 'Cortona', 'Clanis', 'Akrokorinthos', 'Argos', 'Corinth', 'Laon', 'Girgenti', 'Perugia', 'Akragas'],
 'fauna': ['white oxen'],}}}}""",
 "personality": personality,
     "schema_entity": """{{{{'person': ['string'],
 'location': ['string'],
 'fauna': ['string'],
 'flora': ['string'],
 'organisation': ['string']}}}}""",
     "question": question,},
]

In [42]:
template = """ {personality}.
Your task is to identify the named entities in a sentence.
Structure the answer according to {schema_entity}. Do not deviate from this schema.
The sentence is indicated by <<<>>>.

Question: {question}
Sentence: <<<{input}>>>

Answer: {answer} """

In [43]:
example_prompt = PromptTemplate(
    input_variables=["input", "answer", "personality", "question", "schema_entity"], template = template
)

example_prompt.format(**examples[0])

' You are a named entity recognizer trained to recognize entities in travelogues..\nYour task is to identify the named entities in a sentence.\nStructure the answer according to {{{{\'person\': [\'string\'],\n \'location\': [\'string\'],\n \'fauna\': [\'string\'],\n \'flora\': [\'string\'],\n \'organisation\': [\'string\']}}}}. Do not deviate from this schema.\nThe sentence is indicated by <<<>>>.\n\nQuestion: Extract the relevant entities from the given sentence.\nSentence: <<<The hill-city of Perugia supplies an instructive contrast with the\nhill-city of Cortona. The obvious contrast in the matter of modern\nprosperity and importance is an essential part of the comparative\nhistory. Cortona has through all ages lived on, but not much more\nthan lived on. Perugia has, through all ages, kept, if not a place\nin the first rank of Italian cities, yet at any rate a high place in\nthe second rank. She never had the European importance of Venice,\nGenoa, Florence, Naples, and Milan, or of 

In [44]:
#don't forget to put <<<>>> around the input to indicate that this is the sentence you want to focus on!

input = """<<<In the general view of Arezzo there can be hardly said to be any one
dominant object. If the castle made any show, it and the cathedral
church, standing nearly on the same level on the highest ground in the
town, would stand well side by side. As it is, the body of the duomo
is the prominent feature in the view. But it is hardly a dominant
feature. It is the only building whose body shows itself, but it rises
among a crowd of towers, ecclesiastical and municipal, and one of
them, the great campanile of St. Mary della Pieve , though the body
of its church does not show itself far below, is a distinct rival to
the cathedral, and utterly dwarfs its small and modern, though not
ungraceful, octagon tower.>>>"""

In [45]:
prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    suffix="{input}",
    input_variables=["input"],
)

In [46]:
def fewshot_llm_response(response, basemodel_class = NER):
    # try:
      main_response = response.content.replace("{{", "{")
      main_response = main_response.replace("}}", "}")
      json_response = main_response.split("{")[1]  # Ensure we close the JSON object properly
      json_response = "{" + json_response.split("}")[0] + "}"  # Ensure we close the JSON object properly
      print(json_response)

      ast_response = ast.literal_eval(json_response)
      result = NER.model_validate(ast_response)

      category_entity = []
      for entity in result:
        if entity[1] != None: #if the model returned a valid result for the categories which is not None
          category = entity[0]
          entity_text_list = entity[1]

          for ent in entity_text_list:
            category_entity.append((ent, category))

      return category_entity

    # except:
    #   print(response.content)
    #   return {(None, None)}

In [47]:
def few_shot_llm_output(sentence):
    llm = HuggingFaceEndpoint(
        repo_id=repo_id,
        huggingfacehub_api_token=os.getenv("HUGGINGFACEHUB_API_TOKEN"),
    )

    chat_model = ChatHuggingFace(llm=llm)

    chain = prompt | chat_model

    response = chain.invoke({
        "input": input,
        "schema_entity": schema_entity,
        "personality": personality,
        "sentence": sentence,
        "categories": categories,
    })
    # print(response.content)
    return fewshot_llm_response(response)

In [48]:
response = few_shot_llm_output(input)


{'person': [],
 'location': ['Arezzo', 'Akrokorinthos', 'Argos', 'Corinth', 'Laon', 'Girgenti', 'Perugia', 'Akragas'],
 'fauna': [],
 'organisation': []}


In [49]:
print(response)

[('Arezzo', 'location'), ('Akrokorinthos', 'location'), ('Argos', 'location'), ('Corinth', 'location'), ('Laon', 'location'), ('Girgenti', 'location'), ('Perugia', 'location'), ('Akragas', 'location')]


# Quantitative evaluation (custom case)


We're interested to build a NER/aspect recognizer which extracts both scientific and common names of fauna and flora.
We made an annotated test set of fauna and flora instances in our corpus. We will take the following steps:

1. Apply the LLM to all unique annotated sentences in our test set and save the results to a .csv-file.
2. Cast the results to IOB-labels using our [IOB-notebook](https://colab.research.google.com/drive/1IqA9gi6ExtFQspg4dmzTvmXCupP7KY35?usp=sharing).
3. Compare the lists of IOB-labels for both the gold standard data and the LLM output, and evaluate with [Nervaluate](https://pypi.org/project/nervaluate/).

In [50]:
#We have a couple of duplicates in our dataset. Let's get rid of them first!
NL_fauna_flora.drop_duplicates(inplace = True)

In [51]:
# We have 541 unique fauna and flora annotations for Dutch texts to evaluate our approach!
len(NL_fauna_flora)

541

In [52]:
NL_fauna_flora.head()

Unnamed: 0,sentence,text,_sentence_text,aspect_cat
0,DBNL-151_sample_IAA_19.txt_1626-1841,kaaiman,Op onze vaart daarheen hadden wij nog het voor...,FAUNA
1,DBNL-151_sample_IAA_19.txt_365-705,suikerriet,Van tijd tot tijd valt een spreker er tusschen...,FLORA
2,DBNL-151_sample_IAA_19.txt_365-705,zoethout,Van tijd tot tijd valt een spreker er tusschen...,FLORA
7,BHL_957_sample_Dutch_19.0.txt_306-596,bamboes,"Het werktuig , waarmede de bewoners van deze k...",FLORA
8,BHL_957_sample_Dutch_19.0.txt_609-741,bamboes,"' ) Door dit bamboes sterk te slingeren , word...",FLORA


In [53]:
training_df = NL_fauna_flora.groupby(["sentence"]).agg(list)

In [54]:
training_df["_sentence_text"] = training_df["_sentence_text"].apply(lambda x: x[0])

In [55]:
training_df

Unnamed: 0_level_0,text,_sentence_text,aspect_cat
sentence,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BHL_61_sample_Dutch_19.0.txt_18415-18641,[dieren],"Vetb werkt overal met zijn instrumenten , Snel...",[FAUNA]
BHL_794_sample_Dutch_18.0.txt_1004-1033,[mout],Men maakt 'er geen mout van .,[FLORA]
BHL_794_sample_Dutch_18.0.txt_1034-1105,"[haver, voeder, paar- den]","De haver wordt fterk gezaid , dog alleen tot v...","[FLORA, FLORA, FAUNA]"
BHL_794_sample_Dutch_18.0.txt_10949-11061,[fparreboom],"'T is waar , men brouwt hier uit een ^y foort ...",[FLORA]
BHL_794_sample_Dutch_18.0.txt_1106-1417,"[bladeren, geboomte, voeder, vee, bomen, blad,...",Men wirt hier de bladeren van 't geboomte niet...,"[FLORA, FLORA, FLORA, FAUNA, FLORA, FLORA, FLO..."
...,...,...,...
DBNL-10_sample20 (1).txt_9491-9642,[hout],"De Corso is een gedeelte van de boulevard , zo...",[FLORA]
DBNL-10_sample20 (1).txt_9729-9791,[paard],In 't midden is een wacht te paard om voor de ...,[FAUNA]
DBNL-151_sample_IAA_19.txt_1626-1841,[kaaiman],Op onze vaart daarheen hadden wij nog het voor...,[FAUNA]
DBNL-151_sample_IAA_19.txt_365-705,"[suikerriet, zoethout]",Van tijd tot tijd valt een spreker er tusschen...,"[FLORA, FLORA]"


In [56]:
test_set = training_df.sample(5)

In [57]:
test_set

Unnamed: 0_level_0,text,_sentence_text,aspect_cat
sentence,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BHL_794_sample_Dutch_18.0.txt_940-1003,"[garft, vee]","Men zait maar wei- nig garft , en dat nog alle...","[FLORA, FAUNA]"
BHL_957_sample_Dutch_19.0.txt_17482-17504,[Macrolobium excelsum],Macrolobium excelsum .,[FLORA]
BHL_794_sample_Dutch_18.0.txt_12348-12378,[koe],Een koe wei-d gekoft voor 50 .,[FAUNA]
BHL_957_sample_Dutch_19.0.txt_14236-14267,[Paradisea apoda],Paradisea apoda Lin . siangu d.,[FAUNA]
BHL_794_sample_Dutch_18.0.txt_16745-16798,[linden],Dat van de Amerikaanfche linden was zeer befch...,[FLORA]


## Build prompt and apply to test set

In [58]:
class fauna_flora(BaseModel):
    """
    This class asserts the data types we expect from the output of the LLM.
      fauna:  Optionally a list, otherwise None
      flora: Optionally a list, otherwise None.
    """
    FAUNA: Optional[list] = None
    FLORA: Optional[list] = None

In [59]:
# add a JSON object with the category names followed by the expected data type

schema_entity = {
        "FAUNA": ["string"],
        "FLORA": ["string"],
        }

In [60]:
# add the category names with small global introduction/definition as a string

categories =  """FAUNA: common and scientific names of animals and fauna,
FLORA: common and scientific names of vegetation, plants, flowers and flora"""

In [61]:
# specify the question/request posed to the LLM

question = "Extract the relevant named entities from the given sentence."


# specify the personality you expect from the LLM

personality = "You are a named entity recognizer trained to recognize and extract instances of fauna and flora in travelogues."

In [62]:
# This brings all the elements above together in a template.
# The sentence is clearly indicated by <<<>>>, which helps the model to stick to the text given.

template = """ {personality}.
Your task is to identify the named entities in a sentence.
Structure the answer according to {schema_entity}. Do not deviate from this JSON schema.
The sentence is indicated by <<<>>>.

Question: {question}
Sentence: <<<{sentence}>>>

Answer: """

In [63]:
# define the prompt

prompt = PromptTemplate.from_template(template)

In [131]:
def parse_llm_response(response, basemodel_class = NER):
  # try:
    main_response = response.replace("{{", "{")
    main_response = main_response.replace("}}", "}")
    json_response = main_response.split("{")[1]  # Ensure we close the JSON object properly
    json_response = "{" + json_response.split("}")[0] + "}"  # Ensure we close the JSON object properly
    ast_response = ast.literal_eval(json_response)
    print(ast_response)
    result = fauna_flora.model_validate(ast_response )

    category_entity = []
    for entity in result:
      if entity[1] != None: #if the model returned a valid result for the categories which is not None
        category = entity[0]
        entity_text_list = entity[1]

        for ent in entity_text_list:
          category_entity.append((ent, category))
    print(category_entity)
    return category_entity
  # except:
  #   print(response)
  #   return {(None, None)}

In [132]:
### call to the LLM and parse the response

def llm_output_fauna_flora(sentence):
    llm = HuggingFaceEndpoint(
        repo_id=repo_id,
        huggingfacehub_api_token=os.getenv("HUGGINGFACEHUB_API_TOKEN"),
    )

    chat_model = ChatHuggingFace(llm=llm)

    chain = prompt | chat_model

    response = chain.invoke({
        "question": question,
        "schema_entity": schema_entity,
        "personality": personality,
        "sentence": sentence,
        "categories": categories,
    })
    return parse_llm_response(response.content, basemodel_class = fauna_flora) #add the basemodel class we created as an argument to tell the LLM how to validate the output

In [134]:
test_set["NER_results"] = test_set._sentence_text.apply(llm_output_fauna_flora)

{'FAUNA': ['vee'], 'FLORA': []}
[('vee', 'FAUNA')]
{'FAUNA': [], 'FLORA': ['Macrolobium excelsum']}
[('Macrolobium excelsum', 'FLORA')]
{'FAUNA': [], 'FLORA': []}
[]
{'FAUNA': ['Paradisea apoda'], 'FLORA': ['siangu d']}
[('Paradisea apoda', 'FAUNA'), ('siangu d', 'FLORA')]
{'FAUNA': [], 'FLORA': ['linden']}
[('linden', 'FLORA')]


In [135]:
test_set.head()

Unnamed: 0_level_0,text,_sentence_text,aspect_cat,NER_results,text_llm,aspect_cat_llm
sentence,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
BHL_794_sample_Dutch_18.0.txt_940-1003,"[garft, vee]","Men zait maar wei- nig garft , en dat nog alle...","[FLORA, FAUNA]","[(vee, FAUNA)]",[None],[None]
BHL_957_sample_Dutch_19.0.txt_17482-17504,[Macrolobium excelsum],Macrolobium excelsum .,[FLORA],"[(Macrolobium excelsum, FLORA)]",[None],[None]
BHL_794_sample_Dutch_18.0.txt_12348-12378,[koe],Een koe wei-d gekoft voor 50 .,[FAUNA],[],[None],[None]
BHL_957_sample_Dutch_19.0.txt_14236-14267,[Paradisea apoda],Paradisea apoda Lin . siangu d.,[FAUNA],"[(Paradisea apoda, FAUNA), (siangu d, FLORA)]",[None],[None]
BHL_794_sample_Dutch_18.0.txt_16745-16798,[linden],Dat van de Amerikaanfche linden was zeer befch...,[FLORA],"[(linden, FLORA)]",[None],[None]


In [136]:
def results_to_cols(results):
  ent_texts = []
  ent_labels = []

  for tup in results:
    text = tup[0]
    label = tup[1]

    ent_texts.append(text)
    ent_labels.append(label)

  return ent_texts, ent_labels

In [137]:
test_set["NER_results"] = test_set.NER_results.apply(results_to_cols)

In [138]:
test_set.head()

Unnamed: 0_level_0,text,_sentence_text,aspect_cat,NER_results,text_llm,aspect_cat_llm
sentence,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
BHL_794_sample_Dutch_18.0.txt_940-1003,"[garft, vee]","Men zait maar wei- nig garft , en dat nog alle...","[FLORA, FAUNA]","([vee], [FAUNA])",[None],[None]
BHL_957_sample_Dutch_19.0.txt_17482-17504,[Macrolobium excelsum],Macrolobium excelsum .,[FLORA],"([Macrolobium excelsum], [FLORA])",[None],[None]
BHL_794_sample_Dutch_18.0.txt_12348-12378,[koe],Een koe wei-d gekoft voor 50 .,[FAUNA],"([], [])",[None],[None]
BHL_957_sample_Dutch_19.0.txt_14236-14267,[Paradisea apoda],Paradisea apoda Lin . siangu d.,[FAUNA],"([Paradisea apoda, siangu d], [FAUNA, FLORA])",[None],[None]
BHL_794_sample_Dutch_18.0.txt_16745-16798,[linden],Dat van de Amerikaanfche linden was zeer befch...,[FLORA],"([linden], [FLORA])",[None],[None]


In [139]:
# parse the results to two separate columns in lists "text_llm" ["squalus", "appelboom"] and "aspect_cat_llm" [FAUNA, FLORA]

test_set['text_llm'], test_set['aspect_cat_llm'] = zip(*test_set.NER_results)

In [140]:
test_set.head()

Unnamed: 0_level_0,text,_sentence_text,aspect_cat,NER_results,text_llm,aspect_cat_llm
sentence,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
BHL_794_sample_Dutch_18.0.txt_940-1003,"[garft, vee]","Men zait maar wei- nig garft , en dat nog alle...","[FLORA, FAUNA]","([vee], [FAUNA])",[vee],[FAUNA]
BHL_957_sample_Dutch_19.0.txt_17482-17504,[Macrolobium excelsum],Macrolobium excelsum .,[FLORA],"([Macrolobium excelsum], [FLORA])",[Macrolobium excelsum],[FLORA]
BHL_794_sample_Dutch_18.0.txt_12348-12378,[koe],Een koe wei-d gekoft voor 50 .,[FAUNA],"([], [])",[],[]
BHL_957_sample_Dutch_19.0.txt_14236-14267,[Paradisea apoda],Paradisea apoda Lin . siangu d.,[FAUNA],"([Paradisea apoda, siangu d], [FAUNA, FLORA])","[Paradisea apoda, siangu d]","[FAUNA, FLORA]"
BHL_794_sample_Dutch_18.0.txt_16745-16798,[linden],Dat van de Amerikaanfche linden was zeer befch...,[FLORA],"([linden], [FLORA])",[linden],[FLORA]


In [141]:
# export test_set and transform to iob-labels

test_set.to_csv("test_set_llm.csv")

## Calculate evaluation metrics with Nervaluate

Evaluate how good the LLM is at recognizing entities/aspects through a span evaluation. We compare the gold standard data with examples extracted by the LLM.

1.   Gold labels and output LLM to BIO-format to allow for span evaluation
2.   Calculate metrics using nervaluate (F1).




In [150]:
test_set = pd.read_csv("test_set_results.csv")

In [151]:
test_set.head()

Unnamed: 0,text,_sentence_text,aspect_cat,NER_results,text_llm,aspect_cat_llm,output_gs,output_llm,sentence_split,iob_gs,iob_llm
0,['bloesems'],In de eerste plaats noem ik de bloesems .,['FLORA'],"(['bloesems'], ['FLORA'])",['bloesems'],['FLORA'],"(['In', 'de', 'eerste', 'plaats', 'noem', 'ik'...","(['In', 'de', 'eerste', 'plaats', 'noem', 'ik'...","['In', 'de', 'eerste', 'plaats', 'noem', 'ik',...","['O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-FLORA',...","['O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-FLORA',..."
1,"['Centroplites', 'Padden', 'Schildpadden']",Centroplites . y Kik- ( 177 ) Kikvorscheiij Pa...,"['FAUNA', 'FAUNA', 'FAUNA']","(['Centroplites', 'Kikvorscheiij Padden', 'Sch...","['Centroplites', 'Kikvorscheiij Padden', 'Schi...","['FAUNA', 'FAUNA', 'FAUNA']","(['Centroplites', '.', 'y', 'Kik', '177', 'Kik...","(['Centroplites', '.', 'y', 'Kik', '177', 'Kik...","['Centroplites', '.', 'y', 'Kik', '177', 'Kikv...","['B-FAUNA', 'O', 'O', 'O', 'O', 'O', 'B-FAUNA'...","['B-FAUNA', 'O', 'O', 'O', 'O', 'B-FAUNA', 'I-..."
2,"['tuin', 'veldvrugten']","Het antwoord was , dat zy die ver naar 't Noor...","['FLORA', 'FLORA']","([], [])",[],[],"(['Het', 'antwoord', 'was', ',', 'dat', 'zy', ...","(['Het', 'antwoord', 'was', ',', 'dat', 'zy', ...","['Het', 'antwoord', 'was', ',', 'dat', 'zy', '...","['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ...","['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ..."
3,['M. communis'],"Suppl . ) , en volgende schrijvers , geven een...",['FLORA'],"(['M. communis'], ['FLORA'])",['M. communis'],['FLORA'],"(['Suppl', '.', ',', 'en', 'volgende', 'schrij...","(['Suppl', '.', ',', 'en', 'volgende', 'schrij...","['Suppl', '.', ',', 'en', 'volgende', 'schrijv...","['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ...","['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ..."
4,"['Cereus', 'Melocactus']","Eene bizonderheid , die ook op dat eiland bij ...","['FLORA', 'FLORA']","([], [])",[],[],"(['Eene', 'bizonderheid', ',', 'die', 'ook', '...","(['Eene', 'bizonderheid', ',', 'die', 'ook', '...","['Eene', 'bizonderheid', ',', 'die', 'ook', 'o...","['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ...","['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ..."


In [152]:
true = test_set["iob_gs"].to_list() #get the gold labels
true = [ast.literal_eval(x) for x in true] #make sure every element in your list is also of type 'list'

predicted = test_set["iob_llm"].to_list()
predicted = [ast.literal_eval(x) for x in predicted]

In [153]:
evaluator = Evaluator(true, predicted, tags=['FAUNA', 'FLORA'], loader="list")
results = evaluator.evaluate()

  results = evaluator.evaluate()


In [154]:
results

({'ent_type': {'correct': 42,
   'incorrect': 5,
   'partial': 0,
   'missed': 42,
   'spurious': 7,
   'possible': 89,
   'actual': 54,
   'precision': 0.7777777777777778,
   'recall': 0.47191011235955055,
   'f1': 0.5874125874125875},
  'partial': {'correct': 35,
   'incorrect': 0,
   'partial': 12,
   'missed': 42,
   'spurious': 7,
   'possible': 89,
   'actual': 54,
   'precision': 0.7592592592592593,
   'recall': 0.4606741573033708,
   'f1': 0.5734265734265734},
  'strict': {'correct': 32,
   'incorrect': 15,
   'partial': 0,
   'missed': 42,
   'spurious': 7,
   'possible': 89,
   'actual': 54,
   'precision': 0.5925925925925926,
   'recall': 0.3595505617977528,
   'f1': 0.44755244755244755},
  'exact': {'correct': 35,
   'incorrect': 12,
   'partial': 0,
   'missed': 42,
   'spurious': 7,
   'possible': 89,
   'actual': 54,
   'precision': 0.6481481481481481,
   'recall': 0.39325842696629215,
   'f1': 0.48951048951048953}},
 {'FAUNA': {'ent_type': {'correct': 13,
    'incorrect

# More information 🎓

## Large Language Models for literary-historical research
On issues of **privacy, hallucination reproducibility and bias**.

*  Jon Chun and Katherine Elkins. 2023. eXplainable
AI with GPT4 for story analysis and generation: A novel framework for diachronic sentiment analysis. International Journal of Digital Humanities, 5(2):507–532.
*  Eva A. M. van Dis, Johan Bollen, Willem Zuidema,
Robert van Rooij, and Claudi L. Bockting. 2023.
ChatGPT: five priorities for research. Nature, 614(7947):224–226.
*  Emily M. Bender, Timnit Gebru, Angelina McMillan-
Major, and Shmargaret Shmitchell. 2021. On
the Dangers of Stochastic Parrots: Can Language Models Be Too Big? . In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pages 610–623, New York, NY, USA. Association for Computing Machinery.

## Other interesting tools

LLMs and tools are currently spreading like wildfire. We chose the Mixtral 8x7b-model for its good performance on English, Dutch, German and French texts, but there's a lot of other open-source models and tools out there.

*   [GoLLIE](https://hitz-zentroa.github.io/GoLLIE/) provides a framework for guideline-following NER!


# Food for thought

*  Was your corpus heavily impacted by OCR-errors? Have a look at LLMs for text preprocessing!
*  Check out the possibilities of Retrieval Augmented Generation (RAG) for information extraction.