## Use human annotations of the "Polygon" Wikipedia article to assess LLM-powered knowledge graph extraction

We extract the "is-a" relationship between geometric entities and represent the graph as a list of pairs (X, Y) such that X is a kind of Y.

Two main metrics:
- "recall" (what fraction of ground truth has been correctly retrieved by the LLM)
- "trash rate" (what fraction of the LLM answers didn't match ground truth).

In [1]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain.schema import SystemMessage, HumanMessage

import pandas as pd
from io import StringIO

import os
import json
import pprint
from collections import namedtuple

ChatGPTAnswer = namedtuple("ChatGPTAnswer", ["thoughts", "csv_output"])

In [2]:
SYS_PROMPT = (
    "You are a professional network graph maker who extracts "
    "ontology from a given text on the topic of Euclidian geometry. "
    "You are provided with a context chunk (delimited by +++++). "
    "Your MUST extract key geometric entities from the context that are "
    "connected by a 'is-a', 'is a kind of' or 'is a type of' relationship.\n"
    "Thought 1: While traversing through each sentence, think about the key geometric entities mentioned in it.\n"
    "   Entities are typically geometrical objects such as points, lines, squares, circles, elliptic curves, or their constituents.\n"
    "   Also think of the terms or concepts that might be implicit in the text.\n"
    "   Terms must be as atomistic as possible, usually in singular rather than plural."
    "Thought 2: Think about the 'is-a' hierarchy between the entities you extracted.\n"
    "   Make sure you are not confusing 'is-a' with 'has-a'. For instance, a center is not a circle, but rather a circle has a center.\n"
    "   Which of the entities fit nicely into the '<X> is a kind of <Y>' template? Are there hyponym-hypernym pairs among them?\n"
    "   X might be a subtype or subvariety of Y, like inheritance in object-oriented programming languages.\n\n"
)

CONTEXT_USER_PROMPT = "Context:\n\n+++++\n{context}\n+++++\n\nYour thoughs:\n"
FORMATTING_USER_PROMPT = (
    "Now format your final response to make it suitable for parsing in the simple CSV format. "
    "For each pair of entities such that X is a kind of Y, put them on a single line, separated by a comma:\n"
    "X,Y\n"
    "You will be PENALIZED for every deviation from the CSV format.\n"
    "You will be PENALIZED for false or lousy relationships, so ENSURE that\n"
    "    - every X and Y are indeed a legitimate geometrical entity\n"
    "    - every X is indeed a kind of Y, meaning that X derives all properties of Y.\n"
    "Your output in the CSV format:\n"
)

def get_x_is_a_kind_of_y_from_chunk(context_chunk: str, openai_model: str = "gpt-3.5-turbo") -> ChatGPTAnswer:       
    chat = ChatOpenAI(model=openai_model)
    messages = [
        SystemMessage(content=SYS_PROMPT),
        HumanMessage(content=CONTEXT_USER_PROMPT.format(context=context_chunk)),
    ]
    thoughts = chat.invoke(messages)
    csv_output = chat.invoke(messages + [thoughts, HumanMessage(content=FORMATTING_USER_PROMPT)])
    return ChatGPTAnswer(thoughts.content, csv_output.content)

In [3]:
XIsAYPairs = list[tuple[str, str]]

def parse_chatgpt_csv_answer(csv_answer: str) -> XIsAYPairs:
    df = pd.read_csv(StringIO(csv_answer), names=["x_is_a_kind", "of_y"])
    return [tuple(row) for row in df.itertuples(index=False, name=None)]

def calculate_metrics(ground_truth: XIsAYPairs, llm_answers: XIsAYPairs) -> dict:
    ground_truth = [(t.lower().strip(), T.lower().strip()) for t, T in ground_truth]
    llm_answers = [(l.lower().strip(), L.lower().strip()) for l, L in llm_answers]
    
    # Validation step (skipped)
    # Duplicates removed when set is constructed.
    ground_truth = set(ground_truth)
    llm_answers = set(llm_answers)
    
    exact_matches = set()
    for gt_pair in ground_truth:
        if gt_pair in llm_answers:
            exact_matches.add(gt_pair)
    
    wrong_answers = llm_answers.difference(exact_matches)
    unpaired_ground_truth = ground_truth.difference(exact_matches)
    
    if len(ground_truth) == 0:
        recall = None
    else:
        recall = len(exact_matches) / len(ground_truth)
    
    if len(llm_answers) == 0:
        trash_rate = None
    else:
        trash_rate = len(wrong_answers) / len(llm_answers)
    
    metrics = {
        "exact_matches" : exact_matches,
        "wrong_answers" : wrong_answers,
        "unpaired_ground_truth" : unpaired_ground_truth,
        "recall" : recall,
        "trash_rate" : trash_rate,
    }
    return metrics

In [4]:
with open("annotations.json") as file:
    annotated_chunks = json.load(file)["annotated_chunks"]

In [5]:
first = annotated_chunks[0]
chunk, xisypairs = first["chunk"], first["x_is_a_y"]
xisypairs

[['polygon', 'plane figure'],
 ['closed polygonal chain', 'polygon'],
 ['vertex', 'point'],
 ['edge', 'segment'],
 ['side', 'segment'],
 ['n-gon', 'polygon'],
 ['triangle', 'n-gon'],
 ['triangle', 'polygon'],
 ['3-gon', 'n-gon'],
 ['simple polygon', 'polygon'],
 ['solid polygon', 'region'],
 ['solid polygon', 'polygon'],
 ['skew polygon', 'polygon'],
 ['self-intersecting polygon', 'polygon'],
 ['star polygon', 'polygon'],
 ['polygon', 'polytope']]

In [6]:
res = get_x_is_a_kind_of_y_from_chunk(chunk)
res

ChatGPTAnswer(thoughts="Key geometric entities extracted:\n\n1. Polygon\n2. Plane figure\n3. Line segment\n4. Closed polygonal chain\n5. Edge\n6. Side\n7. Vertex\n8. Corner\n9. n-gon\n10. Triangle\n11. Simple polygon\n12. Intersection\n13. Shared endpoint\n14. Solid polygon\n15. Body\n16. Polygonal region\n17. Polygonal area\n18. Skew polygon\n19. Star polygon\n20. Euclidean space\n21. Polytope\n22. Dimension\n23. Generalization\n\nHierarchical relationships:\n- A polygon is a plane figure.\n- A polygon is made up of line segments connected to form a closed polygonal chain.\n- A closed polygonal chain consists of edges or sides.\n- The points where two edges meet are the polygon's vertices or corners.\n- An n-gon is a type of polygon with n sides (e.g., a triangle is a 3-gon).\n- A simple polygon is a type of polygon which does not intersect itself.\n- A solid polygon is a type of polygonal region with a boundary defined by a simple polygon.\n- A skew polygon is a type of closed polygo

In [7]:
print(res.csv_output)

Polygon,Plane figure
Polygon,Line segment
Polygon,Closed polygonal chain
Closed polygonal chain,Edge
Closed polygonal chain,Side
Edge,Vertex
Edge,Corner
Polygon,n-gon
n-gon,Triangle
Polygon,Simple polygon
Simple polygon,Intersection
Simple polygon,Shared endpoint
Simple polygon,Solid polygon
Solid polygon,Body
Solid polygon,Polygonal region
Solid polygon,Polygonal area
Closed polygonal chain,Skew polygon
Skew polygon,Polygon
Polygon,Euclidean space
Polygon,Polytope
Polytope,Dimension
Polygon,Generalization


In [8]:
llm_answers = parse_chatgpt_csv_answer(res.csv_output)
llm_answers

[('Polygon', 'Plane figure'),
 ('Polygon', 'Line segment'),
 ('Polygon', 'Closed polygonal chain'),
 ('Closed polygonal chain', 'Edge'),
 ('Closed polygonal chain', 'Side'),
 ('Edge', 'Vertex'),
 ('Edge', 'Corner'),
 ('Polygon', 'n-gon'),
 ('n-gon', 'Triangle'),
 ('Polygon', 'Simple polygon'),
 ('Simple polygon', 'Intersection'),
 ('Simple polygon', 'Shared endpoint'),
 ('Simple polygon', 'Solid polygon'),
 ('Solid polygon', 'Body'),
 ('Solid polygon', 'Polygonal region'),
 ('Solid polygon', 'Polygonal area'),
 ('Closed polygonal chain', 'Skew polygon'),
 ('Skew polygon', 'Polygon'),
 ('Polygon', 'Euclidean space'),
 ('Polygon', 'Polytope'),
 ('Polytope', 'Dimension'),
 ('Polygon', 'Generalization')]

In [9]:
calculate_metrics(xisypairs, llm_answers)

{'exact_matches': {('polygon', 'plane figure'),
  ('polygon', 'polytope'),
  ('skew polygon', 'polygon')},
 'wrong_answers': {('closed polygonal chain', 'edge'),
  ('closed polygonal chain', 'side'),
  ('closed polygonal chain', 'skew polygon'),
  ('edge', 'corner'),
  ('edge', 'vertex'),
  ('n-gon', 'triangle'),
  ('polygon', 'closed polygonal chain'),
  ('polygon', 'euclidean space'),
  ('polygon', 'generalization'),
  ('polygon', 'line segment'),
  ('polygon', 'n-gon'),
  ('polygon', 'simple polygon'),
  ('polytope', 'dimension'),
  ('simple polygon', 'intersection'),
  ('simple polygon', 'shared endpoint'),
  ('simple polygon', 'solid polygon'),
  ('solid polygon', 'body'),
  ('solid polygon', 'polygonal area'),
  ('solid polygon', 'polygonal region')},
 'unpaired_ground_truth': {('3-gon', 'n-gon'),
  ('closed polygonal chain', 'polygon'),
  ('edge', 'segment'),
  ('n-gon', 'polygon'),
  ('self-intersecting polygon', 'polygon'),
  ('side', 'segment'),
  ('simple polygon', 'polygon'

In [10]:
chatgpt_answers = []
for anck in annotated_chunks:
    answer = get_x_is_a_kind_of_y_from_chunk(anck["chunk"])
    chatgpt_answers.append(answer)

In [11]:
import pickle

with open("20240227_chatgpt_answers.pickle", "wb") as file:
    pickle.dump(chatgpt_answers, file)

In [12]:
import pickle

with open("20240227_chatgpt_answers.pickle", "rb") as file:
    chatgpt_answers = pickle.load(file)

In [13]:
all_metrics = []
for i, answer in enumerate(chatgpt_answers):
    try:
        llm_answers = parse_chatgpt_csv_answer(answer.csv_output)
    except Exception as e:
        print(answer.csv_output)
        llm_answers = []
        
    ground_truth = annotated_chunks[i]["x_is_a_y"]
    m = calculate_metrics(ground_truth, llm_answers)
    all_metrics.append(m)

In [14]:
[m["recall"] for m in all_metrics]

[0.375,
 None,
 0.8571428571428571,
 0.14285714285714285,
 0.3333333333333333,
 0.0,
 0.25,
 0.6,
 0.8333333333333334,
 0.6428571428571429,
 0.875,
 0.0,
 0.0,
 0.2]

In [15]:
[m["trash_rate"] for m in all_metrics]

[0.6,
 1.0,
 0.07692307692307693,
 0.875,
 0.8,
 1.0,
 0.8181818181818182,
 0.6666666666666666,
 0.2857142857142857,
 0.25,
 0.46153846153846156,
 1.0,
 1.0,
 0.8]

In [16]:
for i, m in enumerate(all_metrics):
    print(f"{i} chunk")
    pprint.pprint(m)
    print("\n========================\n")

0 chunk
{'exact_matches': {('n-gon', 'polygon'),
                   ('polygon', 'plane figure'),
                   ('simple polygon', 'polygon'),
                   ('skew polygon', 'polygon'),
                   ('solid polygon', 'polygon'),
                   ('triangle', 'polygon')},
 'recall': 0.375,
 'trash_rate': 0.6,
 'unpaired_ground_truth': {('3-gon', 'n-gon'),
                           ('closed polygonal chain', 'polygon'),
                           ('edge', 'segment'),
                           ('polygon', 'polytope'),
                           ('self-intersecting polygon', 'polygon'),
                           ('side', 'segment'),
                           ('solid polygon', 'region'),
                           ('star polygon', 'polygon'),
                           ('triangle', 'n-gon'),
                           ('vertex', 'point')},
 'wrong_answers': {('closed polygonal chain', 'edges or sides'),
                   ('closed polygonal chain', 'vertices or corners'

## Typical problems

- plural, e.g. `('regular polygons', 'polygons')` is a wrong answer because of "s" at the end of "polygons"
- entities unrelated to geometry, e.g. greek words, wax honeycomb, bees or geology.
- LLM thought of something I didn't think of when annotating the text, e.g. "radius is a kind of length"
- CSV parsing: extra comma in `πολύς (polús) 'much', 'many'`