# Building KG with LLM - Part 1: extract entities

Some references:
- https://www.youtube.com/watch?v=Hg4ahTQlBm0




In [2]:
#!pip install python-dotenv openai

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

OPEN_API_KEY = os.getenv("OPEN_API_KEY")

In [2]:
from openai import OpenAI

client = OpenAI(
    # This is the default and can be omitted
    api_key=OPEN_API_KEY,
)

In [8]:
text = """
Edward Jones (7 April 1824 – c. 1893 or 1896), also known as "the boy Jones", was an English stalker who became notorious for breaking into Buckingham Palace several times between 1838 and 1841.

Jones was fourteen years old when he first broke into the palace in December 1838. He was found in possession of some items he had stolen, but was acquitted at his trial. He broke in again in 1840, ten days after Queen Victoria had given birth to Princess Victoria. Staff found him hiding under a sofa and he was arrested and subsequently questioned by the Privy Council—the monarch's formal body of advisers. He was sentenced to three months' hard labour at Tothill Fields Bridewell prison. He was released in March 1841 and broke back into the palace two weeks later, where he was caught stealing food from the larders. He was again arrested and sentenced to three months' hard labour at Tothill Fields.
"""
# https://en.wikipedia.org/wiki/The_boy_Jones


## GOAL


What we want to do is to extract entities and relations in a parsable format, that we can then import into a KG with Cypher:
{
   "entities": [{"type": "Person", "value": "XXX", "id": 1}, ...],
   "relations": [{"type": "KNOWS", "from_entity": 1, "to_entity": 2, "since": "..."}, ...]
}

## Simple prompts


In [9]:
prompt_template = """Extract the entities from the following text:

{text}
"""
prompt = prompt_template.format(text=text)


completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-3.5-turbo",
    temperature=0,
)

print(completion.choices[0].message.content)

Entities:
1. Edward Jones
2. Buckingham Palace
3. Queen Victoria
4. Princess Victoria
5. Tothill Fields Bridewell prison
6. Privy Council


In [10]:
prompt_template = """Extract the entities and specify their type from the following text:

{text}
"""
prompt = prompt_template.format(text=text)


completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-3.5-turbo",
    temperature=0,
)

print(completion.choices[0].message.content)

Entities:
1. Edward Jones (Person)
2. Buckingham Palace (Location)
3. Queen Victoria (Person)
4. Princess Victoria (Person)
5. Tothill Fields Bridewell prison (Location)
6. Privy Council (Organization)


In [31]:
prompt_template = """Extract the entities and specify their type from the following text. Return result as JSON.

{text}
"""
prompt = prompt_template.format(text=text)


completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-3.5-turbo",
    temperature=0,
)

print(completion.choices[0].message.content)

{
  "entities": [
    {
      "name": "Edward Jones",
      "type": "Person"
    },
    {
      "name": "7 April 1824",
      "type": "Date"
    },
    {
      "name": "1893 or 1896",
      "type": "Date"
    },
    {
      "name": "the boy Jones",
      "type": "Nickname"
    },
    {
      "name": "Buckingham Palace",
      "type": "Location"
    },
    {
      "name": "England",
      "type": "Location"
    },
    {
      "name": "Queen Victoria",
      "type": "Person"
    },
    {
      "name": "Princess Victoria",
      "type": "Person"
    },
    {
      "name": "Privy Council",
      "type": "Organization"
    },
    {
      "name": "Tothill Fields Bridewell prison",
      "type": "Location"
    },
    {
      "name": "March 1841",
      "type": "Date"
    }
  ]
}


In [22]:
prompt_template = """Extract the entities and specify their type from the following text. Also extract relationships between these entities. Return result as JSON.

{text}
"""
prompt = prompt_template.format(text=text)


completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-3.5-turbo",
    temperature=0,
)

print(completion.choices[0].message.content)

{
  "entities": [
    {
      "name": "Edward Jones",
      "type": "Person"
    },
    {
      "name": "Buckingham Palace",
      "type": "Location"
    },
    {
      "name": "Queen Victoria",
      "type": "Person"
    },
    {
      "name": "Princess Victoria",
      "type": "Person"
    },
    {
      "name": "Tothill Fields Bridewell prison",
      "type": "Location"
    },
    {
      "name": "Privy Council",
      "type": "Organization"
    }
  ],
  "relationships": [
    {
      "source": "Edward Jones",
      "relation": "was known as",
      "target": "the boy Jones"
    },
    {
      "source": "Edward Jones",
      "relation": "broke into",
      "target": "Buckingham Palace"
    },
    {
      "source": "Edward Jones",
      "relation": "was questioned by",
      "target": "Privy Council"
    },
    {
      "source": "Queen Victoria",
      "relation": "gave birth to",
      "target": "Princess Victoria"
    },
    {
      "source": "Edward Jones",
      "relation": "was 

- Entities extracted
- Relationships extracted
We can do some post-processing to make it look like a property graph. Or we can tune the prompt. 

## Let's focus on entities for a while

In [27]:
prompt_template = """Extract the entities and specify their type from the following text. Return result as JSON.
Use only fhe following entity types:

Entities:
- Person
- Organization
- Event
- Location

Text:
{text}
"""
prompt = prompt_template.format(text=text)


completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-3.5-turbo",
    temperature=0,
)

print(completion.choices[0].message.content)

{
  "entities": [
    {
      "text": "Edward Jones",
      "type": "Person"
    },
    {
      "text": "Buckingham Palace",
      "type": "Location"
    },
    {
      "text": "Queen Victoria",
      "type": "Person"
    },
    {
      "text": "Princess Victoria",
      "type": "Person"
    },
    {
      "text": "Privy Council",
      "type": "Organization"
    },
    {
      "text": "Tothill Fields Bridewell prison",
      "type": "Location"
    }
  ]
}


In [30]:
prompt_template = """Extract the entities and specify their type from the following text. Return result as JSON.
Use only fhe following entities and properties:
Entities:
- Person: name, dateOfBirth, gender (M or F), nationality, nickname
- Organization: name, description
- Event: name, date
- Location: name
All properties but the first one are optional.

Text!
{text}
"""
prompt = prompt_template.format(text=text)


completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-3.5-turbo",
    temperature=0,
)

print(completion.choices[0].message.content)

{
  "entities": [
    {
      "type": "Person",
      "name": "Edward Jones",
      "dateOfBirth": "7 April 1824",
      "nickname": "the boy Jones",
      "nationality": "English"
    },
    {
      "type": "Organization",
      "name": "Buckingham Palace",
      "description": "royal residence"
    },
    {
      "type": "Event",
      "name": "breaking into Buckingham Palace",
      "date": "between 1838 and 1841"
    },
    {
      "type": "Location",
      "name": "Tothill Fields Bridewell prison"
    },
    {
      "type": "Person",
      "name": "Queen Victoria",
      "gender": "F"
    },
    {
      "type": "Person",
      "name": "Princess Victoria",
      "gender": "F"
    },
    {
      "type": "Organization",
      "name": "Privy Council",
      "description": "monarch's formal body of advisers"
    }
  ]
}
