# [Workshop] Knowledge Graph (KG) Reasoner

### Ground Work:

Read novel << Animal Farm >> By author: George Orwell (pseudonym of Eric Blair) (1903-1950) http://gutenberg.net.au/ebooks01/0100011.txt


<img src="https://upload.wikimedia.org/wikipedia/commons/f/fb/Animal_Farm_-_1st_edition.jpg" width=300>

# 0. Package Installation (one time job)

### Neo4J: (Knowledge) Graph Database - How to install and configure graph database Neo4j on Ubuntu 20.04 ?

https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-neo4j-on-ubuntu-20-04#:~:text=Run%20the%20following%20command%20to,%7C%20sudo%20apt%2Dkey%20add%20%2D&text=Next%2C%20add%20the%20Neo4j%204.1,.neo4j.com%20stable%204.1%22


In [1]:
# Install below Python-NEO4J interaction library
# !pip install neo4j

### OpenNRE: Open-source toolkit for relation extraction - How to install 'OpenNRE' when '!pip install opennre' not working?

https://opennre-docs.readthedocs.io/en/latest/get_started/install.html



In [2]:
# !pip install opennre

# 1. Import Library

In [3]:
from neo4j import GraphDatabase
from neo4j.exceptions import ServiceUnavailable
import logging
import spacy
import opennre

  from .autonotebook import tqdm as notebook_tqdm


# 2. Initialize working environment

### Connect Neo4j

In [4]:
# Initialize the graph DB and delete all the nodes and relationships
graph = GraphDatabase.driver(
    "neo4j://localhost:7687",
#     auth=("neo4j", "7895123k")
    auth=("neo4j", "xxx")
)

In [5]:
query = (
        "MATCH (all_nodes)"
        "OPTIONAL MATCH (all_nodes)-[all_rels]->()"
        "DELETE all_nodes, all_rels"
    )
with graph.session() as session:
    result = session.run(query)

### Load OpenNRE

In [6]:
# https://github.com/thunlp/OpenNRE

# https://github.com/thunlp/OpenNRE/issues/312
# !pip install transformers==3.4.0
import transformers

# model = opennre.get_model('wiki80_cnn_softmax')
model = opennre.get_model('wiki80_bert_softmax')
# model = opennre.get_model('wiki80_bertentity_softmax')
# model = opennre.get_model('tacred_bert_softmax')
# model = opennre.get_model('tacred_bertentity_softmax')


2023-08-26 18:10:14,150 - root - INFO - Loading BERT pre-trained checkpoint.
Some weights of the model checkpoint at /home/ubuntu/.opennre/pretrain/bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


# 3. Create Knowledge Graph

In [7]:
# !python -m spacy download en
# !python -m spacy download en_core_web_lg

## 3.1 Build KG by extracting name-entities (vertices/nodes) & relations (edges/links/predicates) from textual corpus.

In [8]:
sentences = []

with open("./corpus/Family.txt", "r") as f:
    sentence = f.readline()
    while sentence:
        sentences.append(sentence)
        sentence = f.readline()

sentences[0:5]

['Happy Village is located in Florida, USA.\n',
 'Mr. Wilson is a well-known local silversmith in Happy Village.\n',
 'He is very very old.\n',
 'When Mr. Wilson was young, he came to Happy Village to seek refuge from a war.\n',
 'Mr. Wilson found his wife Mary here and raised three children.\n']

### SpaCy: Named Entity Recognition:

https://spacy.io/api/annotation#named-entities


### OpenNRE: Relation Extraction:

https://github.com/thunlp/OpenNRE#what-is-relation-extraction


In [9]:
nlp = spacy.load('en_core_web_lg') # !python -m spacy download en_core_web_lg


使用Spacy加载英语的预训练语言模型”en_core_web_lg”。
预训练语言模型包含了丰富的语言知识和语义信息，可以用于各种自然语言处理任务，如实体识别、依赖性解析、命名实体识别等。加载后，可以使用该模型对文本进行各种NLP操作和分析。

In [10]:

# 识别句子中的实体和它们之间的关系，并将它们存储到Neo4j图数据库中

# 1. 遍历每个句子。
# 2. 使用spaCy将句子进行分析，提取其中的命名实体（人名、地名、组织名）。
# 3. 对于每个实体，如果它不在已有实体列表中，则将其添加到已有实体字典中，并使用节点标签和实体名称在图数据库中创建一个新节点。
# 4. 对于每对实体，判断它们的关系是否已经存在。如果不存在，则使用OpenNRE模型推断它们的关系及其置信度。
# 5. 将推断得到的关系存储到已有关系列表中，并根据置信度判断是否需要在图数据库中创建一个新的关系。

# 通过该代码段，可以在文本中识别出实体并构建实体之间的关系图，从而分析实体之间的关联性。

exist_ent = {}
exist_relationship = []

for sentence in sentences:
    # 遍历每个句子。
    doc = nlp(sentence)
    entities = []
    names_of_entities = []
    for ent in doc.ents:
        # 提取特定类型的实体
        # 每次迭代，将会取出一个实体，并将其赋值给变量ent。在循环体内可以对每个实体执行进一步的操作或者获取实体的相关信息，比如实体的文本内容、标签等。
        # ent.text 获取到实体的文本内容，ent.label_ 获取到实体的标签（如’PERSON’表示人名，‘GPE’表示地名，‘ORG’表示组织名等）。
        # 这样我们就可以根据实体的标签和文本内容来进行后续的处理，比如存储到数据库中或者进行关系推断等。
        if ent.label_ in ['PERSON', 'GPE', 'ORG'] and ent.text not in names_of_entities:
#        if ent not in entities:
            names_of_entities.append(ent.text)
            entities.append(ent)
            
    for ent in entities:
        # 遍历所有的新实体，将其存储到 exist_ent 字典中，并在图数据库中创建新的节点。
        if exist_ent.get(ent.text) is None:
            # exist_ent 字典中是否已存在当前实体的文本内容。
            exist_ent[ent.text] = ent.text
            # 使用作为节点标签和实体名称的信息，构建一个执行Cypher查询的字符串。
            query = (
                "MERGE (node: "+ent.label_+" {name: $name})" # MERGE语句用于在图数据库中创建节点，如果节点已存在则不重复创建。
                "RETURN node"
            )
            # 使用 graph.session() 函数创建一个数据库会话，并使用 session.run() 方法执行查询。执行结果存储在 result 对象中。
            with graph.session() as session:
                result = session.run(query, name=ent.text)
            print("create new node with label as {0} and name as {1}".format(ent.label_, ent.text))

    for i in range(len(entities)):
        for j in range(i + 1, len(entities)):
            text_i = entities[i].text
            text_j = entities[j].text
            loc_h = sentence.find(text_i)
            loc_t = sentence.find(text_j)
            # 推断这两个实体之间的关系
            result = model.infer({'text': sentence, 'h': {'pos': (loc_h, loc_h + len(text_i))},
                               't': {'pos': (loc_t, loc_t + len(text_j))}})
           
            (rel, confidence) = result[0].replace(' ', '_'), result[1]

            record = (text_i, text_j, rel, confidence) # 记录实体文本、关系名称和置信度。

            # 创建反向关系
            result_rev = model.infer({'text': sentence, 'h': {'pos': (loc_t, loc_t + len(text_j))},
                               't': {'pos': (loc_h, loc_h + len(text_i))}})
            (rel_rev, confidence) = result_rev[0].replace(' ', '_'), result_rev[1]

            record_rev = (text_j, text_i, rel_rev, confidence)

            # 检查是否已经存在相同的关系记录，如果不存在，则将关系记录添加到 exist_relationship 列表中。
            if record not in exist_relationship:
                exist_relationship.append(record)
                if record[3] > 0.8:
                    # 检查置信度是否大于0.8
                    query = (
                        "MATCH (n1 {name: $name1})"
                        "MATCH (n2 {name: $name2})"
                        "MERGE (n1) - [r:"+record[2]+"] -> (n2)"
                        "RETURN n1, n2, r"
                    )
                    # 使用Cypher查询在图数据库中创建新的关系
                    with graph.session() as session:
                        result = session.run(query, name1=exist_ent[text_i], name2=exist_ent[text_j])
                    print("create new relationship {0} - {1} -> {2} with confidence of {3}".format(record[0], record[2], record[1], record[3]))

            if record_rev not in exist_relationship:
                exist_relationship.append(record_rev)
                if record_rev[3] > 0.8:
                    query = (
                        "MATCH (n1 {name: $name1})"
                        "MATCH (n2 {name: $name2})"
                        "MERGE (n1) - [r:"+record_rev[2]+"] -> (n2)"   
                        "RETURN n1, n2, r"
                    )

                    with graph.session() as session:
                        result = session.run(query, name1=exist_ent[text_j], name2=exist_ent[text_i])
                    print("create new relationship {0} - {1} -> {2} with confidence of {3}".format(record_rev[0], record_rev[2], record_rev[1], record_rev[3]))


create new node with label as GPE and name as Florida
create new node with label as GPE and name as USA
create new relationship Florida - country -> USA with confidence of 0.9941097497940063
create new node with label as PERSON and name as Wilson
create new node with label as GPE and name as Happy Village
create new relationship Wilson - residence -> Happy Village with confidence of 0.9967474937438965
create new relationship Happy Village - residence -> Wilson with confidence of 0.9943844079971313
create new node with label as PERSON and name as Mary
create new relationship Wilson - spouse -> Mary with confidence of 0.9981299042701721
create new relationship Mary - spouse -> Wilson with confidence of 0.9981393814086914
create new node with label as PERSON and name as James
create new relationship James - father -> Wilson with confidence of 0.9941914677619934
create new relationship Wilson - child -> James with confidence of 0.9942625164985657
create new node with label as PERSON and na

### Visualize created knowledge graph at: http://localhost:7474/browser/

* Username: neo4j

* Password: ai-user

## 3.2 Extend/Expand KG by inferecing (auto-reasoning) new relations/entites using univeral rules (static knowledge / common sense).

Automatic Reasoner

In [11]:
# from neo4j import GraphDatabase
# from neo4j.exceptions import ServiceUnavailable
# import logging
# import spacy
# import opennre

### Practise graph modification:

In [12]:
graph = GraphDatabase.driver(
    "neo4j://localhost:7687",
    auth=("neo4j", "xxx")
)

In [13]:
# Modify KG, e.g. Delete all link(s) from 'Happy Village' --to--> 'Wilson'
query = (
        "MATCH (n1:GPE {name:'Happy Village'})"
        "MATCH (n2:PERSON {name:'Wilson'})"
        "MATCH (n1)-[r]->(n2)"
        "DELETE r"
)

# Execuate above query to modify neo4j db:
with graph.session() as session:
    results = session.run(query)

Your exercise: Modify KG, e.g. Delete all link(s) from 'Florida' --to--> 'Happy Village'

In [14]:
# Your exercise: Modify KG, e.g. Delete all link(s) from 'Florida' --to--> 'Happy Village'
query2 = (
    "MATCH (n1:GPE {name:'Florida'})"
    "MATCH (n2:GPE {name:'Happy Village'})"
    "MATCH (n1)-[r]->(n2)"
    "DELETE r"
)

with graph.session() as session:
    results = session.run(query2)

### Add rule set:

In [15]:
# Rule 1: Sibilings relationship/link

# WHEN:
# (n1:PERSON)-[r:father]->(n2:PERSON) # n1 has a father : n2
# (n2:PERSON)-[r2:child]->(n3:PERSON) # n2 has a child  : n3
# THEN:
# n1 and n3 are siblings (bi-direcitonal)

query = (
        "MATCH (n1:PERSON)-[r:father]->(n2:PERSON)-[r2:child]->(n3:PERSON)"
        "WHERE (n1) <> (n3)"
        "MERGE (n1)-[r3:sibling]->(n3)"
        "MERGE (n3)-[r4:sibling]->(n1)"
        "RETURN n1, n3"
)

In [16]:
# Execuate above query to modify neo4j db:
with graph.session() as session:
    results = session.run(query)

In [17]:
# Rule 2: Mother relationship/link

# WHEN:
# (n1:PERSON)-[r:father]->(n2:PERSON)  # n1 has a father : n2
# (n2:PERSON)-[r2:spouse]->(n3:PERSON) # n2 has a spouse : n3
# THEN:
# (n1)-[r3:mother]->(n3) # n1 has a mother : n3
# (n3)-[r4:child]->(n1)  # n3 has a child  : n1

query = (
        "MATCH (n1:PERSON)-[r:father]->(n2:PERSON)-[r2:spouse]->(n3:PERSON)"
        "MERGE (n1)-[r3:mother]->(n3)"
        "MERGE (n3)-[r4:child]->(n1)"
        "RETURN n1, n3"
)

In [18]:
# Execuate above query to modify neo4j db:
with graph.session() as session:
    results = session.run(query)

In [19]:
# Rule 3: Grandfather relationship/link

# WHEN:
# (n1:PERSON)-[r:father]->(n2:PERSON)  # n1 has a father : n2
# (n2:PERSON)-[r2:father]->(n3:PERSON) # n2 has a father : n3
# THEN:
# (n1)-[r3:grandfather]->(n3) # n1 has a grandfather : n3
# (n3)-[r4:grandchild]->(n1)  # n3 has a grandchild  : n1

query = (
        "MATCH (n1:PERSON)-[r:father]->(n2:PERSON)-[r2:father]->(n3:PERSON)"
        "MERGE (n1)-[r3:grandfather]->(n3)"
        "MERGE (n3)-[r4:grandchild]->(n1)"
        "RETURN n1, n3"
)

In [20]:
# Execuate above query to modify neo4j db:
with graph.session() as session:
    results = session.run(query)

## 3.3 Retrieve informatino by querying / reasoning over KG.

### Our query: 'Deep Thought, what's the relationship between Wilson and William?'

In [21]:
query = (
        "MATCH (n1 {name:'Wilson'})-[r]->(n2 {name:'William'})"
        "RETURN n1, r, n2"
)
with graph.session() as session:
    results = session.run(query)
    for result in results:
        print("Deep Thought: {0} has {1} {2}.".format(result['n1']['name'], 
                                                           result['r'].type, 
                                                           result['n2']['name']))

Deep Thought: Wilson has grandchild William.


### Our query: 'Deep Thought, what's the relationship between James and Marry?'

In [22]:
query = (
        "MATCH (n1 {name:'James'})-[r]->(n2 {name:'Mary'})"
        "RETURN n1, r, n2"
)
with graph.session() as session:
    results = session.run(query)
    for result in results:
        print("Deep Thought: {0} has {1} {2}.".format(result['n1']['name'], 
                                                           result['r'].type, 
                                                           result['n2']['name']))

Deep Thought: James has mother Mary.


# 4. Knowledge Graph [Workshop]
### Build knowledge graph in << Animal Farm >> by  George Orwell;
### Then query various entities and relationships;

In [23]:
sentences = []

# with open("./corpus/Family.txt", "r") as f:
with open("./corpus/Animal Farm by George Orwell.txt", "r") as f:
    sentence = f.readline()
    while sentence:
        sentences.append(sentence)
        sentence = f.readline()

sentences[0:5]

['Title: Animal Farm \n',
 'Author: George Orwell (pseudonym of Eric Blair) (1903-1950)\n',
 'Chapter I\n',
 'Mr Jones, of the Manor Farm, had locked the hen-houses for the night, but was too drunk to remember to shut the pop-holes.\n',
 'With the ring of light from his lantern dancing from side to side, he lurched across the yard, kicked off his boots at the back door, drew himself a last glass of beer from the barrel in the scullery, and made his way up to bed, where Mrs Jones was already snoring.\n']

In [25]:
# Your code:

# 识别句子中的实体和它们之间的关系，并将它们存储到Neo4j图数据库中

# 1. 遍历每个句子。
# 2. 使用spaCy将句子进行分析，提取其中的命名实体（人名、地名、组织名）。
# 3. 对于每个实体，如果它不在已有实体列表中，则将其添加到已有实体字典中，并使用节点标签和实体名称在图数据库中创建一个新节点。
# 4. 对于每对实体，判断它们的关系是否已经存在。如果不存在，则使用OpenNRE模型推断它们的关系及其置信度。
# 5. 将推断得到的关系存储到已有关系列表中，并根据置信度判断是否需要在图数据库中创建一个新的关系。

# 通过该代码段，可以在文本中识别出实体并构建实体之间的关系图，从而分析实体之间的关联性。
# nlp = spacy.load('en_core_web_lg')
exist_ent = {}
exist_relationship = []

for sentence in sentences:
    # 遍历每个句子。
    doc = nlp(sentence)
    entities = []
    names_of_entities = []
    for ent in doc.ents:
        # 提取特定类型的实体
        # 每次迭代，将会取出一个实体，并将其赋值给变量ent。在循环体内可以对每个实体执行进一步的操作或者获取实体的相关信息，比如实体的文本内容、标签等。
        # ent.text 获取到实体的文本内容，ent.label_ 获取到实体的标签（如’PERSON’表示人名，‘GPE’表示地名，‘ORG’表示组织名等）。
        # 这样我们就可以根据实体的标签和文本内容来进行后续的处理，比如存储到数据库中或者进行关系推断等。
        if ent.label_ in ['PERSON', 'GPE', 'ORG'] and ent.text not in names_of_entities:
#        if ent not in entities:
            names_of_entities.append(ent.text)
            entities.append(ent)
            
    for ent in entities:
        # 遍历所有的新实体，将其存储到 exist_ent 字典中，并在图数据库中创建新的节点。
        if exist_ent.get(ent.text) is None:
            # exist_ent 字典中是否已存在当前实体的文本内容。
            exist_ent[ent.text] = ent.text
            # 使用作为节点标签和实体名称的信息，构建一个执行Cypher查询的字符串。
            query = (
                "MERGE (node: "+ent.label_+" {name: $name})" # MERGE语句用于在图数据库中创建节点，如果节点已存在则不重复创建。
                "RETURN node"
            )
            # 使用 graph.session() 函数创建一个数据库会话，并使用 session.run() 方法执行查询。执行结果存储在 result 对象中。
            with graph.session() as session:
                result = session.run(query, name=ent.text)
            print("create new node with label as {0} and name as {1}".format(ent.label_, ent.text))

    for i in range(len(entities)):
        for j in range(i + 1, len(entities)):
            text_i = entities[i].text
            text_j = entities[j].text
            loc_h = sentence.find(text_i)
            loc_t = sentence.find(text_j)
            # 推断这两个实体之间的关系
            result = model.infer({'text': sentence, 'h': {'pos': (loc_h, loc_h + len(text_i))},
                               't': {'pos': (loc_t, loc_t + len(text_j))}})
           
            (rel, confidence) = result[0].replace(' ', '_').replace('/', '_'), result[1]

            record = (text_i, text_j, rel, confidence) # 记录实体文本、关系名称和置信度。

            # 创建反向关系
            result_rev = model.infer({'text': sentence, 'h': {'pos': (loc_t, loc_t + len(text_j))},
                               't': {'pos': (loc_h, loc_h + len(text_i))}})
            (rel_rev, confidence) = result_rev[0].replace(' ', '_').replace('/', '_'), result_rev[1]

            record_rev = (text_j, text_i, rel_rev, confidence)

            # 检查是否已经存在相同的关系记录，如果不存在，则将关系记录添加到 exist_relationship 列表中。
            if record not in exist_relationship:
                exist_relationship.append(record)
                if record[3] > 0.8:
                    # 检查置信度是否大于0.8
                    query = (
                        "MATCH (n1 {name: $name1})"
                        "MATCH (n2 {name: $name2})"
                        "MERGE (n1) - [r:"+record[2]+"] -> (n2)"
                        "RETURN n1, n2, r"
                    )
                    # 使用Cypher查询在图数据库中创建新的关系
                    with graph.session() as session:
                        result = session.run(query, name1=exist_ent[text_i], name2=exist_ent[text_j])
                    print("create new relationship {0} - {1} -> {2} with confidence of {3}".format(record[0], record[2], record[1], record[3]))

            if record_rev not in exist_relationship:
                exist_relationship.append(record_rev)
                if record_rev[3] > 0.8:
                    query = (
                        "MATCH (n1 {name: $name1})"
                        "MATCH (n2 {name: $name2})"
                        "MERGE (n1) - [r:"+record_rev[2]+"] -> (n2)"   
                        "RETURN n1, n2, r"
                    )

                    with graph.session() as session:
                        result = session.run(query, name1=exist_ent[text_j], name2=exist_ent[text_i])
                    print("create new relationship {0} - {1} -> {2} with confidence of {3}".format(record_rev[0], record_rev[2], record_rev[1], record_rev[3]))



create new node with label as ORG and name as Animal Farm
create new node with label as PERSON and name as George Orwell
create new node with label as PERSON and name as Eric Blair
create new relationship George Orwell - said_to_be_the_same_as -> Eric Blair with confidence of 0.9982547163963318
create new relationship Eric Blair - said_to_be_the_same_as -> George Orwell with confidence of 0.9979104399681091
create new node with label as PERSON and name as Jones
create new node with label as ORG and name as the Manor Farm
create new relationship Jones - residence -> the Manor Farm with confidence of 0.9298855066299438
create new node with label as PERSON and name as Willingdon Beauty
create new node with label as ORG and name as Bluebell
create new node with label as PERSON and name as Jessie
create new node with label as PERSON and name as Pincher
create new relationship Bluebell - sibling -> Jessie with confidence of 0.9978225231170654
create new relationship Jessie - sibling -> Blueb

### Hints: You might hit errors like: CypherSyntaxError - invalid input '/' in relationships(predicates), etc. 


I am confident you can solve it, aka: Don't write email to me regarding these.

---

In [26]:
query = (
        "MATCH (n1 {name:'Napoleon'})-[r]->(n2 {name:'Animal Farm'})"
        "RETURN n1, r, n2"
)
with graph.session() as session:
    results = session.run(query)
    for result in results:
        print("Deep Thought: {0} is {1} {2}.".format(result['n1']['name'], 
                                                           result['r'].type, 
                                                           result['n2']['name']))

Deep Thought: Napoleon is field_of_work Animal Farm.
Deep Thought: Napoleon is residence Animal Farm.
Deep Thought: Napoleon is owned_by Animal Farm.


In [27]:
query = (
        "MATCH (n1 {name:'Napoleon'})-[r]->(n2 {name:'Snowball'})"
        "RETURN n1, r, n2"
)
with graph.session() as session:
    results = session.run(query)
    for result in results:
        print("Deep Thought: {0} has {1} {2}.".format(result['n1']['name'], 
                                                           result['r'].type, 
                                                           result['n2']['name']))

Deep Thought: Napoleon has spouse Snowball.
Deep Thought: Napoleon has characters Snowball.
Deep Thought: Napoleon has sibling Snowball.


In [28]:
query = (
        "MATCH (n1 {name:'Rebellion'})-[r]->(n2)"
        "RETURN n1, r, n2"
)
with graph.session() as session:
    results = session.run(query)
    for result in results:
        print("Deep Thought: {0} has {1} {2}.".format(result['n1']['name'], 
                                                           result['r'].type, 
                                                           result['n2']['name']))

---
`The end is called a new start.` --- ISS: **I** **S**(elf) **S**(tudy)

---