# Create a Graph from a text

This notebook demonstrates how to extract graph from any text using the graph maker

Steps:
- Define an Ontology
- Load a list of example text chunks. We will use the Lord of the Rings summary from this wikipedia page. 
- Create Graph using an Open source model using Groq APIs. 
- Save the graph to Neo4j db
- Visualise



Loading the graph maker functions ->

In [1]:
from knowledge_graph_maker import GraphMaker, Ontology, LocalClient, GroqClient, OpenAIClient
from knowledge_graph_maker import Document

# Define the Ontology. 

The ontology is a pydantic model with the following schema. 

```python
class Ontology(BaseModel):
    label: List[Union[str, Dict]]
    relationships: List[str]
```



In [18]:
from tavily import TavilyClient
import os
tavily_key = os.environ['TAVILY_KEY']

client = TavilyClient(api_key=tavily_key)
import json

In [19]:
# simple query using tavily's advanced search
response = client.search("Tell me about the Latest Emerging Threats from Iran and the Middle East",
                         search_depth="advanced",
                         max_results=5,
                         include_answer=False,
                         include_images=False,
                         #include_domains=[""],
                         )

response

{'query': 'Tell me about the Latest Emerging Threats from Iran and the Middle East',
 'follow_up_questions': None,
 'answer': None,
 'images': None,
 'results': [{'title': 'The Emerging Iranian Military Threat in the Middle East',
   'url': 'https://www.aei.org/articles/the-emerging-iranian-military-threat-in-the-middle-east/',
   'content': "April 30, 2024. The failure of Iran's recent drone and missile attack on Israel is obscuring a critical inflection in the threat that Iran poses to the United States and its regional partners ...",
   'score': 0.97079,
   'raw_content': None},
  {'title': 'Violence in the Mideast, rising threats from Islamic State group in ...',
   'url': 'https://apnews.com/article/iran-middle-east-mideast-attack-treat-bc88b3364ebc2cf7dfbfea4568eb4d96',
   'content': 'The top U.S. commander for the Mideast told a Senate committee Thursday, March 7, 2024, that exploding violence in the Middle East, fueled by Iran, presents the most likely threat to the U.S. homela

In [20]:
news_docs = []

for doc in response['results']:
    #print(doc)
    title = doc['title']
    content = doc['content']

    compiled_doc = title + " | " + content

    print(compiled_doc)
    news_docs.append(compiled_doc)

    print(" ")


print(len(news_docs))

The Emerging Iranian Military Threat in the Middle East | April 30, 2024. The failure of Iran's recent drone and missile attack on Israel is obscuring a critical inflection in the threat that Iran poses to the United States and its regional partners ...
 
Violence in the Mideast, rising threats from Islamic State group in ... | The top U.S. commander for the Mideast told a Senate committee Thursday, March 7, 2024, that exploding violence in the Middle East, fueled by Iran, presents the most likely threat to the U.S. homeland, and the risk of an attack by violent extremists in Afghanistan on American and Western interests abroad is increasing. (AP Photo, File)
 
Iran's Attack and the New Escalatory Cycle in the Middle East | The Middle East is entering a new phase after unprecedented attacks by Israel and Iran during the first two weeks of April. Robin Wright, a senior fellow at USIP and the Woodrow Wilson Center who has covered the region for a half century, explores what happened, the

Here is the ontology we will use for the LOTR summaries ->

In [21]:

ontology = Ontology(
    labels=[
        {"Person": "Person name without any adjectives, Remember a person may be referenced by their name or using a pronoun"},
        {"Object": "Do not add the definite article 'the' in the object name"},
        {"Event": "Event event involving multiple people. Do not include qualifiers or verbs like gives, leaves, works etc."},
        {"Place": "Geographic location, country, city, or region."},
        "Document",
        {"Organization": "Reprsenting a group of people with common ideals or interests."},
        {"Action": "Activities taken by a Person or Organization."},
        {"Miscellaneous": "Any important concept can not be categorised with any other given label"},
    ],
    relationships=[
        "Relation between any pair of Entities"
        ],
)


## Select a Model

Groq support the following models at present. 

*LLaMA3 8b*
Model ID: llama3-8b-8192

*LLaMA3 70b*
Model ID: llama3-70b-8192

*Mixtral 8x7b*
Model ID: mixtral-8x7b-32768

*Gemma 7b*
Model ID: gemma-7b-it


Selecting a model for this example ->


In [22]:

## Groq models
#model = "mixtral-8x7b-32768"
#model ="llama3-8b-8192"
# model = "llama3-70b-8192"
# model="gemma-7b-it"

## Open AI models
#oai_model="gpt-3.5-turbo"

## Use Groq
#llm = GroqClient(model=model, temperature=0.1, top_p=0.5)
## OR Use OpenAI
#llm = OpenAIClient(model=oai_model, temperature=0.1, top_p=0.5)
## OR Use Local Ollama Docker Server
llm = LocalClient(model="llama3", temperature=0.1, top_p=0.5)


## Local Llama
#from langchain_community.llms import Ollama
#from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
#llm = Ollama(model="llama3", base_url="http://locahost:11434", verbose=True)


## Create documents out of text chumks. 
Documents is a pydantic model with the following schema 

```python
class Document(BaseModel):
    text: str
    metadata: dict
```

The metadata we add to the document here is copied to every relation that is extracted out of the document. More often than not, the node pairs have multiple relation with each other. The metadata helps add more context to these relations

In this example I am generating a summary of the text chunk, and the timestamp of the run, to be used as metadata. 


In [23]:
import datetime
current_time = str(datetime.datetime.now())


graph_maker = GraphMaker(ontology=ontology, llm_client=llm, verbose=False)

def generate_summary(text):
    SYS_PROMPT = (
        "Succintly summarise the text provided by the user. "
        "Respond only with the summary and no other comments"
    )
    try:
        summary = llm.generate(user_message=text, system_message=SYS_PROMPT)
    except:
        summary = ""
    finally:
        return summary


docs = map(
    lambda t: Document(text=t, metadata={"summary": generate_summary(t), 'generated_at': current_time}),
    news_docs
)


## Create Graph
Finally run the Graph Maker to generate graph. 

In [24]:

graph = graph_maker.from_documents(
    list(docs), 
    delay_s_between=5 ## delay_s_between because otherwise groq api maxes out pretty fast. 
    ) 
print("Total number of Edges", len(graph))

[92m[39m
[92m▶︎ GRAPH MAKER LOG - 2024-05-24 20:31:26 - INFO [39m
[92mDocument: 1[39m
[92m[39m
[34m[39m
[34m▶︎ GRAPH MAKER VERBOSE - 2024-05-24 20:31:26 - INFO [39m
[34mUsing Ontology:
labels=[{'Person': 'Person name without any adjectives, Remember a person may be referenced by their name or using a pronoun'}, {'Object': "Do not add the definite article 'the' in the object name"}, {'Event': 'Event event involving multiple people. Do not include qualifiers or verbs like gives, leaves, works etc.'}, {'Place': 'Geographic location, country, city, or region.'}, 'Document', {'Organization': 'Reprsenting a group of people with common ideals or interests.'}, {'Action': 'Activities taken by a Person or Organization.'}, {'Miscellaneous': 'Any important concept can not be categorised with any other given label'}] relationships=['Relation between any pair of Entities'][39m
[34m[39m


Using Model:  llama3
Using Model:  llama3
Using Model:  llama3
Using Model:  llama3
Using Model:  llama3
Using Model:  llama3


ValueError: Invalid input type <class 'dict'>. Must be a PromptValue, str, or list of BaseMessages.

In [9]:
for edge in graph:
    print(edge.model_dump(exclude=['metadata']), "\n\n")

{'node_1': {'label': 'Organization', 'name': 'U.S. Department of Defense'}, 'node_2': {'label': 'Organization', 'name': 'CENTCOM'}, 'relationship': 'The U.S. Department of Defense provides additional resources to CENTCOM.', 'order': 0} 


{'node_1': {'label': 'Person', 'name': 'Iran'}, 'node_2': {'label': 'Person', 'name': 'United States'}, 'relationship': 'Iran has a 40-year pattern of violence against the United States.', 'order': 0} 


{'node_1': {'label': 'Person', 'name': 'United States'}, 'node_2': {'label': 'Person', 'name': 'Iran'}, 'relationship': 'The United States and its allies have been victims of violence by Iran.', 'order': 0} 


{'node_1': {'label': 'Organization', 'name': 'USIP'}, 'node_2': {'label': 'Organization', 'name': 'Woodrow Wilson Center'}, 'relationship': 'Robin Wright is a senior fellow at USIP and the Woodrow Wilson Center.', 'order': 1} 


{'node_1': {'label': 'Person', 'name': 'Robin Wright'}, 'node_2': {'label': 'Organization', 'name': 'USIP'}, 'relation

# Save the Graph to Neo4j 

In [10]:
from knowledge_graph_maker import Neo4jGraphModel

create_indices = False
neo4j_graph = Neo4jGraphModel(edges=graph, create_indices=create_indices)

neo4j_graph.save()


aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
aenter called
aexit called
a

632

## Georefence Entities

In [None]:
import requests
import time

# Search Input Terms
key_terms = ['South Korea', 'United States']  ## should come from named_places
layer_0 = "country"
layer_1 = "city"
limit = 1

# Data Collection & Extraction
for term in key_terms:
    print(term)
    # Make a request to the endpoint
    url = f"https://photon.komoot.io/api/?"
    querystring = f"q={term}&layer={layer_0}&layer={layer_1}&limit={limit}"

    response = requests.request("GET", url, params=querystring)

    # Parse the response
    data = response.json()
    print(len(data['features']))

    print(data['features'])
    print(data['features'][0]['geometry']['coordinates'])

    time.sleep(1)