If running in Google Colab please run the following cell

In [None]:
%pip install agraph-python langchain pandas shortuuid

# AllegroGraph LLM Embedding Examples

In [1]:
from franz.openrdf.connect import ag_connect
from franz.openrdf.vocabulary import RDF, RDFS
from llm_utils import BufferTriples, addArbitraryTextString, read_text, FindNearestNeighbors, AskMyDocuments
from franz.openrdf.model.value import URI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema.document import Document
import shortuuid
import pandas as pd
import urllib
import textwrap
import os

#set your environment variables here
os.environ['AGRAPH_USER'] = 'your username'
os.environ['AGRAPH_PASSWORD'] = 'your password'
os.environ['AGRAPH_HOST'] = 'your host'
os.environ['AGRAPH_PORT'] = 'your port'

Before starting any other work it is very important to set your openai API key in your AG server. The directions are present in the README, but are added here as well due to their importance.

1. Please navigate to your local installation of the new webview
2. Go to the repository where your data is stored (`llm-philosophy` if you're using the repo created from this demo)
3. Go to `Repository Control` in the left column under `Repository` and search for `query execution options`. Select it.
4. Select `+ New Query Option` and add **openaiApiKey** as the _name_, and your OpenAI api key as the _value_. Set the `Scope` to **Repository**
5. Don't forget to save by hitting `Save Query Options`!

## Import Philosophy Books into Allegrograph

In this example we read in a few philosophy books from [Project Gutenberg](https://www.gutenberg.org/) and then show the power of two new AllegroGraph Magic Predicates, `llm:nearestNeighbor` and `llm:askMyDocuments`. First we select a few books from the Gutenberg Library and gather some data from the website manually and add it to the following dictionary. The keys of the dictionary are what we want the URI of the book to be in the graph. The author and title speak for themselves, and then the contents are a link to a text version of book ([example here](https://www.gutenberg.org/cache/epub/7370/pg7370.txt)):

In [2]:
documents = {
    "http://franz.com/llm/SecondTreatiseOfGovernment":{
        "author": "John Locke",
        "title": "Second Treatise of Government",
        "contents": "https://www.gutenberg.org/cache/epub/7370/pg7370.txt",
    },
    "http://franz.com/llm/TheRepublic":{
        "author": "Plato",
        "title": "The Republic",
        "contents": "https://www.gutenberg.org/cache/epub/1497/pg1497.txt",
    },
    "http://franz.com/llm/CritiqueOfPureReason":{
        "author": "Immanuel Kant",
        "title": "The Critique of Pure Reason",
        "contents": "https://www.gutenberg.org/cache/epub/4280/pg4280.txt"
    },
    "http://franz.com/llm/TreatiseOfHumanNature":{
        "author": "David Hume",
        "title": "A Treatise of Human Nature",
        "contents": "https://www.gutenberg.org/cache/epub/4705/pg4705.txt"
    }
}

Then we connect to a new AllegroGraph repository and declare some local namespaces we will use to add the documents to the graph as triples. Please add the necessary parameters to connect to your AllegroGraph server.

In [3]:
conn = ag_connect('llm-philosophy')
conn.setNamespace('', 'http://franz.com/llm/')
f = conn.namespace('http://franz.com/llm/')

We then loop through the documents and grab the contents using the `urllib` library. For each document we split using [Langchain's Recursive Text Splitter](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter) and add the chunks of text to the graph. You can examine the splitting of the text in the _addArbitraryTextString_ code in the `llm_utils.py` file in this same directory.

In [None]:
buffer = BufferTriples(conn)
for document in documents:
    id_uri = conn.createURI(document)
    buffer.add((id_uri, RDF.TYPE, f.Document, id_uri))
    buffer.add((id_uri, f.title, documents[document]['title'], id_uri))
    buffer.add((id_uri, RDFS.LABEL, documents[document]['title'], id_uri))
    text = read_text(documents[document]['contents'])
    buffer = addArbitraryTextString(conn, f, buffer, text, id_uri)
buffer.flush_triples()

Note that the following image may not be the same for you depending on the splitter used.

![philosophy-books](images/philosophy-books.png)

## Indexing the text fragments with openAI

In order to ask question of your own documents you first need to index all the text fragment using agtool and openAI embeddings. The documentation on how to do this and how it works can be found [here](https://internal.franz.com/people/dm/docs/llmembed.html). 

For agtool to work it needs some metadata that we put in a .def file as described in the link above. In this tutorial we define `philosophy.def`
```text
gpt
 openai-api-key "your-openai-api-key-here"
 vector-database-name "philosophy"
 limit 10000
 vector-database-dim 1536
 include-predicates <http://franz.com/llm/text>
```
The explanations of thses parameters can be found in the documentation linked above.

We run the following command to index the documents (the command will change depending on your location of documents/server etc.) This command assumes you created an alias for agtool, and that your server runs on localhost:10035. 

```shell
agtool llm index localhost:10035/llm-philosophy philosophy.def 
```

Once the embedding is done we can starting querying the graph!

### Nearest Neighbor SPARQL Query

The general syntax for the query clause of `llm:nearestNeighbors` is as follows:
```
(?uri ?score ?originalText) llm:nearestNeighbor (?text ?vector-database ?topN ?minScore)
```

Please make sure you have set your OpenAI API key in Webview before attempting the following queries.

We will start with a sample SPARQL query for nearest neighbors.

In [4]:
query_string = """
        PREFIX llm: <http://franz.com/ns/allegrograph/8.0.0/llm/> 
        select * where { 
            (?uri ?score ?originalText) llm:nearestNeighbor ("government" "philosophy" 10 0.1)  }"""
with conn.executeTupleQuery(query_string) as result:
    df = result.toPandas()
df.head()

Unnamed: 0,originalText,score,uri
0,"private education of parents, contribute to th...",0.820638,<http://franz.com/llm/TreatiseOfHumanNature_1186>
1,"to a kind of natural authority, that the chief...",0.818856,<http://franz.com/llm/SecondTreatiseOfGovernme...
2,"government, and do either promote, or not, wha...",0.817867,<http://franz.com/llm/SecondTreatiseOfGovernme...
3,"world, government would still be necessary in ...",0.814602,<http://franz.com/llm/TreatiseOfHumanNature_1213>
4,money which they prize; they will spend that w...,0.814055,<http://franz.com/llm/TheRepublic_1087>


### Wrapping nearestNeighbor in a function. 
We wrote a sample class that allows users to find nearest neighbors and also perform some additional tasks with the response object. The parameters are:
- `conn` - The connection object
- `phrase` - the phrase for which you are looking to find the nearest neighbors
- `vector_db` - the vector database
- `number` - (optional) set to 10 if not declared, sets the maximum number of neighbors you wished returned
- `confidence` - (optional) set to .5 if note declared, sets the minimum matching score for all returned vectors


In [5]:
nn = FindNearestNeighbors(conn, 'government', 'philosophy')

private education of parents, contribute to the giving us a sense of honour and duty in the strict
regulation of our actions with regard to the properties of others.     SECT. VII OF THE ORIGIN OF
GOVERNMENT   Nothing is more certain, than that men are, in a great measure, governed by interest,
and that even when they extend their concern beyond themselves, it is not to any great distance; nor
is it usual for them, in common life, to look farther than their nearest friends and acquaintance.
It is no less certain, that it is impossible for men to consult, their interest in so effectual a
manner, as by an universal and inflexible observance of the rules of justice, by which alone they
can preserve society, and keep themselves from falling into that wretched and savage condition,
which is commonly represented as the state of nature. And as this interest, which all men have in
the upholding of society, and the observation of the rules of justice, is great, so is


The following method simply prints the URI of the nearest vector, the matching score, and the text of the matching vector. 

In [6]:
nn.proof()

0 <http://franz.com/llm/TreatiseOfHumanNature_1186> 0.8206376
private education of parents, contribute to the giving us a sense of honour and duty in the strict
regulation of our actions with regard to the properties of others.     SECT. VII OF THE ORIGIN OF
GOVERNMENT   Nothing is more certain, than that men are, in a great measure, governed by interest,
and that even when they extend their concern beyond themselves, it is not to any great distance; nor
is it usual for them, in common life, to look farther than their nearest friends and acquaintance.
It is no less certain, that it is impossible for men to consult, their interest in so effectual a
manner, as by an universal and inflexible observance of the rules of justice, by which alone they
can preserve society, and keep themselves from falling into that wretched and savage condition,
which is commonly represented as the state of nature. And as this interest, which all men have in
the upholding of society, and the observation of the

The following method of `add_neighbors_to_graph` creates a connection between each of the "neighbors". It does this with the following steps:
- creates a node for the request which stores the phrase, the minimum matching score, and when the query was run
- We create a blank node which stores the matching score and the index.
- Then we connect the blank nodes to the respective text chunk.

In [7]:
nn.add_neighbors_to_graph()

The following is an image generated from an old example and will look slightly different

![nearest neighbors](images/nearestneighbors.png)

In [6]:
nn = FindNearestNeighbors(conn, 'What is Human Nature', 'philosophy', number=10, confidence=.2)

which will bear the examination of the latest posterity. For my part, my only hope is, that I may
contribute a little to the advancement of knowledge, by giving in some particulars a different turn
to the speculations of philosophers, and pointing out to them more distinctly those subjects, where
alone they can expect assurance and conviction. Human Nature is the only science of man; and yet has
been hitherto the most neglected. It will be sufficient for me, if I can bring it a little more into
fashion; and the hope of this serves to compose my temper from that spleen, and invigorate it from
that indolence, which sometimes prevail upon me. If the reader finds himself in the same easy
disposition, let him follow me in my future speculations. If not, let him follow his inclination,
and wait the returns of application and good humour. The conduct of a man, who studies philosophy in
this careless manner, is more truly sceptical than that of one, who feeling in himself


In [7]:
nn.proof()

0 <http://franz.com/llm/TreatiseOfHumanNature_618> 0.8471893
which will bear the examination of the latest posterity. For my part, my only hope is, that I may
contribute a little to the advancement of knowledge, by giving in some particulars a different turn
to the speculations of philosophers, and pointing out to them more distinctly those subjects, where
alone they can expect assurance and conviction. Human Nature is the only science of man; and yet has
been hitherto the most neglected. It will be sufficient for me, if I can bring it a little more into
fashion; and the hope of this serves to compose my temper from that spleen, and invigorate it from
that indolence, which sometimes prevail upon me. If the reader finds himself in the same easy
disposition, let him follow me in my future speculations. If not, let him follow his inclination,
and wait the returns of application and good humour. The conduct of a man, who studies philosophy in
this careless manner, is more truly sceptical t

In [8]:
nn = FindNearestNeighbors(conn, 'Philosopher King', 'philosophy', confidence=.8)

State, and until a like necessity be laid on the State to obey them; or until kings, or if not
kings, the sons of kings or princes, are divinely inspired with a true love of true philosophy. That
either or both of these alternatives are impossible, I see no reason to affirm: if they were so, we
might indeed be justly ridiculed as dreamers and visionaries. Am I not right?  Quite right.  If
then, in the countless ages of the past, or at the present hour in some foreign clime which is far
away and beyond our ken, the perfected philosopher is or has been or hereafter shall be compelled by
a superior power to have the charge of the State, we are ready to assert to the death, that this our
constitution has been, and is—yea, and will be whenever the Muse of Philosophy is queen. There is no
impossibility in all this; that there is a difficulty, we acknowledge ourselves.  My opinion agrees
with yours, he said.


### askMyDocuments SPARQL Query

This magic predicates will force chatGPT to read the topN nearest neighbors found by the function llm:nearestNeighbor and then give an answer using only the output of that function. The syntax of this magic predicate follows here, see also documentation <here>:
```
(?response ?citation ?score) llm:askMyDocuments (?query ?vectorDatabase ?topN ?minScore)
```

In [8]:
query_string = """
    PREFIX llm: <http://franz.com/ns/allegrograph/8.0.0/llm/>
    select * where {
        (?response ?score ?citation ?content) llm:askMyDocuments ("cite two opposing views on government" "philosophy" 10 .5) }"""
with conn.executeTupleQuery(query_string) as result:
    df = result.toPandas()
df.head()

Unnamed: 0,citation,content,response,score
0,<http://franz.com/llm/TheRepublic_483>,"regarded from another point of view, is a mili...",One perspective views the government as a nece...,0.824885
1,<http://franz.com/llm/TreatiseOfHumanNature_1227>,their history: and nothing but the most violen...,One perspective views the government as a nece...,0.824368


We have created another class as an example that shows some possible functionality. Again, the code for this can be found in `llm_utils.py`. The creation of a `AskMyDocuments` class always prints the response for ease of use in this notebook. The arguments are as follows:
- `conn` - the connection object
- `question` - the question to your documents
- `vector_db` - the vector database where indexed text is stored
- `number` - the maximum number of responses
- `confidence` - the minimum matching score

In [8]:
response = AskMyDocuments(conn, 'cite two opposing views on government', 'philosophy', number=20, confidence=0)

RESPONSE: One view of government, as discussed in 'The Republic' (http://franz.com/llm/TheRepublic_483), is that it is all-encompassing and absorbs all other desires and affections of the citizens. It is seen as an institution that provides stability and order, focusing primarily on war and philosophy. In times of peace, citizens' duties to the State, which are also their duties to one another, take up their whole life and time. 

An opposing view from 'Treatise Of Human Nature' (http://franz.com/llm/TreatiseOfHumanNature_1227) argues that government is a human invention for the interest of society, and when the tyranny of the governor removes this interest, it also removes the natural obligation to obedience. This view suggests that resistance is not only justified but also morally obligatory in instances of tyranny and oppression. 

In summary, one view sees government as a necessary and omnipresent force for order and stability, while the other sees it as a human construct that can 

The `df` method simple gives the user access to the complete response of the SPARQL query.

In [5]:
response.df

Unnamed: 0,citation,content,response,score
0,<http://franz.com/llm/TheRepublic_483>,"regarded from another point of view, is a mili...","One perspective on government, as derived from...",0.824915
1,<http://franz.com/llm/TreatiseOfHumanNature_1227>,their history: and nothing but the most violen...,"One perspective on government, as derived from...",0.824385
2,<http://franz.com/llm/SecondTreatiseOfGovernme...,"government, and do either promote, or not, wha...","One perspective on government, as derived from...",0.823392


Similar to the `llm:nearestNeigbor` example we also created a similar function that links all evidence chunks of the response to a newly created response object that stores the metadata of the `AskMyDocuments` class.

In [7]:
response.add_evidence_to_graph()

![ask my documents](images/ask_my_documents.png)

In [10]:
response = AskMyDocuments(conn, 'What is the purpose of humanity', 'philosophy')

RESPONSE: The purpose of humanity, as gleaned from these texts, seems to encompass several intertwined facets. One key purpose is the continued enlightenment and growth of the human mind, striving for a more perfect realization of our present life and the potential future of the human race. This involves a persistent pursuit of the common interest, facilitated by education and the breaking away from traditional constraints. Furthermore, humanity is also tasked with the moral obligation of ensuring the happiness and well-being of all individuals, with the highest and noblest motivations rooted in virtue and goodness. This moral obligation extends beyond mere earthly utility, as humanity is gifted with a consciousness of probity and a moral law that transcends immediate advantageous consequences. Finally, there's an inherent responsibility towards self-regulation and moderation in our appetites and health, maintaining a delicate balance between our intellectual and sensory experiences.


In [9]:
response = AskMyDocuments(conn, "What state are humans naturally in", 'philosophy')

RESPONSE: Humans are naturally in a state of perfect freedom and equality, able to order their actions and dispose of their possessions as they see fit within the bounds of natural law, without requiring permission from or dependence on any other person. This is the 'state of nature'. However, this condition does not seem to last for a significant period, as humans tend to form societies and enter into a 'state of society' where there are agreed rules and governance. This transition is driven by the drawbacks of the state of nature, including the lack of a common superior or authority to judge between individuals, making the enforcement of rights and justice problematic.


## Asking Questions of a Contract

In this example we will ask questions of a contract and show the section(s) of the contract where the answers can be found. [Here](https://sccrtc.org/wp-content/uploads/2010/09/SampleContract-Shuttle.pdf) is a link the contract in question.

We start by creating a new repository and adding the previously parsed contract triples (please add your parameters to connect to the server)

In [13]:
conn = ag_connect('llm-contracts')
conn.addFile('contract.nt')
conn.size()

1024

![contract gruff image](images/contract-compensation.png)

Now we have to index the text again. We define `contract.def` as follows:
```
gpt
 openai-api-key "your-openai-api-key-here"
 vector-database-name "contract"
 vector-database-dim 1536
 limit 1000000
 splitter list
 include-predicates <http://franz.com/hasContent>
```

And then again we run the `agtool llm index` command as follows:

```shell
./agtool llm index localhost:10035/llm-contracts ../../demos/llm/contract.def 
```

Once all text has been indexed we can start asking the document questions!

In [14]:
response = AskMyDocuments(conn, 'Can we pay the consultant a bonus?', 'contract')

RESPONSE: Based on the provided content, there is no explicit prohibition or allowance for paying the consultant a bonus in the agreement. The compensation is stated to be for the consultant accomplishing a certain result (2), and progress payments are made based on satisfactory services provided and actual allowable incurred costs (2._E). However, the consultant is deemed an independent consultant and not an employee (11), and the agreement does not mention any employee benefits or bonuses. Any additional compensation would likely need to be detailed and agreed upon in a separate arrangement or amendment to the existing agreement, ensuring that it does not fall under rebates, kickbacks or other unlawful consideration (22).


In [4]:
response.proof()

0 0.8274215 <http://franz.com/2.>  2. COMPENSATION. In consideration for CONSULTANT accomplishing said result, COMMISSION agrees to pay CONSULTANT as follows: 

1 0.7981638 <http://franz.com/2._E.>  E. Progress payments will be made no less than monthly in arrears based on satisfactory services provided and actual allowable incurred costs. A pro rata portion of the CONSULTANTs fixed fee, if applicable, will be included in the monthly progress payments. If CONSULTANT fails to submit the required  Page 2 deliverable items according to the schedule set forth in the Scope of Services, the COMMISSION may delay payment and/or terminate this Agreement in accordance with the provisions of Section 4 of this Agreement. 

2 0.8200693 <http://franz.com/22.>  22. REBATES, KICKBACKS OR OTHER UNLAWFUL CONSIDERATION. The CONSULTANT warrants that this Agreement was not obtained or secured through rebates kickbacks or other unlawful consideration, either promised or paid to any COMMISSION employee. For 

In [5]:
response.add_evidence_to_graph()

![ask my contract](images/ask_my_contract.png)

In [6]:
response = AskMyDocuments(conn, "What should the consultant submit with each invoice?", "contract")

RESPONSE: With each invoice, the consultant should submit a written progress report detailing their performance, any interim findings, difficulties encountered, and proposed remedies. The invoice should be itemized and include information such as labor details (staff name, hours charged, hourly billing rate, current charges and cumulative charges), expenses incurred during the billing period, total invoice/payment requested, and total amount previously paid under the agreement. The consultant should also provide a report of expenditures for each task, subtask or milestone, and estimate the percentage completion for each division of work. Along with the invoice, the consultant should also submit any deliverables as per the agreement. If there are any insurance-related payments due, these should also be submitted. All information derived from these deliverables is deemed confidential and may not be disclosed to any other party without prior written consent of the commission.


In [7]:
response = AskMyDocuments(conn, "A third party sued the contractor and tried to collect money from the city.", "contract")

RESPONSE: In such a situation, the contractor, known as the consultant in the content, is expected to defend and hold harmless the Commission, which includes its officers, agents, employees and volunteers from any legal actions as stipulated in the 'Indemnification for Damages, Taxes and Contributions' clause. This means the contractor would be responsible for any legal ramifications. Additionally, if the contractor is found in default of the agreement for any reason, the Commission has the right to terminate the agreement without liability as per the 'Termination' clause. It is also important to note that all activities and files related to the project are subject to review and inspection by the Commission, the state, and the Federal Highway Administration if federal participating funds are used. This is to ensure transparency and compliance with all laws and regulations.


In [8]:
response.proof()

0 0.77711004 <http://franz.com/5.>  5. INDEMNIFICATION FOR DAMAGES, TAXES AND CONTRIBUTIONS. CONSULTANT shall exonerate, indemnify, defend, and hold harmless the COMMISSION (which for the purpose of this Agreement shall include, without limitation, its officers, agents, employees and volunteers) from and against: 

1 0.77066624 <http://franz.com/4._B.>  B. COMMISSION may terminate this Agreement for CONSULTANT's default if a federal or state proceeding for the relief of debtors is undertaken by or against CONSULTANT, or CONSULTANT's  Page 3 principal, or if CONSULTANT or CONSULTANT's principal makes an assignment for the benefit of creditors, or if CONSULTANT breaches any term(s) or violates any provision(s) of this Agreement and does not cure such breach or violation within ten (10) days after written notice thereof by COMMISSION. CONSULTANT shall be liable for any and all reasonable costs incurred by COMMISSION as a result of such default, including but not limited to reprocurement c

In [None]:
def print_text(string: str):
    string = string.replace('\n', '').replace('\r', '')
    wrapper = textwrap.TextWrapper(width=100)
    word_list = wrapper.wrap(text=string)
    for element in word_list:
        print(element)
