Here we start with a general initialization of our system. 

In [11]:
ip = '127.0.0.1'
embedding_port = '8082'
llm_port = '8081'

In the first step we will import a few libraries:
- *wrap* from textwrap allows us to create texts with a certain length of characters from the original text
- *requests* shall allow us to perfom REST api queries
- *json* shall allow us to format REST payloads, right
- *numpy* provides us with handy math and array functions
- and we also need the square root *sqrt* for vector normalizations

In [13]:
from textwrap import wrap
import requests
import json
import numpy as np
from math import sqrt

f = open('s12879-022-07821-5.txt','r')
text = f.read()
wrapped_text = wrap(text,1000)
f.close()

We wrapped the text into chunks of containing a maximum of 1000 characters each. *wrap* stops at spaces, and hence the chunks are a bit smaller. We have obtained 53 such chunks.

In [14]:
len(wrapped_text)

53

In the next step we query our embedding server, with one of the text chunks and look at the response that we get.

In [15]:
url = 'http://' + ip + ':' + embedding_port + '/embeddings'

payload = { 'content': wrapped_text[0] }
resp = requests.post(url,json=payload)
embedding_json = resp.json()
embedding_json

[{'index': 0,
  'embedding': [[-0.04028487205505371,
    0.04802819713950157,
    -0.02903229370713234,
    -0.06878755241632462,
    0.11609193682670593,
    -0.06789445877075195,
    -0.0677323266863823,
    0.0940631628036499,
    0.050849925726652145,
    0.06822194159030914,
    0.009372207336127758,
    -0.012343090958893299,
    0.01357260625809431,
    0.04474422708153725,
    0.034388575702905655,
    -0.04073406755924225,
    -0.10449543595314026,
    -0.10939788073301315,
    -0.030441967770457268,
    0.008723844774067402,
    -0.016679581254720688,
    0.06274183094501495,
    0.03222104534506798,
    -0.032788459211587906,
    -0.07420777529478073,
    0.0008840003283694386,
    -0.07952026277780533,
    0.02536260522902012,
    -0.011875877156853676,
    0.003985620103776455,
    0.027599969878792763,
    0.043289702385663986,
    -0.10317948460578918,
    0.010762388817965984,
    -0.022555503994226456,
    -0.05548817664384842,
    -0.014536811038851738,
    0.03213461

Our first goal is to create a vector data base from the text chunks. Therefore we create embeddings for all text chunks that we have obtained. We pack each embedding into a numpy.array (vector). This embeddings are added to a list, called embeddings. 

In [16]:
import requests
import json
import numpy as np 

url = 'http://' + ip + ':' + embedding_port + '/embeddings'

embeddings = []
for text in wrapped_text:
    payload = { 'content': text }
    resp = requests.post(url,json=payload)
    embedding_json = resp.json()
    embedding = np.array(embedding_json[0]['embedding'][0])
    embeddings.append(embedding)

We see that each embedding vector contains 384 values

In [17]:
len(embeddings[0])

384

As for each text we have obtained a single embedding we have a 
total of 53 embedding vectors

In [18]:
len(embeddings)

53

For simplicity our vector database shall be described as a Pandas Dataframe

In [19]:
import pandas as pd

vector_data_base = pd.DataFrame({ 'text': wrapped_text,
                                  'embeddings': embeddings })

In [20]:
vector_data_base.head()

Unnamed: 0,text,embeddings
0,(2022) 22:815 Haschka et al. BMC Infectious Di...,"[-0.04028487205505371, 0.04802819713950157, -0..."
1,variant model accounting for cross-immunity th...,"[0.032607097178697586, 0.05761970952153206, 0...."
2,"R̂omicron , was found to range between 1.51 an...","[0.014789286069571972, -0.013547075912356377, ..."
3,December 2019 and subsequently spread all over...,"[-0.041069965809583664, 0.0405275784432888, -0..."
4,"holder. To view a copy of this licence, visit ...","[-0.08132666349411011, 0.008967764675617218, -..."


In [21]:
vector_data_base['embeddings'][0]

array([-4.02848721e-02,  4.80281971e-02, -2.90322937e-02, -6.87875524e-02,
        1.16091937e-01, -6.78944588e-02, -6.77323267e-02,  9.40631628e-02,
        5.08499257e-02,  6.82219416e-02,  9.37220734e-03, -1.23430910e-02,
        1.35726063e-02,  4.47442271e-02,  3.43885757e-02, -4.07340676e-02,
       -1.04495436e-01, -1.09397881e-01, -3.04419678e-02,  8.72384477e-03,
       -1.66795813e-02,  6.27418309e-02,  3.22210453e-02, -3.27884592e-02,
       -7.42077753e-02,  8.84000328e-04, -7.95202628e-02,  2.53626052e-02,
       -1.18758772e-02,  3.98562010e-03,  2.75999699e-02,  4.32897024e-02,
       -1.03179485e-01,  1.07623888e-02, -2.25555040e-02, -5.54881766e-02,
       -1.45368110e-02,  3.21346149e-02, -4.09230143e-02,  4.85786237e-02,
        7.10645989e-02, -1.77366249e-02,  3.98786850e-02, -3.51186283e-02,
       -1.45529043e-02, -3.84208001e-02, -8.25383663e-02,  2.68522352e-02,
        3.74357961e-03, -2.62036324e-02, -2.66808756e-02,  1.12981992e-02,
       -6.22144453e-02,  

Now we need to query the vector database.
In order to perform such a query we create ourselfs utility functions.
We define the cosine distance. The query should result in a couple of
passages from the PDF that yield the shortest cosine distances from a given 
embedding. 

In [22]:
def cosine_distance(a,b):
    return(float(1.-((np.dot(a,b))/(sqrt(np.dot(a,a))*sqrt(np.dot(b,b))))))

def best_answers(n,query,database):
    distances = []
    for i in range(len(vector_data_base)):
        distances.append(cosine_distance(database['embeddings'][i],query))
    local_db = database
    local_db['distances'] = distances
    local_db = local_db.nsmallest(n,'distances')
    return(list(local_db['text']))

We also have to encode a question given to the LLM into an embedding. This embedding is going to be used to find semantically close passages from our pdf file. 
If the query to our RAG enabled LLM is really long it might contain a lot of characters. In such a case we add the embeddings up and renormalize them. 

In [23]:
def encode_into_single_embedding(intext):
    embeddings = []
    wrapped_text = wrap(intext,1000)
    
    for text in wrapped_text:
        payload = { 'content': text }
        resp = requests.post(url,json=payload)
        embedding_json = resp.json()
        embedding = np.array(embedding_json[0]['embedding'][0])
        embeddings.append(embedding)

    embeddings_length = len(embeddings[0])
    added_embeddings = np.zeros(embeddings_length)

    for i in range(len(embeddings)):
        added_embeddings = np.add(added_embeddings,embeddings[i])

    added_embeddings = added_embeddings / np.dot(added_embeddings,
                                                 added_embeddings)
    return(added_embeddings)

In [24]:
encoded = encode_into_single_embedding("What is the article's title")
encoded

array([-2.91900433e-02,  9.84097507e-02,  1.69230869e-02,  7.30054163e-02,
        1.21640704e-02,  2.23560112e-02, -4.12694323e-03,  1.22863540e-02,
        8.77115140e-03,  1.10309628e-02, -1.43572978e-02,  5.87994463e-02,
       -3.54272355e-02, -1.38455379e-02, -2.48810130e-02, -9.81730606e-03,
       -2.84182582e-02,  3.84686221e-02,  2.71460063e-02,  5.72059496e-02,
       -3.19690969e-02,  4.93710717e-02,  2.83320270e-02, -9.28254809e-03,
       -2.68279447e-02,  9.88050934e-03, -3.22751220e-02, -6.11688716e-03,
       -6.34040914e-02, -5.42492128e-02,  2.50429966e-03,  1.93842037e-02,
        8.07313034e-02,  3.97206959e-02, -4.49994910e-03, -4.05552131e-02,
       -9.06956858e-03, -1.00290152e-02,  9.76103481e-02,  5.16673817e-02,
        5.25072782e-02, -1.60725342e-01, -7.07712219e-03, -2.51862652e-02,
        9.63989470e-03,  5.44269613e-02, -2.31511819e-02, -1.67143421e-02,
       -8.34758664e-02,  3.72869973e-02, -6.78082638e-02, -7.16736399e-02,
       -7.29872481e-03, -

In [25]:
uberbest = best_answers(5, encoded, vector_data_base)
uberbest[1]

'data initiative https://\u200bwww.\u200bdata.\u200bgouv.\u200bfr, and as such this section is not applicable. Consent for publication where human participants can be identified: Not applicable, see above. Consent for publication The authors declare that they all agree with the publication of this paper. No further consent is needed as no further identifiable individuals have participated in this study, and the data used for this study was made publicly available by Santé Publique France and the French open data initiative https://\u200b www.\u200bdata.\u200bgouv.\u200bfr.  References 1. Ge Y, Zhang W-B, Liu H, Ruktanonchai CW, Hu M, Wu X, Song Y, Ruktanonchai NW, Yan W, Cleary E, Feng L, Li Z, Yang W, Liu M, Tatem AJ, Wang J-F, Lai S. Impacts of worldwide individual non-pharmaceutical interventions on COVID-19 transmission across waves and space. Int J Appl Earth Obs Geoinf. 2022;106: 102649. https://\u200bdoi.\u200borg/\u200b10.\u200b1016/j.\u200bjag.\u200b2021.\u200b102649. 2. Viana

As RAG is somehow very advanced prompt engineering we have to create
ourselfs a prompt. In order to build the prompt we will use the following
helper strings containing llm-system tokens. In order to better understand 
how we shall enginneer our prompt we shall have a look at llama3's chat template.

In [26]:
system_msg_start = \
    "<|start_header_id|>system<|end_header_id|> \n\n" + \
    "You are a helpful AI assistent in answering prompt, " + \
    "taking the following contexts into account " + \
    "as good as you can as you answer. \n"

user_msg_start = \
    "<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"

assistent_msg_start = \
    "<|eot_id|><|start_header_id|>assistent<|end_header_id|>\n\n"

prompt_counter = 0;
prompt = ""

The prompt shall contain passages according to the *first* query. It shall further contain the chat history, so that the llm knows its previous answers. With the above utility functions and strings at hand build the prompt as shown below. We can reuse this function in order to add preceding conversations *history* to the prompt.

In [27]:
def build_prompt_from_query(n,prompt,user_query):

    global prompt_counter
    if prompt_counter == 0:
        encoded = encode_into_single_embedding(user_query)
        uberbest = best_answers(n, encoded, vector_data_base)

        prompt = system_msg_start
        for i in range(n):
            prompt = prompt + '\n' + uberbest[i] + '\n'

        prompt = prompt + user_msg_start + '\n'
        prompt = prompt + user_query + '\n'
        prompt = prompt + assistent_msg_start + '\n'

    else:
        prompt = prompt + user_msg_start + '\n'
        prompt = prompt + user_query + '\n'
        prompt = prompt + assistent_msg_start + '\n'

    prompt_counter = prompt_counter + 1
    return(prompt)

def add_llm_answer(prompt, llm_answer):
    prompt = prompt + llm_answer + '\n'
    return(prompt)

Finally we would like to get a response from the llm to our engineered prompts. To query the LLM and present the result we use the following utility function.

In [28]:
def get_llm_response(n,tokens,prompt):

    url = 'http://' + ip + ':' + llm_port + '/completion'
    payload = { "prompt": str(prompt),
                "n_predict": int(tokens),
                "stream": False }

    resp = requests.post(url,json=payload)
    answer = resp.json()

    return(answer['content'])

Finally we put everything together in the chat function, that we can call multiple times, in order to convers with our LLM.

In [29]:
def chat(n,usermsg,prompt):
    prompt = build_prompt_from_query(n,prompt,usermsg)
    llm_answer = get_llm_response(n, -1, prompt)
    prompt = add_llm_answer(prompt,llm_answer)
    print(llm_answer)
    return(prompt)

Now at the final cell that we can execute multiple times, conversing with the LLM. 

In [30]:
n = 5 # documents to be retrieved for the rag
your_message = "What do you know about the OMICRON Variant in France"
prompt = chat(n,your_message,prompt)

Based on the provided text, here's what I know about the OMICRON variant in France:

1. **Detection**: The detection of the OMICRON variant in France was noticed at the start of December 2021.
2. **Rapid growth**: Early assessments of its dynamics pointed to a rapid growth and a substantial spreading advantage over the delta variant, which was the circulating variant in France at that time.
3. **High infection rates**: The rate of daily detected cases underwent unprecedented growth, with over 300,000 detected cases per day registered in the first half of January 2022.
4. **Circulation**: The OMICRON variant, specifically the lineage BA.1, was found to be circulating in all regions of metropolitan France within three weeks of its introduction.
5. **Co-circulation**: The OMICRON variant co-circulated with the delta and beta/gamma variants in France.
6. **Surveillance**: A dedicated surveillance protocol was established in France to detect the OMICRON variant, which targeted a set of muta

In [31]:
prompt = chat(n,"Tell me more about Point 6 - Mutations!", prompt)

Based on the provided text, here's more information about the mutations related to the OMICRON variant in France:

1. **Initial set of mutations**: The initial set of mutations targeted by the surveillance protocol in France was specific to the OMICRON variant. However, this set was soon updated.
2. **Updated set of mutations**: The updated set of mutations in late December 2021 included:
	* Deletion of site 69/70
	* Substitutions K417N and/or S371L-S373P and/or Q493R
3. **L452R mutation**: The L452R mutation was used as an indicator to describe the delta variant, but not the OMICRON variant. However, the OMICRON lineage BA.5 could exert the L452R mutation, but it was not present in metropolitan France in the investigated period.
4. **Alternative indicator**: In the sensitivity analysis, the alternative indicator was used to describe the OMICRON variant, which included the absence of the L452R mutation.

These mutations were used to develop a surveillance protocol to detect the OMICRON

In [32]:
prompt

"<|start_header_id|>system<|end_header_id|> \n\nYou are a helpful AI assistent in answering prompt, taking the following contexts into account as good as you can as you answer. \n\npossibilities. C As an illustration, percentages of delta and omicron obtained from Santé Publique France for the Ile-de-France region with the 20 day window of opportunity around the inflection point as it has been chosen for this modelling study  during the period from the end of November and beginning of January when such an indicator was adjusted and gradually adopted throughout the French territory. Given our interest on the entire replacement period between the beginning of December and the end of January, we assumed in the baseline analysis omicron to be described by the absence of L452R and E484K mutations. Still, we considered the alternative indicator in the sensitivity analysis. Visual inspection of the time series between 1st December 2021 and 31th January 2022 reveals that the invasion of omicro

In [9]:
embedding_process.terminate()

In [10]:
llm_process.terminate()