# Business Understanding

Document parsers are a powerful tool that can help enterprises automate the process of extracting data from documents. This can save a significant amount of time and money, and can also help to improve the accuracy and efficiency of business processes.

Document parsers can be used to extract data from a wide variety of document types, including employee handbooks, catalogs, invoices, purchase orders, sales orders, shipping and delivery orders, form-based contracts, HR and admin documents, bank and credit card statements, fillable PDF forms, and Word documents.

LLMs are particularly powerful for querying against documents because they can understand the context of the documents and can generate responses that are relevant and informative. For example, if you ask an LLM to summarize the main points of a document, it will be able to identify the most important information in the document and present it in a concise and easy-to-understand way.

LLMs are still under development, but they have the potential to revolutionize the way we interact with documents. LLMs can help us to find information more quickly and easily, understand documents more deeply, and generate new content based on the information in documents.

The purpose of this exercise is to try and implement a LLM response to a localized document. In lieu of a personal document that might be parsed against, I have pulled in some open source documents from the fantastic 
> https://www.gutenberg.org/ebooks/search/?sort_order=downloads

We will try to implement an LLM querying system using natural language prompts to answer using the documents provided. 

**An Important Note**: Since these are open source documents, the LLM might already be aware of the information, so we will prompt the model to only look at the data provided to the LLM as part of the prompt.

# Import

In [6]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

import pprint

# Document Load

In [7]:
for (root, folders, files) in os.walk(top = 'data'):
    
    print([f"{root}/{file}"  for file in files])
    textFiles = [TextLoader(file_path = f"{root}/{file}",autodetect_encoding=True).load() for file in files]

['data/AliceInWonderland.txt', 'data/DollsHouse.txt', 'data/Dracula.txt', 'data/Frankenstein.txt', 'data/LettersFromACat.txt', 'data/Metamorphosis.txt', 'data/PictureOfDorianGray.txt', 'data/PrideAndPrejudice.txt', 'data/RomeoAndJuliet.txt', 'data/ScarletLetter.txt']


In [8]:
for textFile in textFiles:
    print(textFile[0].metadata)

{'source': 'data/AliceInWonderland.txt'}
{'source': 'data/DollsHouse.txt'}
{'source': 'data/Dracula.txt'}
{'source': 'data/Frankenstein.txt'}
{'source': 'data/LettersFromACat.txt'}
{'source': 'data/Metamorphosis.txt'}
{'source': 'data/PictureOfDorianGray.txt'}
{'source': 'data/PrideAndPrejudice.txt'}
{'source': 'data/RomeoAndJuliet.txt'}
{'source': 'data/ScarletLetter.txt'}


The metadata can be better. Instead of having a giant string of the `file path`, I will instead convert it to `file location`, `file type` and `title`. I am most interested in keeping the `title`, the others I'll track primarily for posterity. This can provide one extra datapoint when we are trying to match our queries to the right file. 

In [9]:
def cleanMetaData(metadata):
    import re
    source = metadata['source']
    metadataStrSplit = re.split('/|\.',source)

    fileType = metadataStrSplit.pop()
    title = metadataStrSplit.pop()
    fileLocation = '/'.join(metadataStrSplit)

    return {
        'file location':fileLocation,
        'file type':fileType,
        'file title':title
    }

for textFile in textFiles:
    metaDataKeys = textFile[0].metadata.keys()
    if 'source' in metaDataKeys:
        textFile[0].metadata = cleanMetaData(textFile[0].metadata)

In [10]:
for textFile in textFiles:
    print(textFile[0].metadata)

{'file location': 'data', 'file type': 'txt', 'file title': 'AliceInWonderland'}
{'file location': 'data', 'file type': 'txt', 'file title': 'DollsHouse'}
{'file location': 'data', 'file type': 'txt', 'file title': 'Dracula'}
{'file location': 'data', 'file type': 'txt', 'file title': 'Frankenstein'}
{'file location': 'data', 'file type': 'txt', 'file title': 'LettersFromACat'}
{'file location': 'data', 'file type': 'txt', 'file title': 'Metamorphosis'}
{'file location': 'data', 'file type': 'txt', 'file title': 'PictureOfDorianGray'}
{'file location': 'data', 'file type': 'txt', 'file title': 'PrideAndPrejudice'}
{'file location': 'data', 'file type': 'txt', 'file title': 'RomeoAndJuliet'}
{'file location': 'data', 'file type': 'txt', 'file title': 'ScarletLetter'}


# Text Splitter: Chunkify

In [11]:
splitter = RecursiveCharacterTextSplitter(
    separators = ['\n\n','\n', '.',' '],
    keep_separator=False,
    chunk_size = 1000,
    chunk_overlap  = 100,
    length_function = len,
    is_separator_regex = False,
)

In [12]:
chunks = splitter.split_documents(documents=[textFile[0] for textFile in textFiles])

print(len(chunks))

5161


Lets have a look at one of the chunks

In [60]:
def printChunkInfo(doc, strLen = 300):   
    """
    This function pretty prints a lanchain document by printing the first `strLen` characters of the page content along with the metadata.

    Args:
    doc: Lanchain document objects.
    strLen: The number of characters to print from the page_content string.
    """
    def pretty_print_dict(dict1):
        print('{')
        for key, value in dict1.items():
            print(f'  {key}: {value}')
        print('}')
    
    pagecontent = doc.page_content
    metadata    = doc.metadata
    
    print(f"Printing chunk (First {strLen} chars and the metadata)")
    print('-'*100)
    print(f"Page Content:\n{pagecontent[:strLen]}")
    print('-'*100)
    print(f"MetaData:")
    pretty_print_dict(metadata)

In [61]:
import random
idx = random.randint(0,len(chunks))
randomchunk = chunks[idx]

printChunkInfo(randomchunk)

Printing chunk (First 300 chars and the metadata)
----------------------------------------------------------------------------------------------------
Page Content:
I used my knowledge of this phase of spiritual pathology, and laid down
a rule that she should not be present with Lucy or think of her illness
more than was absolutely required. She assented readily, so readily that
I saw again the hand of Nature fighting for life. Van Helsing and I were
shown up t
----------------------------------------------------------------------------------------------------
MetaData:
{
  file location: data
  file type: txt
  file title: Dracula
}


Looks like we have our corpus ready which is a list of chunks made from all the documents in our data folder. Now we can pass each and every `chunk` to the `LLM` model and have it answer the question using each chunk as a source. 

However, he current approach of passing each and every chunk to the `LLM` model has a number of drawbacks. Firstly, it is computationally expensive, as the model has to process a large amount of data. Secondly, it can be time-consuming, as the model has to make a large number of calls to the `LLM` API. Thirdly, it can be inaccurate, as the model may be misled by irrelevant chunks. 

A more advantageous approach would be to pass only relevant chunks to the LLM model. This would reduce the number of calls to the LLM API, as well as the amount of data that the model has to process, which should lead to a more efficient and cost-effective approach. Additionally, it would improve the accuracy of the model, as the model would not be misled by irrelevant chunks. We can then use the LLM to summarize and give us a final answer based on the answers that it gave for the core *n* chunks.

There are a number of ways to find relevant chunks within a corpus. 

- We could have a `human-in-the-loop` system. In this system, a human would identify the relevant chunks. The identified chunks would then be passed to the LLM model.This is normally the least time efficient solution, even for a marignal sized corpus
- One common automated approach is to use `keyword search`. This involves searching the corpus for chunks that contain specific keywords or phrases. For example, if we are interested in finding chunks about the topic of "natural language processing," we could search the corpus for chunks that contain the keywords "natural language processing," "NLP," or "machine translation."
- The more modern approach would be to use `embeddings similarity` measures to find relevant chunks within a corpus. Embeddings similarity measures allow us to measure the similarity between two chunks by comparing their embedding vectors. For example, we could use the `cosine similarity` measure to compare the embedding vectors of two words "king" and "emperor". The cosine similarity measure would return a value between 0 and 1, where a higher value indicates a greater similarity between the two chunks. We can also use `L2` distance as a measure (`euclidean distance`)



# Embeddings

To use embeddings similarity measures to find relevant chunks within a corpus, we first need to generate embeddings for all of the chunks in the corpus. This can be done using a pre-trained embedding model, such as `Word2Vec`, `GloVe`, `BERT` etc. 

Once we have generated embeddings for all of the chunks, we can then use a similarity measure to compare the embedding vectors of any chunk to the remaining corpus. More directly, we can generate the embeddings for the question itself and measure its similarity of to our copus of chunks. The chunks with the highest similarity scores are the most relevant to the question prompt. 

## Hugging Face  🤗 

The latest in the field of embeddings and sentence similarity matching are models such as `BERT` and there are new ones that are popping up every day.  🤗  Transformers library provides a number of pre-trained tokenizer models, including `BERT`, `RoBERTa`, `DistilBERT`, `ALBERT`, `XLNet`, and `T5` which have been trained on large datasets to help perform NLP tasks more accurately.

All of these models return embeddings, or vectors, of specified dimensions for every `token` it processes. In essence for every chunk multiple embeddings would be generated. Since we are more interested in the similarity between `chunk`s and not `token`s, these embeddings would then have combined using techniques like `mean pooling` or `max pooling`, accounting for `padding`, similar to computer vision problems.



## Sentence Transformers
However, the easier way would be to leverage the `sentence-transformers` framework on  🤗 , which would would do a lot of the above work for us. Looking at the top downloads on the `sentence-transformers` page, it seems like the most popular model is [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)

![title](img/TopSentenceTransformers.png)


To help speed up the process I am going to place the model on a GPU

In [17]:
import torch
from sentence_transformers import SentenceTransformer
device = torch.device("cuda")

STmodel = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
STmodel.to(device)

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
)

## Corpus Embeddings

### Test Case
Our corpus is a list `langchain` Document objects. Each document has 2 parts:
- page_content: the text section of the chunk
- metadata: citation


We are primarily going to rely on the `page_content` section of a document. However, the metadata section can be useful to keep around for citations, and can another data point to find a match on. Lets append the texts together and test on the same random chunk as before

In [18]:
def createChunkDocInfo(chunk):
    return chunk.page_content +' \nMedata: '+ str(chunk.metadata)

createChunkDocInfo(randomchunk)

"was by good luck, for I am sure she did not listen. I was sometimes\n     quite provoked; but then I recollected my dear Elizabeth and Jane,\n     and for their sakes had patience with her. Mr. Darcy was punctual\n     in his return, and, as Lydia imformed you, attended the wedding. He\n     dined with us the next day, and was to leave town again on\n     Wednesday or Thursday. Will you be very angry with me, my dear\n     Lizzy, if I take this opportunity of saying (what I was never bold\n     enough to say before) how much I like him? His behaviour to us has,\n     in every respect, been as pleasing as when we were in Derbyshire.\n     His understanding and opinions all please me; he wants nothing but\n     a little more liveliness, and _that_, if he marry _prudently_, his\n     wife may teach him. I thought him very sly; he hardly ever\n     mentioned your name. But slyness seems the fashion. Pray forgive\n     me, if I have been very presuming, or at least do not punish me so \nMe

In [19]:
def getEmbeddings(model, listOfText):
    return model.encode(listOfText)

In [20]:
embeddings = getEmbeddings(STmodel, createChunkDocInfo(randomchunk))
print(embeddings[:100]) #Only creating the first 100 values of the embeddings vector

[ 1.79663170e-02  1.14518935e-02 -2.46319156e-02  7.18887374e-02
 -3.33716646e-02 -1.29302498e-02 -1.84203815e-02  5.69653362e-02
  3.54221798e-02 -1.70748737e-02 -7.93525949e-03 -3.31766382e-02
  3.05590569e-03 -5.00677414e-02 -5.41818514e-03  1.09316520e-02
  2.73657981e-02  4.52942308e-03  4.29288223e-02 -6.39499724e-03
  6.42575473e-02  3.71954404e-03  2.06486471e-02 -1.61153916e-02
  2.64957491e-02  2.37572044e-02  1.81206397e-03  7.30221719e-02
  7.53033301e-03 -1.93524174e-02  1.16033228e-02  9.86518897e-03
 -6.01424091e-02 -2.88560446e-02  2.28276440e-06 -7.92555790e-03
 -1.77070927e-02 -1.91813447e-02 -1.49291465e-02  1.20882848e-02
  7.28819817e-02 -1.79723240e-02  2.43832693e-02 -7.76596228e-03
 -2.64713038e-02  1.28358090e-02 -1.74699295e-02  3.36028822e-02
  3.23083140e-02  5.69985770e-02  1.19521702e-02 -2.66817063e-02
 -5.29866957e-04 -4.72773835e-02  1.27919778e-01 -2.15720888e-02
 -1.89321227e-02 -6.92544058e-02 -4.09197137e-02  4.42662574e-02
  3.18731964e-02 -1.68058

In [21]:
# printing length of first element of embeddings
embeddings.shape

(768,)

Each embedding is a vector that is 768 elements long. The first 100 elements look like the print out above.\
Now to create embeddings for our entire corpus

In [22]:
# creating a list of strings using the chunks of lanchain documents
sentences = [createChunkDocInfo(chunk) for chunk in chunks]

#creating embeddings for the resulting sentences
embeddings = getEmbeddings(STmodel,sentences)

In [23]:
embeddings.shape

(5161, 768)

# FAISS

> [FAISS](https://ai.meta.com/tools/faiss/#:~:text=FAISS%20(Facebook%20AI%20Similarity%20Search,more%20scalable%20similarity%20search%20functions) (Facebook AI Similarity Search) is a library that allows developers to quickly search for embeddings of multimedia documents that are similar to each other. It solves limitations of traditional query search engines that are optimized for hash-based searches, and provides more scalable similarity search functions.\
> *-- FAISS page*

`FAISS` effectively can act like a local vector DB to help store all our embeddings and efficiently retrieve similar embeddings and consequently, semantically similar sentences

An example on how to implement FAISS using  🤗  transformers can be found [here](https://huggingface.co/learn/nlp-course/chapter5/6?fw=pt)

In [24]:
import faiss
from faiss import write_index, read_index

If you're running the indexing the first time, the `FAISS` model will have to be run and depending on how large the corpus is and how it has been chunked can take a while. However, on future runs, the index can be directly loaded from your local hard drive if we write it to our local machine.

The following cell tries to load the `FAISS` index from your local hard drive, and failing to do so, runs the indexing model.

In [27]:
try:
    index = read_index("faiss_index/FAISS_Embeddings.index")
    print("FAISS index successfully loaded from local machine")
except:
    res = faiss.StandardGpuResources()
    
    print("Creating FAISS index")
    # build a flat (CPU) index
    index_flat = faiss.IndexFlatL2(embeddings.shape[1])
    # make it into a gpu index
    gpu_index_flat = faiss.index_cpu_to_gpu(res, 0, index_flat)
    
    gpu_index_flat.add(embeddings)         # add vectors to the index

    print("Writing index to local machine.")
    write_index(index_flat, "faiss_index/large.index")

Creating FAISS index
Writing index to local machine.


### Sample Query
We'll use the same function as before to generate embeddings for potential queries. Transformer models are optimized to work well with vectors, and our function leverages that. As a result we can encode multiple queries simultaneously, and even if we have a singular query we should enclose it in a list.

Lets try to do a semantic search on a few sample queries.

In [29]:
queries = ["where does the story of frakenstein take place?","where does Alice in Wonderland take place?"]
queriesEmb = getEmbeddings(STmodel,queries)

queriesEmb.shape

(2, 768)

The search function in `FAISS` is a generic function that can be used to search for the nearest neighbors of a query vector in a given index. The function takes the following arguments:
- query: The query vector.
- k: The number of nearest neighbors to return.

and returns:
- distances: A pre-allocated buffer to store the distances to the nearest neighbors.
- indexes: A pre-allocated buffer to store the labels of the nearest neighbors.

In [62]:
numNeighbors = 5                          
Distances, Indexes = gpu_index_flat.search(x=queriesEmb, k=numNeighbors)  

print('-'*50)
print(f"Distance of neghbors to query:\n {Distances[:5]}")      
print('-'*50)             
print(f"Index of neghbors:\n{Indexes[:5]}")                   
 
print('-'*50)
print('Note: There were 2 queries')
print(queries)

--------------------------------------------------
Distance of neghbors to query:
 [[1.0184469  1.0542233  1.082648   1.0858334  1.090111  ]
 [0.67878747 0.7000439  0.81062174 0.9521235  0.96657526]]
--------------------------------------------------
Index of neghbors:
[[1659 1836 1990 1604 1985]
 [ 179    0    1  178  107]]
--------------------------------------------------
Note: There were 2 queries
['where does the story of frakenstein take place?', 'where does Alice in Wonderland take place?']


In [63]:
c = chunks[idx]
printChunkInfo(c)

Printing chunk (First 300 chars and the metadata)
----------------------------------------------------------------------------------------------------
Page Content:
I used my knowledge of this phase of spiritual pathology, and laid down
a rule that she should not be present with Lucy or think of her illness
more than was absolutely required. She assented readily, so readily that
I saw again the hand of Nature fighting for life. Van Helsing and I were
shown up t
----------------------------------------------------------------------------------------------------
MetaData:
{
  file location: data
  file type: txt
  file title: Dracula
}


In [70]:
for q_num,q_str in enumerate(queries):
    print('*'*100)
    print(f"For the query:    '{q_str}':")
    print('*'*100)
    print('\n')
    for idx in Indexes[:5][q_num]:
        print(f"Chunk Index: {idx}")
        printChunkInfo(chunks[idx])
        print('\n')
        print('='*100)

****************************************************************************************************
For the query:    'where does the story of frakenstein take place?':
****************************************************************************************************


Chunk Index: 1659
Printing chunk (First 300 chars and the metadata)
----------------------------------------------------------------------------------------------------
Page Content:
I am already far north of London, and as I walk in the streets of
Petersburgh, I feel a cold northern breeze play upon my cheeks, which
braces my nerves and fills me with delight. Do you understand this
feeling? This breeze, which has travelled from the regions towards
which I am advancing, gives me
----------------------------------------------------------------------------------------------------
MetaData:
{
  file location: data
  file type: txt
  file title: Frankenstein
}


Chunk Index: 1836
Printing chunk (First 300 chars and the me

Looks like our semantic search is working reasonably well. Although clearly some errors are being made. For eg:
- The first query was in regards to `Frankenstein`'s story, instead it pulled information from `Dracula`. 
- For the second query, although all queries are from `Alice in Wonderland` per the query request, the top chunk has more to do with metadata about the printing of the book that was part of the text file.

The model is probably relying fairly heavily on the tile in the metadata. If the filenames were completely random, or named using some kind of hashing, it may struggle to find relevant chunks. 

We'll see whether the LLM is able to distinguish the relevance, between the chunks to see whether it realizes what are passages from the story and what isn't.

# LLM  

I am going to use `OpenAI`s LLM models, but even there you have a few differen't choices. `Da-Vinci` can help keep your costs down. I am going to opt for the ChatGPT 3.5 model because it is more performant. One can even 

For my purposes, I am going to pass a query and a list of relevant docs for the model to create responses. For this step I'd rather have a more precise and succint model, while utitlizing a more descriptive and creative model to summarize the final answer. So even though I have the same end point (`gpt-3.5-turbo`), I create 2 seperate rulesets for the model.

Both of them are instructed to answer based on only the information provided.

Note: Until this step, every all work has been performed on the local hardware, and no costs were incurred. API calls to the `OpenAI` will result in cost based on the number of tokens passed as part of the API call. A valid API key will also be necessary which can be created at the [OpenAI platform website](https://platform.openai.com/)

In [72]:
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.schema.messages import HumanMessage, SystemMessage

In [73]:
# Load API Key
import os
from dotenv import load_dotenv

load_dotenv()

openai_api_key = os.environ['API_KEY']

In [97]:
chatllm_precise = ChatOpenAI(
    temperature=0.0, 
    model="gpt-3.5-turbo",
    openai_api_key=openai_api_key
)
chatllm_descriptive = ChatOpenAI(
    temperature=0.7, 
    model="gpt-3.5-turbo", 
    openai_api_key=openai_api_key
)


def queryLLM(query, 
             excerpt, 
             llm=chatllm_precise):    
    """
    Queries a large language model (LLM) to answer a question using an excerpt of text.

    Args:
    query: The question to be answered.
    excerpt: The excerpt of text to use to answer the question.
        llm: The ChatOpenAI object to use.

    Returns:
    The LLM's response to the query.
    """
    assistantRules = ("Answer the question using only the excerpt provided. Do not make up information. Be precise and succint. "+
    "Do not include the question. Do not include any sort of introduction to your response. Do not include an introduction in your response. Do not mention the excerpt.")
    
    msg = f"Q: {query}" + \
          f" Excerpt: {excerpt}"
    
    messages = [
        SystemMessage(role = 'assistant',
                      content=assistantRules), 
        HumanMessage(content=msg)
    ]
    
    return llm(messages)

def summarizeReponsesLLM(query, 
                         responses, 
                         llm=chatllm_descriptive):
    """
    Summarizes the given responses using the given LLM model.

    Args:
        query: The query that was asked.
        responses: A list of response objects.
        llm: The ChatOpenAI object to use.

    Returns:
        A summary of the given responses.
    """
    
    concat_responses = ' '.join([response.content for response in responses])
    
    assistantRules = ("Answer the question by summarizing the responses. Do not make up information."+
    "Do not include the question. Do not include any sort of introduction to your response. Do not include an introduction in your response. " +
    "Do not mention excerpts that provide no information") 
    
    
    msg = f"Q: {query}" + \
          f" Responses: {concat_responses}"
    
    messages = [
        SystemMessage(role = 'assistant',
                      content=assistantRules), 
        HumanMessage(content=msg)
    ]
    
    return llm(messages)

### Test API call

In [98]:
queryLLM(query = "whats the temperature today in Dallas?",
         excerpt= "A warm weekend is expected. Highs today will be mainly in the 90s with 80s on Sunday. \
         A pattern shift will occur next week resulting in slightly cooler high temps along with scattered showers and thunderstorms each day. Severe weather is not expected")

AIMessage(content='The temperature today in Dallas is expected to be mainly in the 90s.')

In [99]:
queryLLM(query = "whats the temperature today in Dallas?",
         excerpt= "")

AIMessage(content='The excerpt does not provide any information about the temperature in Dallas.')

In [100]:
queryLLM(query = "whats the temperature today in Dallas?",
         excerpt= "Dallas (/ˈdæləs/) is a city in Texas and the most populous in the Dallas–Fort Worth metroplex, the fourth-largest metropolitan area in the United States at 7.5 million people.\
         It is the most populous city in and seat of Dallas County with portions extending into Collin, Denton, Kaufman, and Rockwall counties.")

AIMessage(content='The excerpt does not provide information about the temperature in Dallas.')

Ok. Looks like our API call functions are working as intended

### Generate Responses

In [88]:
queries

['where does the story of frakenstein take place?',
 'where does Alice in Wonderland take place?']

In [115]:
def extractRelevantChunksByIndex(chunks, indexes):
    extracted_chunks = []
    for index in indexes:
        chunkInfo = createChunkDocInfo(chunks[index])
        extracted_chunks.append(chunkInfo)

    return extracted_chunks


def getMostRelevantChunks_perQuery( queries, 
                                    numNeighbors, 
                                    chunks,
                                    faiss_index = gpu_index_flat,
                                    sentenceTransformer = STmodel):
    
    queriesEmb = getEmbeddings(sentenceTransformer,queries)    
    Distances, mostRelevantChunkIndexes_perQuery = faiss_index.search(queriesEmb,
                                                                k = numNeighbors) 
        
    mostRelevantChunks_perQuery = [extractRelevantChunksByIndex(chunks, indexes)
                                   for indexes in mostRelevantChunkIndexes_perQuery]
        
    return queries, queriesEmb, mostRelevantChunks_perQuery

In [119]:
from collections import defaultdict

def queryAndSummarize(queries, 
                      chunks = chunks,
                      numNeighbors=5):
    responses_perQuery = defaultdict(list)
    Results = {}
    
    queries, queriesEmb, relevantChunks_perQuery = getMostRelevantChunks_perQuery(queries,
                                                                                 numNeighbors = numNeighbors,
                                                                                 chunks = chunks)

    for query, relevantChunks in zip(queries, relevantChunks_perQuery):
        for chunk in relevantChunks:
            response = queryLLM(query, chunk)
            responses_perQuery[query].append(response)

        finalAnswer = summarizeReponsesLLM(query=query,
                                        responses=responses_perQuery[query]).content

        Results[query] = {
            'Responses': responses_perQuery[query],
            'Summary':   finalAnswer
        }
        
    return Results

In [120]:
responses = queryAndSummarize(queries)

In [127]:
def pprintResponses(responses):
    for k,v in responses.items():
        print(f"Q: {k}")
        print(f"A: {v['Summary']}")

In [128]:
pprintResponses(responses)

Q: Which families did Romeo and Juliet belong to?
A: Romeo belonged to the Montague family and Juliet belonged to the Capulet family.
Q: Who were some of the victims of Dracula?
A: The responses do not provide any information about the victims of Dracula.
Q: Who were Alice's friends in Alice in Wonderland?
A: Alice's friends in Alice in Wonderland include the ten soldiers, the ten courtiers, the ten royal children, the guests (including the White Rabbit), and the Knave of Hearts.


Lets try it with different questions

In [123]:
queries=[
    "Which families did Romeo and Juliet belong to?",
    "Who were some of the victims of Dracula?",
    "Who were Alice's friends in Alice in Wonderland?"
]


responses = queryAndSummarize(queries,numNeighbors=10)

pprintResponses(responses)

Q: Which families did Romeo and Juliet belong to?
A: Romeo belonged to the Montague family and Juliet belonged to the Capulet family.
Q: Who were some of the victims of Dracula?
A: The responses do not provide any information about the victims of Dracula.
Q: Who were Alice's friends in Alice in Wonderland?
A: Alice's friends in Alice in Wonderland include the ten soldiers, the ten courtiers, the ten royal children, the guests (including the White Rabbit), and the Knave of Hearts.


Although this results in reasonably good answers, but it clearly has limitations. 

Our embedding semantic search may not be able to help identify who Alice's friends are, the Cheshire Cat and the Mad Hatter don't make an appearance, but the Knave of Hearts who is an enemy does.

Similarly, our Dracula victims couldn't be identified.

In [129]:
queries=[
    "Describe the castle of Dracula.",
    "Why did Dorian Gray want his portrait made?",
]


responses = queryAndSummarize(queries,numNeighbors=10)

pprintResponses(responses)

Q: Describe the castle of Dracula.
A: The Castle Dracula is located in the Carpathian mountains and is described as being on the edge of a precipice with a view of green treetops and deep rifts. It is described as having doors everywhere that are locked and bolted, with the only available exits being the windows. The castle is old and big, with broken walls and cold wind. The narrator explores various stairs, passages, and doors, finding one that is not locked. The castle is situated deep under a hill and can be seen in all its grandeur from a distance. The Count of Dracula is seen crawling down the castle wall with his cloak spread out like wings. The narrator also sees what appears to be the evil face of Count Dracula in the shadows of a dark passage. The landlord receives a letter from Count Dracula instructing him to secure the best place on the coach for the narrator, but refuses to provide details or speak about Count Dracula and his castle.
Q: Why did Dorian Gray want his portra

# References and Citations:
- [Understanding Neural Network Embeddings](https://towardsdatascience.com/understanding-neural-network-embeddings-851e94bc53d2) - Frank Liu
- [FAISS](https://github.com/facebookresearch/faiss/)
- [Hugging Face](https://huggingface.co/)
    - [Sentence Transformers](https://huggingface.co/sentence-transformers)