# Memory and Vector Database

## Copy from day 1 - 05-example-entity-extraction.ipynb

In [45]:
document = '<document>'
template_prompt=f'''Extract key pieces of information from this regulation document.
If a particular piece of information is not present, output \"Not specified\".
When you extract a key piece of information, include the closest page number.
Use the following format:\n0. What's the name of the company?\n1. Who are the founders of the company?\n2. When is it founded?\n3. When did it raise series A?\nDocument: \"\"\"{document}\"\"\"\n\n0.'''
print(template_prompt)

Extract key pieces of information from this regulation document.
If a particular piece of information is not present, output "Not specified".
When you extract a key piece of information, include the closest page number.
Use the following format:
0. What's the name of the company?
1. Who are the founders of the company?
2. When is it founded?
3. When did it raise series A?
Document: """<document>"""

0.


In [46]:
select_document = """
Overview:
Tech Solutions Inc. is a leading technology consulting firm specializing in providing innovative solutions to businesses across various industries. We offer a comprehensive range of services including software development, IT consulting, project management, and cybersecurity solutions. 
With a strong focus on delivering exceptional quality and customer satisfaction, we have established ourselves as a trusted partner for organizations seeking digital transformation.
They were founded on April 12, 2005 and raise their first seed in 2007 and series A on May 25, 2007.

Founders:

Background: John Smith is a visionary entrepreneur with over 20 years of experience in the technology industry. He has a deep understanding of market trends and has successfully led several software development projects for multinational corporations.
Role in the Company: As a co-founder of Tech Solutions Inc., John Smith plays a pivotal role in shaping the company's strategic direction. His expertise in software development and leadership skills have been instrumental in driving the company's growth.
Sarah Johnson:

Background: Sarah Johnson is a highly accomplished technologist with a strong background in software engineering. She has extensive experience in managing complex IT projects and has a proven track record of delivering innovative solutions.
Role in the Company: As a co-founder of Tech Solutions Inc., Sarah Johnson leads the company's technical operations. Her deep knowledge of software engineering principles and commitment to excellence have been crucial in establishing the company as a leader in the industry.
Together, John Smith and Sarah Johnson founded Tech Solutions Inc. with the aim of providing cutting-edge technology solutions to help businesses thrive in the digital age. Their combined expertise and passion for innovation have been instrumental in the company's success.
"""


### Lets add some nonesense

In [47]:
random_text = """
There once lived an old man and an old woman who were peasants and had to work hard to earn their daily bread. The old man used to go to fix fences and do other odd jobs for the farmers around, and while he was gone the old woman, his wife, did the work of the house and worked in their own little plot of land.
It was always the Monday mornings. It never seemed to happen on Tuesday morning, Wednesday morning, or any other morning during the week. But it happened every Monday morning like clockwork. He mentally prepared himself to once again deal with what was about to happen, but this time he also placed a knife in his pocket just in case.
There was nothing to indicate Nancy was going to change the world. She looked like an average girl going to an average high school. It was the fact that everything about her seemed average that would end up becoming her superpower.
He wondered if he should disclose the truth to his friends. It would be a risky move. Yes, the truth would make things a lot easier if they all stayed on the same page, but the truth might fracture the group leaving everything in even more of a mess than it was not telling the truth. It was time to decide which way to go.
He had three simple rules by which he lived. The first was to never eat blue food. There was nothing in nature that was edible that was blue. People often asked about blueberries, but everyone knows those are actually purple. He understood it was one of the stranger rules to live by, but it had served him well thus far in the 50+ years of his life.
He sat staring at the person in the train stopped at the station going in the opposite direction. She sat staring ahead, never noticing that she was being watched. Both trains began to move and he knew that in another timeline or in another universe, they had been happy together.
Do you think you're living an ordinary life? You are so mistaken it's difficult to even explain. The mere fact that you exist makes you extraordinary. The odds of you existing are less than winning the lottery, but here you are. Are you going to let this extraordinary opportunity pass?
The wave crashed and hit the sandcastle head-on. The sandcastle began to melt under the waves force and as the wave receded, half the sandcastle was gone. The next wave hit, not quite as strong, but still managed to cover the remains of the sandcastle and take more of it away. The third wave, a big one, crashed over the sandcastle completely covering and engulfing it. When it receded, there was no trace the sandcastle ever existed and hours of hard work disappeared forever.
There are different types of secrets. She had held onto plenty of them during her life, but this one was different. She found herself holding onto the worst type. It was the type of secret that could gnaw away at your insides if you didn't tell someone about it, but it could end up getting you killed if you did.
Welcome to my world. You will be greeted by the unexpected here and your mind will be challenged and expanded in ways that you never thought possible. That is if you are able to survive...
"""

select_document += random_text

## Concept: Memory and Vector Search
One very common problem with LLM applications is that we often want to use them to parse through large amounts of text and do things like summarize, search or combine the information in these large context.

While context size in models keep expanding, this is presently a large limitation for these models, especially when you want to deploy them at scale.

There are many ways to address this, but most are related to something close to adding some form of memory or search to these models.

Here is a link to a more [comprehensive list of these techniques](TODO).
Here, we'll demonstrate a simple way to use LLM embeddings with a vector database to achieve this.

In [48]:
from typing import List, Union
import openai
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

class TextSplitter:
    def __init__(self, max_tokens_in_chunk: int = 2000, chars_per_token: int = 4, separator="\n") -> None:
        """ Splits a text input into chunks, staying underneath the max_tokens_in_chunk.

        Args:
            max_tokens_in_chunk (int, optional): Max number of token to aim for in chunks. Defaults to 2000.
            chars_per_token (int, optional): The approximate number of characters per token. Defaults to 4.
            separator (str, optional): A preferred separator for chunks. Defaults to "\n".
        """
       
        self.max_tokens_in_chunk = max_tokens_in_chunk
        self.chars_per_token = chars_per_token
        self.separator = separator
        chunk_size = int(max_tokens_in_chunk / chars_per_token)
        
        self.text_splitter = CharacterTextSplitter(
            separator="\n",
            chunk_size=chunk_size,
            chunk_overlap=int(chunk_size * 0.2),
            length_function=len,
        )
    
    def __call__(self, text: Union[str, List[str]]) -> List[str]:
        if isinstance(text, str):
            text = [text]
        return self.text_splitter.create_documents(text)
        
class DocRetrival:
    def __init__(self, docs: List[str], k: int = 3):
        """
        A wrapper for FAISS that allows for easy document retrieval.
        
        Args:
            docs (List[str]): A list of documents to index.
            k (int, optional): The number of documents to return. Defaults to 3.
        """
        embeddings = OpenAIEmbeddings()
        self.db = FAISS.from_documents(docs, embeddings)
        self.retriever = self.db.as_retriever(search_kwargs=dict(k=k))
    
    def __call__(self,  search_string: str) -> List[str]:
        """
        Returns the top k documents that are most similar to the search string.
        
        Args:
            search_string (str): The search string.
        """
        return self.retriever.get_relevant_documents(search_string)



In [49]:
# Initialize the splitter
splitter = TextSplitter()

# Split large document into chunks
docs = splitter(select_document)
print(f'Num Docs: {len(docs)}\nTotal Chars: {sum([len(doc.page_content) for doc in docs])}')

# Initialize the retriever
retriver = DocRetrival(docs, k=2)

# Get relevant documents
docs = retriver(template_prompt)
print(f'Num Docs: {len(docs)}\nTotal Chars: {sum([len(doc.page_content) for doc in docs])}')

context = '\n'.join([doc.page_content for doc in docs])


Num Docs: 16
Total Chars: 5006
Num Docs: 2
Total Chars: 631


In [50]:
# Froms 05-example-entity-extraction.ipynb
prompt = template_prompt.replace('<document>', context)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt},
]

chatbot_response = openai.ChatCompletion.create(
  model="gpt-4",
  messages=messages,
  temperature=0,
  max_tokens=1500,
)

print("0. " + chatbot_response.choices[0].message["content"])

0. Tech Solutions Inc.
1. John Smith and Sarah Johnson
2. April 12, 2005
3. May 25, 2007
