## Imports

Embeddings = turning text to vectors (a way to compress text into something more easily searchable)
FAISS = FB's vector database
VectorDBQA = class that takes a pre-trained language model and a vector database and does Q&A stuff
get_openai_callback = a way to get token info (aka OpenAI 'calls back' what it used)
pickle is for saving and opening the vector database (so i dont have to calculate it for every question)

In [3]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms.loading import load_llm
from langchain.vectorstores.faiss import FAISS
import os
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA
from langchain.chains.question_answering import load_qa_chain
from langchain.callbacks import get_openai_callback
from langchain.text_splitter import RecursiveCharacterTextSplitter
import pickle

openai_api_key="API_KEY"

## Taking Text and turning it into a vector DB (and saving that to a file)

In [5]:
dir_path = '/Users/sommohapatra/Desktop/AI Stuff/ETF_Dictionary/text files'
files = [f for f in os.listdir(dir_path) if f.endswith('.txt')]
file_contents = []

# Aggregate the contents of all txt files into a single string
aggregated_text = ""
for file in files:
    with open(os.path.join(dir_path, file), "r") as f:
        file_contents.append(f.read())

metadatas = [{"title": f} for f in files if f.endswith('.txt')]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.create_documents([f for f in file_contents], metadatas=metadatas)

embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

docsearch = FAISS.from_documents(documents, embeddings)

with open("vectordb.pkl", 'wb') as f:
    pickle.dump(docsearch, f)


## Starting the model

Chain types: stuff, map_reduce, map_rerank, refine.

Not sure which is best for what yet. Performance for question-answering seems to be: stuff, refine, map_reduce, map_rerank

Potential use-cases (GPT-generated, so take with grain of salt):
- *Stuff* is designed to concatenate multiple texts together, rather than generating new text (it just stuffs all the related data into the prompt for the LLM to handle). Simple and brute-force (goes over LLM limit sometimes)

- *Map-Reduce* is for large datasets. It takes two steps, the first generating intermediate outputs (aka applying the prompt/query to each chunk - calling LLM each time), and the second reducing those to a single one

- *Refine* is iterative - taking one input chunk and applying the prompt, then doing so for the next input chunk and asking the LLM to refine. Well-suited for datasets that require a high level of precision and control. Takes more time and involves more LLM calls than *stuff* chain

- *Map-Rerank* runs the prompt/query on each chunk, then scores the answer from the chunk on certainty. Returns the most certain response. Can't combine information between documents though.

In [3]:
llm = load_llm("llm.json")

qa_chain = load_qa_chain(llm, chain_type="refine")

with open("vectordb.pkl", 'rb') as f:
    docsearch = pickle.load(f)
    
qa = VectorDBQA(combine_documents_chain=qa_chain, vectorstore=docsearch, return_source_documents=True)

## Querying the Question-Answering model

In [None]:
template = "Answer with as much detail as possible. If you do not know the answer, say 'I don't know':"
query = f"{template} In bullet point format, what are all the costs of launching an ETF?"
result = qa({"query": query})
answer = result['result']
print(answer)

## Get source documents for answer

In [23]:
result['source_documents']

[Document(page_content='We get the following question at least 1x a day: “How do I start an ETF?“\nThe good news is we can help you access the ETF market with a low-cost high-quality service offering that spans the complexity spectrum from standard ETF launches to highly complex tax-free conversion transactions.\nThe bad news is launching an ETF is an inherently challenging task with a lot of moving parts. However, this post is meant to be a resource that will help you become an educated consumer that has the capacity to make an informed decision.\nAs many readers are aware, we are active in the so-called “ETF white-label” business via ETF Architect. We hope this is helpful and if there are additional questions/thoughts you’d like us to add — contact us — we’ll add them to the list.(1)\nYou want to start an ETF…are you sure that’s a good idea?\nWe have a good assortment of materials that address most questions we hear regarding ETF formations:\n- An Introduction to ETF White Label Serv

## Generating Journal Entries from Vector DB

In [53]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

with open("vectordb.pkl", 'rb') as f:
    search_index = pickle.load(f)

prompt_template = """Use the context below to write a 100 word episode script with the following general plot:
    Context: {context}
    Plot: {plot}
    Script:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "plot"]
)

llm = load_llm("llm_story.json")

chain = LLMChain(llm=llm, prompt=PROMPT)

def generate_script(plot):
    docs = search_index.similarity_search(plot, k=1)
    inputs = [{"context": doc.page_content, "plot": plot} for doc in docs]
    return chain.apply(inputs)


script = generate_script("Finn and Jake find themselves in the middle of the Civil War.")
script = script[0]['text'].split("\n")
print(*script, sep="\n")


[Scene opens with Finn and Jake walking through a forest. They come to a clearing and see a battlefield in the distance.]
Finn: Whoa! This is it!
Jake: Yeah, looks like it.
[They approach the battlefield and see two armies facing each other.]
Finn: So this is the Civil War, huh?
Jake: Sure looks like it.
[They approach the front lines of the armies. A soldier notices them and runs up to them.]
Soldier: Hey, what are you two doing here?
Finn: We're just here to watch.
Soldier: Watch? This isn't a show! This is a real war!
Jake: We know. We just want to observe.
[The soldier looks to his commanding officer for direction. The officer nods, and the soldier steps aside.]
Soldier: Alright. But you better stay out of the way.
[Finn and Jake move to the side of the battlefield and watch as the armies charge each other. The battle is fierce and chaotic. Finn and Jake watch in awe as the armies clash. After what seems like an eternity, the battle is over. The Union army emerges victorious.]
Fin