## Imports

Embeddings = turning text to vectors (a way to compress text into something more easily searchable)
FAISS = FB's vector database
VectorDBQA = class that takes a pre-trained language model and a vector database and does Q&A stuff
get_openai_callback = a way to get token info (aka OpenAI 'calls back' what it used)
pickle is for saving and opening the vector database (so i dont have to calculate it for every question)

In [10]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms.loading import load_llm
from langchain.vectorstores.faiss import FAISS
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA
from langchain.chains.question_answering import load_qa_chain
from langchain.callbacks import get_openai_callback
from langchain.text_splitter import RecursiveCharacterTextSplitter
import pickle

openai_api_key="API_KEY"

## Starting the model

Chain types: stuff, map_reduce, map_rerank, refine.

Not sure which is best for what yet. Performance for question-answering seems to be: stuff, refine, map_reduce, map_rerank

Potential use-cases (GPT-generated, so take with grain of salt):
- *Stuff* is designed to concatenate multiple texts together, rather than generating new text (it just stuffs all the related data into the prompt for the LLM to handle). Simple and brute-force (goes over LLM limit sometimes)

- *Map-Reduce* is for large datasets. It takes two steps, the first generating intermediate outputs (aka applying the prompt/query to each chunk - calling LLM each time), and the second reducing those to a single one

- *Refine* is iterative - taking one input chunk and applying the prompt, then doing so for the next input chunk and asking the LLM to refine. Well-suited for datasets that require a high level of precision and control. Takes more time and involves more LLM calls than *stuff* chain

- *Map-Rerank* runs the prompt/query on each chunk, then scores the answer from the chunk on certainty. Returns the most certain response. Can't combine information between documents though.

In [48]:
llm = load_llm("llm.json")

qa_chain = load_qa_chain(llm, chain_type="refine")

with open("vectordb.pkl", 'rb') as f:
    docsearch = pickle.load(f)
    
qa = VectorDBQA(combine_documents_chain=qa_chain, vectorstore=docsearch, return_source_documents=True)

## Setting up the Question-Answering model

In [49]:
template = "Answer with as much detail as possible. If you do not know the answer, say 'I don't know':"

## INPUT QUESTION HERE

In [52]:
question = "Is managing proxy voting a difficult process for advisers?"

query_setup = f"{template} {question}"
result = qa({"query": query_setup})
answer = result['result']
print(answer)


Yes, managing proxy voting can be a difficult process for advisers. Advisers must ensure that they are voting proxies in the best interest of their clients, which can be difficult to determine. They must also be aware of any potential conflicts of interest that may arise in connection with proxy voting. Additionally, advisers must be able to provide clients with a concise summary of their proxy voting process and offer to provide copies of the complete proxy voting policy and procedures upon request. Furthermore, advisers must be able to disclose to clients how they may obtain information regarding voted proxies. Additionally, advisers must be able to provide investors with adequate disclosure as to how final votes are aligned with any proxy advisor employed by their investment manager. This includes disclosing how often their final votes aligned with any proxy advisor they employed, as well as to disclose what percentage of proxy advisor recommendations were reviewed internally by an

## Get source documents for answer

In [41]:
result['source_documents']

[Document(page_content='Patrick: Yeah. Like, we did a proxy with a fund, to your point, Meb, it was brutal. But there’s a process here, there’s a cookbook now. And what I like about this industry is this cookbook was written in the ’90s, where everybody just paid lawyers and paid third parties and margins were much fatter, so no one looked at it. So, you have the way, “Things are supposed to be done.” And then you say, “Well, I know that doesn’t take that much time. I know this can be done with a computer, I know this part I can do myself, and you can get a proxy process that’s supposed to cost a quarter-million dollars, you can get that down to 80. That’s what I like about my job is you can find these things that there’s just so much fat on them. And with a little bit of wringing the towel, you can get it down to a more affordable.', lookup_str='', metadata={'title': 'all_things_etfs_meb_faber.txt'}, lookup_index=0),
 Document(page_content='Meb: The proxy is like the most antiquated. 