# Q/A over documents

Example to show how document stores can be used with llms to Q/A or generate answers based on specific knowledge, in this example the documents are loaded from: https://12factor.net/

In [26]:
from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.prompts import StringPromptTemplate
from langchain import OpenAI, SerpAPIWrapper, LLMChain
from typing import List, Union
from langchain.schema import AgentAction, AgentFinish, HumanMessage
import re
import getpass
from langchain.document_loaders import TextLoader
import os
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

In [19]:
OPENAI_API_KEY = getpass.getpass('Key') 
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

Key ········


In [20]:
embeddings = OpenAIEmbeddings()

loader = TextLoader("12fa.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(
    separator = ".",
    chunk_size = 512,
    chunk_overlap  = 0,
    length_function = len,
)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)

Created a chunk of size 637, which is longer than the specified 512
Using embedded DuckDB without persistence: data will be transient


In [21]:
len(texts)

68

In [22]:
def format_chat(chat_history):
    formatted_history = ''
    for message in chat_history:
        formatted_history += f"{message['role']}: {message['content']} \n\n"
    formatted_history = formatted_history.strip().rstrip("\r\n")
    return formatted_history

In [23]:
from langchain.prompts import PromptTemplate
prompt_template = """
You are now Wyl, you can perform a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. 
Use the following pieces of context to respond to the user's question or statement at the end. 

{context}

If you don't know the answer from the context below, just say that you don't know, don't try to make up an answer.
Keep in mind the following chat history between the user and Wyl:

{chat_history}
ElfAi:
"""

question_prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "chat_history"]
)

In [24]:
llm = ChatOpenAI(temperature=0) #OpenAI(temperature=0, model_name="text-davinci-003")

llm_chain = LLMChain(prompt=question_prompt, llm=llm, verbose=False)
chat_history = []
question = ""

def respond(llm_chain, docsearch, question, chat_history):
    docs = docsearch.similarity_search_with_score(question, 3)
    context = "\n".join([doc[0].page_content.replace("\n", " ") for doc in docs])
    
    chat_history.append({"role": "User", "content": question})
    formatted_history = format_chat(chat_history)

    output = llm_chain.predict(context=context, chat_history=formatted_history)
    chat_history.append({"role": "Wyl", "content": output})
    return chat_history

In [25]:
chat_history = respond(llm_chain, docsearch, "Is hardcoding config in code ok?", chat_history)
print(format_chat(chat_history))

User: Is hardcoding config in code ok? 

Wyl: Hardcoding config in code is not recommended according to the twelve-factor methodology, which emphasizes strict separation of config from code. Config should be factored out of the codebase and stored in environment variables for better management and security. However, there are other approaches to config such as using config files that are not checked into revision control.


In [12]:
chat_history = respond(llm_chain, docsearch, "What are the benefits of using enviroment variables as opposed to hardcoding?", chat_history)
print(format_chat(chat_history))

User: Is hardcoding config in code ok? 
Wyl: No, hardcoding config in code is not recommended according to the twelve-factor app methodology. This methodology recommends storing config in environment variables, as this is more secure and easier to manage. 
User: What are the benefits of using enviroment variables as opposed to hardcoding? 
Wyl: The main benefits of using environment variables as opposed to hardcoding config are that they are easy to change between deploys without changing any code, there is little chance of them being checked into the code repo accidentally, and they are a language- and OS-agnostic standard. Additionally, env vars are granular controls, each fully orthogonal to other env vars, and they are never grouped together as “environments”, making it easier to manage all the config in one place.


In [13]:
chat_history = respond(llm_chain, docsearch, "I think we could just run with a stateful service, Wyl what do you think? ", chat_history)
print(format_chat(chat_history))

User: Is hardcoding config in code ok? 
Wyl: No, hardcoding config in code is not recommended according to the twelve-factor app methodology. This methodology recommends storing config in environment variables, as this is more secure and easier to manage. 
User: What are the benefits of using enviroment variables as opposed to hardcoding? 
Wyl: The main benefits of using environment variables as opposed to hardcoding config are that they are easy to change between deploys without changing any code, there is little chance of them being checked into the code repo accidentally, and they are a language- and OS-agnostic standard. Additionally, env vars are granular controls, each fully orthogonal to other env vars, and they are never grouped together as “environments”, making it easier to manage all the config in one place. 
User: I think we could just run with a stateful service, Wyl what do you think?  
Wyl: Using a stateful service is a good option for storing session state data. It offe

In [14]:
chat_history = respond(llm_chain, docsearch, "Write some example python code on how we can externalise config", chat_history)
print(format_chat(chat_history))

User: Is hardcoding config in code ok? 
Wyl: No, hardcoding config in code is not recommended according to the twelve-factor app methodology. This methodology recommends storing config in environment variables, as this is more secure and easier to manage. 
User: What are the benefits of using enviroment variables as opposed to hardcoding? 
Wyl: The main benefits of using environment variables as opposed to hardcoding config are that they are easy to change between deploys without changing any code, there is little chance of them being checked into the code repo accidentally, and they are a language- and OS-agnostic standard. Additionally, env vars are granular controls, each fully orthogonal to other env vars, and they are never grouped together as “environments”, making it easier to manage all the config in one place. 
User: I think we could just run with a stateful service, Wyl what do you think?  
Wyl: Using a stateful service is a good option for storing session state data. It offe