# [Open AI Langchain QA over Documents](https://python.langchain.com/docs/use_cases/question_answering/)

## The credentials.env file should contain at least the following:
```
AZURE_OPENAI_ENDPOINT     = "https://abcabcabcabc.openai.azure.com/"
AZURE_OPENAI_API_KEY      = "a92800000000000000a"
AZURE_OPENAI_API_VERSION  = "2023-05-15"
OPENAI_API_TYPE           = "azure"
```

## Save the following lines into **conda_langchain.yml**:
```
name: langchain

dependencies:
  - python=3.10.12
  - pip
  - pip:
    - langchain==0.0.334
    - openai==1.2.3
    - jupyter==1.0.0
    - python-dotenv==1.0.0
    - chromadb==0.4.17
    - tiktoken==0.5.1
    - azure-cosmos==4.5.1
```

### , then run:
```
conda env create -f conda_langchain.yml
conda activate langchain
jupyter kernelspec uninstall langchain
python -m ipykernel install --name langchain --user
jupyter kernelspec list
jupyter notebook
```

### notes for chromadb installation:
If you get an error please follow [this article](https://stackoverflow.com/questions/64261546/how-to-solve-error-microsoft-visual-c-14-0-or-greater-is-required-when-inst), or download [vs_buildtools.exe](https://visualstudio.microsoft.com/visual-cpp-build-tools/) and then run 
```
vs_buildtools.exe --norestart --passive --downloadThenInstall --includeRecommended --add Microsoft.VisualStudio.Workload.NativeDesktop --add Microsoft.VisualStudio.Workload.VCTools --add Microsoft.VisualStudio.Workload.MSBuildTools
```

In [1]:
import os
from dotenv import load_dotenv
# load AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, OPENAI_API_VERSION and AZURE_OPENAI_API_TYPE
# plus COMPLETION4_DEPLOYMENT, to be assigned to the MODEL string
# plus BING_SUBSCRIPTION_KEY and BING_SEARCH_URL

load_dotenv("./../credentials_my.env")
MODEL = os.environ["GPT4-0613-8k"] 

from langchain.chat_models import AzureChatOpenAI
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0, max_tokens=1000)

In [2]:
# STEP 1: LOAD
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()
data

[Document(page_content='\n\n\n\n\n\nLLM Powered Autonomous Agents | Lil\'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLil\'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPosts\n\n\n\n\nArchive\n\n\n\n\nSearch\n\n\n\n\nTags\n\n\n\n\nFAQ\n\n\n\n\nemojisearch.app\n\n\n\n\n\n\n\n\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\n \n\n\nTable of Contents\n\n\n\nAgent System Overview\n\nComponent One: Planning\n\nTask Decomposition\n\nSelf-Reflection\n\n\nComponent Two: Memory\n\nTypes of Memory\n\nMaximum Inner Product Search (MIPS)\n\n\nComponent Three: Tool Use\n\nCase Studies\n\nScientific Discovery Agent\n\nGenerative Agents Simulation\n\nProof-of-Concept Examples\n\n\nChallenges\n\nCitation\n\nReferences\n\n\n\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer an

In [3]:
len(data)

1

In [4]:
len(data[0].page_content)

43595

In [5]:
# I print just metadata because the page_content is too long
data[0].metadata['source']

'https://lilianweng.github.io/posts/2023-06-23-agent/'

In [6]:
# STEP 2: SPLIT
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
all_splits = text_splitter.split_documents(data)

In [7]:
print(f"Number of splits: {len(all_splits)}. Here they are:\n")
print (f"- first split ({len(all_splits[0].page_content)}): {all_splits[0]}\n\n- second split ({len(all_splits[1].page_content)}): {all_splits[1]}")

Number of splits: 130. Here they are:

- first split (492): page_content="LLM Powered Autonomous Agents | Lil'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLil'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPosts\n\n\n\n\nArchive\n\n\n\n\nSearch\n\n\n\n\nTags\n\n\n\n\nFAQ\n\n\n\n\nemojisearch.app\n\n\n\n\n\n\n\n\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\n \n\n\nTable of Contents\n\n\n\nAgent System Overview\n\nComponent One: Planning\n\nTask Decomposition\n\nSelf-Reflection\n\n\nComponent Two: Memory\n\nTypes of Memory\n\nMaximum Inner Product Search (MIPS)" metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI,

In [8]:
# STEP 3: STORE

from langchain.embeddings import AzureOpenAIEmbeddings
from langchain.vectorstores import Chroma

embeddings_model = AzureOpenAIEmbeddings(deployment=os.environ["EMBEDDING_DEPLOYMENT"])

my_embedded_words = embeddings_model.embed_documents(['hamburger', 'elephant']) # just for testing

vectorstore = Chroma.from_documents(documents=all_splits[:16], embedding=embeddings_model)

In [9]:
import numpy as np
print(f"shape: {np.array(my_embedded_words).shape}")
print(f"First element: {my_embedded_words[0]}")

shape: (2, 1536)
First element: [-0.014050570133534048, -0.010806334995488017, 0.007523200842313588, -0.03023284261768427, -0.008697971588287598, 0.026576270934358, -0.03759266387116834, -0.038806337357561, -0.020741317150938906, -0.016462350049687508, 0.00029417895328643766, 0.008503473253016636, 4.8199154943603565e-05, -0.02083467664989219, -0.004578493941522241, 0.004096137772027037, 0.030279522367160912, 0.010697415890483376, 0.03581884084163649, -0.01986996431090178, -0.013482633802449961, -0.003930814047348335, 0.021799388988882597, 0.007087524422295024, -0.01431508809301997, -0.00334342867436133, 0.008316755186432637, -0.004100027906370517, -0.018749652186107527, 0.020787996900415547, 0.004629063825342362, -0.00441511528401528, -0.02600055526590951, -0.009631564715175287, 0.009374826093053764, 0.011825507352642027, 0.007153653912166505, -0.020570158690406265, 0.01783161959659376, 0.013467074196398601, 0.03560100076898208, 0.002804667885192065, -0.0043995556779639206, 0.008332314

In [10]:
[i[0] for i in vectorstore.get().items()]

['ids', 'embeddings', 'metadatas', 'documents', 'uris', 'data']

In [11]:
# STEP 4: RETRIEVE
# Retrieve relevant splits for any question using similarity search.

question = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(question,k=4) # by default k=4
print(f"Documents retrieved: {len(docs)}")
docs

Documents retrieved: 4


[Document(page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.', metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log"}),
 Document(page_content='Fig. 1. Overview of a LLM-powered auto

In [12]:
# STEP 5: GENERATE

from langchain.chains import RetrievalQA
#from langchain.chat_models import AzureChatOpenAI
#llm = AzureChatOpenAI(deployment_name="gpt-35-turbo", temperature=1, max_tokens=1000)

qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectorstore.as_retriever(), return_source_documents=True)
result = qa_chain({"query": question})
print(result["query"], result["result"])

What are the approaches to Task Decomposition? The approaches to Task Decomposition include:

1. Decomposition by a Language Learning Model (LLM) with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?".
2. Using task-specific instructions; for example, "Write a story outline." for writing a novel.
3. With human inputs.
4. Chain of Thought (CoT) technique, where the model is instructed to “think step by step” to decompose hard tasks into smaller and simpler steps.
5. Tree of Thoughts (ToT) method, which extends CoT by exploring multiple reasoning possibilities at each step. It decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.


In [13]:
print(f"Question: {result['query']}")
print(f"Answer: {result['result']}")
print(f"Nr. of source documents: {len(result['source_documents'])}")
print(f"First source document:\n{result['source_documents'][0]}")

from IPython.display import display, HTML, Markdown
display(Markdown(f'''
Source of first source document: <a href={result["source_documents"][0].metadata["source"]}>
{result["source_documents"][0].metadata["title"]}</a>
'''))

Question: What are the approaches to Task Decomposition?
Answer: The approaches to Task Decomposition include:

1. Decomposition by a Language Learning Model (LLM) with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?".
2. Using task-specific instructions; for example, "Write a story outline." for writing a novel.
3. With human inputs.
4. Chain of Thought (CoT) technique, where the model is instructed to “think step by step” to decompose hard tasks into smaller and simpler steps.
5. Tree of Thoughts (ToT) method, which extends CoT by exploring multiple reasoning possibilities at each step. It decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Nr. of source documents: 4
First source document:
page_content='Task decomposition can 


Source of first source document: <a href=https://lilianweng.github.io/posts/2023-06-23-agent/>
LLM Powered Autonomous Agents | Lil'Log</a>


In [None]:
# STEP 6: MEMORY

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

In [None]:
from langchain.chains import ConversationalRetrievalChain

retriever = vectorstore.as_retriever()
chat = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)

In [None]:
result = chat({"question": "What are some of the main ideas in self-reflection?"})
result

In [None]:
result = chat({"question": "Give me an example"})
result = chat({"question": "Last answer in Italian, please"})
result

In [None]:
memory.buffer

In [None]:
# STEP 7: PERSISTENT STORAGE FOR MEMORY (Azure Cosmos DB)
# Create CosmosDB instance from langchain cosmos class.

import random
from langchain.memory import CosmosDBChatMessageHistory

cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id="Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id="Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [None]:
# create a memory buffer that stores conversations into cosmosdb
memory_cosmos = ConversationBufferMemory(memory_key="chat_history", return_messages=True, chat_memory=cosmos)

retriever = vectorstore.as_retriever()
chat_cosmos = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory_cosmos)

answer1 = chat_cosmos({"question": "What are some of the main ideas in self-reflection?"})
answer2 = chat_cosmos({"question": "Give me an example"})
answer3 = chat_cosmos({"question": "Last answer in Italian, please"})
cosmosdb_docs = len(cosmos.messages)
print(f"The user {cosmos.user_id} has stored {int(cosmosdb_docs/2)} Open AI chats for a total of {cosmosdb_docs} documents into Azure Cosmos DB.")

In [None]:
answer3

In [None]:
!pip install langchainhub

In [None]:
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")
prompt