<a href="https://colab.research.google.com/github/ryderwishart/biblical-machine-learning/blob/main/ai_bible_query.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Empowering Bots with Data

This notebook exemplifies how to build a powerful question-answering system using OpenAI's Language Model (LLM), augmented with a specific knowledge database—in this case, the Bible. The aim of this combined system is to provide accurate responses to queries related to the Bible, as well as to demonstrate the capabilities of LLMs when combined with structured data sources, thus empowering them with long-term memory and specificity.

The main components of this notebook are:

1. **Building a Bible QA System**: A question-answering system specifically trained on the Bible is created. This involves vectorizing the Bible text and storing it in a Vectorstore database. The Bible QA system is built using the RetrievalQA method from the langchain.chains module, which leverages the power of the LLM to reason about and retrieve information from the Bible database.

2. **Initializing an LLM Agent**: An LLM Agent is initialized with the Bible QA system as a tool. This allows the agent to utilize the Bible QA system when answering questions, essentially enabling it to "remember" information from the Bible and reason about it. The LLM Agent acts as a router and decision-maker, determining how and when to use the Bible QA system based on the input question.

3. **Running Queries**: The combined system is then used to answer complex, multi-step questions related to the Bible. The LLM Agent's ability to perform multi-step reasoning is showcased, with the agent using the Bible QA system to answer individual parts of the question and then combining those answers into a final response.

This notebook thus exemplifies the use of LLMs as a reasoning tool, enhanced by a specific knowledge database. This approach not only improves the accuracy of responses to specific queries but also provides the LLM with a form of long-term memory, allowing it to consistently access and reason about a large, fixed body of knowledge.

In [15]:
!pip install langchain openai chromadb tiktoken

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tiktoken
  Downloading tiktoken-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m18.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
Successfully installed tiktoken-0.4.0


In [7]:
# Download the KJV text file
import os

filename = 'pg10.txt'
if filename not in [path for path in os.listdir()]:
    !wget 'https://www.gutenberg.org/cache/epub/10/pg10.txt'
os.listdir()

['.config', 'pg10.txt', 'sample_data']

In [8]:
import getpass
secret_key = getpass.getpass('Enter OpenAI secret key: ')
os.environ['OPENAI_API_KEY'] = secret_key

Enter OpenAI secret key: ··········


In [9]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from pathlib import Path

# Load Language Model
llm = OpenAI(temperature=0)

In [11]:
# Load and process the Bible text
doc_path = filename
loader = TextLoader(doc_path)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_documents(documents)



In [16]:
# Create embeddings and store in a vectorstore
embeddings = OpenAIEmbeddings()
bible_chroma = Chroma.from_documents(texts, embeddings, collection_name="kjv-bible")



In [17]:
# Create a RetrievalQA tool with this vectorstore
bible_tool = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=bible_chroma.as_retriever())

tools = [
    Tool(
        name = "Bible QA System",
        func=bible_tool.run,
        description="useful for when you need to answer questions about the Bible. Input should be a fully formed question.",
        return_direct=True  # If you want to use the agent as a router and return results directly
    ),
]

In [18]:
# Initialize the agent
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)


In [19]:
agent.run("What does Genesis 1:1 say?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out what the Bible says.
Action: Bible QA System
Action Input: What does Genesis 1:1 say?[0m
Observation: [36;1m[1;3m Genesis 1:1 says "In the beginning God created the heaven and the earth."[0m

[1m> Finished chain.[0m


' Genesis 1:1 says "In the beginning God created the heaven and the earth."'

Here is a two-step question. The zero-shot react description Cannot handle the second part properly.

In [30]:
# agent.run("Where was Job from? Does that place get mentioned anywhere else in the Bible?")
agent.run("What is an ephod? Who wears one?")





[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out what an ephod is and who wears one.
Action: Bible QA System
Action Input: What is an ephod? Who wears one?[0m
Observation: [36;1m[1;3m An ephod is a type of garment worn by priests in the Bible. It is described as being made of gold, blue, purple, scarlet, and fine twined linen, and having two onyx stones with the names of the children of Israel engraved on them.[0m

[1m> Finished chain.[0m


' An ephod is a type of garment worn by priests in the Bible. It is described as being made of gold, blue, purple, scarlet, and fine twined linen, and having two onyx stones with the names of the children of Israel engraved on them.'

However, the self-ask with search agent can.

In [31]:
from langchain import OpenAI, SerpAPIWrapper
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType

# Assuming you have already initialized your Bible QA system
bible_qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=bible_chroma.as_retriever())

tools = [
    Tool(
        name="Intermediate Answer",
        func=bible_qa.run,
        description="useful for when you need to answer questions about the Bible. Input should be a fully formed question."
    )
]

self_ask_with_search = initialize_agent(tools, llm, agent=AgentType.SELF_ASK_WITH_SEARCH, verbose=True)
# self_ask_with_search.run("Where was Job from? Does that place get mentioned in the Bible?")
self_ask_with_search.run("What is an ephod? Who wears one?")




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m Yes.
Follow up: What is an ephod?[0m
Intermediate answer: [36;1m[1;3m An ephod is a type of garment worn by priests in the Bible. It was made of gold, blue, purple, scarlet, and fine twined linen, and was often decorated with stones or other ornaments.[0m
[32;1m[1;3mFollow up: Who wears an ephod?[0m
Intermediate answer: [36;1m[1;3m Aaron.[0m
[32;1m[1;3mSo the final answer is: An ephod is a type of garment worn by priests in the Bible, and it was worn by Aaron.[0m

[1m> Finished chain.[0m


'An ephod is a type of garment worn by priests in the Bible, and it was worn by Aaron.'

Much better.

## Next steps:

- Combine the self-ask agent with the Bible vector retrieval agent.
- Think through some custom tooling for exciting new applications such as:
 - Bible translation functionality in low- or no-resource languages
 - Extracting new structured datasets using text plus existing structured data that can be retrieved on the fly
 - Evaluating Bible translations using qualitative "metrics" (as opposed to more conventional ones) on the basis of structured data extracted from the back translation

Questions? Ideas? Reach out to ryderwishart (at) gmail dot com!