# MosesAI project

1. Read all Talmud pages in a directory
2. Send them to Pinecode
3. Read Pinecone index
4. For a query, find relevant documents
5. Using Langchain, send the query and relevant documents to ChatGPT
6. Get the answer

In [1]:
import openai
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key  = os.getenv('OPENAI_API_KEY')
MODEL="gpt-4"

In [2]:
import pinecone
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
#from PIL import Image               # to load images
#from IPython.display import display # to display images
#pil_im = Image.open('/content/langchain and pinecone.png')
#display(pil_im)

  from tqdm.autonotebook import tqdm


In [3]:
# !pip install --upgrade langchain openai  -q

In [4]:
#!pip install unstructured -q
#!pip install unstructured[local-inference] -q
#!pip install detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2 -q

In [5]:
#!apt-get install poppler-utils  

https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/directory_loader.html

In [6]:
from langchain.document_loaders import DirectoryLoader

directory = '../data/talmud-pages/'

def load_docs(directory):
  loader = DirectoryLoader(directory)
  documents = loader.load()
  return documents

documents = load_docs(directory)
len(documents)

2297

https://python.langchain.com/en/latest/modules/indexes/text_splitters/getting_started.html

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_docs(documents,chunk_size=1000,chunk_overlap=20):
  text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
  docs = text_splitter.split_documents(documents)
  return docs

docs = split_docs(documents)
print(len(docs))

docs = documents
print(len(docs))

3928
2297


In [9]:
print(docs[0].page_content)

Nazir 9 - Nazir who did not like figs If one says, "I am a nazir, so I cannot eat figs," - this is a strange statement. Being a nazir means specifically abstaining from grapes, nothing else. However, Beit Shammai says that he does become a nazir nevertheless. How so? People usually do not make nonsensical statements. This one probably wanted to become a nazir but added that he meant figs. He could have made a mistake, thinking there was such a thing. Or, he really could have changed his mind and was preparing a loophole for himself. But the problem is that Beit Shammai does not accept the idea of changing one's mind regarding Temple-related things. So either way, he becomes a nazir. What about Beit Hillel? They say that the man is not a nazir. He made a statement, that is true, but it was not a valid legal statement about becoming a nazir. So it did not take effect at all. Art: Melon And Bowl Of Figs by Gustave Caillebotte Talk to MosesAI about it


In [10]:
#requires for open ai embedding
#!pip install tiktoken -q

In [11]:
import openai
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key=openai.api_key)

query_result = embeddings.embed_query("Hello world")
len(query_result)

1536

In [12]:
#!pip install pinecone-client -q

https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/pinecone.html

In [14]:
import pinecone 
from langchain.vectorstores import Pinecone
# initialize pinecone
pinecone.init(
    api_key=os.getenv('PINECONE_API_KEY'), 
    # environment="us-west4-gcp-free"  # next to api key in console
    environment="us-central1-gcp"  # next to api key in console
)

#index_name = "langchain-demo"
index_name = "talmud-pages"

index = Pinecone.from_documents(docs, embeddings, index_name=index_name)

# if you already have an index, you can load it like this
index = Pinecone.from_existing_index(index_name, embeddings)

dc875eaf-0570-4d9c-aab6-9672545edcf1


In [15]:
def get_similiar_docs(query,k=5,score=False):
  if score:
    similar_docs = index.similarity_search_with_score(query,k=k)
  else:
    similar_docs = index.similarity_search(query,k=k)
  return similar_docs

query = "When do you say Shema?"
similar_docs = get_similiar_docs(query)
len(similar_docs)
similar_docs[0]

Document(page_content='However, the Shema is not said when one actually goes to sleep or wakes up, but rather in the general period when people lie down and get up. When is this? In the evening - when the Kohanim, who were impure, are returning from the mikveh to eat priestly portion (terumah), that is, at nightfall. That is when the time to say the Shema in the evening begins, but when does it end? Rabbi Eliezer said, "Until the end of the first watch, that is, the first part of the night." Rabbi Eliezer understands "when you lie down" as the time when people go to sleep. The Sages say until midnight, and Rabban Gamliel says until dawn. We can understand Rabban Gamliel, \'when people lie down" means when people are asleep. But the opinion of the Sages will require clarification.', metadata={'source': '../data/talmud-paragraphs/brachot2.html-paragraph-1.txt'})

In [16]:
from langchain.llms import OpenAI

llm = OpenAI(model_name=MODEL)



https://python.langchain.com/en/latest/use_cases/question_answering.html

In [17]:
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type="stuff")

def get_answer(query):
  similar_docs = get_similiar_docs(query)
  # print(similar_docs)
  answer =  chain.run(input_documents=similar_docs, question=query)
  return  answer

query = "When to say Shema?"  
get_answer(query)

'The Shema prayer should be said twice a day: once in the evening and once in the morning. In the evening, the time to say the Shema begins at nightfall, when the Kohanim who were impure, are returning from the mikveh to eat their priestly portion (terumah). The time to say the Shema in the evening ends, according to Rabbi Eliezer, at the end of the first watch or the first part of the night. The Sages say it ends at midnight, and Rabban Gamliel says it ends at dawn. In the morning, the Shema should be said "when people rise," which is when there is enough light to distinguish between the blue and the white wool, or when one can distinguish between a wolf and a dog, or recognize a friend from four steps away. The time to say the Shema ends, according to Rabbi Eliezer, at sunrise, while Rabbi Yehoshua says it ends three hours into the day.'

In [18]:
query = "what are the sacrifices for? \
Sacrifices are typically brought for mistakes or unintentional transgressions, as stated in Keritot 9. \
However, there are cases when one brings an offering for intentional acts, such as relations with a slavewoman designated for another, a nazir who went to the cemetery, and one who swore a false oath of testimony (also mentioned in Keritot 9) \
what about bird sacrifices?"
get_answer(query)

'Bird sacrifices are brought in some situations depending on the financial status of the individual. As mentioned in Keritot 10, for example, if a woman who gave birth is poor, she brings a pair of birds as her sacrifice. Similarly, a spiritual leper (metzora) who is poor brings one animal and two birds as his sacrifice.'