# MosesAI project

1. Read all Talmud pages in a directory
2. Send them to Pinecode
3. Read Pinecone index
4. For a query, find relevant documents
5. Using Langchain, send the query and relevant documents to ChatGPT
6. Get the answer

In [8]:
from openai import OpenAI
import os
from dotenv import load_dotenv, find_dotenv

# Load .env
_ = load_dotenv(find_dotenv())

# Create a client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

MODEL = "gpt-4"

In [9]:
import pinecone
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

In [10]:
from langchain.document_loaders import DirectoryLoader

directory = 'data/talmud-pages/'

def load_docs(directory):
  loader = DirectoryLoader(directory)
  documents = loader.load()
  return documents

documents = load_docs(directory)
len(documents)

2297

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_docs(documents,chunk_size=1000,chunk_overlap=20):
  text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
  docs = text_splitter.split_documents(documents)
  return docs

docs = split_docs(documents)
print(len(docs))

docs = documents
print(len(docs))

3929
2297


In [12]:
print(docs[0].page_content)

Nazir 9 - Nazir who did not like figs If one says, "I am a nazir, so I cannot eat figs," - this is a strange statement. Being a nazir means specifically abstaining from grapes, nothing else. However, Beit Shammai says that he does become a nazir nevertheless. How so? People usually do not make nonsensical statements. This one probably wanted to become a nazir but added that he meant figs. He could have made a mistake, thinking there was such a thing. Or, he really could have changed his mind and was preparing a loophole for himself. But the problem is that Beit Shammai does not accept the idea of changing one's mind regarding Temple-related things. So either way, he becomes a nazir. What about Beit Hillel? They say that the man is not a nazir. He made a statement, that is true, but it was not a valid legal statement about becoming a nazir. So it did not take effect at all. Art: Melon And Bowl Of Figs by Gustave Caillebotte Talk to MosesAI about it


In [14]:
from openai import OpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())  # optional, if using .env

# Pass the API key directly
api_key = os.getenv("OPENAI_API_KEY")
embeddings = OpenAIEmbeddings(openai_api_key=api_key)

# Use the embedding model
query_result = embeddings.embed_query("Hello world")
print(len(query_result))


  embeddings = OpenAIEmbeddings(openai_api_key=api_key)


1536


https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/pinecone.html

In [17]:
"""
Notebook cell ― Pinecone 3.x + LangChain 0.2
-------------------------------------------
Requires:
  pip install --upgrade pinecone langchain-pinecone langchain-openai
Environment:
  OPENAI_API_KEY, PINECONE_API_KEY
Inputs:
  docs  # list[langchain.schema.Document] if you plan to ingest
"""

import os
from pinecone import Pinecone, ServerlessSpec
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore

# 1) Embedding model
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    api_key=os.getenv("OPENAI_API_KEY"),
)

# 2) Decide the vector dimension
DIMENSION = embeddings.dimensions or 1536   # text-embedding-3-small default :contentReference[oaicite:0]{index=0}

# 3) Connect to Pinecone
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
index_name = "talmud-pages"

# 4) Create the index once
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=DIMENSION,
        metric="cosine",
        spec=ServerlessSpec(cloud="gcp", region="us-central1")
    )

# 5) Open the index and wrap it for LangChain
pc_index = pc.Index(index_name)
vectorstore = PineconeVectorStore(index=pc_index, embedding=embeddings)

# To ingest data if the index is empty:
# vectorstore.add_documents(docs)

from pinecone import Pinecone
from langchain_pinecone import PineconeVectorStore

# Connect to Pinecone and wrap index
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
pc_index = pc.Index("talmud-pages")

index = PineconeVectorStore(
    index=pc_index,
    embedding=embeddings  # your OpenAIEmbeddings instance
)


In [18]:
def get_similiar_docs(query,k=5,score=False):
  if score:
    similar_docs = index.similarity_search_with_score(query,k=k)
  else:
    similar_docs = index.similarity_search(query,k=k)
  return similar_docs

query = "When do you say Shema?"
similar_docs = get_similiar_docs(query)
len(similar_docs)
similar_docs[0]

Document(id='f359ae85-d02e-41ca-8324-9175f7dac703', metadata={'source': '../data/talmud-paragraphs/bava_batra35.html-paragraph-2.txt'}, page_content='Once one gets the object, can the other one take it back? Some say "no" - because the court would not allow a perpetual feud. Others say that "yes, he can take it back."')

In [19]:
from langchain.llms import OpenAI

llm = OpenAI(model_name=MODEL)



https://python.langchain.com/en/latest/use_cases/question_answering.html

In [20]:
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type="stuff")

def get_answer(query):
  similar_docs = get_similiar_docs(query)
  # print(similar_docs)
  answer =  chain.run(input_documents=similar_docs, question=query)
  return  answer

query = "When to say Shema?"  
get_answer(query)

stuff: https://python.langchain.com/docs/versions/migrating_chains/stuff_docs_chain
map_reduce: https://python.langchain.com/docs/versions/migrating_chains/map_reduce_chain
refine: https://python.langchain.com/docs/versions/migrating_chains/refine_chain
map_rerank: https://python.langchain.com/docs/versions/migrating_chains/map_rerank_docs_chain

See also guides on retrieval and question-answering here: https://python.langchain.com/docs/how_to/#qa-with-rag
  chain = load_qa_chain(llm, chain_type="stuff")
  answer =  chain.run(input_documents=similar_docs, question=query)


APIRemovedInV1: 

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742


In [21]:
query = "what are the sacrifices for? \
Sacrifices are typically brought for mistakes or unintentional transgressions, as stated in Keritot 9. \
However, there are cases when one brings an offering for intentional acts, such as relations with a slavewoman designated for another, a nazir who went to the cemetery, and one who swore a false oath of testimony (also mentioned in Keritot 9) \
what about bird sacrifices?"
get_answer(query)

APIRemovedInV1: 

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742
