-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Overview
I wanted to test out this library for a project, but I have hit so many roadblocks that I do not think this library is even functional. Here is my code I have been testing with, built following documentation in this repo (I have replaced sensitive information throughout with ...)
import dspy
from dspy.evaluate import Evaluate
from dspy.evaluate.metrics import answer_exact_match
from dspy.retrieve.pinecone_rm import PineconeRM
from dspy.teleprompt import BootstrapFewShot, BootstrapFewShotWithRandomSearch
class RAG(dspy.Module):
def __init__(self, num_passages: int = 3):
super().__init__()
# declare three modules: the retriever, a query generator, and an answer generator
self.retrieve = dspy.Retrieve(k=num_passages)
self.generate_query = dspy.ChainOfThought("question -> search_query")
self.generate_answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question: str):
# generate a search query from the question, and use it to retrieve passages
search_query = self.generate_query(question=question).search_query
passages = self.retrieve(query_or_queries=search_query).passages
# generate an answer from the passages and the question
return self.generate_answer(context=passages, question=question)
turbo = dspy.OpenAI(
model="gpt-4-1106-preview",
api_key="...")
retriever_model = PineconeRM(
pinecone_index_name="...",
pinecone_api_key="...",
pinecone_env="...",
openai_embed_model="text-embedding-3-small",
openai_api_key="...",
k=3)
dspy.settings.configure(
lm=turbo,
rm=retriever_model,
)
train = [('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...')]
train = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in train]
dev = [('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...')]
dev = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in dev]
teleprompter = BootstrapFewShot(metric=answer_exact_match, max_bootstrapped_demos=2)
teleprompter2 = BootstrapFewShotWithRandomSearch(metric=answer_exact_match, max_bootstrapped_demos=2, num_candidate_programs=8, num_threads=32)
rag_compiled = teleprompter.compile(RAG(), trainset=train)
rag_compiled2 = teleprompter2.compile(RAG(), trainset=train, valset=dev)
evaluate_hotpot = Evaluate(devset=dev, metric=answer_exact_match, num_threads=32, display_progress=True, display_table=15)
evaluate_hotpot(rag_compiled)
evaluate_hotpot(rag_compiled2)
rag_compiled("...")
rag_compiled2("...")Shortlist of errors encountered
Improper check
Filename: dspy/retrieve/pinecone_rm.py
Function name: _get_embeddings()
Description: Before checking if the user has set self.use_local_model, the function is checking if torch is installed. This dependency is not needed if the user is using OpenAI embeddings, however with this current logic DSPy is forcing users to have it installed no matter what. Reordering this function like below solves this problem
if not self.use_local_model:
if OPENAI_LEGACY:
embedding = openai.Embedding.create(
input=queries, model=self._openai_embed_model
)
else:
embedding = openai.embeddings.create(
input=queries, model=self._openai_embed_model
).model_dump()
return [embedding["embedding"] for embedding in embedding["data"]]
try:
import torch
except ImportError as exc:
raise ModuleNotFoundError(
"You need to install torch to use a local embedding model with PineconeRM."
) from excParameters being passed incorrectly
Filename: dspy/retrieve/pinecone_rm.py
Function name: forward()
Description: Somewhere up the chain of calls (i believe in dsp/primitives/search.py -> retrieveEnsemble()) k is being passed to the forward() function of PineconeRM. I remedied this by adding an unused k to the function definition
def forward(self, query_or_queries: Union[str, List[str]], k: int) -> dspy.Prediction:But this was just a short term fix as I was trying to get this script functional for testing.
Passages has no value long_text
Filename: dsp/primitives/search.py
Function name: retrieve()
Description: This line was encountering issues accessing the attribute .long_text
passages = [psg.long_text for psg in passages]If the passages in this list are dictionaries, shouldn't we be using .get("long_text")? Either way, at this point each passage was a string in this list and I am no longer sure if my debugging is making things better or worse.
Looking for some guidance if I am wildly off base or if this PineconeRM is not operational. Thanks
Additional Info
My environment is using python 3.10.13 and i have dspy-ai==2.1.6 installed