(250) - Add MongoDB Atlas Retrieval Model #257

jhyearsley · 2023-12-26T06:16:15Z

Added MongoDB Atlas Retrieval Model. Currently only supports OpenAI embedding provider.

okhat · 2023-12-28T01:13:25Z

Hey @jhyearsley , I want to merge this. It's probably affected by the recent OpenAI update though, any changes needed here?

@DanielUH2019 some changes probably also still needed for 1-2 of the retrievers, from my comments in your (merged) PR

jhyearsley · 2023-12-28T05:47:16Z

@okhat thanks for the ping, should be good to go now. I reran the script I've been testing against and it's working as expected

jhyearsley · 2023-12-28T05:51:51Z

Attaching the script in case others find it helpful, want to test themselves, or have any feedback to share. As written the script implements RAG in less than 30 lines of code!!!

The script requires a MongoDB Atlas cluster which stores data in the namespace kb.embedded_content. The embedded_content collection stores chunked data which was embedded with OpenAI text-embedding-ada-002 embedding model. The question in the script is also embedded with text-embedding-ada-002 and Atlas Vector Search is used to retrieve the approximate KNNs which are fed back into GPT as domain specific context to answer the question.

Really cool to see the power and simplicity of DSPy in action 🎉 thanks @okhat for all the recent help!

import dspy
from dspy.retrieve.mongodb_atlas_rm import MongoDBAtlasRM

lm = dspy.OpenAI(model="gpt-3.5-turbo", model_type="chat", max_tokens=1500)
rm = MongoDBAtlasRM(
    db_name="kb",
    collection_name="embedded_content",
    index_name="embedded_content_vector_index",
    k=5,
)

dspy.settings.configure(lm=lm, rm=rm)


## Basic Q&A
class BasicQA(dspy.Signature):
    """Answer MongoDB questions with paragraph answers."""

    question = dspy.InputField()
    answer = dspy.OutputField(
        desc="Focus on MongoDB. Answers should be 1 to 3 paragraphs"
    )


question = "Why is MongoDB Atlas the best way to run MongoDB in the cloud?"
generate_basic_answer = dspy.Predict(BasicQA)
pred = generate_basic_answer(question=question)
print(f"Question: {question}")
print(f"Predicted Answer: {pred.answer}")
lm.inspect_history(n=1)

## Chain of Thought
generate_answer_with_chain_of_thought = dspy.ChainOfThought(BasicQA)
pred = generate_answer_with_chain_of_thought(question=question)
print(f"Question: {question}")
print(f"Thought: {pred.rationale.split('.', 1)[1].strip()}")
print(f"Predicted Answer: {pred.answer}")

## Inspecting semantic retrieval results
topK_passages = rm(question).passages
print(f"Top {rm.k} passages for question: {question} \n", "-" * 30, "\n")
for idx, passage in enumerate(topK_passages):
    print(f"{idx+1}]", passage["text"], "\n")


## RAG
class GenerateAnswer(dspy.Signature):
    """Answer MongoDB questions with thorough and in-depth answers."""

    context = dspy.InputField(
        desc="may contain relevant data for answering the question"
    )
    question = dspy.InputField()
    answer = dspy.OutputField(desc="Answers should be 1 to 3 paragraphs")


class RAG(dspy.Module):
    def __init__(self, num_passages=5):
        super().__init__()

        self.retrieve = rm
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)


rag = RAG()
predRAG = rag(question=question)
print(f"Question: {question}")
print(f"Answer: {predRAG.answer}")
lm.inspect_history(n=1)

jhyearsley added 2 commits December 25, 2023 23:10

(250) - Add MongoDB Atlas Retrieval Model

0e96712

(250) - Add MongoDB Atlas Retrieval Model

8b2f767

jhyearsley added 3 commits December 27, 2023 21:22

Merge remote-tracking branch 'upstream/main' into mongodb-atlas-rm

8e71272

Update openai imports and client methods to match upgraded python client

1864168

Merge branch 'mongodb-atlas-rm'

797cf94

okhat merged commit f94861f into stanfordnlp:main Jan 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(250) - Add MongoDB Atlas Retrieval Model #257

(250) - Add MongoDB Atlas Retrieval Model #257

Uh oh!

jhyearsley commented Dec 26, 2023

Uh oh!

okhat commented Dec 28, 2023

Uh oh!

jhyearsley commented Dec 28, 2023 •

edited

Loading

Uh oh!

jhyearsley commented Dec 28, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

(250) - Add MongoDB Atlas Retrieval Model #257

(250) - Add MongoDB Atlas Retrieval Model #257

Uh oh!

Conversation

jhyearsley commented Dec 26, 2023

Uh oh!

okhat commented Dec 28, 2023

Uh oh!

jhyearsley commented Dec 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhyearsley commented Dec 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jhyearsley commented Dec 28, 2023 •

edited

Loading

jhyearsley commented Dec 28, 2023 •

edited

Loading