<a href="https://colab.research.google.com/github/prakul/mongoDB_atlas_vector_search_sample/blob/main/MongoDB_Semantic_kernel_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!python -m pip install semantic-kernel==0.3.14.dev openai

In [2]:
import getpass

OPENAI_API_KEY = getpass.getpass("OpenAI API Key:")
MONGODB_ATLAS_CLUSTER_URI = getpass.getpass("MongoDB Atlas Cluster URI:")
MONGODB_ATLAS_VECTOR_SEARCH_INDEX="default"
MONGODB_DATABASE="semantic-kernel-test"
MONGODB_COLLECTION = "test"

OpenAI API Key:··········
MongoDB Atlas Cluster URI:··········


In [3]:
import semantic_kernel as sk

kernel = sk.Kernel()

### Register OpenAI model to the kernel


In [4]:
import openai
from semantic_kernel.connectors.ai.open_ai import (OpenAIChatCompletion, OpenAITextEmbedding)

openai.api_key = OPENAI_API_KEY

kernel.add_chat_service("chat-gpt", OpenAIChatCompletion("gpt-3.5-turbo", openai.api_key))
kernel.add_text_embedding_generation_service("ada", OpenAITextEmbedding("text-embedding-ada-002", openai.api_key))


<semantic_kernel.kernel.Kernel at 0x7cd24ebf0490>

### Register MongoDBMemoryStore to the kernel



In [5]:
from semantic_kernel.connectors.memory.mongodb_atlas import MongoDBAtlasMemoryStore

mongodb_atlas_store=MongoDBAtlasMemoryStore(index_name=MONGODB_ATLAS_VECTOR_SEARCH_INDEX, connection_string=MONGODB_ATLAS_CLUSTER_URI, database_name=MONGODB_DATABASE)
kernel.register_memory_store(memory_store=mongodb_atlas_store)


### Register TextMemorySkill

In [6]:
kernel.import_skill(sk.core_skills.TextMemorySkill())

{'recall': <semantic_kernel.orchestration.sk_function.SKFunction at 0x7cd230259420>,
 'save': <semantic_kernel.orchestration.sk_function.SKFunction at 0x7cd230288400>}

## The need for RAG

While LLM like OpenAI GPT-3.5 exhibit impressive wide range of skills. Being trained on the internet data it knows about a wide range of topics and can answer things accurately

In [11]:
# Wrap your prompt in a function
prompt = kernel.create_semantic_function("""
As a friendly AI Copilot answer the question: Did Albert Einstein have pets?
""")

print(prompt())

Yes, Albert Einstein did have pets. He had a few cats throughout his life, and his most famous cat was named Tiger. Einstein was known to have a great fondness for animals, particularly cats, and he often spoke about his love for them.


But LLMs also have a few limitations: they have a knowledge cutoff (Sep 2021 in case of OpenAI), and do not know about proprietary & personal data. They also have a tendency to hallucinate, that is they may confidently make up facts and provide answers that may seem to be accurate, but are actually incorrect. Here we can test an example to  demonstrate that:

In [12]:
prompt = kernel.create_semantic_function("""
As a friendly AI Copilot answer the question: Did we have pets?
""")

print(prompt())

Yes, humans have had pets for thousands of years. Pets are animals that are kept for companionship, entertainment, or emotional support. They can be found in various forms, such as dogs, cats, birds, fish, and even more exotic animals like reptiles or rodents. Pets have become an integral part of many households, providing love, companionship, and joy to their owners.


Now we will show how to augment the knowledge base of the LLM with proprietary data.

In [16]:
async def populate_memory(kernel: sk.Kernel) -> None:
    # Add some documents to the semantic memory
    await kernel.memory.save_information_async(
        collection=MONGODB_COLLECTION, id="1", text="We enjoy coffee and Starbucks"
    )
    await kernel.memory.save_information_async(
        collection=MONGODB_COLLECTION, id="2", text="We are Associate Developer Advocates at MongoDB"
    )
    await kernel.memory.save_information_async(
        collection=MONGODB_COLLECTION, id="3", text="We have great coworkers and we love our teams!"
    )
    await kernel.memory.save_information_async(
        collection=MONGODB_COLLECTION, id="4", text="Our names are Anaiya and Tim"
    )
    await kernel.memory.save_information_async(
        collection=MONGODB_COLLECTION, id="5", text="We have been to New York City and Dublin"
    )

In [17]:
print("Populating memory...aka adding in documents")
await populate_memory(kernel)
print(kernel)

Populating memory...aka adding in documents
<semantic_kernel.kernel.Kernel object at 0x7cd24ebf0490>
