# Building Semantic Memory with Embeddings

So far, we've mostly been treating the kernel as a stateless orchestration engine.
We send text into a model API and receive text out.

In a [previous notebook](04-kernel-arguments-chat.ipynb), we used `kernel arguments` to pass in additional
text into prompts to enrich them with more data. This allowed us to create a basic chat experience.

However, if you solely relied on kernel arguments, you would quickly realize that eventually your prompt
would grow so large that you would run into the model's token limit. What we need is a way to persist state
and build both short-term and long-term memory to empower even more intelligent applications.

To do this, we dive into the key concept of `Semantic Memory` in the Semantic Kernel.


Import Semantic Kernel SDK from pypi.org and other dependencies for this example.

In [None]:
# Note: if using a virtual environment, do not run this cell
%pip install -U semantic-kernel[azure]
from semantic_kernel import __version__

__version__

Initial configuration for the notebook to run properly.

In [1]:
# Make sure paths are correct for the imports

import os
import sys

notebook_dir = os.path.abspath("")
parent_dir = os.path.dirname(notebook_dir)
grandparent_dir = os.path.dirname(parent_dir)


sys.path.append(grandparent_dir)

### Configuring the Kernel

Let's get started with the necessary configuration to run Semantic Kernel. For Notebooks, we require a `.env` file with the proper settings for the model you use. Create a new file named `.env` and place it in this directory. Copy the contents of the `.env.example` file from this directory and paste it into the `.env` file that you just created.

**NOTE: Please make sure to include `GLOBAL_LLM_SERVICE` set to either OpenAI, AzureOpenAI, or HuggingFace in your .env file. If this setting is not included, the Service will default to AzureOpenAI.**

#### Option 1: using OpenAI

Add your [OpenAI Key](https://openai.com/product/) key to your `.env` file (org Id only if you have multiple orgs):

```
GLOBAL_LLM_SERVICE="OpenAI"
OPENAI_API_KEY="sk-..."
OPENAI_ORG_ID=""
OPENAI_CHAT_MODEL_ID=""
OPENAI_TEXT_MODEL_ID=""
OPENAI_EMBEDDING_MODEL_ID=""
```
The names should match the names used in the `.env` file, as shown above.

#### Option 2: using Azure OpenAI

Add your [Azure Open AI Service key](https://learn.microsoft.com/azure/cognitive-services/openai/quickstart?pivots=programming-language-studio) settings to the `.env` file in the same folder:

```
GLOBAL_LLM_SERVICE="AzureOpenAI"
AZURE_OPENAI_API_KEY="..."
AZURE_OPENAI_ENDPOINT="https://..."
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME="..."
AZURE_OPENAI_TEXT_DEPLOYMENT_NAME="..."
AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME="..."
AZURE_OPENAI_API_VERSION="..."
```
The names should match the names used in the `.env` file, as shown above.

For more advanced configuration, please follow the steps outlined in the [setup guide](./CONFIGURING_THE_KERNEL.md).

We will load our settings and get the LLM service to use for the notebook.

In [2]:
from services import Service

from samples.service_settings import ServiceSettings

service_settings = ServiceSettings()

# Select a service to use for this notebook (available services: OpenAI, AzureOpenAI, HuggingFace)
selectedService = (
    Service.AzureOpenAI
    if service_settings.global_llm_service is None
    else Service(service_settings.global_llm_service.lower())
)
print(f"Using service type: {selectedService}")

Using service type: Service.OpenAI


In order to use memory, we need to instantiate the Kernel with a Memory Storage
and an Embedding service. In this example, we make use of the `VolatileMemoryStore` which can be thought of as a temporary in-memory storage. This memory is not written to disk and is only available during the app session.

When developing your app you will have the option to plug in persistent storage like Azure AI Search, Azure Cosmos Db, PostgreSQL, SQLite, etc. Semantic Memory allows also to index external data sources, without duplicating all the information as you will see further down in this notebook.


In [3]:
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
    AzureChatCompletion,
    AzureTextEmbedding,
    OpenAIChatCompletion,
    OpenAITextEmbedding,
)

kernel = Kernel()

chat_service_id = "chat"

# Configure AI service used by the kernel
if selectedService == Service.AzureOpenAI:
    azure_chat_service = AzureChatCompletion(
        service_id=chat_service_id,
    )
    embedding_gen = AzureTextEmbedding(
        service_id="embedding",
    )
    kernel.add_service(azure_chat_service)
    kernel.add_service(embedding_gen)
elif selectedService == Service.OpenAI:
    oai_chat_service = OpenAIChatCompletion(
        service_id=chat_service_id,
    )
    embedding_gen = OpenAITextEmbedding(
        service_id="embedding",
    )
    kernel.add_service(oai_chat_service)
    kernel.add_service(embedding_gen)

In [None]:
from dataclasses import dataclass, field
from typing import Annotated
from uuid import uuid4

from semantic_kernel.data.vector import VectorStoreField, vectorstoremodel


@vectorstoremodel(collection_name="simple-model")
@dataclass
class SimpleModel:
    """Simple model to store some text with a ID."""

    text: Annotated[str, VectorStoreField("data", is_full_text_indexed=True)]
    id: Annotated[str, VectorStoreField("key")] = field(default_factory=lambda: str(uuid4()))
    embedding: Annotated[
        list[float] | str | None, VectorStoreField("vector", dimensions=1536, embedding_generator=embedding_gen)
    ] = None

    def __post_init__(self):
        if self.embedding is None:
            self.embedding = self.text

At its core, Semantic Memory is a set of data structures that allow you to store the meaning of text that come from different data sources, and optionally to store the source text too. These texts can be from the web, e-mail providers, chats, a database, or from your local directory, and are hooked up to the Semantic Kernel through data source connectors.

The texts are embedded or compressed into a vector of floats representing mathematically the texts' contents and meaning. You can read more about embeddings [here](https://aka.ms/sk/embeddings).


### Manually adding memories

Let's create some initial memories "About Me". We can add memories to our `VolatileMemoryStore` by using `SaveInformationAsync`


In [5]:
records = [
    SimpleModel(text="Your budget for 2024 is $100,000"),
    SimpleModel(text="Your savings from 2023 are $50,000"),
    SimpleModel(text="Your investments are $80,000"),
]

In [6]:
from semantic_kernel.connectors.in_memory import InMemoryStore

in_memory_store = InMemoryStore()

collection = in_memory_store.get_collection(record_type=SimpleModel)
await collection.ensure_collection_exists()
# Add records to the collection
await collection.upsert(records)

['e6f2c669-0996-4cc5-8312-59e7d996a32b',
 '9042c465-115d-422d-b5f7-5ce35bd36455',
 'ecc3ec62-b1b5-4a41-a761-63280d3a4329']

Let's try searching the memory:


In [None]:
from semantic_kernel.data.vector import VectorSearchProtocol


async def search_memory_examples(collection: VectorSearchProtocol, questions: list[str]) -> None:
    for question in questions:
        print(f"Question: {question}")
        results = await collection.search(question, top=1)
        async for result in results.results:
            print(f"Answer: {result.record.text}")
            print(f"Score: {result.score}\n")

The default distance metric for the InMemoryCollection is `cosine`, this means that the closer the vectors are, the more similar they are. The `search` method will return the top `3` results by default, but you can change this by passing in the `top` parameter, this is set to `1` here to get only the most relevant result.

In [8]:
await search_memory_examples(
    collection,
    questions=[
        "What is my budget for 2024?",
        "What are my savings from 2023?",
        "What are my investments?",
    ],
)

Question: What is my budget for 2024?
Answer: Your budget for 2024 is $100,000
Score: 0.20661429378068685

Question: What are my savings from 2023?
Answer: Your savings from 2023 are $50,000
Score: 0.2028375831967465

Question: What are my investments?
Answer: Your investments are $80,000
Score: 0.33724588631042385



Next, we will add a search function to our kernel that will allow us to search the memory store for relevant information. This function will use the `search` method of the `InMemoryStore` to find the most relevant memories based on a query.

In [9]:
func = kernel.add_function(
    plugin_name="memory",
    function=collection.create_search_function(
        function_name="recall",
        description="Searches the memory for relevant information based on the input query.",
    ),
)

Then we will create a prompt that will use the search function to find relevant information in the memory store and return it as part of the prompt.

In [10]:
from semantic_kernel.functions import KernelFunction
from semantic_kernel.prompt_template import PromptTemplateConfig


async def setup_chat_with_memory(
    kernel: Kernel,
    service_id: str,
) -> KernelFunction:
    prompt = """
    ChatBot can have a conversation with you about any topic.
    It can give explicit instructions or say 'I don't know' if
    it does not have an answer.

    Information about me, from previous conversations:
    - {{recall 'budget by year'}} What is my budget for 2024?
    - {{recall 'savings from previous year'}} What are my savings from 2023?
    - {{recall 'investments'}} What are my investments?

    {{$request}}
    """.strip()

    prompt_template_config = PromptTemplateConfig(
        template=prompt,
        execution_settings={
            service_id: kernel.get_service(service_id).get_prompt_execution_settings_class()(service_id=service_id)
        },
    )

    return kernel.add_function(
        function_name="chat_with_memory",
        plugin_name="chat",
        prompt_template_config=prompt_template_config,
    )

Now that we've included our memories, let's chat!


In [11]:
print("Setting up a chat (with memory!)")
chat_func = await setup_chat_with_memory(kernel, chat_service_id)

print("Begin chatting (type 'exit' to exit):\n")
print(
    "Welcome to the chat bot!\
    \n  Type 'exit' to exit.\
    \n  Try asking a question about your finances (i.e. \"talk to me about my finances\")."
)


async def chat(user_input: str):
    print(f"User: {user_input}")
    answer = await kernel.invoke(chat_func, request=user_input)
    print(f"ChatBot:> {answer}")

Setting up a chat (with memory!)
Begin chatting (type 'exit' to exit):

Welcome to the chat bot!    
  Type 'exit' to exit.    
  Try asking a question about your finances (i.e. "talk to me about my finances").


In [12]:
await chat("What is my budget for 2024?")

User: What is my budget for 2024?
ChatBot:> Your budget for 2024 is $100,000.


In [13]:
await chat("talk to me about my finances")

User: talk to me about my finances
ChatBot:> Based on our previous conversations, your investments amount to $80,000. Your financial overview includes a budget for 2024 of $100,000 and savings from 2023 totaling $50,000. If you'd like, I can help you analyze your financial goals, suggest investment strategies, or assist with budgeting. Let me know how you'd like to proceed!


### Adding documents to your memory

Many times in your applications you'll want to bring in external documents into your memory. Let's see how we can do this using our VolatileMemoryStore.

Let's first get some data using some of the links in the Semantic Kernel repo.


In [14]:
@vectorstoremodel(collection_name="github-files")
@dataclass
class GitHubFile:
    """
    Model to store GitHub file URLs and their descriptions.
    """

    url: Annotated[str, VectorStoreField("data", is_full_text_indexed=True)]
    text: Annotated[str, VectorStoreField("data", is_full_text_indexed=True)]
    key: Annotated[str, VectorStoreField("key")] = field(default_factory=lambda: str(uuid4()))
    embedding: Annotated[
        list[float] | str | None, VectorStoreField("vector", dimensions=1536, embedding_generator=embedding_gen)
    ] = None

    def __post_init__(self):
        if self.embedding is None:
            self.embedding = f"{self.url} {self.text}"

In [15]:
github_files = []
github_files.append(
    GitHubFile(
        url="https://github.com/microsoft/semantic-kernel/blob/main/README.md",
        text="README: Installation, getting started, and how to contribute",
    )
)
github_files.append(
    GitHubFile(
        url="https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/02-running-prompts-from-file.ipynb",
        text="Jupyter notebook describing how to pass prompts from a file to a semantic plugin or function",
    )
)
github_files.append(
    GitHubFile(
        url="https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/00-getting-started.ipynb",
        text="Jupyter notebook describing how to get started with Semantic Kernel",
    )
)

github_memory = in_memory_store.get_collection(record_type=GitHubFile)
await github_memory.ensure_collection_exists()

# Add records to the collection
await github_memory.upsert(github_files)

['81585815-6be9-4b45-b608-af1f9918567f',
 '83d5e6ba-a6f9-4d05-aa27-0b5470c1aa83',
 'e015e196-ddb3-4157-9291-dcef201d5308']

In [16]:
ask = "I love Jupyter notebooks, how should I get started?"
print("===========================\n" + "Query: " + ask + "\n")

memories = await github_memory.search(ask, top=5)

async for result in memories.results:
    memory = result.record
    print(f"Result {memory.key}:")
    print(f"  URL        : {memory.url}")
    print(f"  Text       : {memory.text}")
    print(f"  Relevance  : {result.score}")
    print()

Query: I love Jupyter notebooks, how should I get started?

Result e015e196-ddb3-4157-9291-dcef201d5308:
  URL        : https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/00-getting-started.ipynb
  Text       : Jupyter notebook describing how to get started with Semantic Kernel
  Relevance  : 0.38310321217805676

Result 83d5e6ba-a6f9-4d05-aa27-0b5470c1aa83:
  URL        : https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/02-running-prompts-from-file.ipynb
  Text       : Jupyter notebook describing how to pass prompts from a file to a semantic plugin or function
  Relevance  : 0.5391694801842719

Result 81585815-6be9-4b45-b608-af1f9918567f:
  URL        : https://github.com/microsoft/semantic-kernel/blob/main/README.md
  Text       : README: Installation, getting started, and how to contribute
  Relevance  : 0.6495028830870797



Now you might be wondering what happens if you have so much data that it doesn't fit into your RAM? That's where you want to make use of an external Vector Database made specifically for storing and retrieving embeddings. Fortunately, semantic kernel makes this easy thanks to an extensive list of available connectors. In the following section, we will connect to an existing Azure AI Search service that we will use as an external Vector Database to store and retrieve embeddings.


_Please note you will need an AzureAI Search api_key or token credential and endpoint for the following example to work properly._

In [None]:
from semantic_kernel.connectors.azure_ai_search import AzureAISearchCollection

azs_memory = AzureAISearchCollection(record_type=GitHubFile)
# We explicitly delete the collection if it exists to ensure a clean state
await azs_memory.ensure_collection_deleted()
await azs_memory.ensure_collection_exists()
# Add records to the collection
await azs_memory.upsert(github_files)

['81585815-6be9-4b45-b608-af1f9918567f',
 '83d5e6ba-a6f9-4d05-aa27-0b5470c1aa83',
 'e015e196-ddb3-4157-9291-dcef201d5308']

The implementation of Semantic Kernel allows to easily swap memory store for another. Here, we will re-use the functions we initially created for `InMemoryStore` with our new external Vector Store leveraging Azure AI Search


Let's now try to query from Azure AI Search! Note that the `score` might be different because the AzureAISearchCollection uses a different default distance metric then InMemoryCollection, if you specify the `distance_function` in the model, you can get the same results as with the InMemoryCollection.


In [18]:
await search_memory_examples(
    azs_memory,
    questions=[
        "What is Semantic Kernel?",
        "How do I get started on it with notebooks?",
        "Where can I find more info on prompts?",
    ],
)

Question: What is Semantic Kernel?
Answer: README: Installation, getting started, and how to contribute
Score: 0.72694075

Question: How do I get started on it with notebooks?
Answer: Jupyter notebook describing how to get started with Semantic Kernel
Score: 0.68709856

Question: Where can I find more info on prompts?
Answer: Jupyter notebook describing how to pass prompts from a file to a semantic plugin or function
Score: 0.6572231



Make sure to cleanup!

In [19]:
await azs_memory.ensure_collection_deleted()

We have laid the foundation which will allow us to store an arbitrary amount of data in an external Vector Store above and beyond what could fit in memory at the expense of a little more latency.
