# Step 11: Semantic Memory and RAG (Retrieval-Augmented Generation)

This notebook demonstrates how to store facts in memory, search them semantically, and use them to improve LLM responses using RAG.

In [None]:
import asyncio
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
    AzureTextEmbedding,
    AzureChatCompletion,
)
from semantic_kernel.memory import SemanticTextMemory, VolatileMemoryStore

## Initialize Kernel and Services

We set up the Semantic Kernel with:
1. **AzureTextEmbedding**: Converts text into embeddings (vector representations)
2. **AzureChatCompletion**: Chat-based LLM for generating responses

In [None]:
kernel = Kernel()

embedding_service = AzureTextEmbedding(service_id="embedding")
chat_service = AzureChatCompletion(service_id="chat")
kernel.add_service(embedding_service)
kernel.add_service(chat_service)

## Understanding Embeddings

Let's see how embeddings work with two similar sentences.

In [None]:
TEXTS = [
    "A dog ran joyfully through the green field, chasing after butterflies in the warm afternoon sun.",
    "A happy puppy sprinted across the grassy meadow, playfully pursuing insects under the bright sky.",
]

# Generate embeddings for similar texts
text_embedded = await embedding_service.generate_embeddings(TEXTS)
print("üî¢ Embeddings generated for similar texts")
print(f"Embedding dimensions: {len(text_embedded[0])}")

## Set Up Semantic Memory

We use `SemanticTextMemory` with `VolatileMemoryStore` (in-memory, temporary storage).

In [None]:
memory = SemanticTextMemory(
    storage=VolatileMemoryStore(), 
    embeddings_generator=embedding_service
)

## Save Facts to Memory

Let's store some travel-related facts.

In [None]:
await memory.save_information(
    collection="travel_notes",  
    id="note1", 
    text="User is currently in Barcelona.", 
)
await memory.save_information(
    collection="travel_notes",
    id="note2",
    text="User enjoys modern art museums and seaside walks.",
)
await memory.save_information(
    collection="travel_notes",
    id="note3",
    text="Today is Saturday and the user is free in the afternoon.",
)

print("‚úÖ Facts saved to memory!")

## Semantic Search

Now we'll search memory for relevant facts using semantic similarity.

In [None]:
query = "What should I recommend for this afternoon?"

results = await memory.search(collection="travel_notes", query=query, limit=3)

print(f"\nüîç Semantic Query: {query}")
for r in results:
    print(f"‚úÖ Match: '{r.text}' (score: {r.relevance:.2f})")

## RAG: Response WITH Memory Context

Let's use the top match as context for the LLM.

In [None]:
context_info = results[0].text
prompt_with_context = f"Based on this context: '{context_info}', what can I suggest to do this afternoon?"
response_with_context = await kernel.invoke_prompt(prompt_with_context)

print("\n--- üß† LLM Response WITH Memory Context ---")
print(response_with_context)

## Response WITHOUT Memory Context

For comparison, let's ask the same question without context.

In [None]:
prompt_without_context = "What can I suggest to do this afternoon?"
response_without_context = await kernel.invoke_prompt(prompt_without_context)

print("\n--- ‚ùì LLM Response WITHOUT Memory ---")
print(response_without_context)

## Summary

You've now seen Retrieval-Augmented Generation (RAG) in action:
1. ‚úÖ Store facts in semantic memory
2. üîç Search using semantic similarity
3. üß† Use retrieved context to improve LLM responses

This makes LLMs more reliable, accurate, and contextual!