
---

### Reminder: This 📘 `.NET Interactive` notebook needs to be run from VS Code with [these prerequisites](../PREREQS.md).

#### How to use this notebook: 

* Just read the text and scroll along until you run into code blocks.
* Code blocks have computer code inside them — hover over the block and you can run the code.
* Run the code by hitting the ▶️ "play" button to the left. If the code runs you'll see a ✔️. If not, you'll get a ❌.
* The output and status of the code block will appear just below itself — you need to scroll down further to see it.
* Sometimes a code block will ask you for input in a hard-to-notice dialog box 👆 at the top of your notebook window. 

---

# Recipe IV. 🥑 Memories Maximized
## 🧑‍🍳 Cook well beyond the model's memory limits


The length of a prompt is dependent upon the LLM you are using. Newer models can take more longer prompts; older models can only take shorter prompts. As a result, there's a limitation to how much context you can provide within any given prompt. 

| Model | Maximum Tokens** |
|---|---|
| ada | 2049 |
| babbage | 2049 |
| curie-001 | 2049 |
| davinci-003 | 4097 |
| GPT-4 | 8192 |

** _1 token is approximately 3 characters; 1 page of book is roughly 500 tokens_

A method that is growing in popularity is to use what are called "embeddings" — which are high-dimensional numerical representations of any given text. It's possible to generate an "embedding" for a short piece of text or a longer piece of text. The length of the text is limited by the specific embedding model that you use.

When using OpenAI or Azure OpenAI Service models, the `ada` model is both an inexpensive and good-enough choice for most use cases. Let's start our learnings with generating some embeddings, and see how they work in practice.

## Step 1. Instantiate a 🔥 kernel for both completions and generating embeddings

Note that the code below includes a few new lines that should be unfamiliar to you. They refer to using the `text-embedding-ada-002` model to use for generating the vector of numbers for a piece of text.

In [1]:
#r "nuget: Microsoft.SemanticKernel, 0.9.61.1-preview"

#!import ../config/Settings.cs

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.KernelExtensions;
using System.IO;
using Microsoft.SemanticKernel.Configuration;
using Microsoft.SemanticKernel.SemanticFunctions;
using Microsoft.SemanticKernel.CoreSkills;
using Microsoft.SemanticKernel.Memory;

var (useAzureOpenAI, model, azureEndpoint, apiKey, orgId) = Settings.LoadFromFile();

var kernel =  Microsoft.SemanticKernel.Kernel.Builder
.Configure(c =>
{
    if (useAzureOpenAI) {
        c.AddAzureOpenAITextCompletion("davinci", model, azureEndpoint, apiKey);
        c.AddAzureOpenAIEmbeddingGeneration("ada", "text-embedding-ada-002", azureEndpoint, apiKey);
    } else {
        c.AddOpenAITextCompletion("davinci", model, apiKey, orgId);
        c.AddOpenAIEmbeddingGeneration("ada", "text-embedding-ada-002", apiKey, orgId);
    }
})
.WithMemoryStorage(new VolatileMemoryStore())
.Build();


## Step 2. Add 🥑 memories to let the 🔥 kernel cook richer meals

Imagine a collection of facts collected about you on the Internet as follows:

In [3]:
const string memoryCollectionName = "Facts About Me";

await kernel.Memory.SaveInformationAsync(memoryCollectionName, id: "LinkedIn Bio", 
    text: "I currently work in the hotel industry at the front desk. I won the best team player award.");

await kernel.Memory.SaveInformationAsync(memoryCollectionName, id: "LinkedIn History", 
    text: "I have worked as a tourist operator for 8 years. I have also worked as a banking associate for 3 years.");

await kernel.Memory.SaveInformationAsync(memoryCollectionName, id: "Recent Facebook Post", 
    text: "My new dog Trixie is the cutest thing you've ever seen. She's just 2 years old.");
    
await kernel.Memory.SaveInformationAsync(memoryCollectionName, id: "Old Facebook Post", 
    text: "Can you believe the size of the trees in Yellowstone? They're huge! I'm so committed to forestry concerns.");

Console.WriteLine("Four GIGANTIC vectors were generated just now from those 4 pieces of text above.");

Four GIGANTIC vectors were generated just now from those 4 pieces of text above.


> ✅ You'll need to have access to the `text-embedding-ada-002` model for the above to run correctly. Note that the Step 1 for this unit is different than all the other notebooks because it has this extra requirement to run.

Next, imagine that you wanted to ask your LLM a question about you. What would it do? Well, given that it doesn't know anything about you out-of-the-box, it will simply make things up about you.

In [7]:
var myFunction = kernel.CreateSemanticFunction(@"
Tell me about me and {{$input}} in less than 70 characters.
", maxTokens: 100, temperature: 0.8, topP: 1);
var result = await myFunction.InvokeAsync("my work history");
Console.WriteLine(result);

Error: Command cancelled.

For example, the semantic function above might say:

`You are a creative problem solver with a varied work history.`

That could apply to anybody, of course :+).

Instead of hoping that the LLM comes up with the more correct answer, we can use memories to craft a more accurate completion. We do that by finding the most similar memories on file by searching through the memories we've stored, giving it the maximum number of hits we want back with `limit`, and set a threshold for how relevant we want a search to come back with `minRelevanceScore`.

In [8]:
string ask = "Tell me about me and my work history.";
var relatedMemory = "I know nothing.";
var counter = 0;

var memories = kernel.Memory.SearchAsync(memoryCollectionName, ask, limit: 5, minRelevanceScore: 0.77);

await foreach (MemoryQueryResult memory in memories)
{
    if (counter == 0) { relatedMemory = memory.Text; }
    Console.WriteLine($"Result {++counter}:\n  >> {memory.Id}\n  Text: {memory.Text}  Relevance: {memory.Relevance}\n");
}

Result 1:
  >> LinkedIn History
  Text: I have worked as a tourist operator for 8 years. I have also worked as a banking associate for 3 years.  Relevance: 0.8243955422094129

Result 2:
  >> LinkedIn Bio
  Text: I currently work in the hotel industry at the front desk. I won the best team player award.  Relevance: 0.8020284063459443



Now we can ask the same question but with the most relevant context that we stored in `relatedMemory` to give to the LLM to come up with a more accurate response:

In [9]:
var myFunction = kernel.CreateSemanticFunction(@"
{{$input}}
Tell me about me and my work history in less than 70 characters.
", maxTokens: 100, temperature: 0.1, topP: .1);

var result = await myFunction.InvokeAsync(relatedMemory);

Console.WriteLine(result);


8 yrs tourist operator, 3 yrs banking associate.


## Step 3: Feel the "a-ha" moment.

### Manipulating 🥑 memories is how the token window limitation is addressed.

Recall the table showing the maximum tokens that can be used per model:

| Model | Maximum Tokens** |
|---|---|
| ada | 2049 |
| babbage | 2049 |
| curie-001 | 2049 |
| davinci-003 | 4097 |
| GPT-4 | 8192 |

** _1 token is approximately 3 characters; 1 page of book is roughly 500 tokens_

Given this same basic technique of gathering the most similar memories that are appropriate to a prompt, it's possible to have many more memories stored and available on-hand to compare with a given prompt. And it's not necessary to include just the top hit, but also more hits that are just as similar to the "most relevant" memory available. 

This is how an entire book can be used by Semantic Kernel as a memory source to feed into a prompt by only selecting the relevant chunks of text — i.e. that which relates to the prompt. To do so you would:

1. Generate embeddings for each of the paragraphs in the book.
2. For a given prompt, find the most similar paragraphs within the book.
3. Staying within the limitation of the token size window, gather all the related paragraphs.
4. You now have a prompt with a great deal of relevant 🥑 context to send to the model.
5. Reap the benefits of an "informed" LLM AI weighing in on a particular subject for you.

Let's review this in practice. Say I have a 500-page book. 

1. I take each page and generate the embedding with `Memory.SaveInformationAsync`
2. I then take my prompt, `the best scenes are ones with flowers in it and deserve to be summarized` and use `Memory.SearchAsync` to locate the pages with flower scenes in them.
3. Let's say there are three pages that are relevant. Those three pages will be used to compose a new prompt that's simply the three pages appended to each other along with the original prompt. If instead you need to include ten pages, and exceed the token window, then summarize each of the ten pages separately into ten shorter passages. Do this until you meet the token window requirements.
4. You have the prompt to give to the model you've chosen. It has pulled the relevant information out of the 500-page book, and will do its best to summarize what you care about the most.
5. Ta-da! You'll get what you've asked for.

To illustrate this point, we can take Abraham Lincoln's famous Gettysburg Address and use it to generate a new speech. We'll use the OpenAI API to generate the speech, and then we'll use the Azure Cognitive Search API to search for the speech in the text of the Gettysburg Address and break it up into chunks of text and then processed with embeddings. We've asked the GPT-3 model to write a simple text chunking procedure where a specified maximum length of a chunk is given. Chunking is still more of an art than a science, so you can see the result isn't as perfect as we'd like. But this will give you a sense of how a large text file can be processed into smaller pieces of text that get used by an LLM AI model.

In [11]:
using System;
using System.IO;
using System.Text;

public static List<string> ChunkTextFile(string filePath, int recommendedLength)
{
    List<string> chunks = new List<string>();

    // Read in the text file
    string text = File.ReadAllText(filePath);

    // Break the text into chunks of the recommended length
    int startIndex = 0;
    while (startIndex < text.Length)
    {
        int endIndex = startIndex + recommendedLength;
        if (endIndex > text.Length)
        {
            endIndex = text.Length;
        }

        // Look for a natural breakage point like a paragraph or just before a new heading
        while (endIndex < text.Length && !char.IsWhiteSpace(text[endIndex]))
        {
            endIndex++;
        }

        // Get the chunk of text
        string chunk = text.Substring(startIndex, endIndex - startIndex);

        // Strip the whitespace at the start and end of the string
        chunk = chunk.Trim();

        // Add the chunk to the list
        chunks.Add(chunk);

        // Move to the next chunk
        startIndex = endIndex;
    }

    return chunks;
}

// Get the list of chunks from the file
List<string> chunks = ChunkTextFile("./lincoln.txt", 140);

const string lincolnMemoryCollectionName = "Abe's Words";

// Add the chunks to memory
int counter = 0;
foreach (string chunk in chunks)
{
    Console.WriteLine($"Chunk {counter}: {chunk}");

    await kernel.Memory.SaveInformationAsync(lincolnMemoryCollectionName, id: $"Chunk {counter++}", 
        text: chunk);
}


Chunk 0: Four score and seven years ago our fathers brought forth upon this continent a new nation, conceived in liberty, and dedicated to the proposition
Chunk 1: that all men are created equal. (Applause.) Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived
Chunk 2: and so dedicated, can long endure. We are met on a great battle field of that war; we are met to dedicate a portion of it as the final resting
Chunk 3: place of those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this, but in a larger


Error: Microsoft.SemanticKernel.AI.AIException: Throttling: Too many requests, HTTP status: TooManyRequests
   at Microsoft.SemanticKernel.AI.OpenAI.Clients.OpenAIClientAbstract.ExecutePostRequestAsync[T](String url, String requestBody, CancellationToken cancellationToken)
   at Microsoft.SemanticKernel.AI.OpenAI.Clients.OpenAIClientAbstract.ExecuteTextEmbeddingRequestAsync(String url, String requestBody, CancellationToken cancellationToken)
   at Microsoft.SemanticKernel.AI.OpenAI.Services.OpenAITextEmbeddings.GenerateEmbeddingsAsync(IList`1 data)
   at Microsoft.SemanticKernel.AI.Embeddings.EmbeddingGeneratorExtensions.GenerateEmbeddingAsync[TValue,TEmbedding](IEmbeddingGenerator`2 generator, TValue value)
   at Microsoft.SemanticKernel.Memory.SemanticTextMemory.SaveInformationAsync(String collection, String text, String id, String description, CancellationToken cancel)
   at Submission#13.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

We can now query these chunks for the most similar ones that match a simple question: `"What should the people do?"`

In [12]:
var aCounter = 0;
var myPrompt = "What should the people do?";
var myMemory = "";
var memories = kernel.Memory.SearchAsync(lincolnMemoryCollectionName, myPrompt, limit: 5, minRelevanceScore: 0.77);

await foreach (MemoryQueryResult memory in memories) {
    Console.WriteLine($"Result {++aCounter}:\n  >> {memory.Id}\n  Text: {memory.Text}  Relevance: {memory.Relevance}\n");
    myMemory += memory.Text + " ";
}

Console.WriteLine("Memory to feed back into the prompt will be:\n  >> " + myMemory+ "\n");
var myLincolnFunction = kernel.CreateSemanticFunction(@"
Lincoln said:
---
{{$input}}
---
So what should the people do?
", maxTokens: 100, temperature: 0.1, topP: .1);

var lincolnResult = await myLincolnFunction.InvokeAsync(myMemory);

Console.WriteLine("Generated response ... 'according to Lincoln':\n" + lincolnResult);


Memory to feed back into the prompt will be:
  >> 

Generated response ... 'according to Lincoln':

The people should continue to strive for a more perfect union, uphold the principles of democracy and equality, and work towards the common good of all citizens. They should also remain vigilant and actively participate in the democratic process to ensure that their voices are heard and their rights are protected.


To see this run at scale, check out the GitHub Q&A Sample app available at [https://aka.ms/sk/repo](https://aka.ms/sk/repo). It takes an entire code repo, converts it to embeddings, and it lets you "chat" with the repo itself. Keep in mind that it would be generally impossible to feed the entire repo into an LLM AI's window, and that's where using 🥑 memories come in.

# ⏭️ Next Steps

Run through more advanced examples in the notebooks that are available in our GitHub repo at [https://aka.ms/sk/repo](https://aka.ms/sk/repo).

[Learn about 🍋 connectors!](../e5-connectors/notebook.ipynb)

Or stay a longer while and add more facts about yourself in the `MemoryCollection`.