# **🧪  Lab 1 - Prototype**

This notebook is about building a simple Microsoft Fabric-based Copilot application

### **🔥 Prerequirement**

1. [Visual Studio Code](https://code.visualstudio.com/) or [GitHub Codespaces](https://github.com/features/codespaces)
2. [Azure](https://azure.com/free) or [Azure for Student](https://aka.ms/studentgetazure)
3. Apply [Azure OpenAI Service](https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xUOFA5Qk1UWDRBMjg0WFhPMkIzTzhKQ1dWNyQlQCN0PWcu)
4. [.NET 7+](https://dotnet.microsoft.com/en-us/)
5. [Docker](https://www.docker.com/)
6. [Qdrant](https://qdrant.tech/)

### **📚 Intro**

We use Embeddings to inject relevant knowledge into the model of Azure OpenAI Services through the vector database. This is how we infuse magical skills. Learn from the following

Introduce the related library of Semantic Kernel, and introduce the support of Semantic Kernel for Qdrant vector database

In [None]:
#r "nuget: Microsoft.SemanticKernel, *-*"
#r "nuget: Microsoft.SemanticKernel.Connectors.Memory.Qdrant, *-*"

In [None]:
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.Memory.Qdrant;
using Microsoft.SemanticKernel.Memory;

In [None]:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Net.Http;
using System.Text.Json;
using System.Threading.Tasks;

A kernel in Semantic Kernel (SK) is an orchestrator of user problems. The kernel combines skills, memory and connectors to realize the user's intelligent expectations. In addition to configuring basic Azure OpenAI / OpenAI basic link strings / models / parameters / vector database, it can also pair related skills based on user requirements, integrate skills to form workflows, etc. 

The following example is to bind the Sementic Kernel to the gpt-3.5-turbo-16k and text-embedding-ada-002 models in Azure OpenAI Service, and connect to the locally assumed Qdrant vector database.

**Note**

*1. You need to bind the model of Azure OpenAI Service, which corresponds to the model you deployed in Azure AI Studio*

*2. Azure OpenAI Service Endpoint and Key need to be obtained in Azure Portal*

*3. The dimension of Azure OpenAI Service Embeddings is 1536, which is the default value, and you need to start the Qdrant service before this, please read [here](https://qdrant.tech/)*



In [None]:
IKernel kernel = Kernel.Builder
            .WithAzureChatCompletionService("Your Deplpyment model name", "Azure OpenAI Endpoint", "Azure OpenAI Key")
            .WithAzureTextEmbeddingGenerationService("Your Deplpyment model name", "Azure OpenAI Endpoint", "Azure OpenAI Key")
            .WithQdrantMemoryStore("http://localhost:6333", 1536)
            .Build();

# **🔑 Key: Chunkings**

With the expansion of model tokens by Azure OpenAI Service, chunking is no longer a problem, but we need to organize knowledge points is also an important step, Microsoft Fabric documents, headers and footers can be removed

In [None]:
string markdownFile = @"../docs/microsoft-fabric-overview.md";
// string markdownFile = @"../docs/data-science-overview.md";
// string markdownFile = @"../docs/data-engineering-overview.md";
// string markdownFile = @".../docs/data-factory-overview.md";
// string markdownFile = @"../docs/data-warehousing.md";
// string markdownFile = @"../docs/fabric-terminology.md";
// string markdownFile = @"../docs/end-to-end-tutorials.md";


In [None]:
string learnContent = File.ReadAllText(markdownFile);

In [None]:
int content_start = learnContent.IndexOf("# ");
int content_end = learnContent.IndexOf("## Next steps");
learnContent = learnContent.Substring(content_start,content_end - content_start) ;  

In [None]:
int chunkSize = learnContent.Length / 600;

In [None]:
var skillsDirectory = Path.Combine(System.IO.Directory.GetCurrentDirectory(), "..", "skills");

***Import Skill into the kernel, and inject magic through Prompt to extract relevant knowledge points***

In [None]:
var read_skill = kernel.ImportSemanticSkillFromDirectory(skillsDirectory, "ReadSkill");

In [None]:
public class KBContent
{
    public string KB { get; set; }
    public string Content { get; set; }
}


In [None]:
public async Task<string> GetKBContent(IKernel kernel,string content)
{
    var kbContent = await kernel.RunAsync(content, read_skill["KB"]);

    return kbContent.ToString();
}

In [None]:
bool checkStr = false;
string kbContent = "";
var kbList = new List<KBContent>();

In [None]:
kbContent = await GetKBContent(kernel,learnContent.Replace("\\","\\\\").Replace("\"","\'").Replace(":::","").Replace("\n",""));

In [None]:
string setKBContent = kbContent.Replace("[OUTPUT]","").Replace("[END OUTPUT]","").Trim();

Through Prompt, we extract the content of knowledge points into JSON format, which is convenient for us to import into the relevant vector database

In [None]:
setKBContent

In [None]:
var jsonKBContent = System.Text.Json.JsonSerializer.Deserialize<List<KBContent>>(setKBContent);

In [None]:
foreach(var item in jsonKBContent)
{
    kbList.Add(item);
}

The token is added or we don’t need to split the content, but sometimes there are unexpected situations, you need to intercept the content, segment and interact with the model

In [None]:
// for(int i = 0; i < chunkSize+1; i++)
// {
                 
//     if(checkStr)
//     {
//                //Console.WriteLine(strTmp);
//         kbContent = await GetKBContent(kernel,learnContent.Substring(i*600).Replace("\\","\\\\").Replace("\"","\'").Replace(":::",""));
//     }
//     else
//     {
//         // strTmp = saveContent.Substring(i*600);
//                //Console.WriteLine(strTmp.Substring(0,600));
//         kbContent = await GetKBContent(kernel,learnContent.Substring(i*600,600).Replace("\\","\\\\").Replace("\"","\'").Replace(":::",""));
//     }


//     string setKBContent = kbContent.Replace("[OUTPUT]","").Replace("[END OUTPUT]","").Trim();


//     Console.WriteLine(setKBContent);


//     var jsonKBContent = System.Text.Json.JsonSerializer.Deserialize<List<KBContent>>(setKBContent);

//     foreach(var item in jsonKBContent)
//     {
//         kbList.Add(item);
//     }

//     if(i!=chunkSize)
//     {

//         string strTmp = learnContent.Substring((i+1)*600);

//         if(strTmp.Length <= 600)
//             checkStr = true;
//         else
//             checkStr = false;
//     }

// }

In [None]:
kbList

In [None]:
var vectorData =  new List<KBContent>();

int stepsCount = 0;

foreach(var item in kbList)
{
    if(item.KB!="")
    {
        vectorData.Add(item);
    }
    else
    {
        if(vectorData.Count!=0)
        {
            vectorData[vectorData.Count-1].Content += item.Content;
        }
    }

}

In [None]:
vectorData

## **🔑 Key: VectorDB**

We need to store knowledge in the vector database and query through the vector database. Here is a simple example

In [None]:
string conceptCollectionName = "fbkb-concept";

In [None]:
int vectorCount = 1000;

foreach(var item in vectorData)
{
  await kernel.Memory.SaveInformationAsync(conceptCollectionName, id: "id"+vectorCount.ToString(), text: item.KB+" "+ item.Content);
  vectorCount++;
}

In [None]:
string questionText = "What is Mircorosft Fabric ?"; 

The key to vector search is to find approximate matching vector content

In [None]:
var searchResults =  kernel.Memory.SearchAsync(conceptCollectionName, questionText, limit: 1, minRelevanceScore: 0.7);

In [None]:

string result = "";
await foreach (var item in searchResults)
{
    result = item.Metadata.Text;
    Console.WriteLine(item.Metadata.Text + " : " + item.Relevance);
}

## **Summary**

We can inject content knowledge into LLM by combining Chunking and vector database. What we need to consider is how to combine the content with the model for optimization. So what needs to be more is the content of Prompt。 Learn more about [link](https://learn.microsoft.com/en-us/semantic-kernel/prompt-engineering/)