# RAG
Retrieval Augmented Generation (RAG)

这是最常见到的LLM应用了。让我们用一个简版的PDF问答演示

![Diagram](./img/chat-to-pdf-llm.png)


In [3]:
// 安装 nuget 包
#r "nuget: Microsoft.SemanticKernel, 1.0.0-beta8"
#r "nuget: System.Linq.Async, 6.0.1"
#r "nuget: Microsoft.Extensions.Logging.Console, 6.0.0.0"


## 第一步，创建Kernel与memory


In [6]:
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Plugins.Memory;
using Microsoft.SemanticKernel.Connectors.AI.OpenAI;
using Microsoft.Extensions.Logging;

var builder = new KernelBuilder();

var chatmodel = "gpt-4";
var apiKey = Environment.GetEnvironmentVariable("OPENAI__APIKEY");

ArgumentNullException.ThrowIfNull(apiKey);

var loggerFactory = LoggerFactory.Create(builder => { 
    builder.SetMinimumLevel(LogLevel.Information);  
    builder.AddConsole(); });

var kernel = builder
.WithLoggerFactory(loggerFactory)
.WithOpenAIChatCompletionService(chatmodel, apiKey)
.Build();

var embeddingModel = "text-embedding-ada-002";
var memoryBuilder = new MemoryBuilder();
memoryBuilder.WithOpenAITextEmbeddingGenerationService(embeddingModel, apiKey);
memoryBuilder.WithMemoryStore(new VolatileMemoryStore());

var memory = memoryBuilder.Build();


## 第二步，取出文本

In [12]:
#r "nuget: PdfPig, 0.1.8" 
using Microsoft.SemanticKernel.Text;
using UglyToad.PdfPig;
using UglyToad.PdfPig.DocumentLayoutAnalysis.TextExtractor;

In [19]:
using System.IO;
using UglyToad.PdfPig;
using UglyToad.PdfPig.DocumentLayoutAnalysis.TextExtractor;

var text = string.Empty;
using (PdfDocument document = PdfDocument.Open(@".\pdf\drivers-handbook.pdf"))
{
  
    foreach (var page in document.GetPages())
    {
        string pageText = page.Text;
        text = $"{text} {pageText}";
    }
    Console.WriteLine(text);
}





## 第三步，分段处理
分段有数个作用。
应模型的窗内口大小限制，很多时候不能把整个文本发送给AI服务。其次，查询相关内容提供给模型能提高精度（kinda)且节省费用

In [23]:
    const int DocumentLineSplitMaxTokens = 75;
    const int DocumentChunkMaxTokens = 512;
    const int DocumentChunkOverlapCount = 75;

    var lines = TextChunker.SplitPlainTextLines(text, DocumentLineSplitMaxTokens);
    var chunks = TextChunker.SplitPlainTextParagraphs(lines, DocumentChunkMaxTokens, DocumentChunkOverlapCount);

## 第四步，记忆

In [25]:
foreach (var chunk in chunks)
{
    var recordID = await memory.SaveInformationAsync(
        collection: "drivers-handbook",
        text: chunk,
        id: Guid.NewGuid().ToString()
    );     
}

## 第五步, 生成答案

In [33]:
var generateAnswerFuncPrompt = "Give the following context: {{ $facts}}, answer this question: {{ $input}}. If you don't know the answer, return I don't know";
var generateAnswerFunc = kernel.CreateSemanticFunction(generateAnswerFuncPrompt, requestSettings: new OpenAIRequestSettings { MaxTokens = 200, Temperature = 0 });

var retriveFunc = async (string question) => {
    var paragraphs = memory.SearchAsync("drivers-handbook", question,  limit: 3, minRelevanceScore: 0.7);
    var text = string.Empty;
    await foreach (var p in paragraphs)
    {
        text = $"{text}\n {p.Metadata.Text}";
    }
    return text;
};

In [36]:
using Microsoft.SemanticKernel.Orchestration;

var q1 = "交通灯中红色箭头是什么意思?";

var retrived = await retriveFunc(q1);
var answer = await kernel.RunAsync(generateAnswerFunc, new ContextVariables() {["input"]= q1, ["facts"]= retrived});
Console.WriteLine(answer.GetValue<string>())

红色箭头在交通信号中表示禁止驾驶员朝箭头所指的方向行驶。
