# **Data preparation**

This notebook contains the code to preprocess and convert lecture transcripts and course notes into embeddings for later use in the student copilot application

### **What's Semantic Kernel**

Semantic Kernel is an open-source SDK that lets you easily combine AI services like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C# and Python. By doing so, you can create AI apps that combine the best of both worlds.

![Copilot Stack](./imgs/mind-and-body-of-semantic-kernel.png)


Semantic Kernel has been engineered to allow developers to flexibly integrate AI services into their existing apps. To do so, Semantic Kernel provides a set of connectors that make it easy to add memories and models. In this way, Semantic Kernel is able to add a simulated "brain" to your app.

Additionally, Semantic Kernel makes it easy to add skills to your applications with AI plugins that allow you to interact with the real world. These plugins are composed of prompts and native functions that can respond to triggers and perform actions. In this way, plugins are like the "body" of your AI app.

Because of the extensibility Semantic Kernel provides with connectors and plugins, you can use it to orchestrate AI plugins from both OpenAI and Microsoft on top of nearly any model. For example, you can use Semantic Kernel to orchestrate plugins built for ChatGPT, Bing, and Microsoft 365 Copilot on top of models from OpenAI, Azure, or even Hugging Face.



using nuget to add Microsoft.SemanticKernel & Microsoft.SemanticKernel.Connectors.Memory.Qdrant library

In [2]:
#r "nuget: Microsoft.SemanticKernel, *-*"
#r "nuget: Microsoft.SemanticKernel.Connectors.Memory.Qdrant, *-*"

In [3]:
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Plugins.Core;
using Microsoft.SemanticKernel.Connectors.AI.OpenAI;
using Microsoft.SemanticKernel.Connectors.AI.OpenAI.TextEmbedding;

using Microsoft.SemanticKernel.Plugins.Memory;
using Microsoft.SemanticKernel.Connectors.Memory.Qdrant;

In [4]:
using System;
using System.IO;
using System.Text.Json;

### **Creating the kernel runtime environment**

By investigating the constructor of the Kernel class, you can see that you can configure multiple settings that are necessary to run both native and semantic functions. These include:

- The default AI service that will power your semantic functions.

- The template engine used to render prompt templates.

- The logger used to log messages from functions.

- The plugins available to be executed by the kernel

- Additional configuration used by the kernel via the KernelConfig class.

In [16]:
var AZURE_OPENAI_ENDPOINT = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT");
var AZURE_OPENAI_KEY = Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY");
var AZURE_OPENAI_EMBEDDING_DEPLOYMENT = Environment.GetEnvironmentVariable("AZURE_OPENAI_EMBEDDING_DEPLOYMENT");
var AZURE_OPENAI_CHAT_DEPLOYMENT = Environment.GetEnvironmentVariable("AZURE_OPENAI_CHAT_DEPLOYMENT");
var QDRANT_HOST = Environment.GetEnvironmentVariable("QDRANT_HOST");

In [6]:
IKernel kernel = new KernelBuilder().WithAzureChatCompletionService(AZURE_OPENAI_CHAT_DEPLOYMENT, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_KEY).Build();

### **What is a plugin?**

To drive alignment across the industry, we've adopted the OpenAI plugin specification as the standard for plugins. This will help create an ecosystem of interoperable plugins that can be used across all of the major AI apps and services like ChatGPT, Bing, and Microsoft 365.


In [7]:
var pluginDirectory =  System.IO.Directory.GetCurrentDirectory() + "/Plugins";

Set plugin for File and Answer

In [8]:
var filePlugin = kernel.ImportSemanticFunctionsFromDirectory(pluginDirectory,"FilePlugin");
var answerPlugin = kernel.ImportSemanticFunctionsFromDirectory(pluginDirectory,"AnswerPlugin");

In [9]:
public class KB
{
    public string kb { get; set; }
    public string content { get; set; }
}

Get knowledge from transcripts

In [10]:
DirectoryInfo transcriptsFolder = new DirectoryInfo(@"./data/transcripts");    
FileInfo[] transcriptsFiles = transcriptsFolder.GetFiles();

In [11]:
IList<KB> kbList = new List<KB>();

foreach(var file in transcriptsFiles)
{
    string content = File.ReadAllText(file.FullName);
    var jsonResult = await kernel.RunAsync(content, filePlugin["Transcrips"]);
    var result = JsonSerializer.Deserialize<List<KB>>(jsonResult.ToString());
    foreach(var item in result)
    {
        kbList.Add(item);
    }
}

In [12]:
kbList

index,value
,
,
,
,
,
,
,
,
,
,

Unnamed: 0,Unnamed: 1
kb,Intro
content,in this video I want to give you a bit of Background by walking through the major milestones in the history of machine learning and artificial intelligence

Unnamed: 0,Unnamed: 1
kb,Alan Turing and the Turing test
content,intelligence really begun in 1950s though it's based on mathematical and statistical developments over many centuries Alan Turing is credited with helping to lay the foundation for the concept of a machine they can think in his quest to Define machine intelligence he achieved a crucial Milestone by creating the Turing test in 1950. in this test an interrogator questions both a human and a computer and tries to determine which one is which if the interrogator cannot tell the difference then the computer can be considered intelligent

Unnamed: 0,Unnamed: 1
kb,The Dartmouth Summer Research Project on AI
content,artificial intelligence was coined with a small group of scientists gathered at Dartmouth College in the U.S for an event called the summer research project on artificial intelligence this conference was the birth of the field of research we know as AI

Unnamed: 0,Unnamed: 1
kb,The golden years of AI
content,the years from 1956 to 1974 are known as The Golden Ears of AI optimism ran high in the hope that AI could solve many problems in 1967 Marvin Minsky the co-founder of the mitai lab stated confidently and incorrectly that within a generation the problem of creating artificial intelligence will substantially be solved natural language processing research flourished search was refined and made more powerful and the concept of micro worlds was created where simple tasks were completed using plain language instructions research was well funded by government agencies advances were made in computation and algorithms and prototypes of intelligent machines were built some of these machines include shaky the robot who could maneuver and decide how to perform tasks intelligently Eliza and gnarly charabot that could converse with people and act as a primitive therapist Blocksworld an example of a micro world where blogs get could be stacked and sorted and decision-making experiments could be tested by the mid-1970s it had become apparent that the complexity of making intelligent machines had been understated and that its promise had been overblown compute power was too limited there was a lack of data to train and test AIS and there were questions around the ethics of introducing AI systems like the therapist Eliza into society funding dried up and confidence in the field slowed marking the beginning of what is called an AI winter

Unnamed: 0,Unnamed: 1
kb,The AI winter
content,in the 1980s as computers became more powerful expert systems became more successful there was a Resurgence in optimism about AI as businesses found practical applications of these rule-based inference systems by the late 80s it was becoming apparent that expert systems had become too specialized and were unlikely to achieve machine intelligence the rise of personal computers also competed with these large specialized centralized systems this led to another chill in the AI field

Unnamed: 0,Unnamed: 1
kb,Resurgence and fall of AI for expert systems
content,things began to change in the mid-1990s as compute and storage capabilities grew exponentially making it possible to process much larger data sets than ever before the rise of the internet and the popularity of smartphones both contributed to increasing amounts of data a new experiments in machine learning became possible throughout the 2000s significant advancements were made in computer vision and natural language processing by training machine learning models on Big Data

Unnamed: 0,Unnamed: 1
kb,Growth in AI driven by more data and more powerful hardware
content,in the past decade compute power and the size of data sets have continued to grow and machine learning has become capable of solving even more problems as a result today machine learning touches almost every part of our Lives sometimes we're well aware of it like when we interact with chat TPT in the browser or see a self-driving car go by but most of the time it's seamlessly woven into familiar experiences of our everyday life such as when we're approved for a new loan or get a catalog at home

Unnamed: 0,Unnamed: 1
kb,Increased awareness of ethical and responsible AI
content,this era has also been marked by an increased awareness of potential ethical issues in machine learning and by significant research in the field of responsible AI we want the benefits of AI but we also want AI that is responsible and doesn't amplify human bias in the next video we will introduce techniques for building using and maintaining machine learning models I'll see you there

Unnamed: 0,Unnamed: 1
kb,Intro
content,the process of building machine learning models is very different from any other development workflow in this video you'll learn about that process more specifically you learn about deciding whether AI is the right approach for your problem collecting and preparing your data training your model evaluating your model tuning the hyper parameters and testing the trained model in the real world

Unnamed: 0,Unnamed: 1
kb,Decide if AI is the right approach
content,traditional software is well suited to solve problems where the solution can be described as a formal set of rules in contrast AI shines in solving problems where the solution can be extracted from data many of the problems we encountered in our daily life can be efficiently solved with traditional programming if an engineer can break up the solution of a problem and Define it using precise rules then traditional programming is a great tool to use but many of the problems we encounter in our day-to-day aren't quite as easy to Define as a set of rules thankfully for many of those problems we have access to plenty of real life data containing useful information which means that AI can help us find a solution one good example is translating from one language to another writing a set of rules that full encodes all the parallels between two languages is not easy but there are many examples of translation online so AI has been able to do a much better job of translation than previous attempts so our first step when we're starting a new project should be to analyze the problem and determine which technique is best to solve it if you're able to obtain plenty of data that contains useful information about your solution then AI is a promising approach once you decided that

Unnamed: 0,Unnamed: 1
kb,Collect and prepare data
content,AI is the right method for you you need to collect and prepare your data for example you may need to normalize it or convert it to a different form or remove rows that are missing certain Fields once your data is clean you need to decide about which aspects of your data or features you're going to use as input to your prediction and which feature you want to predict for example if you have medical data you may decide to use features that describe the patient's medical history as input and a chance of a particular disease as the output feature you want to predict and finally you need to split your data into training and test sets a usual split is 80 for your training data and 20 for test

Unnamed: 0,Unnamed: 1
kb,Train your model
content,next you need to choose a machine learning algorithm which you'll learn a lot about in the coming videos if you're undecided between a few good algorithms you may want to try them all and see which one performs best then you need to train your model using the training set you collected earlier and the algorithm you chose training a model may take a while especially if the model is large

Unnamed: 0,Unnamed: 1
kb,Evaluate your model
content,once the model is trained you can test it using the test data set that you split earlier it's important that you test the algorithm with data that it hasn't seen during training to ensure that it generalizes well to new scenarios

Unnamed: 0,Unnamed: 1
kb,Tune the hyperparameters
content,some algorithms contain hyper parameters which are settings that control key aspects of their inner workings choosing good hyper parameters is important because they can make a big difference in your results if you want to be systematic about your hyper parameter search you can write code that tries lots of different combinations and helps you discover the best values for your data once you get good test results it's time to see how well your model performs within the context of its intended use for example this could involve collecting live data from a sensor and using it to make predictions or deploying a model to a few users of your application if it all looks good then you're ready to release it to production and enjoy its benefits

Unnamed: 0,Unnamed: 1
kb,Test the model in the real world
content,make sure you watch the next video where we'll start getting Hands-On with machine learning by configuring all the tools we'll use in the rest of the series I'll see you there

Unnamed: 0,Unnamed: 1
kb,Introducing ML for Beginners
content,hello and welcome to this course on classical machine learning for beginners whether you're completely new to the topic or an experienced ml practitioner looking to brush up on an area we're happy to have you join us this course is based on the free open source 26 lesson ml for beginners curriculum from Microsoft which can be found at AKA dot Ms slash ml-beginners machine learning is one of the most popular Technologies these days I'm sure you've heard this term if you have any sort of familiarity with technology no matter what domain you work in however the mechanics of machine learning are a mystery to most people and the subject can sometimes feel overwhelming in this course you'll start right from the beginning and you'll learn about it step by step to practical Hands-On coding examples let's

Unnamed: 0,Unnamed: 1
kb,The difference between AI and ML
content,start by talking about the difference between artificial intelligence and machine learning AI is a science of getting machines to accomplish tasks that typically require human level intelligence many different techniques have been proposed for AI but the most successful and popular approach these days is machine learning unlike other AI techniques ml uses specialized algorithms to make decisions by learning from data so machine learning is really a subset of artificial intelligence you've also probably heard of deep learning which is a subset of machine learning that relies on neural networks to learn from data in this course we're

Unnamed: 0,Unnamed: 1
kb,What you'll learn in this course
content,going to cover what we call classical machine learning you'll learn some Core Concepts of ml a bit of History statistical techniques like regression classification clustering and more the concepts you'll learn here will serve you well as you progress to more

Unnamed: 0,Unnamed: 1
kb,Advanced Techniques
content,keep in mind that this course won't cover data science deep learning neural networks and AI techniques other than ml Microsoft offers two additional courses for you to learn more about these areas data science for beginners available at AKA dot Ms slash data science beginners and AI for beginners available at aka.ms Ai and beginners machine learning is a

Unnamed: 0,Unnamed: 1
kb,Why study Machine Learning
content,Hot Topic because it's solving complex real-world problems in so many areas Finance earth science space exploration cognitive science and many more Fields have adopted machine learning to solve problems specific to their domains for example you can use machine learning to predict the likelihood of disease from a patient's medical history to anticipate weather events to understand the sentiment of a text and to detect fake news and stop the spread of propaganda applications of machine learning are almost everywhere and are as ubiquitous as the data that is Flowing from our devices and systems because of how useful it is understanding the basics of machine learning is going to help you no matter what domain you're coming from in the next video in the series I'll give an overview of the history of ml I'll see you there


Get knowledge from notes

In [13]:
DirectoryInfo notesFolder = new DirectoryInfo(@"./data/notes");    
FileInfo[] notesFiles = notesFolder.GetFiles();

In [14]:

foreach(var file in notesFiles)
{
    string content = File.ReadAllText(file.FullName);
    var jsonResult = await kernel.RunAsync(content, filePlugin["Notes"]);
    var result = JsonSerializer.Deserialize<KB>(jsonResult.ToString());
    kbList.Add(result);
}

### **What's Embeddings**

An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors. Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words. Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space. An embedding can be learned and reused across models.

**Note:** We use qdrant for this sample . please open terminal and run this code to start qdrant db

```bash

docker run -p 6333:6333 qdrant/qdrant

```

In [17]:
var qdrantMemoryBuilder = new MemoryBuilder();

var textEmbedding = new AzureTextEmbeddingGeneration(AZURE_OPENAI_EMBEDDING_DEPLOYMENT, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_KEY);
qdrantMemoryBuilder.WithTextEmbeddingGeneration(textEmbedding);
qdrantMemoryBuilder.WithQdrantMemoryStore(QDRANT_HOST, 1536);

var builder = qdrantMemoryBuilder.Build();

save knowledge to vectordb

In [18]:
string MemoryCollectionName = "kb_collection";
int index = 1;


foreach(var item in kbList)
{
    await builder.SaveInformationAsync(MemoryCollectionName, id: "index"+(index++).ToString(), text: item.kb + " -  " + item.content);
}

### **Testing**

In [19]:
var searchResults =  builder.SearchAsync(MemoryCollectionName, "can you tell me what is different ML and AI", limit: 1, minRelevanceScore: 0.8);


In [20]:

await foreach (var item in searchResults)
{
    var answer = await kernel.RunAsync(item.Metadata.Text, answerPlugin["Summary"]);
    Console.WriteLine(answer.ToString());
}

going to focus on machine learning and its relationship to artificial intelligence. Machine learning is a branch of AI that uses algorithms to make decisions by learning from data. It is a popular and successful approach in AI. Deep learning, on the other hand, is a subset of machine learning that relies on neural networks to learn from data. In this course, we will primarily explore machine learning and its connection to artificial intelligence.


In [21]:
var searchResults =  builder.SearchAsync(MemoryCollectionName, "do you know 1956: Dartmouth Summer Research Project", limit: 1, minRelevanceScore: 0.7);


In [22]:
await foreach (var item in searchResults)
{
    var answer = await kernel.RunAsync(item.Metadata.Text, answerPlugin["Summary"]);
    Console.WriteLine(answer.ToString());
}

The Dartmouth Summer Research Project on AI, held at Dartmouth College in the U.S., marked the beginning of the field of artificial intelligence (AI). It brought together a small group of scientists who coined the term AI. This conference laid the foundation for the research and development of AI as we know it today.
