# 🎁 NumFlat

<img src=images/euclidean-manhattan.png width=400>

In [None]:
#r "nuget: NumFlat, 1.0.0"

In [None]:
// https://github.com/sinshu/numflat/blob/main/NumFlatTest/DistanceTests.cs

using NumFlat;

Vec<double> x = [1, 2, 3]; // the issue of libraries coming up with their implementation of Vector and Tensor
Vec<double> y = [1, 3, 7]; // and often will not be beneficiary of any newer runtime optimization/support

var euclidean = Distance.Euclidean.GetDistance(x, y); //(x - y).Norm()
var manhattan = Distance.Manhattan.GetDistance(x, y); //(x - y).L1Norm()

In [None]:
string[] words = { "clinic", "hospital" };
string userInput = "hopital";

Func<string, int, Vec<double>> vectorize = (word, maxLength) =>
{
    var paddedWord = word.PadRight(maxLength, '\0'); // Pad with null characters
    return new Vec<double>(paddedWord.Select(c => (double)c).ToArray()); // how character are numbers
}; 

int maxLength = Math.Max(userInput.Length, words.Max(w => w.Length)); // Determining the maximum length
var wordVectors = words.Select(word => vectorize(word, maxLength)).ToArray();
var userVector = vectorize(userInput, maxLength);

// Calculate Euclidean distances
var distances = wordVectors.Select(
    (vector, index) => (Word: words[index], Distance: Distance.Euclidean.GetDistance(userVector, vector)))
    .OrderBy(result => result.Distance);

foreach(var distance in distances)
    Console.WriteLine($"Closest match: {distance.Word}, Distance: {distance.Distance}");

In [None]:
static int DamerauLevenshteinDistance(this string s, string t)
{
    var bounds = new { Height = s.Length + 1, Width = t.Length + 1 };

    int[,] matrix = new int[bounds.Height, bounds.Width];

    for (int height = 0; height < bounds.Height; height++) { matrix[height, 0] = height; };
    for (int width = 0; width < bounds.Width; width++) { matrix[0, width] = width; };

    for (int height = 1; height < bounds.Height; height++)
    {
        for (int width = 1; width < bounds.Width; width++)
        {
            int cost = (s[height - 1] == t[width - 1]) ? 0 : 1;
            int insertion = matrix[height, width - 1] + 1;
            int deletion = matrix[height - 1, width] + 1;
            int substitution = matrix[height - 1, width - 1] + cost;

            int distance = Math.Min(insertion, Math.Min(deletion, substitution));

            if (height > 1 && width > 1 && s[height - 1] == t[width - 2] && s[height - 2] == t[width - 1])
                distance = Math.Min(distance, matrix[height - 2, width - 2] + cost);

            matrix[height, width] = distance;
        }
    }

    return matrix[bounds.Height - 1, bounds.Width - 1];
}

string[] words = { "clinic", "hospital" };
string userInput = "hopital";

var q = from w in words
        select new { word = w, Distance = w.DamerauLevenshteinDistance(userInput) };
foreach(var r in q.OrderBy(w => w.Distance))
    Console.WriteLine($"{r.word} {r.Distance}");

- ChatGPT agrees; you dont need to be Mathematical Guru and can easily figure this out with some basic prompts
- https://chatgpt.com/share/678e0c58-c820-800b-aff3-7a2a6ebfd8a2

<img src=images/levenshtein-distance.png>

# 🧠 Generative AI

<img src=images/model-function-calling.jpg width=800>

- https://github.com/xoredge/ai-tools-demo
- Ollama
    - https://ollama.com
- Semantic Kernel
    - https://learn.microsoft.com/en-us/semantic-kernel/overview
    - https://github.com/microsoft/semantic-kernel
- Function Calling
    - https://platform.openai.com/docs/guides/function-calling
    - https://ollama.com/blog/tool-support

## 🎁 Examples

- Multi Agent
    - https://github.com/khurram-uworx/dotnetdebrief/blob/main/src/ChatBots/AgentTicketHandler.cs
    - YAML
    - We can define using Code
        - We can have Parameters
    - Chain of Thoughts
    - Function Calling and why we need "Vectorization" (Plugin)
- Notepad
    - https://github.com/khurram-uworx/dotnetdebrief/tree/main/src/Notepad
    - Try prompt engineering to simulate Notepad AI use cases
    - Show the System Prompt

## 📖 Grounding

In [None]:
you are a sql expert who creates sql server queries to answer the user inputs

use the following sql server database schema information to create queries
- database has Users table that has id (int), username (string), password (string) and isactive(bool) fields
- database has Transactions table that has id(int) foreign key to Users id field, date (datetime), amount (int) and remark (string) fields

Important; just write sql query in answer; dont explain anything
Very Important; dont forget the underlying database is sql server; use appropriate tsql syntax

User input: Who are my active users that have made some transactions in the last month?

In [None]:
User input: How many active users do we have?
User input: Who are my active users that have made some transactions in the last month?
User input: Who are my top 5 active users by transaction amount?

In [None]:
The movie Inception (2010), directed by Christopher Nolan, is about a skilled thief named Dom Cobb who specializes in extracting secrets
from people's subconscious during dreams. The film explores the concept of dreams within dreams and questions the nature of reality.

Given this information, write a short review of Inception that highlights its themes and storytelling

In [None]:
The Matrix (1999) is about a computer hacker, Neo, who discovers that reality is a simulated world controlled by machines.
The Truman Show (1998) follows Truman Burbank, who slowly realizes his entire life has been a reality TV show. Both films explore the theme of questioning reality.

Using this information, compare how The Matrix and The Truman Show depict the idea of escaping a false reality

In [None]:
You are an AI that recommends family-friendly movies for a weekend watch. Here are five movies retrieved from a database based on relevance to the request

Finding Nemo (2003) - A heartwarming animated film about a father fish searching for his lost son.
Toy Story 3 (2010) - A story about friendship, change, and growing up, featuring beloved toy characters.
Paddington 2 (2017) - A charming and humorous film about a bear in London with a strong message of kindness.
The Incredibles (2004) - A superhero movie with action, humor, and family dynamics.
How to Train Your Dragon (2010) - A touching story about friendship between a boy and a dragon.

Based on this retrieved list, craft a response recommending the best option for a family movie night.

## 💡 Retreival Augmented Generation (RAG)

<img src=images/rag-sequence-diagram.png width=700>

__Resumes RAG__

<img src=images/resumes-rag.png>

- https://github.com/khurram-uworx/dotnetdebrief/blob/main/src/ChatBots/KernelMemoryPgRagSK.cs

In [None]:
Using the education details and graduation year from the provided resumes, estimate each candidate’s years of experience.
If a candidate graduated in 2020, assume they have been working since then. Given that the current year is 2025,
this would mean they have approximately 5 years of experience.

Identify and list candidates who have 5 or more years of experience based on this approach. If other work experience
details are available, incorporate them to refine the estimation. Provide a ranked list of qualified candidates along
with their estimated years of experience.

__Concepts__

- Structured Outputs
    - https://platform.openai.com/docs/guides/structured-outputs
    - https://ollama.com/blog/structured-outputs
    - https://devblogs.microsoft.com/semantic-kernel/using-json-schema-for-structured-output-in-net-for-openai-models
- Embeddings
    - https://openai.com/index/introducing-text-and-code-embeddings
    - https://openai.com/index/new-embedding-models-and-api-updates

- Importance of Vector Search / Semantic Search


<img src=images/agentic-rag.png width=700>

- https://vectorize.io/how-i-finally-got-agentic-rag-to-work-right

__Resources__
- https://help.openai.com/en/articles/8868588-retrieval-augmented-generation-rag-and-semantic-search-for-gpts
- https://cloud.google.com/use-cases/retrieval-augmented-generation
- https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation
- https://en.wikipedia.org/wiki/Retrieval-augmented_generation

## 🚀 Gen AI Applications

<img src=images/genai-apps.png>

- AlphaFold Protein Structure Database https://alphafold.ebi.ac.uk
    - https://www.youtube.com/watch?v=P_fHJIYENdI The Most Useful Thing AI Has Ever Done

# ⏭️ Where To Next?

## 🧠 RAG: Where we are

<img src=images/sell-me-this-pen.jpg width=400>
<img src=images/ai-startup.webp width=400>

## 💡 Where to next

<img src=images/rag-overview.png width=1200>

<img src=images/lego-pieces-basic.JPG>

- Foundation Models
- Grounding
- Function Calling
- RAG; Vector Database, Embedding Models

<img src=images/sk-arch-1.png width=800><br>
<img src=images/sk-arch-2.png width=800>

- https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent
- https://learn.microsoft.com/en-us/semantic-kernel/frameworks/process/process-framework ⚠️

- https://learn.microsoft.com/en-us/dotnet/ai/tutorials/llm-eval
- https://learn.microsoft.com/en-us/shows/mr-maedas-cozy-ai-kitchen 👈
    - https://www.youtube.com/playlist?list=PLlrxD0HtieHjHoXHYSiSvpTp_sE5JhNEE

- https://azure.github.io/ai-app-templates 👈

<img src=images/langchain-arch.webp width=800>

## 🎈 Software Engineering Spin

### Semantic Kernel Plugins

- Semantic Kernel Search Plugin using Vector Database
    - https://learn.microsoft.com/en-us/semantic-kernel/concepts/text-search/text-search-plugins
    - https://learn.microsoft.com/en-us/semantic-kernel/concepts/text-search/text-search-vector-stores

<img src=images/sk-search.png>

### Multi Agents

We explored "AutoGen" which is a Microsoft Research project for Agentic AI scenarios
    - They have recently revamped it and v0.4 is out but the new version is Python for now and .NET version will arrive soon
- Some elements of it are now in Semantic Kernel already; there are some nice higher order concepts; we can define Agents in YAML files, how multiple agents can participate in a single chat etc; unfortunately its still "Preview" and works best with OpenAI; i tried it with Ollama / Lllama 3.2 and was able to figure out few initial issues.
 
- https://devblogs.microsoft.com/semantic-kernel/empowering-ai-agents-with-tools-via-openapi-a-hands-on-guide-with-microsoft-semantic-kernel-agents
- https://devblogs.microsoft.com/semantic-kernel/guest-blog-building-multi-agent-systems-with-multi-models-in-semantic-kernel-part-1/
- https://devblogs.microsoft.com/all-things-azure/agentic-philosophers
    - https://github.com/microsoft/all-things-azure/tree/main/agentic-philosophers 👈
    - https://github.com/khurram-uworx/dotnetdebrief/blob/main/src/ChatBots/AgentDebate.cs
    - Multi-agent chat
    - Chat with documents
    - LLM Functions/Inference

<img src=images/agentic-philosphers.png>

- https://www.developerscantina.com/p/semantic-kernel-multiagents/
    - https://github.com/qmatteoq/SemanticKernel-Demos/tree/main/SemanticKernel.Agents

- https://devblogs.microsoft.com/semantic-kernel/guest-blog-creative-writing-assistant-a-multi-agent-app-sample-with-semantic-kernel-net-aspire
    - https://github.com/Azure-Samples/contoso-creative-writer
    - https://github.com/Azure-Samples/aspire-semantic-kernel-creative-writer
- https://cookbook.openai.com/examples/orchestrating_agents
    - https://github.com/openai/swarm

__Additional Resources__
- https://learn.microsoft.com/en-us/azure/ai-services/agents/overview
- https://devblogs.microsoft.com/all-things-azure/how-to-develop-ai-apps-and-agents-in-azure-a-visual-guide

### Process Framework

- https://learn.microsoft.com/en-us/semantic-kernel/frameworks/process/process-framework
    - https://github.com/microsoft/semantic-kernel/tree/main/dotnet/samples/GettingStartedWithProcesses
    - https://devblogs.microsoft.com/semantic-kernel/integrating-ai-into-business-processes-with-the-process-framework

## 🎈 GraphRAG

- Scattered information
    - Relevant Chunks all over the information
- Noise
- Context Loss
- Scalability

- https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data
- https://microsoft.github.io/graphrag
    - https://github.com/microsoft/graphrag

__C#__
- https://jasonhaley.com/2024/08/06/study-notes-graph-rag1-code-sample-notebook
    - https://github.com/JasonHaley/semantic-kernel-getting-started/blob/main/notebooks/1_PropertyGraphRAG.ipynb
    - https://github.com/JasonHaley/semantic-kernel-getting-started

## 🎈 Struct RAG

- https://arxiv.org/pdf/2410.08815

<img src=images/ragstruct-overview.png width=1200>

- https://github.com/kbeaugrand/KernelMemory.StructRAG

## 🎈 Beyond API Calls & Chatbots

<img src=images/this-is-not-enough.jpg>

<img src=images/useful-ai-webapps.png width=1000>

https://www.youtube.com/watch?v=TSNAvFJoP4M
- How to add genuinely useful AI to your webapp (not just chatbots) - Steve Sanderson
- __NDC Conferences__ 2024 July

# 🔢 ML

<img src=images/ai-ml-dl.png width=500>

Machine Learning (ML) is __a subset of artificial intelligence__ (AI) that focuses on the development of __algorithms__ that allow systems to __learn from data__, identify patterns, and make decisions with __minimal human intervention__
- It involves creating __mathematical models__ that can improve their performance as they are exposed to more data over time, effectively adapting and __improving without being explicitly programmed__ for each task

- Temperature Monitoring / Loadshedding
- User's Behavior
    - Fair Use
    - Planning to Leave
- Predictive maintenance system

__Software Engineer's View__
- Data Collection and Preprocessing: The system needs __clean, structured, and high-quality__ data. This involves gathering relevant data from sensors, databases, or other sources, followed by preprocessing steps like normalization, missing value imputation, and feature engineering
- Model Selection and Training: Different types of __machine learning models__, such as regression, classification, or clustering algorithms, need to be selected based on the nature of the problem. The model is __trained__ using historical data, and its performance is __evaluated__ using metrics such as accuracy, precision, recall, or F1 score
- Model Deployment: Once trained, the model must be __integrated into the software ecosystem for real-time use__. This could involve deploying the model to the cloud, edge devices, or even embedded systems, depending on the requirements
- Monitoring and Maintenance: ML models can degrade over time as the data distribution changes, leading to a phenomenon called __model drift__. Regular monitoring and retraining of the model with new data are essential to __maintain the system’s effectiveness__
- Scalability and Optimization: Machine learning systems often need to handle large volumes of data in real time, which requires __careful engineering to ensure scalability__. Optimizations such as model compression, parallelization, or distributed computing may be necessary

<img src=images/ml-ai-roles.png>

- https://mbmlbook.com "Model-Based Machine Learning" is a comprehensive online book that introduces readers to building machine learning models for real-world problems; Pragmatic Programmer style

- Domain Expert
- Security; data privacy
- Ethical considerations
- Bias in models, transparency of decision-making

# 📚 Resources

<img src=images/ml-models-everywhere.png>

- https://huggingface.co/models
- https://onnx.ai/models
- https://ollama.com/library
- https://ai.azure.com/explore/models
- https://aws.amazon.com/marketplace/solutions/machine-learning/pre-trained-models
- https://hub.docker.com/catalogs/gen-ai