# üß† AI and Generative AI

<img src=images/suno.jpeg height=400><br>

<img src=images/neither-a-nor-i.webp><br>

<img src=images/ai-ml-dl.png height=400>

## LLMs and SLMs

- Size and Complexity
    - More Parameters: Higher Capacity (**Maturity**)
        - Generalization vs. Specialization
        - Reasoning Depth
        - Memory & Knowledge Storage
        - Compute & Cost Implications
    - Training Data (**Experience**)
        - Diversity & Volume
        - Domain-Specific vs. General Data
        - Recency & Relevance
    - Architecture (**Formal Structure**)
        - Model Design
        - Optimization Techniques
        - Fine-Tuning & Adaptability
    - Context Length | Tool Use | Fine Tuning 
- Resource Efficiency
    - Quantization
        - FP32, FP16/BF16, INT8, INT4, INT2
        - Memory footprint
        - Inference Speed/Energy
        - Quality | Robustness | Stability
- Knowledge Scope
    - General Purpose vs. Domain Specific
    - Textbooks are all you need
    - Post Training Quantization (PTQ) vs. Quantization Aware Training (QAT)
- Performance
    - The Engine
        - PyTorch | TensorFlow | ONNX Runtime | TensorRT | llama.cpp
        - Same model / same hardware: 2-10x speed difference
- Use Cases

## The Science of Language Models

- https://en.wikipedia.org/wiki/Attention_Is_All_You_Need
- https://cohere.com/llmu/what-are-transformer-models
- https://www.youtube.com/watch?v=zjkBMFhNj_g [1hr Talk] Intro to Large Language Models

<img src=images/transformer-architecture.webp width=700>

# üöÄ Inference

## üß© Integration

- Web APIs
    - OpenAI | Azure OpenAI Service
        - JSON based request/response
        - Clear endpoints for chat completition, embeddings, audio and image generation
    - Anthropic, Cohere and others vendor APIs
    - Hugging Face and other model hosters Inference API

### üêç Python

In [None]:
#!connect jupyter --kernel-name pythonkernel --kernel-spec python3

In [None]:
!pip install openai

In [None]:
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1/",
    api_key="ollama" # ignored, but required by SDK
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "what is the area of Pakistan?",
        }
    ],
    model="llama3.2:1b"
)

print(chat_completion.choices[0].message.content)

### #Ô∏è‚É£ C#

In [None]:
#r "nuget: OpenAI"

In [None]:
using OpenAI;
using OpenAI.Chat;
using System.ClientModel;

var client = new OpenAIClient(new ApiKeyCredential("ollama"), new OpenAIClientOptions
{
    Endpoint = new Uri("http://localhost:11434/v1/")
});
var chat = client.GetChatClient("llama3.2:1b");
var response = await chat.CompleteChatAsync([
    new UserChatMessage("what is the area of Pakistan?")
]);

response.Value.Content[0].Text

<img src=images/the-curly-languages.jpg height=700>

## üß† Model Runners

- Ollama
    - Llama.cpp => CUDA | ROCm | Vulkan | Metal (Apple) | CPU (AVX/NEON)
- AI Toolkit | Foundry
    - Windows ML => Onnx Execution Providers
    - GPU (CUDA, TensorRT, ROCm, QNN, OpenVINO) | NPU (OpenVINO) | CPU (oneDNN)
- LM Studio
    - Llama.cpp
- Docker
- vLLM
    - PyTorch / Triton (CUDA)

- https://github.com/dotnet/ai-samples/blob/main/src/mlnet-gen-ai üëà
    - https://github.com/dotnet/ai-samples/blob/main/src/mlnet-gen-ai/LLaMA/LLaMA.csproj
        - Tensorflow | Safetensors (Hugging Face)
- https://github.com/dotnet/machinelearning/tree/main/src/Microsoft.ML.GenAI.Phi üëà
- https://github.com/dotnet/machinelearning/tree/main/src/Microsoft.ML.GenAI.LLaMA

- Videos üëà
    - NPU | GPU

# ‚ö° Function Calling

- https://openai.com/index/introducing-structured-outputs-in-the-api
- https://platform.openai.com/docs/guides/function-calling
    - https://openai.com/index/function-calling-and-other-api-updates

<img src=images/model-function-calling.jpg width=800><br>

<img src=images/model-inference.png><br>

- Gen AI / Chat Bots / OpenAITools üëà

<img src=images/slm-tools-1.png><br>
<img src=images/slm-tools-2.png>

# üìö RAG

## üî¢ Embeddings

<img src=images/embeddings-3d.png width=800>

- https://en.wikipedia.org/wiki/Word_embedding

In [None]:
!pip install ollama
!pip install numpy

In [None]:
import ollama
import numpy as np
from numpy.linalg import norm

e1 = ollama.embeddings(
  model='all-minilm',
  prompt='A man is eating food',

)['embedding'] # ollama.embeddings return a dictionary; embedding key is the vector
e2 = ollama.embeddings(
  model='all-minilm',
  prompt='A man is eating pasta',

)['embedding']
e3 = ollama.embeddings(
  model='all-minilm',
  prompt='A man is riding a horse',
  
)['embedding']

a = np.array(e1) # converting python array to numpy array
b = np.array(e2)
c = np.array(e3)
 
# cosine similarities
cosineAA = np.dot(a, a)/(norm(a) * norm(a))
cosineAB = np.dot(a, b)/(norm(a) * norm(b))
cosineBC = np.dot(b, c)/(norm(b) * norm(c))

print("Cosine Similarities:", cosineAA, cosineAB, cosineBC)

## üóÉÔ∏è Vector Database

### Overview / Chroma

In [None]:
!pip install chromadb

In [None]:
import ollama
import chromadb

documents = [
  "Llamas are members of the camelid family meaning they're pretty closely related to vicu√±as and camels",
  "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
  "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
  "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
  "Llamas are vegetarians and have very efficient digestive systems",
  "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old",
]

client = chromadb.Client()
collection = client.create_collection(name="docs")

# store each document in a vector embedding database
for i, d in enumerate(documents):
  response = ollama.embeddings(model="all-minilm", prompt=d)
  embedding = response["embedding"]
  collection.add(
    ids=[str(i)],
    embeddings=[embedding],
    documents=[d]
  )

  # an example prompt
prompt = "What animals are llamas related to?"

# generate an embedding for the prompt and retrieve the most relevant doc
response = ollama.embeddings(
  prompt=prompt,
  model="all-minilm"
)
results = collection.query(
  query_embeddings=[response["embedding"]],
  n_results=1
)
data = results['documents'][0][0]

# generate a response combining the prompt and data we retrieved in step 2
output = ollama.generate(
  model="llama3.2:1b",
  prompt=f"Using this data: {data}. Respond to this prompt: {prompt}"
)

print(output['response'])

### üßÆ Qdrant and Pgvector

- https://qdrant.tech/documentation/quickstart üëà
- https://github.com/pgvector/pgvector

To quickly have pgvector; use Docker and run it using one of the following command
- docker run -e POSTGRES_PASSWORD=uworx -p 5432:5432 -d ankane/pgvector:latest
- docker run -e POSTGRES_PASSWORD=uworx -p 5432:5432 -d pgvector/pgvector:pg17 üëà

In [None]:
create extension vector;
select * from pg_extension;

create table Items (Id bigserial primary key, embedding vector(3));
insert into Items (embedding) values ('[1, 2, 3]'), ('[4, 5, 6]');

select * from Items order by embedding <-> '[3,1,2]' limit 5;
/*
    <-> is L2 distance, <#> is inner product, <=> is cosine distance
*/

## RAG++

- CAG: Cache Augmented Generation
- GraphRAG
- Struct RAG

<img src=images/graph-rag-code-translation.png width=1000>

# ü§ñ Chatbots

## Overview

<img src=images/rag.png width=800><br>
<img src=images/rag-localdb.avif width=800><br>
<img src=images/rag-use-cases.png width=800>

- AutoGen | **Semantic Kernel** > **Agent Framework**
    - Microsoft Bot Framework
    - Teams SDK | **M365 Agent SDK** | Copilot SDK
- LangChain
- LlamaIndex

## .NET üíñ AI

- Microsoft.Extensions.AI
    - IChatClient, IEmbeddingGenertor and IImageGenerator‚Äã
    - OpenAI, Anthropic, Gemini and others‚Äã
    - ML.NET (Onnx) and LlamaSharp offers IChatClient‚Äã
    - OllamaSharp for local/on-prem inference (Foundry Local)‚Äã
    - Prompt/Response Interception, Rate Limiting, Retries‚Äã
    - Telemetry, Cached Client, Evaluations‚Äã
    - src / Chat Bots / AIExtensionTools üëà
    - **Resources**
        - https://devblogs.microsoft.com/dotnet/introducing-microsoft-extensions-ai-preview
        - https://devblogs.microsoft.com/semantic-kernel/microsoft-extensions-ai-simplifying-ai-integration-for-net-partners
        - https://learn.microsoft.com/en-us/dotnet/ai/ai-extensions üëà
        - https://github.com/dotnet/ai-samples/tree/main/src/microsoft-extensions-ai üëà
        - https://devblogs.microsoft.com/dotnet/e-shop-infused-with-ai-comprehensive-intelligent-dotnet-app-sample üëà
- Microsoft.Extensions.VectorData
    - Azure AI Search, Cosmos DB
    - In Memory, Volatile (In Memory)
    - Elasticsearch, MongoDB, Pinecone, Qdrant, Redis, SQLLite, Weaviate
    - Chroma, Milvis, Postgres, Sql Server coming soon
    - **Resources**
        - https://devblogs.microsoft.com/dotnet/introducing-pinecone-dotnet-sdk üëà
        - https://devblogs.microsoft.com/dotnet/vector-data-qdrant-ai-search-dotnet üëà
        - https://devblogs.microsoft.com/dotnet/introducing-microsoft-extensions-vector-data
        - https://devblogs.microsoft.com/semantic-kernel/microsoft-extensions-vectordata-abstractions-now-available
        - https://learn.microsoft.com/en-us/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors
- Microsoft Agent Framework
    - Semantic Kernel
    - AutoGen

<img src=images/sk-tool-km.png>

- https://devblogs.microsoft.com/dotnet/e-shop-infused-with-ai-comprehensive-intelligent-dotnet-app-sample
- https://devblogs.microsoft.com/dotnet/local-ai-models-with-dotnet-aspire

- https://learn.microsoft.com/en-us/dotnet/ai/conceptual/data-ingestion
    - https://devblogs.microsoft.com/dotnet/introducing-data-ingestion-building-blocks-preview
- https://learn.microsoft.com/en-us/dotnet/ai/quickstarts/ai-templates
    - src / Ai Chat Web üëà

# üß† Attention is all you need

| **Year** | **Milestone / Developer Impact**                                                  |
| -------- | --------------------------------------------------------------------------------- |
| **2017** | Attention Is All You Need (Transformer)                                           |
| **2018** | BERT; GPT-1                                                                       |
| **2019** | GPT-2                                                                             |
| **2020** | GPT-3                                                                             |
| **2022** | ChatGPT launch                                                                    |
| **2023** | Function Calling in OpenAI API                                                    |
| **2024** | GPT-4o multimodal; Structured Outputs API                                         |
| **2025** | Responses API; MCP standard; ChatGPT Agents; GPT-4.1 / GPT-4.5; GPT-5.1 / GPT-5.2 |


<img src=images/sell-me-this-pen.jpg height=500>
<img src=images/ai-startup.webp height=500>


# ü§ñ Agents

<img src=images/this-is-not-enough.jpg>

## Model Context Protocol

- https://www.anthropic.com/news/model-context-protocol
- https://github.com/modelcontextprotocol
    - https://github.com/modelcontextprotocol/csharp-sdk
    - https://github.com/modelcontextprotocol/servers

<img src=images/mcp2.png height=400><br>
<img src=images/mcp.png>

## ü§ñ Agents

<img src=images/agents.png width=800>

- https://devblogs.microsoft.com/semantic-kernel/customer-case-story-creating-a-semantic-kernel-agent-for-automated-github-code-reviews

- https://www.anthropic.com/news/3-5-models-and-computer-use
- src / ComputerUse üëà

<img src=images/computer-use-chat-loop.png width=800>

- src / ChatBots / Agents.cs üëà
- https://github.com/microsoft/Agent-Framework-Samples

**CLI Agents**
- Copilot
    - https://www.youtube.com/watch?v=UMz8aQ4lOtE Demo: Using GitHub Copilot CLI and yolo mode
        - WorkIQ üëà
    - https://github.blog/ai-and-ml/github-copilot/power-agentic-workflows-in-your-terminal-with-github-copilot-cli
    - https://github.blog/news-insights/company-news/build-an-agent-into-any-app-with-the-github-copilot-sdk
- Rovo
    - Pictures

<img src=images/ai-protocol-stack.avif>

- https://www.youtube.com/watch?v=4CrxcdNbRFY
    - 45mins,  ASP.NET Community Standup - Build agentic UI with AG-UI and Blazor
    - https://learn.microsoft.com/en-us/agent-framework/overview/agent-framework-overview
    - https://learn.microsoft.com/en-us/agent-framework/integrations/ag-ui
    - https://docs.copilotkit.ai/microsoft-agent-framework

<img src=images/gen-ai.png>

- https://openai.com/index/introducing-apps-in-chatgpt üöÄ
- https://blog.modelcontextprotocol.io/posts/2026-01-26-mcp-apps üöÄ