---
## 1. Architecture Overview

The voice chat application uses a **three-tier architecture**:

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     WebSocket     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     WebSocket     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ   Browser UI    ‚îÇ ‚óÑ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫ ‚îÇ  Backend Server  ‚îÇ ‚óÑ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫ ‚îÇ Azure OpenAI        ‚îÇ
‚îÇ  (JavaScript)   ‚îÇ                  ‚îÇ   (ASP.NET Core) ‚îÇ                  ‚îÇ Realtime API        ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò                  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò                  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        ‚îÇ                                    ‚îÇ                                      ‚îÇ
        ‚îÇ  ‚Ä¢ Captures microphone             ‚îÇ  ‚Ä¢ HttpContext.WebSockets            ‚îÇ  ‚Ä¢ Speech-to-Text
        ‚îÇ  ‚Ä¢ Plays audio response            ‚îÇ  ‚Ä¢ ClientWebSocket                   ‚îÇ  ‚Ä¢ LLM Processing
        ‚îÇ  ‚Ä¢ Manages UI state                ‚îÇ  ‚Ä¢ Bidirectional proxy               ‚îÇ  ‚Ä¢ Text-to-Speech
        ‚îÇ                                    ‚îÇ  ‚Ä¢ DI & Configuration                ‚îÇ
```

### Why ASP.NET Core?

1. **Built-in WebSocket Support**: First-class WebSocket handling via middleware
2. **Dependency Injection**: Clean service architecture
3. **Configuration System**: Options pattern for settings
4. **Performance**: Excellent async/await support for I/O operations

---
## 2. WebSocket Fundamentals

### What is a WebSocket?

WebSocket is a **bidirectional, full-duplex communication protocol** over a single TCP connection. Unlike HTTP (request-response), WebSockets allow both client and server to send messages at any time.

### .NET WebSocket Classes

| Class | Purpose |
|-------|--------|
| `System.Net.WebSockets.WebSocket` | Base abstract class |
| `ClientWebSocket` | Connect to external WebSocket servers |
| `HttpContext.WebSockets` | Accept incoming WebSocket connections |

### WebSocket States in .NET

```csharp
public enum WebSocketState
{
    None = 0,
    Connecting = 1,
    Open = 2,
    CloseSent = 3,
    CloseReceived = 4,
    Closed = 5,
    Aborted = 6
}
```

In [None]:
// WebSocket State Machine Demonstration
using System;
using System.Net.WebSockets;

// Simulating WebSocket state transitions
public class WebSocketStateDemo
{
    public static void ShowStateTransitions()
    {
        Console.WriteLine("WebSocket State Transitions:");
        Console.WriteLine(new string('=', 50));
        
        var states = new[]
        {
            (WebSocketState.None, "Initial state"),
            (WebSocketState.Connecting, "Handshake in progress"),
            (WebSocketState.Open, "Connection established - can send/receive"),
            (WebSocketState.CloseSent, "Close frame sent, waiting for response"),
            (WebSocketState.CloseReceived, "Close frame received"),
            (WebSocketState.Closed, "Connection closed gracefully"),
            (WebSocketState.Aborted, "Connection terminated unexpectedly")
        };
        
        foreach (var (state, description) in states)
        {
            Console.WriteLine($"{state,-15} ‚Üí {description}");
        }
    }
}

WebSocketStateDemo.ShowStateTransitions();

---
## 3. Azure OpenAI Realtime API

### What is the Realtime API?

Azure OpenAI's **Realtime API** provides:
- üé§ **Speech-to-Text**: Transcribes audio in real-time
- üß† **LLM Processing**: Generates intelligent responses
- üîä **Text-to-Speech**: Converts response to natural speech

All in a **single WebSocket connection** with sub-second latency!

### API Endpoint Structure

```
wss://{endpoint}/openai/realtime
    ?api-version={version}
    &deployment={deployment-name}
    &api-key={your-api-key}
```

In [None]:
// Building the Azure Realtime API URL
using System;

public class AzureRealtimeUrlBuilder
{
    /// <summary>
    /// Build the WebSocket URL for Azure OpenAI Realtime API.
    /// </summary>
    public static string BuildUrl(
        string endpoint,
        string deployment,
        string apiKey,
        string apiVersion = "2024-10-01-preview")
    {
        // Convert HTTPS to WSS (secure WebSocket)
        var wsEndpoint = endpoint
            .Replace("https://", "wss://")
            .TrimEnd('/');
        
        return $"{wsEndpoint}/openai/realtime" +
               $"?api-version={apiVersion}" +
               $"&deployment={deployment}" +
               $"&api-key={apiKey}";
    }
    
    /// <summary>
    /// Get a safe version of the URL for logging (without API key).
    /// </summary>
    public static string GetSafeUrl(string fullUrl)
    {
        var keyIndex = fullUrl.IndexOf("&api-key=");
        return keyIndex > 0 
            ? fullUrl.Substring(0, keyIndex) + "&api-key=***" 
            : fullUrl;
    }
}

// Example
var url = AzureRealtimeUrlBuilder.BuildUrl(
    endpoint: "https://my-openai.openai.azure.com",
    deployment: "gpt-4o-realtime",
    apiKey: "abc123xyz"
);

Console.WriteLine("Azure Realtime API URL Structure:");
Console.WriteLine(new string('=', 50));
Console.WriteLine(AzureRealtimeUrlBuilder.GetSafeUrl(url));

---
## 4. Audio Encoding & Processing

### Audio Format Requirements

Azure OpenAI Realtime API expects audio in specific formats:

| Property | Value |
|----------|-------|
| Format | PCM (Pulse Code Modulation) |
| Sample Rate | 24000 Hz (24 kHz) |
| Bit Depth | 16-bit (Int16) |
| Channels | Mono (1 channel) |
| Encoding | Base64 (for JSON messages) |

### Audio Pipeline

```
Browser               .NET Server           Azure
   ‚îÇ                       ‚îÇ                   ‚îÇ
   ‚îÇ Float32 ‚Üí Int16       ‚îÇ                   ‚îÇ
   ‚îÇ Base64 encode         ‚îÇ                   ‚îÇ
   ‚ñº                       ‚îÇ                   ‚îÇ
JSON message ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫‚îÇ                   ‚îÇ
                           ‚îÇ Forward as-is     ‚îÇ
                           ‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫‚îÇ
                           ‚îÇ                   ‚îÇ
                           ‚îÇ‚óÑ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
                           ‚îÇ Forward response  ‚îÇ
‚óÑ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§                   ‚îÇ
Decode & play              ‚îÇ                   ‚îÇ
```

---
## 4a. Microsoft Agent Framework (Text Mode)

### What is Microsoft Agent Framework?

The **Microsoft Agent Framework** provides a unified way to build AI agents across .NET and Python:

- ü§ñ **AIAgent**: High-level abstraction for chat-based AI interactions
- üßµ **AgentSession**: Manages conversation history across multiple turns (RC1)
- üîß **Tool Support**: Native functions, OpenAPI, and MCP (Model Context Protocol)
- ‚òÅÔ∏è **Multi-Provider**: Azure OpenAI, OpenAI, Microsoft Foundry, and more

### Why Use Agent Framework?

| Feature | Direct API Calls | Agent Framework |
|---------|-----------------|-----------------|
| Conversation Memory | Manual management | Built-in sessions |
| Tool/Function Calling | Complex setup | Declarative |
| Multi-turn Context | Implement yourself | Automatic |
| Streaming Responses | Manual parsing | Built-in support |
| Cross-platform | Separate implementations | Same patterns (.NET & Python) |

### Package Installation (.NET)

```xml
<PackageReference Include="Microsoft.Agents.AI.AzureAI" Version="*-*" />
<PackageReference Include="Microsoft.Agents.AI.OpenAI" Version="*-*" />
<PackageReference Include="Microsoft.Extensions.AI" Version="*-*" />
```

> **Note**: The `--prerelease` flag or `*-*` version is required while Agent Framework is in preview.

In [None]:
// Microsoft Agent Framework - AIAgent Pattern (as used in AzureChatService)
using System;

// This demonstrates the AIAgent pattern used in the Voice Chat backend
// Actual implementation requires the Microsoft.Agents.AI NuGet packages

var agentServiceCode = @"
using Azure;
using Azure.AI.OpenAI;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;

public class AzureChatService
{
    private readonly AIAgent? _agent;
    
    public AzureChatService(IOptions<AzureOpenAISettings> settings, ILoggerFactory loggerFactory)
    {
        var config = settings.Value;
        
        // Initialize the AIAgent with Azure OpenAI
        if (!string.IsNullOrEmpty(config.Endpoint) && !string.IsNullOrEmpty(config.ApiKey))
        {
            var client = new AzureOpenAIClient(
                new Uri(config.Endpoint),
                new AzureKeyCredential(config.ApiKey));

            var chatClient = client.GetChatClient(config.ChatDeployment);

            // Create AIAgent using extension method (RC1)
            _agent = chatClient.AsAIAgent(
                name: ""ChatAssistant"",
                instructions: ""You are a helpful assistant. Respond naturally and concisely."",
                description: ""A helpful chat assistant powered by Azure OpenAI"",
                loggerFactory: loggerFactory);
        }
    }

    public async Task HandleTextSession(WebSocket clientWs, string sessionId)
    {
        // Create a new session for this conversation (RC1)
        var session = await _agent.CreateSessionAsync();
        
        // ... receive user message ...
        
        // Run the agent with the user's message
        var response = await _agent!.RunAsync(
            messages: [new ChatMessage(ChatRole.User, userMessage)],
            session: session);

        var responseText = response.Text ?? string.Empty;
        
        // Session maintains conversation history across multiple turns
    }
}
";

Console.WriteLine("AIAgent Pattern - AzureChatService:");
Console.WriteLine(new string('=', 60));
Console.WriteLine(agentServiceCode);

In [None]:
// Agent Session for Multi-turn Conversations (RC1)
using System;

// AgentSession maintains conversation context across multiple user interactions
// This is the key benefit over simple API calls

var sessionPatternCode = @"
// Multi-turn conversation using AgentSession (RC1)
public async Task DemoMultiTurnConversation(AIAgent agent)
{
    // Create a new session for conversation history
    var session = await agent.CreateSessionAsync();
    
    // First turn
    var response1 = await agent.RunAsync(
        messages: [new ChatMessage(ChatRole.User, ""My name is Alice"")],
        session: session);
    Console.WriteLine($""Agent: {response1.Text}"");
    // Agent: ""Nice to meet you, Alice!""
    
    // Second turn - agent remembers the context
    var response2 = await agent.RunAsync(
        messages: [new ChatMessage(ChatRole.User, ""What is my name?"")],
        session: session);
    Console.WriteLine($""Agent: {response2.Text}"");
    // Agent: ""Your name is Alice!""
    
    // Without session, each call would be independent
    // The session automatically manages the conversation history
}

// Streaming responses for real-time UI updates
public async Task DemoStreamingResponse(AIAgent agent, AgentSession session, string userMessage)
{
    Console.Write(""Agent: "");
    
    await foreach (var chunk in agent.RunStreamAsync(
        messages: [new ChatMessage(ChatRole.User, userMessage)],
        session: session))
    {
        if (!string.IsNullOrEmpty(chunk.Text))
        {
            Console.Write(chunk.Text);  // Print as tokens arrive
        }
    }
    
    Console.WriteLine();  // Final newline
}
";

Console.WriteLine("AgentSession - Multi-turn Conversations (RC1):");
Console.WriteLine(new string('=', 60));
Console.WriteLine(sessionPatternCode);

In [None]:
// Audio Processing Concepts in C#
using System;
using System.Linq;

public static class AudioProcessor
{
    /// <summary>
    /// Convert float32 audio samples (-1.0 to 1.0) to Int16 PCM.
    /// This is what happens in the browser's AudioWorklet.
    /// </summary>
    public static byte[] Float32ToInt16Pcm(float[] samples)
    {
        var pcmBytes = new byte[samples.Length * 2]; // 2 bytes per Int16
        
        for (int i = 0; i < samples.Length; i++)
        {
            // Clamp to valid range
            var clamped = Math.Max(-1.0f, Math.Min(1.0f, samples[i]));
            
            // Scale to Int16 range (-32768 to 32767)
            var int16Value = (short)(clamped * 32767);
            
            // Write as little-endian
            pcmBytes[i * 2] = (byte)(int16Value & 0xFF);
            pcmBytes[i * 2 + 1] = (byte)((int16Value >> 8) & 0xFF);
        }
        
        return pcmBytes;
    }
    
    /// <summary>
    /// Convert PCM bytes to Base64 for JSON transport.
    /// </summary>
    public static string PcmToBase64(byte[] pcmBytes) 
        => Convert.ToBase64String(pcmBytes);
    
    /// <summary>
    /// Decode Base64 audio back to PCM bytes.
    /// </summary>
    public static byte[] Base64ToPcm(string base64Audio) 
        => Convert.FromBase64String(base64Audio);
}

// Demo: Generate a sine wave
const int sampleRate = 24000;  // 24 kHz as required by Azure
const int frequency = 440;     // Hz (A4 note)
const double duration = 0.01;  // 10 milliseconds

int numSamples = (int)(sampleRate * duration);
var sineWave = Enumerable.Range(0, numSamples)
    .Select(i => (float)Math.Sin(2 * Math.PI * frequency * i / sampleRate))
    .ToArray();

// Convert to Int16 PCM
var pcmData = AudioProcessor.Float32ToInt16Pcm(sineWave);

// Convert to Base64 for JSON transport
var base64Audio = AudioProcessor.PcmToBase64(pcmData);

Console.WriteLine("Audio Encoding Example (440 Hz sine wave, 10ms):");
Console.WriteLine(new string('=', 50));
Console.WriteLine($"Sample rate: {sampleRate} Hz");
Console.WriteLine($"Number of samples: {numSamples}");
Console.WriteLine($"PCM bytes: {pcmData.Length} bytes");
Console.WriteLine($"Base64 length: {base64Audio.Length} characters");
Console.WriteLine($"\nBase64 preview: {base64Audio.Substring(0, Math.Min(50, base64Audio.Length))}...");

---
## 5. ASP.NET Core WebSocket Handling

### WebSocket Middleware Setup

ASP.NET Core provides built-in WebSocket support through middleware:

```csharp
// In Program.cs
var app = builder.Build();

// Enable WebSocket middleware
app.UseWebSockets(new WebSocketOptions
{
    KeepAliveInterval = TimeSpan.FromSeconds(30)
});

// Handle WebSocket requests
app.Map("/ws/voice", async context =>
{
    if (context.WebSockets.IsWebSocketRequest)
    {
        var webSocket = await context.WebSockets.AcceptWebSocketAsync();
        // Handle the connection...
    }
});
```

In [None]:
// ASP.NET Core WebSocket Handler Pattern
using System;
using System.Net.WebSockets;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

/// <summary>
/// Demonstrates the WebSocket handler pattern used in ASP.NET Core.
/// This is conceptual code showing the structure.
/// </summary>
public class WebSocketHandlerPattern
{
    // Buffer sizes for audio data
    public const int BufferSize = 64 * 1024;  // 64KB for audio chunks
    
    public static void ShowReceiverPattern()
    {
        var code = @"
// Pattern: Receiving WebSocket Messages
async Task ReceiveMessages(WebSocket webSocket, CancellationToken ct)
{
    var buffer = new byte[64 * 1024];  // 64KB buffer
    
    while (webSocket.State == WebSocketState.Open && !ct.IsCancellationRequested)
    {
        var result = await webSocket.ReceiveAsync(
            new ArraySegment<byte>(buffer),
            ct);
        
        if (result.MessageType == WebSocketMessageType.Close)
        {
            break;  // Client initiated close
        }
        
        if (result.MessageType == WebSocketMessageType.Text)
        {
            var message = Encoding.UTF8.GetString(buffer, 0, result.Count);
            // Process JSON message...
        }
        else if (result.MessageType == WebSocketMessageType.Binary)
        {
            // Process binary audio data...
        }
    }
}";
        Console.WriteLine(code);
    }
    
    public static void ShowSenderPattern()
    {
        var code = @"
// Pattern: Sending WebSocket Messages
async Task SendMessage(WebSocket webSocket, string message, CancellationToken ct)
{
    if (webSocket.State != WebSocketState.Open)
        return;
    
    var buffer = Encoding.UTF8.GetBytes(message);
    
    await webSocket.SendAsync(
        new ArraySegment<byte>(buffer),
        WebSocketMessageType.Text,
        endOfMessage: true,
        ct);
}";
        Console.WriteLine(code);
    }
}

Console.WriteLine("WebSocket Receiver Pattern:");
Console.WriteLine(new string('=', 50));
WebSocketHandlerPattern.ShowReceiverPattern();

Console.WriteLine("\n\nWebSocket Sender Pattern:");
Console.WriteLine(new string('=', 50));
WebSocketHandlerPattern.ShowSenderPattern();

---
## 6. Session Management

### Why Session Management?

Real-time voice applications need to:
- Track active connections
- Associate users with their sessions
- Clean up resources when connections close
- Implement rate limiting per user

### .NET Session Manager Design

Using `ConcurrentDictionary` for thread-safe operations:

In [None]:
using System;
using System.Collections.Concurrent;
using System.Linq;

/// <summary>
/// Session information for a voice chat connection.
/// </summary>
public class VoiceSession
{
    public string SessionId { get; init; } = Guid.NewGuid().ToString();
    public string UserId { get; init; } = string.Empty;
    public string Mode { get; init; } = "voice";
    public DateTime CreatedAt { get; init; } = DateTime.UtcNow;
    public DateTime LastActivity { get; set; } = DateTime.UtcNow;
    public int MessageCount { get; set; }
}

/// <summary>
/// Thread-safe session manager for voice chat.
/// </summary>
public class SessionManager
{
    private readonly ConcurrentDictionary<string, VoiceSession> _sessions = new();
    private readonly ConcurrentDictionary<string, ConcurrentBag<string>> _userSessions = new();
    
    public const int MaxConnectionsPerUser = 3;
    public static readonly TimeSpan SessionTimeout = TimeSpan.FromMinutes(30);
    
    /// <summary>
    /// Create a new session for a user.
    /// </summary>
    public (bool Success, string SessionId, string Message) CreateSession(string userId, string mode)
    {
        // Check connection limit
        var userSessionBag = _userSessions.GetOrAdd(userId, _ => new ConcurrentBag<string>());
        var activeCount = userSessionBag.Count(sid => _sessions.ContainsKey(sid));
        
        if (activeCount >= MaxConnectionsPerUser)
        {
            return (false, string.Empty, $"Max {MaxConnectionsPerUser} connections exceeded");
        }
        
        var session = new VoiceSession
        {
            UserId = userId,
            Mode = mode
        };
        
        _sessions[session.SessionId] = session;
        userSessionBag.Add(session.SessionId);
        
        return (true, session.SessionId, "Session created");
    }
    
    /// <summary>
    /// Get a session by ID.
    /// </summary>
    public VoiceSession? GetSession(string sessionId)
        => _sessions.TryGetValue(sessionId, out var session) ? session : null;
    
    /// <summary>
    /// Update session activity timestamp.
    /// </summary>
    public void UpdateActivity(string sessionId)
    {
        if (_sessions.TryGetValue(sessionId, out var session))
        {
            session.LastActivity = DateTime.UtcNow;
        }
    }
    
    /// <summary>
    /// Remove a session.
    /// </summary>
    public void RemoveSession(string sessionId)
    {
        _sessions.TryRemove(sessionId, out _);
    }
    
    /// <summary>
    /// Get statistics.
    /// </summary>
    public object GetStats() => new
    {
        TotalSessions = _sessions.Count,
        UniqueUsers = _sessions.Values.Select(s => s.UserId).Distinct().Count(),
        VoiceSessions = _sessions.Values.Count(s => s.Mode == "voice"),
        TextSessions = _sessions.Values.Count(s => s.Mode == "text")
    };
}

// Demo
var manager = new SessionManager();

Console.WriteLine("Session Management Demo:");
Console.WriteLine(new string('=', 50));

// Create sessions
var (s1, id1, msg1) = manager.CreateSession("user-alice", "voice");
var (s2, id2, msg2) = manager.CreateSession("user-alice", "text");
var (s3, id3, msg3) = manager.CreateSession("user-bob", "voice");

Console.WriteLine($"Alice voice session: {id1.Substring(0, 8)}... ({msg1})");
Console.WriteLine($"Alice text session: {id2.Substring(0, 8)}... ({msg2})");
Console.WriteLine($"Bob voice session: {id3.Substring(0, 8)}... ({msg3})");

Console.WriteLine($"\nStatistics: {System.Text.Json.JsonSerializer.Serialize(manager.GetStats())}");

// Cleanup
manager.RemoveSession(id1);
Console.WriteLine($"\nAfter removing Alice's voice session: {System.Text.Json.JsonSerializer.Serialize(manager.GetStats())}");

---
## 7. Message Protocol

### Azure Realtime API Message Types

The Realtime API uses JSON messages for control and base64-encoded audio.

#### Client ‚Üí Azure Messages

| Message Type | Purpose |
|--------------|--------|
| `session.update` | Configure session (voice, instructions) |
| `input_audio_buffer.append` | Send audio chunks |
| `input_audio_buffer.commit` | Commit audio for processing |
| `response.create` | Request a response |

#### Azure ‚Üí Client Messages

| Message Type | Purpose |
|--------------|--------|
| `session.created` | Session initialized |
| `response.audio.delta` | Audio chunk of response |
| `response.audio_transcript.delta` | Transcript of response |
| `response.done` | Response complete |
| `error` | Error occurred |

In [None]:
using System;
using System.Text.Json;
using System.Text.Json.Serialization;

// Message Protocol Examples using System.Text.Json

// Session Update Message
public class SessionUpdateMessage
{
    [JsonPropertyName("type")]
    public string Type { get; set; } = "session.update";
    
    [JsonPropertyName("session")]
    public SessionConfig Session { get; set; } = new();
}

public class SessionConfig
{
    [JsonPropertyName("modalities")]
    public string[] Modalities { get; set; } = { "text", "audio" };
    
    [JsonPropertyName("instructions")]
    public string Instructions { get; set; } = "You are a helpful assistant.";
    
    [JsonPropertyName("voice")]
    public string Voice { get; set; } = "alloy";
    
    [JsonPropertyName("input_audio_format")]
    public string InputAudioFormat { get; set; } = "pcm16";
    
    [JsonPropertyName("output_audio_format")]
    public string OutputAudioFormat { get; set; } = "pcm16";
    
    [JsonPropertyName("turn_detection")]
    public TurnDetectionConfig TurnDetection { get; set; } = new();
}

public class TurnDetectionConfig
{
    [JsonPropertyName("type")]
    public string Type { get; set; } = "server_vad";
    
    [JsonPropertyName("threshold")]
    public double Threshold { get; set; } = 0.5;
    
    [JsonPropertyName("silence_duration_ms")]
    public int SilenceDurationMs { get; set; } = 200;
}

// Audio Buffer Append Message
public class AudioAppendMessage
{
    [JsonPropertyName("type")]
    public string Type { get; set; } = "input_audio_buffer.append";
    
    [JsonPropertyName("audio")]
    public string Audio { get; set; } = string.Empty;  // Base64
}

// Demonstrate serialization
var options = new JsonSerializerOptions { WriteIndented = true };

var sessionUpdate = new SessionUpdateMessage();
var audioAppend = new AudioAppendMessage { Audio = "SGVsbG8gV29ybGQ=" };

Console.WriteLine("Message Protocol Examples (.NET):");
Console.WriteLine(new string('=', 50));

Console.WriteLine("\n1. Session Update Message:");
Console.WriteLine(JsonSerializer.Serialize(sessionUpdate, options));

Console.WriteLine("\n2. Audio Append Message:");
Console.WriteLine(JsonSerializer.Serialize(audioAppend, options));

In [None]:
// Message Parser for incoming Azure messages
using System;
using System.Text.Json;

public static class MessageParser
{
    /// <summary>
    /// Parse an incoming message and extract its type.
    /// </summary>
    public static (string Type, JsonDocument Doc) Parse(string json)
    {
        var doc = JsonDocument.Parse(json);
        var type = doc.RootElement.TryGetProperty("type", out var typeElement)
            ? typeElement.GetString() ?? "unknown"
            : "unknown";
        return (type, doc);
    }
    
    /// <summary>
    /// Check if this is an important event worth logging.
    /// </summary>
    public static bool IsImportantEvent(string type)
    {
        return type is "session.created" 
                    or "session.updated"
                    or "response.created" 
                    or "response.done" 
                    or "error"
                    or "input_audio_buffer.speech_started"
                    or "input_audio_buffer.speech_stopped";
    }
    
    /// <summary>
    /// Check if this is an audio streaming event (high frequency).
    /// </summary>
    public static bool IsAudioEvent(string type)
    {
        return type.StartsWith("response.audio") 
            || type == "input_audio_buffer.append";
    }
}

// Demo parsing
var sampleMessages = new[]
{
    "{\"type\":\"session.created\",\"session\":{\"id\":\"abc123\"}}",
    "{\"type\":\"response.audio.delta\",\"delta\":\"audio_data_here\"}",
    "{\"type\":\"response.done\",\"response\":{\"id\":\"resp_1\"}}",
    "{\"type\":\"error\",\"error\":{\"message\":\"Invalid format\"}}"
};

Console.WriteLine("Message Parsing Demo:");
Console.WriteLine(new string('=', 50));

foreach (var msg in sampleMessages)
{
    var (type, _) = MessageParser.Parse(msg);
    var important = MessageParser.IsImportantEvent(type) ? "[IMPORTANT]" : "";
    var audio = MessageParser.IsAudioEvent(type) ? "[AUDIO]" : "";
    Console.WriteLine($"{type,-35} {important} {audio}");
}

---
## 8. Code Examples

### Complete Bidirectional Proxy Pattern

The core pattern for proxying WebSocket connections between a client and Azure:

In [None]:
// Complete WebSocket Proxy Service Pattern

var proxyServiceCode = @"
using System.Net.WebSockets;
using System.Text;

public class AzureRealtimeService
{
    private readonly AzureOpenAISettings _settings;
    private readonly ILogger<AzureRealtimeService> _logger;

    /// <summary>
    /// Handle a voice session by proxying between client and Azure.
    /// </summary>
    public async Task HandleVoiceSession(WebSocket clientWs, string sessionId)
    {
        var azureWsUrl = BuildAzureRealtimeUrl();
        
        using var azureWs = new ClientWebSocket();
        
        try
        {
            // Connect to Azure Realtime API
            await azureWs.ConnectAsync(new Uri(azureWsUrl), CancellationToken.None);
            _logger.LogInformation('Connected to Azure Realtime API');

            // Create cancellation for coordinated shutdown
            using var cts = new CancellationTokenSource();

            // Start bidirectional proxying
            var clientToAzure = ProxyClientToAzure(clientWs, azureWs, cts.Token);
            var azureToClient = ProxyAzureToClient(azureWs, clientWs, cts.Token);

            // Wait for either direction to complete
            await Task.WhenAny(clientToAzure, azureToClient);
            
            // Cancel the other direction
            cts.Cancel();
            
            // Wait for both to finish
            await Task.WhenAll(clientToAzure, azureToClient);
        }
        catch (WebSocketException ex)
        {
            _logger.LogError(ex, 'WebSocket error in voice session');
            await SendErrorToClient(clientWs, ex.Message);
        }
    }

    /// <summary>
    /// Forward messages from browser client to Azure.
    /// </summary>
    private async Task ProxyClientToAzure(
        WebSocket clientWs, 
        WebSocket azureWs, 
        CancellationToken ct)
    {
        var buffer = new byte[64 * 1024];  // 64KB buffer

        while (clientWs.State == WebSocketState.Open && 
               azureWs.State == WebSocketState.Open &&
               !ct.IsCancellationRequested)
        {
            var result = await clientWs.ReceiveAsync(
                new ArraySegment<byte>(buffer), ct);

            if (result.MessageType == WebSocketMessageType.Close)
                break;

            // Forward to Azure exactly as received
            await azureWs.SendAsync(
                new ArraySegment<byte>(buffer, 0, result.Count),
                result.MessageType,
                result.EndOfMessage,
                ct);
        }
    }

    /// <summary>
    /// Forward messages from Azure back to browser client.
    /// </summary>
    private async Task ProxyAzureToClient(
        WebSocket azureWs, 
        WebSocket clientWs, 
        CancellationToken ct)
    {
        var buffer = new byte[64 * 1024];  // 64KB buffer

        while (azureWs.State == WebSocketState.Open && 
               clientWs.State == WebSocketState.Open &&
               !ct.IsCancellationRequested)
        {
            var result = await azureWs.ReceiveAsync(
                new ArraySegment<byte>(buffer), ct);

            if (result.MessageType == WebSocketMessageType.Close)
                break;

            // Forward to client exactly as received
            await clientWs.SendAsync(
                new ArraySegment<byte>(buffer, 0, result.Count),
                result.MessageType,
                result.EndOfMessage,
                ct);
        }
    }

    private string BuildAzureRealtimeUrl()
    {
        var wsEndpoint = _settings.Endpoint
            .Replace('https://', 'wss://')
            .TrimEnd('/');
            
        return $'{wsEndpoint}/openai/realtime' +
               $'?api-version={_settings.ApiVersion}' +
               $'&deployment={_settings.Deployment}' +
               $'&api-key={_settings.ApiKey}';
    }
}
";

Console.WriteLine("Complete Bidirectional Proxy Service:");
Console.WriteLine(new string('=', 50));
Console.WriteLine(proxyServiceCode);

In [None]:
// ASP.NET Core Program.cs Setup Pattern

var programCode = @"
using VoiceChat.Backend.Services;

var builder = WebApplication.CreateBuilder(args);

// Configuration
builder.Services.Configure<AzureOpenAISettings>(options =>
{
    options.Endpoint = Environment.GetEnvironmentVariable('AZURE_ENDPOINT') ?? '';
    options.ApiKey = Environment.GetEnvironmentVariable('AZURE_API_KEY') ?? '';
    options.RealtimeDeployment = Environment.GetEnvironmentVariable('AZURE_REALTIME_DEPLOYMENT') ?? 'gpt-4o-realtime';
});

// Services
builder.Services.AddSingleton<SessionManager>();
builder.Services.AddSingleton<AzureRealtimeService>();
builder.Services.AddSingleton<AzureChatService>();

// CORS for local development
builder.Services.AddCors(options =>
{
    options.AddDefaultPolicy(policy =>
    {
        policy.WithOrigins('http://localhost:3000')
              .AllowAnyHeader()
              .AllowAnyMethod();
    });
});

var app = builder.Build();

// Middleware
app.UseCors();
app.UseWebSockets(new WebSocketOptions
{
    KeepAliveInterval = TimeSpan.FromSeconds(30)
});

// WebSocket endpoint for Voice mode
app.Map('/ws/voice', async (HttpContext context) =>
{
    if (!context.WebSockets.IsWebSocketRequest)
    {
        context.Response.StatusCode = 400;
        return;
    }
    
    var sessionManager = context.RequestServices.GetRequiredService<SessionManager>();
    var realtimeService = context.RequestServices.GetRequiredService<AzureRealtimeService>();
    
    // Create session
    var userId = context.Request.Query['user'].FirstOrDefault() ?? 'anonymous';
    var (success, sessionId, message) = sessionManager.CreateSession(userId);
    
    if (!success)
    {
        context.Response.StatusCode = 429;  // Too Many Requests
        return;
    }
    
    try
    {
        var webSocket = await context.WebSockets.AcceptWebSocketAsync();
        await realtimeService.HandleVoiceSession(webSocket, sessionId);
    }
    finally
    {
        sessionManager.RemoveSession(sessionId);
    }
});

// Health check endpoint
app.MapGet('/health', () => Results.Ok(new { status = 'healthy' }));

app.Run();
";

Console.WriteLine("ASP.NET Core Program.cs Setup:");
Console.WriteLine(new string('=', 50));
Console.WriteLine(programCode);

---
## üìù Summary

### Key Concepts Learned

1. **WebSocket in .NET**: Using `ClientWebSocket` and ASP.NET Core's `HttpContext.WebSockets`

2. **Bidirectional Proxy Pattern**: `Task.WhenAny` + `CancellationToken` for coordinated shutdown

3. **Audio Processing**: PCM16 at 24kHz, base64 encoded, using `ArraySegment<byte>`

4. **Session Management**: `ConcurrentDictionary` for thread-safe session tracking

5. **Message Protocol**: `System.Text.Json` for serialization with `JsonPropertyName` attributes

6. **Configuration**: Options pattern with `IOptions<T>` and environment variables

7. **Microsoft Agent Framework**: `AIAgent` and `AgentThread` for text mode chat with conversation memory

### .NET-Specific Advantages

- **Type Safety**: Strongly-typed message classes
- **Async/Await**: First-class support for asynchronous I/O
- **DI Container**: Clean service architecture
- **Performance**: Efficient memory management with `ArraySegment<T>`
- **Agent Framework**: Same patterns available in Python for cross-platform consistency

### Agent Framework Packages

```xml
<PackageReference Include="Microsoft.Agents.AI.AzureAI" Version="*-*" />
<PackageReference Include="Microsoft.Agents.AI.OpenAI" Version="*-*" />
<PackageReference Include="Microsoft.Extensions.AI" Version="*-*" />
```

### Next Steps

- Run the actual voice chat application to see these concepts in action
- Explore the browser-side code for audio capture and playback
- Experiment with different voice settings and system prompts
- Try adding tools/functions to the agent for enhanced capabilities

### Resources

- [Azure OpenAI Realtime API Documentation](https://learn.microsoft.com/azure/ai-services/openai/realtime-audio-quickstart)
- [ASP.NET Core WebSockets](https://learn.microsoft.com/aspnet/core/fundamentals/websockets)
- [Microsoft Agent Framework](https://github.com/microsoft/agent-framework)
- [System.Net.WebSockets Namespace](https://learn.microsoft.com/dotnet/api/system.net.websockets)