
# Minimal Local Intent Training with ML.NET (No Cloud)

**Goal:** Train an **offline** intent classifier for short user queries using **ML.NET**, save it as `intent-model.zip`, and show how to load and use it in your .NET app.  
This notebook runs entirely **locally** and installs required packages per‑cell via .NET Interactive.

**What you'll get:**
- A tiny labeled dataset (CSV) of queries → intents
- An ML.NET pipeline (featurize text → train classifier)
- Metrics (micro/macro accuracy, log loss)
- A saved model: `model/intent-model.zip`
- A simple **runtime** `IntentClassifier` you can drop into your app
- JS-to-C# annotated explanations (from a JS dev’s perspective)



## 0) Requirements

- **.NET 8 SDK** installed
- **VS Code** with **.NET Interactive Notebooks** extension (or Polyglot Notebooks)
- This notebook uses per-cell package references (`#r "nuget: ..."`) so you don't need a `.csproj` here.


In [None]:

// Install packages just for this notebook session
#r "nuget: Microsoft.ML, 4.0.2"
#r "nuget: Microsoft.ML.FastTree, 4.0.2" // optional, shows alternative trainers
using System;
using System.IO;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;


## 1) Create a tiny labeled dataset (CSV)

We’ll start with a **small** dataset you can expand later.  
Each row is `"Text,Intent"`. Keep intent names **UPPER_SNAKE_CASE** so they map nicely into enums later.


In [None]:

// Create a minimal dataset on disk: data/intents.csv
Directory.CreateDirectory("data");
var csv = """
Text,Intent
"what is rick's email",GET_CONTACT_INFO
"emails for morty and summer",GET_CONTACT_INFO
"show managers hired after 2020",FILTER_BY_ROLE_AND_DATE
"who works in engineering",SEARCH_BY_DEPARTMENT
"hired before 2024",FILTER_BY_HIRE_DATE
"filter by department sales",SEARCH_BY_DEPARTMENT
"list staff engineer emails",GET_CONTACT_INFO
"anyone hired after jan 2021",FILTER_BY_HIRE_DATE
"find directors in operations",SEARCH_BY_DEPARTMENT
"get contact for summer smith",GET_CONTACT_INFO
""";
File.WriteAllText("data/intents.csv", csv);
Console.WriteLine("Wrote data/intents.csv");



## 2) Train a model

**Pipeline (high level):**
1. Map the label (intent string) → key (required by many trainers)
2. Featurize text (`TextFeaturizingEstimator` → bag of words + n-grams, etc.)
3. Train a **multiclass classifier** (we’ll use `SdcaMaximumEntropy` for simplicity)
4. Map predicted key → string label

We’ll also do a quick **train/test split** to see accuracy.


In [None]:

// Data contracts (C# POCOs)
public class QueryRow
{
    [LoadColumn(0)] public string Text { get; set; } = "";
    [LoadColumn(1)] public string Intent { get; set; } = "";
}

public class IntentPrediction
{
    [ColumnName("PredictedLabel")] public string PredictedIntent { get; set; } = "";
    public float[] Score { get; set; } = Array.Empty<float>();
}

var ml = new MLContext(seed: 42);

// Load CSV
var dataPath = Path.Combine("data", "intents.csv");
IDataView data = ml.Data.LoadFromTextFile<QueryRow>(dataPath, hasHeader: true, separatorChar: ',');

// Train/Test split
var split = ml.Data.TrainTestSplit(data, testFraction: 0.25);

// Build pipeline
var pipeline =
    ml.Transforms.Conversion.MapValueToKey("Label", nameof(QueryRow.Intent))
    .Append(ml.Transforms.Text.FeaturizeText("Features", nameof(QueryRow.Text)))
    .Append(ml.MulticlassClassification.Trainers.SdcaMaximumEntropy(labelColumnName: "Label", featureColumnName: "Features"))
    .Append(ml.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

// Train
Console.WriteLine("Training model...");
var model = pipeline.Fit(split.TrainSet);

// Evaluate
var scored = model.Transform(split.TestSet);
var metrics = ml.MulticlassClassification.Evaluate(scored, labelColumnName: "Label", scoreColumnName: "Score");
Console.WriteLine($"MicroAcc={metrics.MicroAccuracy:F3}  MacroAcc={metrics.MacroAccuracy:F3}  LogLoss={metrics.LogLoss:F3}");



## 3) Save the trained model to `intent-model.zip`


In [None]:

Directory.CreateDirectory("model");
var modelPath = Path.Combine("model", "intent-model.zip");
ml.Model.Save(model, split.TrainSet.Schema, modelPath);
Console.WriteLine($"Saved -> {modelPath}");



## 4) Quick inference demo (using the in-memory model)

We’ll predict an intent for a new query and show raw scores for each class.


In [None]:

var engine = ml.Model.CreatePredictionEngine<QueryRow, IntentPrediction>(model);

string[] tests =
{
    "find emails for summer and morty",
    "who are the managers hired after 2021",
    "show engineering staff",
    "hired before 2020 in sales"
};

foreach (var t in tests)
{
    var pred = engine.Predict(new QueryRow { Text = t });
    Console.WriteLine($"{t}");
    Console.WriteLine($"  Predicted: {pred.PredictedIntent}");
    if (pred.Score != null && pred.Score.Length > 0)
        Console.WriteLine($"  Scores: [{string.Join(", ", pred.Score.Select(s => s.ToString("F2")))}]");
    Console.WriteLine();
}



## 5) Load the saved model (from `intent-model.zip`) and predict

This simulates how your **app** will load the model at runtime.


In [None]:

using var fs = File.OpenRead(Path.Combine("model","intent-model.zip"));
var loadedModel = ml.Model.Load(fs, out var schema);
var loadedEngine = ml.Model.CreatePredictionEngine<QueryRow, IntentPrediction>(loadedModel);

var p = loadedEngine.Predict(new QueryRow { Text = "get contact info for rick sanchez" });
Console.WriteLine($"Predicted: {p.PredictedIntent}");



## 6) A tiny reusable `IntentClassifier` (drop this into your app)

This class wraps loading and predicting, adds a **confidence threshold**, and maps the string label into a C# enum.


In [None]:

public enum Intent
{
    GET_CONTACT_INFO,
    FILTER_BY_HIRE_DATE,
    SEARCH_BY_DEPARTMENT,
    FILTER_BY_ROLE_AND_DATE,
    UNKNOWN
}

public sealed class IntentClassifier
{
    private readonly PredictionEngine<QueryRow, IntentPrediction> _engine;
    private readonly string[] _knownIntents;
    private readonly float _minConfidence;

    public IntentClassifier(string modelPath, float minConfidence = 0.60f)
    {
        var ml = new MLContext();
        using var fs = File.OpenRead(modelPath);
        var model = ml.Model.Load(fs, out _);
        _engine = ml.Model.CreatePredictionEngine<QueryRow, IntentPrediction>(model);

        // Keep in sync with your training label space
        _knownIntents = new[]
        {
            nameof(Intent.GET_CONTACT_INFO),
            nameof(Intent.FILTER_BY_HIRE_DATE),
            nameof(Intent.SEARCH_BY_DEPARTMENT),
            nameof(Intent.FILTER_BY_ROLE_AND_DATE)
        };

        _minConfidence = minConfidence;
    }

    public (Intent intent, float confidence) Predict(string text)
    {
        var pred = _engine.Predict(new QueryRow { Text = text });
        var idx = Array.IndexOf(_knownIntents, pred.PredictedIntent);
        var confidence = (idx >= 0 && pred.Score?.Length > idx) ? pred.Score[idx] : 0f;

        if (confidence < _minConfidence) return (Intent.UNKNOWN, confidence);

        if (Enum.TryParse<Intent>(pred.PredictedIntent, out var parsed))
            return (parsed, confidence);

        return (Intent.UNKNOWN, confidence);
    }
}

// Demo:
var clf = new IntentClassifier(Path.Combine("model","intent-model.zip"), minConfidence: 0.65f);
var (intent, conf) = clf.Predict("show managers hired after 2020");
Console.WriteLine($"Intent: {intent}  Confidence: {conf:F2}");



## 7) C# for JS developers — quick translation notes

- **Classes/POCOs**: `QueryRow` and `IntentPrediction` are like TypeScript interfaces but compiled types. Attributes like `[LoadColumn(0)]` tell ML.NET how to map CSV columns.
- **`MLContext`**: Think of it like creating a service container/entry point for ML pipelines.
- **Pipelines**: Similar to composing functions/middlewares.  
  ```csharp
  MapValueToKey -> FeaturizeText -> TrainClassifier -> MapKeyToValue
  ```
- **Train/Test split**: Same idea as JS ML libraries; we keep some data to evaluate model performance.
- **Saving model**: `ml.Model.Save(...)` writes a portable ZIP. Your app loads this at runtime—no cloud.
- **`PredictionEngine`**: A small, single-threaded helper for quick predictions. For production throughput, use `Transform` on an `IDataView` or create a prediction engine pool.
- **Enums**: We keep intents as an `enum` to avoid stringly-typed code at runtime.
- **Confidence**: We read `Score[]` and pick the value at the predicted label index. Use a threshold to decide when to fall back to rules or ask for clarification.

### Where do slots/entities fit?
- Keep **dates/numbers** deterministic with `Microsoft.Recognizers.Text` (e.g., “before 2024” → operator `<`, date `2024-01-01`).
- Use **controlled vocab + fuzzy match** for department/role/name canonicalization.
- Let the **intent model** decide the action; let **slot extractors** provide parameters.



## 8) Next steps (expand as you learn)

- Add more intents and **20–50 examples per intent** to improve accuracy.
- Version models: `model/intent-model.v1.zip`, `v2`, etc., and keep a small README with metrics.
- Wire the classifier into your service and route **low-confidence** predictions to a clarification prompt or a narrow LLM fallback.
- Keep slot extraction deterministic to avoid LLM math/date mistakes.
