# ML.NET: Intent Classification + Slot Extraction (C#)

Welcome! This notebook gets you from **zero** to a **trained classifier** for intents, with clear comments aimed at a JS/React dev.

### What you'll do
1. Install packages right from the notebook
2. Create a tiny labeled dataset of **queries ➜ intent**
3. Train a **multi-class text classifier** (ML.NET)
4. Evaluate it and make predictions
5. Do basic **slot extraction** with `Microsoft.Recognizers.Text`

**Kernel:** Use the **.NET (C#)** kernel (aka `dotnet-csharp`). If you don't see it, install **.NET Interactive Notebooks** in VS Code.


In [1]:
// ML.NET: Intent Classification + Slot Extraction (C#)
// Combined setup (original cells 1–5):
// - Notebook intro (summary)
// - Install NuGet packages
// - Usings
// - Environment sanity prints

#r "nuget: Microsoft.Recognizers.Text, 1.8.13"
#r "nuget: Microsoft.Recognizers.Text.DateTime, 1.8.13"
#r "nuget: Microsoft.ML, 4.0.2"
#r "nuget: Microsoft.ML.FastTree, 4.0.2"
#r "nuget: Microsoft.ML.LightGbm, 4.0.2"

using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms.Text;
using Microsoft.ML.Trainers;
using Microsoft.Recognizers.Text;
using Microsoft.Recognizers.Text.DateTime;
using Microsoft.Recognizers.Text.Number;

Console.WriteLine("Recognizers OK");
Console.WriteLine("DateTime OK");
Console.WriteLine(Environment.UserName);
Console.WriteLine("ML.NET OK");
Console.WriteLine("Packages loaded ✔");

Recognizers OK
DateTime OK
DateTime OK
maneki-neko
maneki-neko
ML.NET OK
ML.NET OK
Packages loaded ✔
Packages loaded ✔


## 1) Define data models
Think of this like TypeScript interfaces:
- **`QueryRecord`**: training rows with `Text` and `Label`
- **`IntentPrediction`**: output shape from the model


In [2]:
public class QueryRecord
{
    // Raw user query text
    public string Text { get; set; } = string.Empty;

    // Intent label (string) – e.g. "GET_CONTACT_INFO"
    public string Label { get; set; } = string.Empty;
}

public class IntentPrediction
{
    // The predicted label (string) after mapping from key to label
    [ColumnName("PredictedLabel")]
    public string PredictedLabel { get; set; } = string.Empty;

    // Raw scores per class – useful for debugging
    public float[] Score { get; set; } = Array.Empty<float>();
}

Console.WriteLine("Models defined ✔");


Models defined ✔


## 2) Create a tiny labeled dataset
This is your seed data. In a real app, you'll grow this set over time as you see real user queries.


In [3]:
var seed = new List<QueryRecord>
{
    // GET_CONTACT_INFO
    new() { Text = "what is rick's email?", Label = "GET_CONTACT_INFO" },
    new() { Text = "show me morty's email address", Label = "GET_CONTACT_INFO" },
    new() { Text = "give me summer's contact info", Label = "GET_CONTACT_INFO" },
    new() { Text = "emails for rick and morty", Label = "GET_CONTACT_INFO" },
    new() { Text = "how do I contact beth", Label = "GET_CONTACT_INFO" },

    // FILTER_BY_HIRE_DATE
    new() { Text = "employees hired before 2021", Label = "FILTER_BY_HIRE_DATE" },
    new() { Text = "anyone joined after 2020?", Label = "FILTER_BY_HIRE_DATE" },
    new() { Text = "show hires between 2019 and 2022", Label = "FILTER_BY_HIRE_DATE" },
    new() { Text = "who started prior to 2020", Label = "FILTER_BY_HIRE_DATE" },
    new() { Text = "hire date after jan 2023", Label = "FILTER_BY_HIRE_DATE" },

    // FILTER_BY_ROLE
    new() { Text = "list engineers", Label = "FILTER_BY_ROLE" },
    new() { Text = "show managers in any department", Label = "FILTER_BY_ROLE" },
    new() { Text = "find staff engineers", Label = "FILTER_BY_ROLE" },
    new() { Text = "who are the directors?", Label = "FILTER_BY_ROLE" },
    new() { Text = "any senior engineers?", Label = "FILTER_BY_ROLE" },

    // SEARCH_BY_DEPARTMENT
    new() { Text = "engineering team members", Label = "SEARCH_BY_DEPARTMENT" },
    new() { Text = "folks in hr", Label = "SEARCH_BY_DEPARTMENT" },
    new() { Text = "anyone from finance dept?", Label = "SEARCH_BY_DEPARTMENT" },
    new() { Text = "people working in marketing", Label = "SEARCH_BY_DEPARTMENT" },
    new() { Text = "show sales department", Label = "SEARCH_BY_DEPARTMENT" },
};

Console.WriteLine($"Seed rows: {seed.Count}");


Seed rows: 20


## 3) Build the ML pipeline

Overview
1. `MapValueToKey` — convert string labels to numeric keys
2. `FeaturizeText` — turn text into numeric features (uses sensible defaults)
3. `SdcaMaximumEntropy` — train a multi-class classifier
4. `MapKeyToValue` — convert the predicted key back to the original label

### Step 1 — Label to key
- ML.NET learns on numbers faster than strings.
- Example mapping: `GET_CONTACT_INFO → 0`, `FILTER_BY_HIRE_DATE → 1`, ...

### Step 2 — Text featurization
- `FeaturizeText` tokenizes text and builds features (word/character based) using built-in defaults.
- In ML.NET 4.x, pass column names as method parameters:
  - `outputColumnName: "Features"`
  - `inputColumnName: nameof(QueryRecord.Text)`
- You don’t need to specify `WordBagEstimator`/`CharBagEstimator` manually; defaults are usually strong.

Example
```
Text:    "show me the department list"
Unigrams: ["show","me","the","department","list"]
Bigrams:  ["show me","me the","the department","department list"]
```

Note
- Older samples may set `TextFeaturizingEstimator.Options.OutputColumnName/InputColumnName` or use `CharBagEstimator`.
- In ML.NET 4.x, provide column names via parameters to `FeaturizeText(...)`, and some older option types are no longer needed.

### Step 3 — Train the classifier
- Multi-class = choose one label out of many.
- `SdcaMaximumEntropy` is a strong, fast baseline for text.
- Alternatives: `LbfgsMaximumEntropy`, `LightGbm`.
- Reproducibility: `new MLContext(seed: 42)`.

### Step 4 — Key back to label
- Predictions are numeric keys; map them back to original strings for display and downstream logic.

### Minimal pipeline sketch (ML.NET 4.x)
```csharp
var pipeline =
    ml.Transforms.Conversion.MapValueToKey(
        inputColumnName: nameof(QueryRecord.Label),
        outputColumnName: "Label")
      .Append(ml.Transforms.Text.FeaturizeText(
        outputColumnName: "Features",
        inputColumnName: nameof(QueryRecord.Text)))
      .Append(ml.MulticlassClassification.Trainers.SdcaMaximumEntropy(
        labelColumnName: "Label",
        featureColumnName: "Features"))
      .Append(ml.Transforms.Conversion.MapKeyToValue(
        outputColumnName: nameof(IntentPrediction.PredictedLabel),
        inputColumnName: "PredictedLabel"));
```

### Optional: explicit n-gram features (advanced)
If you want full control over word/char n-grams, build them explicitly and concatenate:

```csharp
var textNorm   = "TextNorm";
var wordTokens = "WordTokens";
var wordNgrams = "WordNgrams";   // word uni/bi-grams
var charTokens = "CharTokens";
var char3grams = "Char3Grams";   // character tri-grams

var textFeatures =
    ml.Transforms.Text.NormalizeText(
        outputColumnName: textNorm,
        inputColumnName: nameof(QueryRecord.Text))
      .Append(ml.Transforms.Text.TokenizeIntoWords(
        outputColumnName: wordTokens,
        inputColumnName: textNorm))
      .Append(ml.Transforms.Text.ProduceNgrams(
        outputColumnName: wordNgrams,
        inputColumnName: wordTokens,
        ngramLength: 2,
        useAllLengths: true))
      .Append(ml.Transforms.Text.TokenizeIntoCharactersAsKeys(
        outputColumnName: charTokens,
        inputColumnName: textNorm))
      .Append(ml.Transforms.Text.ProduceNgrams(
        outputColumnName: char3grams,
        inputColumnName: charTokens,
        ngramLength: 3,
        useAllLengths: false))
      .Append(ml.Transforms.Concatenate(
        outputColumnName: "Features",
        inputColumnNames: new[] { wordNgrams, char3grams }));

var pipeline =
    ml.Transforms.Conversion.MapValueToKey(
        inputColumnName: nameof(QueryRecord.Label),
        outputColumnName: "Label")
      .Append(textFeatures)
      .Append(ml.MulticlassClassification.Trainers.SdcaMaximumEntropy(
        labelColumnName: "Label",
        featureColumnName: "Features"))
      .Append(ml.Transforms.Conversion.MapKeyToValue(
        outputColumnName: nameof(IntentPrediction.PredictedLabel),
        inputColumnName: "PredictedLabel"));
```

- Word n-grams: `ngramLength: 2`, `useAllLengths: true` → unigrams + bigrams.
- Char n-grams: `ngramLength: 3`, `useAllLengths: false` → strictly tri-grams.
- This mirrors the intent of the older Options example but with ML.NET 4.x APIs.

In [4]:
// The answer to the Ultimate Question of Life, the Universe, and Everything: 42.
var ml = new MLContext(seed: 42);

// Load in-memory list as an IDataView
// IDataView is the core data structure in ML.NET, similar to a DataFrame.
// ML.NET's way of representing tabular data (like a table or DataFrame).
// It's an interface for data pipelines--ML.NET doesn't use plain lists or arrays directly for training.
// Conversts in-memory list to IDataView (seed is a List<QueryRecord>).
var data = ml.Data.LoadFromEnumerable(seed);

// Train/test split
// Splits the data into training and testing sets.
// - Traning set: Used to train the model.
// - Testing set: Used to evaluate the model's performance.
// This is important to avoid overfitting and ensure the model generalizes well to unseen data
// - testFraction: Percentage of data to use for testing (0.25 means 25% of the data will be used for testing).
var split = ml.Data.TrainTestSplit(data, testFraction: 0.25);

// Pipeline
var pipeline = ml.Transforms.Conversion.MapValueToKey(
                        inputColumnName: nameof(QueryRecord.Label),
                        outputColumnName: "Label")
              .Append(ml.Transforms.Text.FeaturizeText(
                        outputColumnName: "Features",
                        inputColumnName: nameof(QueryRecord.Text)))
              .Append(ml.MulticlassClassification.Trainers.SdcaMaximumEntropy(
                        labelColumnName: "Label", featureColumnName: "Features"))
              .Append(ml.Transforms.Conversion.MapKeyToValue(
                        outputColumnName: nameof(IntentPrediction.PredictedLabel),
                        inputColumnName: "PredictedLabel"));

ITransformer model;
try
{
    model = pipeline.Fit(split.TrainSet);
    Console.WriteLine("Model trained ✔");
}
catch (Exception ex)
{
    Console.WriteLine("Training failed: " + ex.Message);
    throw;
}

Console.WriteLine("Model trained ✔");

Model trained ✔
Model trained ✔
Model trained ✔


## 4) Evaluate the model
We care about micro/macro accuracy. With tiny seed data, don't expect miracles — the point is to see the end-to-end flow.


In [5]:
var testPredictions = model.Transform(split.TestSet);
var metrics = ml.MulticlassClassification.Evaluate(testPredictions, labelColumnName: "Label", scoreColumnName: "Score");

Console.WriteLine($"MicroAccuracy: {metrics.MicroAccuracy:F3}");
Console.WriteLine($"MacroAccuracy: {metrics.MacroAccuracy:F3}");
Console.WriteLine($"LogLoss:       {metrics.LogLoss:F3}");
Console.WriteLine($"PerClassLogLoss: [{string.Join(", ", metrics.PerClassLogLoss.Select(v => v.ToString("F3")))}]");


MicroAccuracy: 1.000
MacroAccuracy: 1.000
MacroAccuracy: 1.000
LogLoss:       0.422
LogLoss:       0.422
PerClassLogLoss: [0.000, 0.384, 0.000, 0.498]
PerClassLogLoss: [0.000, 0.384, 0.000, 0.498]


## 5) Use the model for predictions
Now we simulate new user queries. Notice the model is robust to rephrasing compared to simple `if (text.Contains(...))` rules.


In [6]:
var engine = ml.Model.CreatePredictionEngine<QueryRecord, IntentPrediction>(model);

string[] samples = new []
{
    "emails for rick and summer",
    "who started before 2020?",
    "any senior managers?",
    "show folks in engineering",
    "morty's contact details please"
};

foreach (var text in samples)
{
    var pred = engine.Predict(new QueryRecord { Text = text });
    Console.WriteLine($"{text} -> {pred.PredictedLabel}");
}


emails for rick and summer -> GET_CONTACT_INFO
who started before 2020? -> FILTER_BY_HIRE_DATE
who started before 2020? -> FILTER_BY_HIRE_DATE
any senior managers? -> FILTER_BY_ROLE
any senior managers? -> FILTER_BY_ROLE
show folks in engineering -> SEARCH_BY_DEPARTMENT
show folks in engineering -> SEARCH_BY_DEPARTMENT
morty's contact details please -> GET_CONTACT_INFO
morty's contact details please -> GET_CONTACT_INFO


## 6) Save & load the model (production-style)
You'd do this to ship the trained model with your app or cache it locally.


In [7]:
var modelPath = Path.Combine(Directory.GetCurrentDirectory(), "intent_model.zip");
using (var fs = File.Create(modelPath))
{
    ml.Model.Save(model, split.TrainSet.Schema, fs);
}

Console.WriteLine($"Saved: {modelPath}");

// Reload
ITransformer reloadedModel;
using (var fs = File.OpenRead(modelPath))
{
    reloadedModel = ml.Model.Load(fs, out var schema);
}

var engine2 = ml.Model.CreatePredictionEngine<QueryRecord, IntentPrediction>(reloadedModel);
var check = engine2.Predict(new QueryRecord { Text = "list engineers" });
Console.WriteLine($"Reloaded model prediction: {check.PredictedLabel}");


Saved: /Users/maneki-neko/learning/NLP/intent_model.zip
Reloaded model prediction: FILTER_BY_ROLE
Reloaded model prediction: FILTER_BY_ROLE


## 7) Slot extraction with Microsoft.Recognizers.Text
For structured values like dates and numbers, prefer deterministic extractors.
Below are quick examples:


In [8]:
// Helper to pretty print extractor results
void Dump(string title, IEnumerable<ModelResult> results)
{
    Console.WriteLine($"\n== {title} ==");
    foreach (var r in results)
    {
        var len = r.Text?.Length ?? 0;
        Console.WriteLine($"Text='{r.Text}' Type={r.TypeName} Value={(r.Resolution != null && r.Resolution.ContainsKey("value") ? r.Resolution["value"] : "")} Start={r.Start} Len={len}");
        if (r.Resolution != null)
        {
            foreach (var kv in r.Resolution)
                Console.WriteLine($"  {kv.Key}: {kv.Value}");
        }
    }
}

// Example query strings
var q1 = "employees hired before 2024 in engineering";
var q2 = "show hires between 2019 and 2022";
var q3 = "anyone joined after 2020?";

// DateTime model (English)
var dtModel = new DateTimeRecognizer(Culture.English).GetDateTimeModel();
Dump(q1, dtModel.Parse(q1));
Dump(q2, dtModel.Parse(q2));
Dump(q3, dtModel.Parse(q3));

// Number model example
var numModel = new NumberRecognizer(Culture.English).GetNumberModel();
Dump("numbers in: 'list top 3 engineers'", numModel.Parse("list top 3 engineers"));


== employees hired before 2024 in engineering ==
Text='before 2024' Type=datetimeV2.daterange Value= Start=16 Len=11
Text='before 2024' Type=datetimeV2.daterange Value= Start=16 Len=11
  values: System.Collections.Generic.List`1[System.Collections.Generic.Dictionary`2[System.String,System.String]]
  values: System.Collections.Generic.List`1[System.Collections.Generic.Dictionary`2[System.String,System.String]]

== show hires between 2019 and 2022 ==

== show hires between 2019 and 2022 ==
Text='between 2019 and 2022' Type=datetimeV2.daterange Value= Start=11 Len=21
Text='between 2019 and 2022' Type=datetimeV2.daterange Value= Start=11 Len=21
  values: System.Collections.Generic.List`1[System.Collections.Generic.Dictionary`2[System.String,System.String]]
  values: System.Collections.Generic.List`1[System.Collections.Generic.Dictionary`2[System.String,System.String]]

== anyone joined after 2020? ==

== anyone joined after 2020? ==
Text='after 2020' Type=datetimeV2.daterange Value= Start

## 8) Putting it together (mini pipeline)
Here's a tiny function that calls the classifier **and** tries a bit of slot extraction.
For real apps, you'd expand the role/department dictionaries and add more robust date handling.


In [10]:
public record Slots(string? Department = null, string? Role = null, DateTime? Date = null, (DateTime Start, DateTime End)? Range = null);

var departments = new[] { "engineering", "hr", "finance", "sales", "marketing" };
var roles = new[] { "engineer", "staff engineer", "senior engineer", "manager", "director" };

Slots ExtractSlots(string text)
{
    // Department (very naive)
    var dept = departments.FirstOrDefault(d => text.ToLowerInvariant().Contains(d));

    // Role (very naive)
    var role = roles.FirstOrDefault(r => text.ToLowerInvariant().Contains(r));

    // Date or range via Recognizers
    var dtModel = new DateTimeRecognizer(Culture.English).GetDateTimeModel();
    var results = dtModel.Parse(text);

    DateTime? singleDate = null;
    (DateTime Start, DateTime End)? range = null;

    foreach (var r in results)
    {
        if (r.TypeName.Contains("daterange") && r.Resolution != null && r.Resolution.TryGetValue("values", out var valuesObj))
        {
            // values is a list of dicts with timex/start/end types; we pick first
            if (valuesObj is List<Dictionary<string, string>> values && values.Count > 0)
            {
                var v = values[0];
                if (v.TryGetValue("start", out var startStr) && v.TryGetValue("end", out var endStr))
                {
                    if (DateTime.TryParse(startStr, out var s) && DateTime.TryParse(endStr, out var e))
                    {
                        range = (s, e);
                        break;
                    }
                }
            }
        }
        else if (r.TypeName.Contains("date") && r.Resolution != null && r.Resolution.TryGetValue("values", out var valuesObj2))
        {
            if (valuesObj2 is List<Dictionary<string, string>> values && values.Count > 0)
            {
                var v = values[0];
                if (v.TryGetValue("value", out var dateStr) && DateTime.TryParse(dateStr, out var d))
                {
                    singleDate = d;
                    break;
                }
            }
        }
    }

    return new Slots(Department: dept, Role: role, Date: singleDate, Range: range);
}

void Inspect(string text)
{
    var intent = engine.Predict(new QueryRecord { Text = text }).PredictedLabel;
    var slots = ExtractSlots(text);
    Console.WriteLine($"\nQuery: {text}\nIntent: {intent}\nSlots: {{ Department={slots.Department ?? "-"}, Role={slots.Role ?? "-"}, Date={(slots.Date.HasValue ? slots.Date.Value.ToString("yyyy-MM-dd") : "-")}, Range={(slots.Range.HasValue ? $"{slots.Range.Value.Start:yyyy-MM-dd}..{slots.Range.Value.End:yyyy-MM-dd}" : "-")} }}");
}

Inspect("employees hired before 2024 in engineering");
Inspect("emails for rick and morty");
Inspect("show hires between 2019 and 2022");
Inspect("list senior engineers in finance");



Query: employees hired before 2024 in engineering
Intent: SEARCH_BY_DEPARTMENT
Slots: { Department=engineering, Role=engineer, Date=-, Range=- }

Query: emails for rick and morty
Intent: GET_CONTACT_INFO
Slots: { Department=-, Role=-, Date=-, Range=- }

Query: emails for rick and morty
Intent: GET_CONTACT_INFO
Slots: { Department=-, Role=-, Date=-, Range=- }

Query: show hires between 2019 and 2022
Intent: FILTER_BY_HIRE_DATE
Slots: { Department=-, Role=-, Date=-, Range=2019-01-01..2022-01-01 }

Query: show hires between 2019 and 2022
Intent: FILTER_BY_HIRE_DATE
Slots: { Department=-, Role=-, Date=-, Range=2019-01-01..2022-01-01 }

Query: list senior engineers in finance
Intent: FILTER_BY_ROLE
Slots: { Department=finance, Role=engineer, Date=-, Range=- }

Query: list senior engineers in finance
Intent: FILTER_BY_ROLE
Slots: { Department=finance, Role=engineer, Date=-, Range=- }


---
## Where to go next
1. **Grow the dataset** with real user phrasing; retrain periodically.
2. Swap in other trainers (e.g., `LbfgsMaximumEntropy`, `LightGbm`) and compare metrics.
3. Add **domain dictionaries** (role hierarchies, department synonyms) to improve slot mapping.
4. Persist model + build a small API that loads the ZIP and exposes `/predict`.

_You’ve now trained and used a model entirely in C# with ML.NET._ ✅
