# ML.NET: Intent Classification + Slot Extraction (C#)

Welcome! This notebook gets you from **zero** to a **trained classifier** for intents, with clear comments aimed at a JS/React dev.

### What you'll do
1. Install packages right from the notebook
2. Create a tiny labeled dataset of **queries ➜ intent**
3. Train a **multi-class text classifier** (ML.NET)
4. Evaluate it and make predictions
5. Do basic **slot extraction** with `Microsoft.Recognizers.Text`

**Kernel:** Use the **.NET (C#)** kernel (aka `dotnet-csharp`). If you don't see it, install **.NET Interactive Notebooks** in VS Code.


In [ ]:
// === Install NuGet packages used in this notebook ===
#r "nuget: Microsoft.ML, 4.0.2"
#r "nuget: Microsoft.ML.StandardTrainers, 4.0.2"
#r "nuget: Microsoft.ML.FastTree, 4.0.2" // optional, for alternative trainers
#r "nuget: Microsoft.Recognizers.Text.DateTime, 1.8.13"
#r "nuget: Microsoft.Recognizers.Text.Number, 1.8.13"

using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms.Text;
using Microsoft.Recognizers.Text.DateTime;
using Microsoft.Recognizers.Text.Number;
using Microsoft.Recognizers.Text;

Console.WriteLine("Packages loaded ✔");


## 1) Define data models
Think of this like TypeScript interfaces:
- **`QueryRecord`**: training rows with `Text` and `Label`
- **`IntentPrediction`**: output shape from the model


In [ ]:
public class QueryRecord
{
    // Raw user query text
    public string Text { get; set; } = string.Empty;

    // Intent label (string) – e.g. "GET_CONTACT_INFO"
    public string Label { get; set; } = string.Empty;
}

public class IntentPrediction
{
    // The predicted label (string) after mapping from key to label
    [ColumnName("PredictedLabel")]
    public string PredictedLabel { get; set; } = string.Empty;

    // Raw scores per class – useful for debugging
    public float[] Score { get; set; } = Array.Empty<float>();
}

Console.WriteLine("Models defined ✔");


## 2) Create a tiny labeled dataset
This is your seed data. In a real app, you'll grow this set over time as you see real user queries.


In [ ]:
var seed = new List<QueryRecord>
{
    // GET_CONTACT_INFO
    new() { Text = "what is rick's email?", Label = "GET_CONTACT_INFO" },
    new() { Text = "show me morty's email address", Label = "GET_CONTACT_INFO" },
    new() { Text = "give me summer's contact info", Label = "GET_CONTACT_INFO" },
    new() { Text = "emails for rick and morty", Label = "GET_CONTACT_INFO" },
    new() { Text = "how do I contact beth", Label = "GET_CONTACT_INFO" },

    // FILTER_BY_HIRE_DATE
    new() { Text = "employees hired before 2021", Label = "FILTER_BY_HIRE_DATE" },
    new() { Text = "anyone joined after 2020?", Label = "FILTER_BY_HIRE_DATE" },
    new() { Text = "show hires between 2019 and 2022", Label = "FILTER_BY_HIRE_DATE" },
    new() { Text = "who started prior to 2020", Label = "FILTER_BY_HIRE_DATE" },
    new() { Text = "hire date after jan 2023", Label = "FILTER_BY_HIRE_DATE" },

    // FILTER_BY_ROLE
    new() { Text = "list engineers", Label = "FILTER_BY_ROLE" },
    new() { Text = "show managers in any department", Label = "FILTER_BY_ROLE" },
    new() { Text = "find staff engineers", Label = "FILTER_BY_ROLE" },
    new() { Text = "who are the directors?", Label = "FILTER_BY_ROLE" },
    new() { Text = "any senior engineers?", Label = "FILTER_BY_ROLE" },

    // SEARCH_BY_DEPARTMENT
    new() { Text = "engineering team members", Label = "SEARCH_BY_DEPARTMENT" },
    new() { Text = "folks in hr", Label = "SEARCH_BY_DEPARTMENT" },
    new() { Text = "anyone from finance dept?", Label = "SEARCH_BY_DEPARTMENT" },
    new() { Text = "people working in marketing", Label = "SEARCH_BY_DEPARTMENT" },
    new() { Text = "show sales department", Label = "SEARCH_BY_DEPARTMENT" },
};

Console.WriteLine($"Seed rows: {seed.Count}");


## 3) Build the ML pipeline

High level:
1. Convert labels (strings) to keys → `MapValueToKey`
2. Featurize text with `TextFeaturizer` (bag-of-words + n-grams)
3. Train a multi-class classifier (SDCA Maximum Entropy)
4. Convert predicted keys back to string labels


In [ ]:
var ml = new MLContext(seed: 42);

// Load in-memory list as an IDataView
var data = ml.Data.LoadFromEnumerable(seed);

// Train/test split
var split = ml.Data.TrainTestSplit(data, testFraction: 0.25);

// Pipeline
var pipeline = ml.Transforms.Conversion.MapValueToKey(
                        inputColumnName: nameof(QueryRecord.Label),
                        outputColumnName: "Label")
              .Append(ml.Transforms.Text.FeaturizeText(
                        inputColumnName: nameof(QueryRecord.Text),
                        outputColumnName: "Features",
                        new TextFeaturizingEstimator.Options {
                            // Keep it simple; defaults work fine for small demo datasets
                            WordFeatureExtractor = new WordBagEstimator.Options { NgramLength = 2, UseAllLengths = true },
                            CharFeatureExtractor = new WordBagEstimator.Options { NgramLength = 3, UseAllLengths = false }
                        }))
              .Append(ml.MulticlassClassification.Trainers.SdcaMaximumEntropy(
                        labelColumnName: "Label", featureColumnName: "Features"))
              .Append(ml.Transforms.Conversion.MapKeyToValue(
                        inputColumnName: "PredictedLabel", outputColumnName: nameof(IntentPrediction.PredictedLabel)));

// Train
var model = pipeline.Fit(split.TrainSet);

Console.WriteLine("Model trained ✔");


## 4) Evaluate the model
We care about micro/macro accuracy. With tiny seed data, don't expect miracles — the point is to see the end-to-end flow.


In [ ]:
var testPredictions = model.Transform(split.TestSet);
var metrics = ml.MulticlassClassification.Evaluate(testPredictions, labelColumnName: "Label", scoreColumnName: "Score");

Console.WriteLine($"MicroAccuracy: {metrics.MicroAccuracy:F3}");
Console.WriteLine($"MacroAccuracy: {metrics.MacroAccuracy:F3}");
Console.WriteLine($"LogLoss:       {metrics.LogLoss:F3}");
Console.WriteLine($"PerClassLogLoss: [{string.Join(", ", metrics.PerClassLogLoss.Select(v => v.ToString("F3")))}]");


## 5) Use the model for predictions
Now we simulate new user queries. Notice the model is robust to rephrasing compared to simple `if (text.Contains(...))` rules.


In [ ]:
var engine = ml.Model.CreatePredictionEngine<QueryRecord, IntentPrediction>(model);

string[] samples = new []
{
    "emails for rick and summer",
    "who started before 2020?",
    "any senior managers?",
    "show folks in engineering",
    "morty's contact details please"
};

foreach (var text in samples)
{
    var pred = engine.Predict(new QueryRecord { Text = text });
    Console.WriteLine($"{text} -> {pred.PredictedLabel}");
}


## 6) Save & load the model (production-style)
You'd do this to ship the trained model with your app or cache it locally.


In [ ]:
var modelPath = Path.Combine(Directory.GetCurrentDirectory(), "intent_model.zip");
using (var fs = File.Create(modelPath))
{
    ml.Model.Save(model, split.TrainSet.Schema, fs);
}

Console.WriteLine($"Saved: {modelPath}");

// Reload
ITransformer reloadedModel;
using (var fs = File.OpenRead(modelPath))
{
    reloadedModel = ml.Model.Load(fs, out var schema);
}

var engine2 = ml.Model.CreatePredictionEngine<QueryRecord, IntentPrediction>(reloadedModel);
var check = engine2.Predict(new QueryRecord { Text = "list engineers" });
Console.WriteLine($"Reloaded model prediction: {check.PredictedLabel}");


## 7) Slot extraction with Microsoft.Recognizers.Text
For structured values like dates and numbers, prefer deterministic extractors.
Below are quick examples:


In [ ]:
// Helper to pretty print extractor results
void Dump(string title, IEnumerable<ModelResult> results)
{
    Console.WriteLine($"\n== {title} ==");
    foreach (var r in results)
    {
        Console.WriteLine($"Text='{r.Text}' Type={r.TypeName} Value={r.Resolution?["value"] ?? ""} Start={r.Start} Len={r.Length}");
        if (r.Resolution != null)
        {
            foreach (var kv in r.Resolution)
                Console.WriteLine($"  {kv.Key}: {kv.Value}");
        }
    }
}

// Example query strings
var q1 = "employees hired before 2024 in engineering";
var q2 = "show hires between 2019 and 2022";
var q3 = "anyone joined after 2020?";

// DateTime model (English)
var dtModel = DateTimeRecognizer.GetInstance().GetDateTimeModel(Culture.English);
Dump(q1, dtModel.Parse(q1));
Dump(q2, dtModel.Parse(q2));
Dump(q3, dtModel.Parse(q3));

// Number model example
var numModel = NumberRecognizer.GetInstance().GetNumberModel(Culture.English);
Dump("numbers in: 'list top 3 engineers'", numModel.Parse("list top 3 engineers"));


## 8) Putting it together (mini pipeline)
Here's a tiny function that calls the classifier **and** tries a bit of slot extraction.
For real apps, you'd expand the role/department dictionaries and add more robust date handling.


In [ ]:
public record Slots(string? Department = null, string? Role = null, DateTime? Date = null, (DateTime Start, DateTime End)? Range = null);

var departments = new[] { "engineering", "hr", "finance", "sales", "marketing" };
var roles = new[] { "engineer", "staff engineer", "senior engineer", "manager", "director" };

Slots ExtractSlots(string text)
{
    // Department (very naive)
    var dept = departments.FirstOrDefault(d => text.ToLowerInvariant().Contains(d));

    // Role (very naive)
    var role = roles.FirstOrDefault(r => text.ToLowerInvariant().Contains(r));

    // Date or range via Recognizers
    var dtModel = DateTimeRecognizer.GetInstance().GetDateTimeModel(Culture.English);
    var results = dtModel.Parse(text);

    DateTime? singleDate = null;
    (DateTime Start, DateTime End)? range = null;

    foreach (var r in results)
    {
        if (r.TypeName.Contains("daterange") && r.Resolution != null && r.Resolution.TryGetValue("values", out var valuesObj))
        {
            // values is a list of dicts with timex/start/end types; we pick first
            if (valuesObj is List<Dictionary<string, string>> values && values.Count > 0)
            {
                var v = values[0];
                if (v.TryGetValue("start", out var startStr) && v.TryGetValue("end", out var endStr))
                {
                    if (DateTime.TryParse(startStr, out var s) && DateTime.TryParse(endStr, out var e))
                    {
                        range = (s, e);
                        break;
                    }
                }
            }
        }
        else if (r.TypeName.Contains("date") && r.Resolution != null && r.Resolution.TryGetValue("values", out var valuesObj2))
        {
            if (valuesObj2 is List<Dictionary<string, string>> values && values.Count > 0)
            {
                var v = values[0];
                if (v.TryGetValue("value", out var dateStr) && DateTime.TryParse(dateStr, out var d))
                {
                    singleDate = d;
                    break;
                }
            }
        }
    }

    return new Slots(Department: dept, Role: role, Date: singleDate, Range: range);
}

void Inspect(string text)
{
    var intent = engine.Predict(new QueryRecord { Text = text }).PredictedLabel;
    var slots = ExtractSlots(text);
    Console.WriteLine($"\nQuery: {text}\nIntent: {intent}\nSlots: {{ Department={slots.Department ?? "-"}, Role={slots.Role ?? "-"}, Date={(slots.Date.HasValue ? slots.Date.Value.ToString("yyyy-MM-dd") : "-")}, Range={(slots.Range.HasValue ? $"{slots.Range.Value.Start:yyyy-MM-dd}..{slots.Range.Value.End:yyyy-MM-dd}" : "-")} }}");
}

Inspect("employees hired before 2024 in engineering");
Inspect("emails for rick and morty");
Inspect("show hires between 2019 and 2022");
Inspect("list senior engineers in finance");


---
## Where to go next
1. **Grow the dataset** with real user phrasing; retrain periodically.
2. Swap in other trainers (e.g., `LbfgsMaximumEntropy`, `LightGbm`) and compare metrics.
3. Add **domain dictionaries** (role hierarchies, department synonyms) to improve slot mapping.
4. Persist model + build a small API that loads the ZIP and exposes `/predict`.

_You’ve now trained and used a model entirely in C# with ML.NET._ ✅
