
# Slot Extraction + End‑to‑End Query Pipeline (Deterministic + ML.NET)

This notebook shows how to **extract slots/entities deterministically** (dates, ranges, departments, roles, names) and then **combine** that with the **intent model** you trained in `Intent_Training_Minimal.ipynb` to produce a structured `QuerySpec` you can run in C#.

**You will get:**
- Deterministic date/range parsing using `Microsoft.Recognizers.Text.*`
- Department/role/name canonicalization with a tiny vocab + `FuzzySharp`
- A `SlotsExtractor` and `QueryUnderstanding` orchestrator
- Integration with `intent-model.zip` (fallback to rule-based intent if not found)
- A tiny in-memory dataset to demo filtering
- JS→C# explanations inline



## 0) Packages (local, no cloud)


In [None]:

// Per-cell package refs for .NET Interactive
#r "nuget: Microsoft.Recognizers.Text, 1.8.13"
#r "nuget: Microsoft.Recognizers.Text.DateTime, 1.8.13"
#r "nuget: Microsoft.Recognizers.Text.Number, 1.8.13"
#r "nuget: FuzzySharp, 2.0.2"
#r "nuget: Microsoft.ML, 4.0.2"

using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
using System.Globalization;
using Microsoft.Recognizers.Text;
using Microsoft.Recognizers.Text.DateTime;
using Microsoft.Recognizers.Text.Number;
using FuzzySharp;
using Microsoft.ML;
using Microsoft.ML.Data;



## 1) Domain types (C# classes like TS interfaces)


In [None]:

public enum Intent
{
    GET_CONTACT_INFO,
    FILTER_BY_HIRE_DATE,
    SEARCH_BY_DEPARTMENT,
    FILTER_BY_ROLE_AND_DATE,
    UNKNOWN
}

public record Slots(
    string[]? Names = null,
    DateTime? Date = null,
    (DateTime Start, DateTime End)? Range = null,
    string? Operator = null,
    string? Department = null,
    string? Role = null
);

public record QuerySpec(Intent Intent, Slots Slots);

public record Employee(
    string DisplayName,
    string Email,
    string Department,
    string Role,
    DateTime OriginalHireDate
);



## 2) Sample data (so we can actually filter something)


In [None]:

var employees = new List<Employee> {
    new("Rick Sanchez",   "rick.sanchez@company.com",   "Engineering", "Staff Engineer",  new DateTime(2015,  5, 10)),
    new("Morty Smith",    "morty.smith@company.com",    "Engineering", "Engineer I",      new DateTime(2022, 11, 01)),
    new("Summer Smith",   "summer.smith@company.com",   "Sales",       "Account Manager", new DateTime(2021,  7, 14)),
    new("Beth Smith",     "beth.smith@company.com",     "Operations",  "Director",        new DateTime(2018,  3, 22)),
    new("Jerry Smith",    "jerry.smith@company.com",    "Operations",  "Manager",         new DateTime(2020,  1,  9)),
};
Console.WriteLine($"Employees: {employees.Count}");



## 3) Vocab + fuzzy helpers (departments, roles, names)

We keep small **canonical vocabularies** and match user text to them with fuzzy search. This is fast, predictable, and easy to explain.


In [None]:

// Canonical vocab
var Departments = new[] { "Engineering", "Sales", "Operations" };
var Roles = new[] { "Engineer I", "Staff Engineer", "Manager", "Director", "Account Manager" };

// Synonyms map (lowercase keys)
var RoleSynonyms = new Dictionary<string,string>(StringComparer.OrdinalIgnoreCase) {
    ["engineer"] = "Engineer I",
    ["staff engineer"] = "Staff Engineer",
    ["managers"] = "Manager",
    ["manager"] = "Manager",
    ["director"] = "Director",
    ["account manager"] = "Account Manager"
};

string? FuzzyPick(string text, IEnumerable<string> choices, int minScore = 80)
{
    if (string.IsNullOrWhiteSpace(text)) return null;
    var best = Process.ExtractOne(text, choices);
    return (best != null && best.Score >= minScore) ? best.Value : null;
}

string? MapRole(string raw)
{
    if (string.IsNullOrWhiteSpace(raw)) return null;
    if (RoleSynonyms.TryGetValue(raw.Trim(), out var canonical)) return canonical;
    return FuzzyPick(raw, Roles);
}

string? MapDepartment(string raw) => FuzzyPick(raw, Departments);



## 4) Date/number parsing with Microsoft.Recognizers.Text

We detect forms like:
- **before 2024**, **after 2020**
- **between 2020 and 2023**
- **last year**, **past 2 years**, **from Jan 2021 to Mar 2022**


In [None]:

static (DateTime? date, (DateTime Start, DateTime End)? range, string? op) ParseDateOps(string text, string culture = Culture.English)
{
    if (string.IsNullOrWhiteSpace(text)) return (null, null, null);

    var refDate = DateTime.Today;
    var results = DateTimeRecognizer.RecognizeDateTime(text, culture, DateTimeOptions.None, refDate);
    DateTime? singleDate = null;
    (DateTime, DateTime)? dateRange = null;
    string? op = null;

    foreach (var r in results)
    {
        if (!r.Resolution.TryGetValue("values", out var valuesObj)) continue;
        if (valuesObj is not IList<Dictionary<string, string>> values) continue;

        foreach (var v in values)
        {
            if (v.TryGetValue("type", out var type))
            {
                if (type == "date" && v.TryGetValue("value", out var val))
                {
                    if (DateTime.TryParse(val, out var dt)) singleDate = dt;
                }
                else if ((type == "daterange" || type == "datetimerange") &&
                         v.TryGetValue("start", out var s) && v.TryGetValue("end", out var e) &&
                         DateTime.TryParse(s, out var ds) && DateTime.TryParse(e, out var de))
                {
                    dateRange = (ds, de);
                }
            }
        }
    }

    // Simple operator heuristics
    var lower = text.ToLowerInvariant();
    if (lower.Contains("before") || lower.Contains("earlier than") || lower.Contains("<")) op = "<";
    else if (lower.Contains("after") || lower.Contains("later than") || lower.Contains(">")) op = ">";
    else if (lower.Contains("between") || lower.Contains("from") && lower.Contains("to")) op = "between";

    return (singleDate, dateRange, op);
}



## 5) Name extraction (very simple demo)

For now we just fuzzy-match tokens against known employee names. In production, you might maintain a separate index keyed by first/last names.


In [None]:

string[] ExtractNames(string text, IEnumerable<Employee> employees, int minScore = 80)
{
    var tokens = text.Split(new[]{' ', ',', ';', '?'}, StringSplitOptions.RemoveEmptyEntries);
    var names = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
    foreach (var emp in employees)
    {
        var best = Process.ExtractOne(emp.DisplayName, tokens);
        if (best != null && best.Score >= minScore) names.Add(emp.DisplayName);
        // also try first or last alone
        var parts = emp.DisplayName.Split(' ');
        foreach (var p in parts)
        {
            var hit = Process.ExtractOne(p, tokens);
            if (hit != null && hit.Score >= minScore) names.Add(emp.DisplayName);
        }
    }
    return names.ToArray();
}



## 6) Slots extractor

Combines date parsing, role/department mapping, and name extraction.


In [None]:

Slots ExtractSlots(string text)
{
    var (date, range, op) = ParseDateOps(text);
    // crude department/role guesses: scan for canonical or synonyms
    string? dept = null;
    foreach (var d in Departments) if (text.Contains(d, StringComparison.OrdinalIgnoreCase)) { dept = d; break; }
    if (dept is null)
    {
        // try fuzzy on single words that look like departments
        foreach (var word in text.Split(' ', ',', ';'))
        {
            var cand = MapDepartment(word);
            if (cand != null) { dept = cand; break; }
        }
    }

    string? role = null;
    foreach (var kv in RoleSynonyms)
    {
        if (text.Contains(kv.Key, StringComparison.OrdinalIgnoreCase)) { role = kv.Value; break; }
    }
    if (role is null)
    {
        foreach (var word in text.Split(' ', ',', ';'))
        {
            var cand = MapRole(word);
            if (cand != null) { role = cand; break; }
        }
    }

    var names = ExtractNames(text, employees);

    return new Slots(
        Names: names.Length > 0 ? names : null,
        Date: date,
        Range: range,
        Operator: op,
        Department: dept,
        Role: role
    );
}



## 7) Intent classification (load ML.NET model if available; else rule fallback)


In [None]:

// Types for ML.NET load (must match training)
public class QueryRow { public string Text { get; set; } = ""; }
public class IntentPrediction { [ColumnName("PredictedLabel")] public string PredictedIntent { get; set; } = ""; public float[] Score { get; set; } = Array.Empty<float>(); }

Intent PredictIntent(string text, out float confidence)
{
    var modelPath = Path.Combine("model","intent-model.zip");
    if (File.Exists(modelPath))
    {
        var ml = new MLContext();
        using var fs = File.OpenRead(modelPath);
        var model = ml.Model.Load(fs, out _);
        var engine = ml.Model.CreatePredictionEngine<QueryRow, IntentPrediction>(model);
        var pred = engine.Predict(new QueryRow{ Text = text });

        // Map score to confidence at predicted index if possible
        var intents = new[] { nameof(Intent.GET_CONTACT_INFO), nameof(Intent.FILTER_BY_HIRE_DATE), nameof(Intent.SEARCH_BY_DEPARTMENT), nameof(Intent.FILTER_BY_ROLE_AND_DATE) };
        var idx = Array.IndexOf(intents, pred.PredictedIntent);
        confidence = (idx >= 0 && pred.Score?.Length > idx) ? pred.Score[idx] : 0f;

        if (Enum.TryParse<Intent>(pred.PredictedIntent, out var parsed)) return parsed;
        return Intent.UNKNOWN;
    }
    // Rule-based fallback
    var q = text.ToLowerInvariant();
    if (q.Contains("email") || q.Contains("contact")) { confidence = 0.51f; return Intent.GET_CONTACT_INFO; }
    if (q.Contains("hire") && (q.Contains("before") || q.Contains("after") || q.Contains("between"))) { confidence = 0.51f; return Intent.FILTER_BY_HIRE_DATE; }
    if (q.Contains("engineering") || q.contains("sales") || q.Contains("operations") || q.Contains("department")) { confidence = 0.51f; return Intent.SEARCH_BY_DEPARTMENT; }
    if (q.Contains("manager") || q.Contains("director")) { confidence = 0.51f; return Intent.FILTER_BY_ROLE_AND_DATE; }
    confidence = 0.3f;
    return Intent.UNKNOWN;
}



## 8) Orchestrator: Query → QuerySpec → Filtering


In [None]:

QuerySpec Understand(string text, float minConfidence = 0.60f)
{
    var slots = ExtractSlots(text);
    var intent = PredictIntent(text, out var conf);
    if (conf < minConfidence) intent = Intent.UNKNOWN;
    return new QuerySpec(intent, slots);
}

IEnumerable<Employee> Apply(QuerySpec spec, IEnumerable<Employee> data)
{
    IEnumerable<Employee> q = data;

    if (spec.Slots.Names is not null && spec.Slots.Names.Length > 0)
        q = q.Where(e => spec.Slots.Names.Contains(e.DisplayName, StringComparer.OrdinalIgnoreCase));

    if (!string.IsNullOrWhiteSpace(spec.Slots.Department))
        q = q.Where(e => e.Department.Equals(spec.Slots.Department, StringComparison.OrdinalIgnoreCase));

    if (!string.IsNullOrWhiteSpace(spec.Slots.Role))
        q = q.Where(e => e.Role.Equals(spec.Slots.Role, StringComparison.OrdinalIgnoreCase) || e.Role.Contains(spec.Slots.Role, StringComparison.OrdinalIgnoreCase));

    if (spec.Slots.Range is not null)
    {
        var (start, end) = spec.Slots.Range.Value;
        q = q.Where(e => e.OriginalHireDate >= start && e.OriginalHireDate <= end);
    }
    else if (spec.Slots.Date is not null && !string.IsNullOrWhiteSpace(spec.Slots.Operator))
    {
        var d = spec.Slots.Date.Value;
        if (spec.Slots.Operator == "<") q = q.Where(e => e.OriginalHireDate < d);
        else if (spec.Slots.Operator == ">") q = q.Where(e => e.OriginalHireDate > d);
    }

    return q;
}

// Demo: try a few
string[] tests = {
    "what are rick, summer, and morty's emails?",
    "show managers hired after 2020",
    "employees hired before 2021 in engineering",
    "who is in operations",
    "directors between 2017 and 2019"
};

foreach (var t in tests)
{
    var spec = Understand(t);
    var filtered = Apply(spec, employees).ToList();
    Console.WriteLine($"Q: {t}");
    Console.WriteLine($"  Intent: {spec.Intent}");
    Console.WriteLine($"  Slots: names=[{string.Join(", ", spec.Slots.Names ?? Array.Empty<string>())}] dept={spec.Slots.Department ?? "∅"} role={spec.Slots.Role ?? "∅"} op={spec.Slots.Operator ?? "∅"} date={spec.Slots.Date?.ToString("yyyy-MM-dd") ?? "∅"} range={(spec.Slots.Range is null ? "∅" : $"{spec.Slots.Range?.Value.Start:yyyy-MM-dd}..{spec.Slots.Range?.Value.End:yyyy-MM-dd}")}");
    Console.WriteLine($"  Results: {filtered.Count}");
    foreach (var e in filtered) Console.WriteLine($"    - {e.DisplayName} | {e.Email} | {e.Department} | {e.Role} | {e.OriginalHireDate:yyyy-MM-dd}");
    Console.WriteLine();
}



## 9) (Optional) "Formatter" step

In a real service you would pass **only the filtered rows** into an LLM prompt to format a friendly answer. Keep logic in C#; let the LLM do phrasing only.



## 10) C# notes for JS devs (quick mapping)

- **records**: Lightweight immutable types; think `type`/`interface` with value equality.
- **LINQ**: `Where`, `Select`, etc., similar to JS arrays but deferred execution.
- **`StringComparison.OrdinalIgnoreCase`**: case-insensitive equals/contains.
- **Pipelines**: We keep **intent** and **slot** extraction separate; then compose.
- **Confidence threshold**: Unknown/low-confidence routes to clarify or fallback.


In [None]:

// Sanity: no-op cell (for visibility)
Console.WriteLine("Notebook loaded. Ready.");
