Skip to content

[Repo Assist] perf: HtmlCharRefs entity lookup uses Dictionary for O(1) instead of Map O(log n)#1753

Merged
dsyme merged 2 commits intomainfrom
repo-assist/perf-htmlcharrefs-dict-2026-04-20-4e4998b335019aeb
Apr 20, 2026
Merged

[Repo Assist] perf: HtmlCharRefs entity lookup uses Dictionary for O(1) instead of Map O(log n)#1753
dsyme merged 2 commits intomainfrom
repo-assist/perf-htmlcharrefs-dict-2026-04-20-4e4998b335019aeb

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This is an automated pull request from Repo Assist, an AI assistant for this repository.

Summary

HtmlCharRefs.refs previously used an F# Map<string, string>, which gives O(log n) lookups over its ~2230 entries (≈ 11 string comparisons per lookup). This PR replaces it with a Dictionary<string, string> initialised with StringComparer.Ordinal, giving O(1) lookups.

A secondary micro-optimisation avoids allocating a new char[] on every call to TrimEnd inside the (|Number|Lookup|) active pattern by sharing a private static semiColonChars array.

Root Cause

The entity table was populated with |> Map.ofArray (F# persistent balanced-BST map). For HTML documents with many character references (e.g. &, <, &nbsp; etc.) every entity name is resolved via a tree traversal. A Dictionary with ordinal key comparison gives the same correct result with constant-time lookup.

Changes

  • src/FSharp.Data.Html.Core/HtmlCharRefs.fs
    • Split the ~2230-entry array into a private binding refsEntries
    • Build refs : Dictionary<string, string> from that array with capacity hint and StringComparer.Ordinal
    • Add private semiColonChars = [| ';' |] (reused by TrimEnd)
    • Replace defaultArg (refs.TryFind ref) ref with refs.TryGetValue pattern match
  • RELEASE_NOTES.md — new 8.1.10 entry
  • src/AssemblyInfo*.fs — regenerated with version 8.1.10.0

Test Status

  • ✅ Build succeeded (0 errors, 34 pre-existing warnings)
  • dotnet test tests/FSharp.Data.Core.Tests --configuration Release2957 passed, 0 failed
  • dotnet fantomas --check src/FSharp.Data.Html.Core/HtmlCharRefs.fs — formatting compliant

Generated by 🌈 Repo Assist, see workflow run.

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@96b9d4c39aa22359c0b38265927eadb31dcf4e2a

…Map O(log n)

Replace the F# Map<string, string> (O(log n) per lookup) with a
Dictionary<string, string> (O(1) per lookup) for the ~2230 HTML
named-entity table. Also introduce a static semiColonChars array to
avoid allocating a new char[] on every call to TrimEnd inside the
(|Number|Lookup|) active pattern.

Bump version to 8.1.10.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dsyme dsyme marked this pull request as ready for review April 20, 2026 08:51
@dsyme dsyme merged commit e45a7b6 into main Apr 20, 2026
3 checks passed
@dsyme dsyme deleted the repo-assist/perf-htmlcharrefs-dict-2026-04-20-4e4998b335019aeb branch April 20, 2026 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant