Skip to content

perf: checking the syntax contains/cluster list is slow#29

Open
h-east wants to merge 1 commit into
syntax-leadbyte-prefilterfrom
syntax-idlist-cache
Open

perf: checking the syntax contains/cluster list is slow#29
h-east wants to merge 1 commit into
syntax-leadbyte-prefilterfrom
syntax-idlist-cache

Conversation

@h-east
Copy link
Copy Markdown
Owner

@h-east h-east commented May 29, 2026

Problem:  Deciding whether a group is in a "contains"/cluster list scans
          the list and expands clusters on every check, which is slow for
          syntaxes with large lists (e.g. plugins such as netrw).
Solution: Resolve each list once into a sorted, cluster-expanded set of
          group IDs and use a binary search; cache it per syntax block and
          drop the cache when syntax definitions change.

@h-east
Copy link
Copy Markdown
Owner Author

h-east commented May 29, 2026

Measurement results

Build: ./configure default CFLAGS (-g -O2), no profiling.

Method: clean A/B on the same binary using test_override('syn_idlist_cache',
{0|1}) to enable/disable the cache; force full-buffer highlighting by calling
synID() on every line; measure wall time with reltime(), median of 5 runs.

file (filetype) lines cache off cache on change
netrw.vim, Vim plugin 9,717 ~5.5 s ~3.7 s ~33% faster
src/evalfunc.c, C 12,919 ~0.41 s ~0.35 s ~14% faster
big.c, C (concatenated) 99,192 ~2.7 s ~2.5 s ~10% faster

in_id_list() self time drops from ~10.7% to ~6.6% (perf). The gain is largest
for syntaxes built around "contains"/cluster lists (real-world plugins such as
netrw); for keyword-heavy or small buffers it is smaller.

Correctness: byte-for-byte identical synID() output with the cache on and off
across C, C++, Vim script, Python, Ruby, Lua, JavaScript, shell, HTML and CSS
(millions of cells), including netrw.vim and the Vim syntax file.

Tests: test_syntax (incl. a new Test_syntax_idlist_cache_unchanged differential
test), test_highlight and test_functions all pass.

Maintainability

  • The cache is a speed optimization only. test_override('syn_idlist_cache', 1)
    disables it at runtime (same mechanism as nfa_fail), and
    Test_syntax_idlist_cache_unchanged asserts the highlighting is identical with
    the cache on and off, so a future bug surfaces as a CI failure rather than
    silently wrong highlighting.
  • Lists that cannot be represented as a plain set (a nested
    ALLBUT/TOP/CONTAINED marker) fall back to the original linear scan.
  • The cache is dropped in syn_stack_free_block(), which runs on every syntax
    definition change, so it also covers list pointer reuse.

Relationship to the lead-byte prefilter PR (vim#20371)

This is a separate, complementary optimization: the prefilter speeds up
regexp-pattern-heavy syntaxes (C/C++), while this speeds up
contains/cluster-heavy ones (real Vim plugins). As currently written the work
is stacked on the prefilter branch (it reuses the s:SynDumpBuffer test helper),
so it is most naturally a follow-up/stacked PR; making it fully independent of
vim#20371 would require duplicating that helper.

@h-east h-east force-pushed the syntax-idlist-cache branch from ef0af30 to 256d550 Compare May 30, 2026 12:15
@h-east h-east changed the title checking the syntax contains/cluster list is slow perf: checking the syntax contains/cluster list is slow May 30, 2026
@h-east h-east force-pushed the syntax-leadbyte-prefilter branch 2 times, most recently from 64b37b3 to 32a75fe Compare June 1, 2026 02:13
@h-east h-east force-pushed the syntax-idlist-cache branch from 256d550 to 719e174 Compare June 1, 2026 03:03
Problem:  Deciding whether a group is in a "contains"/cluster list scans
          the list and expands clusters on every check, which is slow for
          syntaxes with large lists (e.g. plugins such as netrw).
Solution: Resolve each list once into a sorted, cluster-expanded set of
          group IDs and use a binary search; cache it per syntax block and
          drop the cache when syntax definitions change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@h-east h-east force-pushed the syntax-idlist-cache branch from 719e174 to 9991544 Compare June 1, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant