Skip to content

Add still-nanoda checker with some ongoing optimization efforts#38

Merged
nomeata merged 2 commits intoleanprover:masterfrom
SchrodingerZhu:add-still-nanoda-checker
Apr 26, 2026
Merged

Add still-nanoda checker with some ongoing optimization efforts#38
nomeata merged 2 commits intoleanprover:masterfrom
SchrodingerZhu:add-still-nanoda-checker

Conversation

@SchrodingerZhu
Copy link
Copy Markdown
Contributor

still-nanoda is my fork of nanoda/sonanoda to examine some new optimization efforts

@nomeata
Copy link
Copy Markdown
Collaborator

nomeata commented Apr 26, 2026

Thanks! Based on the readme this is a similar idea as @datokrat's sonanoda. Maybe you want to join forces? Or is the difference worth keeping both?

(I'm unsure about the value of having several nanoda forks with small variations on the arena. Maybe unless we start seeing people shooting holes into them, and that's how you put up a target for investigation, but so far that hasn't happened a lot)

It would be preferable if the checker description includes a list of changes relative to nanoda, or at least the high level ideas.

@SchrodingerZhu
Copy link
Copy Markdown
Contributor Author

SchrodingerZhu commented Apr 26, 2026

This is an experimental fork on top of:

The purpose of this fork is to evaluate some optimizations related to the costly
interning cache access and other engineering perspectives. The changes are not
intended to change the underlying typechecking logic and we may contribute back
once the evaluation stabilizes.

Current Change Set

  1. Local-First Expr Allocation Check: it appears that the imported expressions
    table is huge such that its lookup becomes costly. We adjusted the order
    of checks to look at local interning cache first to reduce global cache visiting.
    We also adjust the default IndexSet API to allow holding an entry
    first and then decide the interning position in a second step. If the expr really
    needs to be allocated locally, this remove one extra lookup. In general, this
    optimization shows 5% speedup in Cedar and Mathlib.

  2. Local-only Expression Filtering: another optimization is to do a filtering
    scan of the expression to avoid global cache lookup if the expression to local-only.

    • Local/Var are apparently local-only;
    • Nodes tied to TcCtx are also local-only.

    This change brings another 10% speedup in Cedar and Mathlib.

@SchrodingerZhu
Copy link
Copy Markdown
Contributor Author

Even after these changes, it appears that alloc_expr alone in each thread still take up to 1/3 of the CPU cycles. We are still looking into it.

@SchrodingerZhu
Copy link
Copy Markdown
Contributor Author

(I'm unsure about the value of having several nanoda forks with small variations on the arena. Maybe unless we start seeing people shooting holes into them, and that's how you put up a target for investigation, but so far that hasn't happened a lot)

My fork is more like a temporarily thing that will go into PR if found valuable. Not a variant in the core algorithm. However, I'd like to have a separate tree to evaluate some thoughts.

@nomeata
Copy link
Copy Markdown
Collaborator

nomeata commented Apr 26, 2026

Ok, so how about we feature it on the arena until it either gets merged, or it gets rejected and not worked on anymore.

@nomeata nomeata merged commit 5fd2bee into leanprover:master Apr 26, 2026
3 checks passed
@SchrodingerZhu SchrodingerZhu deleted the add-still-nanoda-checker branch April 26, 2026 18:17
@SchrodingerZhu
Copy link
Copy Markdown
Contributor Author

SchrodingerZhu commented Apr 26, 2026

It seems that I get a 6.3m to 4.9m wall-time improvement but not a clear CPU instruction count improvement.

@nomeata
Copy link
Copy Markdown
Collaborator

nomeata commented Apr 27, 2026

Screenshot_20260427-064314

Looks pretty impressive to me, also the memory footprint improvement. Note that what is shown as time on the arena is actually instruction counts (more reliable to measure).

@nomeata
Copy link
Copy Markdown
Collaborator

nomeata commented Apr 27, 2026

And all of that purely from optimizing cache access patterns! Not bad.

@nomeata
Copy link
Copy Markdown
Collaborator

nomeata commented Apr 27, 2026

Ah, nevermind, you actually fork sonanoda, and that's where it is from.

The wall time improvement is real, though, just not well shown on the arena.

@SchrodingerZhu
Copy link
Copy Markdown
Contributor Author

SchrodingerZhu commented Apr 27, 2026

Haha yeah, I am mainly to examine some common patterns that happen both in sonanoda and nanoda. Sorry for not clarifying this.

This is one of the output from our trace analysis at our lab. Hopefully, the wall-time improvement still means something helpful.

Another finding we are exploring is that most of the costly lookups are App terms. Adding an additional layer of quick lookup cache helps reducing time by another 1~5% but that part is relatively small and unstable. Also the eviction policy is hard to decide...

@SchrodingerZhu
Copy link
Copy Markdown
Contributor Author

Apart from the global imported expr cache, another 10% of time is spent solely in inst_aux.

@nomeata
Copy link
Copy Markdown
Collaborator

nomeata commented Apr 27, 2026

This reminds me a lot of what Claude worked on when creating https://github.com/nomeata/nanobruijn. I think it even added a special fast cache for creating App nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants