New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prove correctness of HashMap.findEntry?/insert/erase
#38
Conversation
have ⟨p, hMem, hP⟩ := any_eq_true.mp (AssocList.contains_eq a _ ▸ hContains) | ||
simp [eraseP_append_right _ hL₁, | ||
eraseP_append_left (p := fun ab => ab.fst == a) hP _ hMem] | ||
-- begin cursed manual proofs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the gnarliest proof. Golfing welcome!
Ok @digama0, I backported the |
Std/Data/HashMap/Basic.lean
Outdated
⟨u % n, USize.modn_lt _ h⟩ | ||
/-- Calculates the bucket index from a `hash` value. -/ | ||
/- Remark: we use a C implementation because this function is performance critical. -/ | ||
@[extern c inline "(size_t)(#2) & (lean_unbox(#1) - 1)"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be extremely surprised if this makes a difference in practice, and wouldn't believe it without actual benchmarks. Leo hasn't seen any performance improvements from this change in Lean 4 either. (It's also incorrect, because sz
might be boxed in general.)
The enormous practical downside of this change is of course that you can no longer use std4 without compiling it. We didn't require that before.
(If you're concerned about performance, I'd be more concerned about using hashes directly. The Hashable instances are not particularly clever and often return results divisible by powers of two. See also leanprover/lean4#1840)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also incorrect, because
sz
might be boxed in general.
I'm confused by this point; there is a lean_unbox
call, is that insufficient?
The enormous practical downside of this change is of course that you can no longer use std4 without compiling it.
It's true! It is surprising to me that the interpreter does not fallback on the Lean implementation when it encounters an extern
symbol with no code loaded. Maybe it should do this instead of crashing? Anyhow I will revert the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this point; there is a
lean_unbox
call, is that insufficient?
I'm sorry, picked the wrong terminogy here. The sz
argument might not be a scalar number (directly stored in the argument word using tagging), but it could be a heap-allocated bignum (where the argument stores the pointer). In that case lean_unbox
would return the pointer (right-shifted by one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, we are assuming the map has fewer than 2^63
(or whatever the largest directly storable value is) entries, so perhaps sz : USize
would be better.
I added some higher-level lemmas which however need |
praise be to mario
Co-authored-by: Mario Carneiro <di.gama@gmail.com>
41418a1
to
6cac10c
Compare
This PR has rotten, I will readd hashmaps in a new one after #89. |
List.Perm
. I was initially hoping to do a more principled port to mathport to begin with, but its dependency tree turned out to be a bit deep, so that might have to wait.