Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prove correctness of HashMap.findEntry?/insert/erase #38

Closed
wants to merge 26 commits into from

Conversation

Vtec234
Copy link
Member

@Vtec234 Vtec234 commented Nov 27, 2022

  • The PR includes a partial port of List.Perm. I was initially hoping to do a more principled port to mathport to begin with, but its dependency tree turned out to be a bit deep, so that might have to wait.

Std/Data/HashMap/Lemmas.lean Outdated Show resolved Hide resolved
have ⟨p, hMem, hP⟩ := any_eq_true.mp (AssocList.contains_eq a _ ▸ hContains)
simp [eraseP_append_right _ hL₁,
eraseP_append_left (p := fun ab => ab.fst == a) hP _ hMem]
-- begin cursed manual proofs
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the gnarliest proof. Golfing welcome!

This was referenced Nov 27, 2022
@Vtec234
Copy link
Member Author

Vtec234 commented Nov 29, 2022

Ok @digama0, I backported the extern implementation of mkIdx using & but not the specification. I think this is justifiable because they are provably equal under the assumption that sz.isPowerOfTwo, but I left a TODO to fix the spec up.

⟨u % n, USize.modn_lt _ h⟩
/-- Calculates the bucket index from a `hash` value. -/
/- Remark: we use a C implementation because this function is performance critical. -/
@[extern c inline "(size_t)(#2) & (lean_unbox(#1) - 1)"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be extremely surprised if this makes a difference in practice, and wouldn't believe it without actual benchmarks. Leo hasn't seen any performance improvements from this change in Lean 4 either. (It's also incorrect, because sz might be boxed in general.)

The enormous practical downside of this change is of course that you can no longer use std4 without compiling it. We didn't require that before.

(If you're concerned about performance, I'd be more concerned about using hashes directly. The Hashable instances are not particularly clever and often return results divisible by powers of two. See also leanprover/lean4#1840)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also incorrect, because sz might be boxed in general.

I'm confused by this point; there is a lean_unbox call, is that insufficient?

The enormous practical downside of this change is of course that you can no longer use std4 without compiling it.

It's true! It is surprising to me that the interpreter does not fallback on the Lean implementation when it encounters an extern symbol with no code loaded. Maybe it should do this instead of crashing? Anyhow I will revert the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this point; there is a lean_unbox call, is that insufficient?

I'm sorry, picked the wrong terminogy here. The sz argument might not be a scalar number (directly stored in the argument word using tagging), but it could be a heap-allocated bignum (where the argument stores the pointer). In that case lean_unbox would return the pointer (right-shifted by one).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, we are assuming the map has fewer than 2^63 (or whatever the largest directly storable value is) entries, so perhaps sz : USize would be better.

@Vtec234 Vtec234 marked this pull request as draft December 6, 2022 20:47
@Vtec234
Copy link
Member Author

Vtec234 commented Dec 6, 2022

I added some higher-level lemmas which however need List theorems that are mostly already in mathlib. Rather than porting more stuff adhoc, I am marking this as draft since it may as well wait for Data.List.Perm to be mathported in full.

@semorrison semorrison added the WIP work in progress label Aug 8, 2023
@leanprover-community-mathlib4-bot leanprover-community-mathlib4-bot added the merge-conflict This PR has merge conflicts with the `main` branch which must be resolved by the author. label Aug 21, 2023
@Vtec234
Copy link
Member Author

Vtec234 commented Sep 17, 2023

This PR has rotten, I will readd hashmaps in a new one after #89.

@Vtec234 Vtec234 closed this Sep 17, 2023
@Vtec234 Vtec234 deleted the hashmap-op-lemmas branch September 24, 2023 19:47
@Vtec234 Vtec234 mentioned this pull request Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-conflict This PR has merge conflicts with the `main` branch which must be resolved by the author. WIP work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants