feat: Splay Tree Formalisation#568
Conversation
|
Link to discussion thread: https://leanprover.zulipchat.com/#narrow/channel/513188-CSLib/topic/Splay.20tree.20PR/with/595534315 |
|
|
||
| variable {α : Type} | ||
|
|
||
| inductive BinaryTree (α : Type) where |
There was a problem hiding this comment.
It seems this is the same as Mathlib.Data.Tree.Basic?
There was a problem hiding this comment.
There is a stylistic difference.
| node (left : BinaryTree α) (key : α) (right : BinaryTree α)
The mathlib version looks like:
| node (key : α) (left : BinaryTree α) (right : BinaryTree α)
So it is about pre-order vs. in-order representation. When I formalized BST stuff, I found it more convenient to think about it in an in-order version.
There was a problem hiding this comment.
For example, writing rotation
def rotateLeft : BinaryTree α → BinaryTree α
| .node a x (.node b y c) => .node (.node a x b) y c
| t => t
It is immediately clear from the code that you perform a left rotation. Having a pre-order version in mathlib does not look good when you write about rotations or double rotations.
There was a problem hiding this comment.
I understand your argument, but I don't think having a different order of constructors warrants this duplication.
| /-! ### BST Structure -/ | ||
| section BSTStructure | ||
|
|
||
| structure BST (α : Type) [LinearOrder α] where |
There was a problem hiding this comment.
I'm not convinced by the bundling here.
There was a problem hiding this comment.
Can you elaborate? What is bad? What should be a better choice?
There was a problem hiding this comment.
Chris is right. This is an unnecessary bundling. You can operate solely on BinaryTree and insert IsBST as a hypothesis in those theorems that need it.
There was a problem hiding this comment.
Most of the proofs in the PR were based on a separate tree and IsBST property.
We create an API specifically for people who can use BST. In some use cases, it is more convenient to refer to a BST as a type rather than as a single tree with its properties. Sometimes you don't want to carry these properties around all the time.
There was a problem hiding this comment.
Please do ask on Zulip if you'd like to hear other opinions, but unbundling these sort of propositions is a well-established best practice. I think this should be a Prop on Mathlib's (nearly identical) definition of trees.
There was a problem hiding this comment.
Most of the proofs in the PR were based on a separate tree and IsBST property.
We create an API specifically for people who can use BST. In some use cases, it is more convenient to refer to a BST as a type rather than as a single tree with its properties. Sometimes you don't want to carry these properties around all the time.
Consider the fact that a library PR must be built with future re-use in mind, beyond just this PR. If you create a tree definition it can be reused in several places where it need not be a bst. Secondly, this means you can actually work with trees from mathlib.
There was a problem hiding this comment.
These files need to be modules.
Shreyas4991
left a comment
There was a problem hiding this comment.
Summary : When writing functional data structures, it is good practice to provide standard functional data structure API such as maps and folds, and then API lemmas over them. Also you are restating an induction principle of BinaryTree. This is redundant.
| /-! ### Tree Invariants and BST Properties -/ | ||
| section Invariants | ||
|
|
||
| inductive ForallTree (p : α → Prop) : BinaryTree α → Prop |
There was a problem hiding this comment.
This is just the induction principle of BinaryTree stated in a convoluted way.
There was a problem hiding this comment.
I dont think that ForallTree p _ is the induction principle of BinaryTree. it is the tree analogue of List.Forall p _ (for example see ForallTree_iff_toKeyList in Correctness.lean for the equivalence with the list-based characterisation.)
Keeping it makes pattern-matching easier on the tree constructors and goes well with cases/induction tactics in rotation and BST-preservation proofs.
There was a problem hiding this comment.
I think the main purpose of List.Forall is to play fun defeq tricks; so I think keeping this is defensible, but perhaps choosing \forall x \mem t, p x as the simp-normal form is better than using t.Forall p
There was a problem hiding this comment.
I will keep this definition as Eric suggested.
| ForallTree p r → | ||
| ForallTree p (.node l key r) | ||
|
|
||
| inductive IsBST [LinearOrder α] : BinaryTree α → Prop |
There was a problem hiding this comment.
See the design of Batteries RBMap for defining these kinds of functions. Ideally this should be defined through a fold function. The first step is of course to write the map and fold functions and API lemmas for them. See RBMap in Batteries and lean core for examples.
There was a problem hiding this comment.
When you instruct someone to do X, please justify/explain why X is better. In particular, I would appreciate #mwe to illustrate your point. Otherwise, I don't understand why the fold function is better than the ForAll version.
There was a problem hiding this comment.
It is not common practice to offer MWEs in PR reviews. The purpose of MWEs is to present debugging samples in one's own code. Not suggest designs in a review.
Happy to elaborate on folds in the thread.
Briefly, fold functions are a general class of functions that capture the following common recursion pattern on recursive data types: given a binary operation, traverse the data structure recursively and accumulate the results of applying the binary operation. In Mathlib is rich in examples of using foldl and foldr. They are derived by defining instances of the 'Traversable' typeclass
There was a problem hiding this comment.
I don't quite see what you're asking for @Shreyas4991; the most obvious fold function for trees is just the builtin recursion principle provided by inductive. If you're looking for a linear fold that traverses the tree in pre/post/infix order, then I would lean towards skipping it and just writing the toPreList, toPostList, and toInfixList operators and let the user fold on those.
There was a problem hiding this comment.
Concretely I am asking for Forall and co to be replaced by the fold version of the recursive functions in the linked thread below.
https://leanprover.zulipchat.com/#narrow/channel/513188-CSLib/topic/Splay.20tree.20PR/near/595950421
They all follow the same pattern of fold (except fold has one accumulator parameter which makes defining them with foldl/foldr simpler)
| end Invariants | ||
|
|
||
|
|
||
| /-! ### Accessor Lemmas for ForallTree -/ |
There was a problem hiding this comment.
This entire section should fall out from API lemmas for fold and map.
| /-! ### BST Structure -/ | ||
| section BSTStructure | ||
|
|
||
| structure BST (α : Type) [LinearOrder α] where |
There was a problem hiding this comment.
Chris is right. This is an unnecessary bundling. You can operate solely on BinaryTree and insert IsBST as a hypothesis in those theorems that need it.
| Caution: If the search fails, we do not rotate (as currently | ||
| defined in splay) the empty leaf and start to rotate from | ||
| its ancestor, so the cost is path.length - 1. -/ | ||
| def splay.cost [LinearOrder α] (t : BinaryTree α) (q : α) : ℝ := |
There was a problem hiding this comment.
| def splay.cost [LinearOrder α] (t : BinaryTree α) (q : α) : ℝ := | |
| def splay.cost [LinearOrder α] (t : BinaryTree α) (q : α) : Nat := |
no need to make this unprintable.
- Removed the BSTAPI import in Cslib.lean. Leftover from when the file was split, and it's what was breaking the build (the file doesn't exist). - Made Basic/Correctness/Complexity modules. - Cost is ℕ now instead of ℝ. - Dropped a few unused helper lemmas that were just proof leftovers. I also found some lemmas that aren't used yet but kept them: the potential monotonicity ones and the search-path-length lemmas. They're clean general API rather than scaffolding, so I'd rather leave them in. Happy to discuss removing them if you disagree. Builds clean locally, mk_all --check passes.
fix lint errors surfaced once the build passed - renamed defs with underscores: num_nodes→nodeCount, search_path_len→ searchPathLen, BST_contains→bstContains, sequence_cost→sequenceCost - named the anonymous Membership instance - added missing docstrings (rotateRight/rotateLeft, IsBST, Dir, Frame fields, descend.go, nodeCount) - dropped two redundant @[simp]s (nodeCount/toKeyList_applyChild_bringUp) - moved the △[] notation inside namespace Tree
|
I pushed the fixes from the review, CI's green now. Here is what we changed:
On I also left in a few currently-unused lemmas (potential monotonicity, search-path length) since they read as general API. Happy to drop them if you'd rather. |
|
Edited the original PR description to drop the now-removed BSTAPI section and fix the module count to three. |
This PR introduces Splay Trees to CSLib, the algorithmic definitions, the correctness proofs, and the (amortised) complexity analysis.
Design & Architecture: The implementation is partitioned into three modules to isolate dependencies.
Basic: Core definitions (splay, splayUp, descend, Frame). Primitive rotations are upstreamed to BinaryTree.
Correctness:
Complexity: We formalise the Sleator-Tarjan potential method.
Why Bottom-Up? (Comparison with Top-Down):
There is a complementary top-down implementation available for reference here. This PR utilises a bottom-up approach because it reduces the length of the formalisation:
Co-authored-by: Anton Kovsharov antonkov@google.com
Co-authored-by: Antoine du Fresne von Hohenesche antoine@du-fresne.ch
Co-authored-by: Sorrachai Yingchareonthawornchai sorrachai.cp@gmail.com