Make Tree Borrows Provenance GC compact the tree #3837

JoJoDeveloping · 2024-08-23T20:38:13Z

Follow-up on #3833 and #3835. In these PRs, the TB GC was fixed to no longer cause a stack overflow. One test that motivated it was the test fill::horizontal_line in tiny-skia. But not causing stack overflows was not a large improvents, since it did not fix the fundamental issue: The tree was too large. The test now ran, but it required gigabytes of memory and hours of time (only for it to be OOM-killed 🤬), whereas it finishes within 24 seconds in Stacked Borrows. With this merged, it finishes in about 40 seconds under TB.

The problem in that test was that it used slice::chunked to iterate a slice in chunks. That iterator is written to reborrow at each call to next, which creates a linear tree with a bunch of intermediary nodes, which also fragments the RangeMap for that allocation.

The solution is to now compact the tree, so that these interior nodes are removed. Care is taken to not remove nodes that are protected, or that otherwise restrict their children.

I am currently only 99% sure that this is sound, and I do also think that this could compact even more. So @Vanille-N please also have a look at whether I got the compacting logic right.

For a more visual comparison, here is a gist of what the tree looks like at one point during that test, with and without compacting.

This new GC requires a different iteration order during accesses (since the current one can make the error messages non-deterministic), so it is rebased on top of #3843 and requires that PR to be merged first.

JoJoDeveloping · 2024-08-23T20:40:25Z

src/borrow_tracker/tree_borrows/perms.rs

+    pub fn can_be_replaced_by_child(self, child: Self) -> bool {
+        match self.inner.partial_cmp(&child.inner) {
+            Some(Ordering::Less) | Some(Ordering::Equal) => true,
+            Some(Ordering::Greater) | None => false,
+        }
+    }


This is the magic function. I think that this is sound, i.e. if it returns true, then the child can indeed replace the parent.

Here is an example: The parent is Active, and the child is Frozen. On a foreign write, both go to Disabled. On a foreign read, both go to Frozen. On a child read, both remain; and the Frozen tag blocks child writes. So the Active has no effect and can be compacted by replacing it with its child.

JoJoDeveloping · 2024-08-23T20:41:29Z

src/borrow_tracker/tree_borrows/perms.rs

+    pub fn can_be_replaced_by_child(self, child: Self) -> bool {
+        match self.inner.partial_cmp(&child.inner) {
+            Some(Ordering::Less) | Some(Ordering::Equal) => true,
+            Some(Ordering::Greater) | None => false,
+        }
+    }


I also don't think this function is complete. For once, if the parent is ReservedIM, and the child is Reserved, we can still replace the parent by the child, whereas partial_cmp says that they are incomparable.

Note that the handling of Reserved {conflicted: true} parents is very subtle:

Usually, it's not OK to replace a conflicted Reserved by an Active. But a Reserved never has Active children anyways (as all Active have Active parents), and further, since we know the parent is not protected, we know that the conflictedness does not matter anyway

This exposes another way the method is incomplete: a conflicted Reserved parent can be replaced by a non-conflicted Reserved child, since the parent's conflictedness no longer matters, as the parent is not protected. But this is not how partial_cmp works.

(Note that I am highlighting all these cases to show that the reasoning here is indeed subtle in many cases)

partial_cmp on permissions denotes reachability, and is not by definition tied to the amount of UB that the permission can produce. I like the idea of can_be_replaced_by_child but I think the above implementation suggests that the replaceability derives from partial_cmp when it's actually very dependent on invariants of the tree. I think it should be decoupled from partial_cmp and hand-written to explicitly show all compactable cases with their justification.

For example the fact that you can compact an Active parent with a Frozen child is entirely dependent on whether the Active is protected, so there are at minimum many unwritten assumptions in can_be_replaced_by_child about the fact that protected tags can't be GC'd and the valid parent-child combinations.
If we make the match explicit we can add a sanity assert(!parent_prot) in the parent: Active, child: Frozen case.

Alright, the diff had cut off the comment above so I hadn't seen that the parent not being protected is at least documented. I still think we should decouple replaceability from reachability.

Yes, we do not remove protected parents. Such parents should never be GC'd anyways since they still have a protector end write coming.

Note that I do think there's a deeper connection between reachability and replaceability: Permissions that come "later" always have "less" permissions than those that come earlier. It should always be sound to replace a node by one that's further ahead in the state machine (protectors notwithstanding). While this is not "by definition," it is the underlying principle making TB work. But I agree that it should probably be decoupled in code.

JoJoDeveloping · 2024-08-23T20:44:09Z

src/borrow_tracker/tree_borrows/tree.rs

+        if global.borrow().protected_tags.contains_key(&child.tag) {
+            // todo think really hard whether we can't just handle this as we would regularly
+            return None;
+        }


I am not sure if this check is necessary. I reckon that it is not, i.e. we can merge protected tags up. Part of the intuition is:

Almost all the protector does is make more transitions UB. Since we do not remove the protector, these same transitions remain UB

The one exception is that Reserved becomes conflicted on foreign accesses -- but this should not cause any issues, since the foreign read still happens similarly on the child.

We still know the parent is not protected

JoJoDeveloping · 2024-08-23T20:58:12Z

We should probably have a definition of replaceable(parent, child) and then exhaustively test that:

Let p_parent and p_child be two permissions such that replaceable(p_parent, p_child), and where p_parent is not protected. Then all accesses to them either

move them further along such that afterwards replaceable(p_parent', p_child') (they remain in simulation)
cause UB. In that case, the UB must also be caused if only p_child is accessed in that manner.

~~@Vanille-N can you add such a test? Note that replaceable is this method.~~ This has been added now.

JoJoDeveloping · 2024-08-23T23:47:12Z

The tests are failing for several reasons, all related to the fact that it's a "GC-stress" mode test where the GC runs every tick.

First, linux-futex.rs fails because (presumably) the GC now does a bunch more work, so things get delayed, and the time taken for an operation is outside of the acceptible range.

The other tests because the GC messes with the tree structure, combined with a suboptimal tree visiting order when checking accesses. To check an access, one currently starts at the root, and first checks that all nodes from there downwards support a child access. If the violating node (i.e. one not supporting the access) is such a parent, the error message will point this out (and print information about both tags). But quite often, this parent is simply the immediate parent of the node that was accessed, and not used otherwise, so the GC has merged it with the child. Since then it's not a violating parent but the node itself, the error message changes. And this makes the UI tests fail.

My solution would be to change the iteration order to go upwards but not downwards, which would be resilient to GC tree compactings. The comments currently in code seem to imply that the current iteration order has a deeper meaning, but sadly without going any deeper. @Vanille-N do you know more?

Vanille-N · 2024-08-24T06:18:37Z

My solution would be to change the iteration order to go upwards but not downwards, which would be resilient to GC tree compactings. The comments currently in code seem to imply that the current iteration order has a deeper meaning, but sadly without going any deeper. @Vanille-N do you know more?

Sure. The current iteration order is of course a heuristic (TB doesn't specifically require an order to these) in an attempt to quickly identify the "root cause" of the issue, with a priority for more straightforward errors.

Insufficient permissions are more straightforward than protector violations, so we check parents before children.
The hidden assumption here is that parents can only trigger UB of the kind "insufficient permissions", and children can only trigger UB of the kind "protector violation".

As for the iteration order (top-down), the reasoning is that if we have x parent of y parent of z all Disabled and we do a child access through z, we are going to get the same kind of UB whichever we choose to report, but if we report z and the user fixes just z then we still get UB on y on the next run. At the time of writing this code I guessed that z was more likely to be the root cause of the bug and get the user to fix the UB in fewer steps. I don't have concrete evidence this is mostly guesswork.

I recognize that the combination of this iteration order and this tree compacting strategy can lead to nondeterministic error messages. Fixing that should take absolute priority over whatever diagnostics heuristics is implemented, and any iteration order will probably still give good enough messages.

Vanille-N · 2024-08-24T06:41:30Z

Finally, from what I understand (I haven't looked at the detail of your compaction logic yet), you seem to be keeping the child and deleting the parent. This is not the only compacting logic possible.

The one I had in mind originally was to do the opposite : look at the parent to determine if we can delete the child. This removes the constraint of "if the parent is a singleton" because it preserves the ancestry relation even in branching trees.
For example it seems to me that your approach is not capable of compacting

0: Frozen
   |-- 1: Frozen (GC)
   |      |-- 2: Frozen
   |      '-- 3: Frozen
   '-- 4: Frozen (GC)
          |-- 5: Frozen
          '-- 6: Frozen

into

0: Frozen
   |-- 2: Frozen
   |-- 3: Frozen
   |-- 5: Frozen
   '-- 6: Frozen

I don't doubt that your bottom-up compacting can merge some configurations that a top-down approach cannot, and maybe those are more frequent, but the point is that the iteration order that I chose is compatible with the compacting logic that I initially had in mind.

Vanille-N · 2024-08-24T06:48:16Z

Also since wide trees cannot get very deep, it's probably fine to have a compacting logic that can't handle branching if we observe that issues come mostly from concrete examples where the tree is deep.

JoJoDeveloping · 2024-08-24T10:12:00Z

The one I had in mind originally was to do the opposite : look at the parent to determine if we can delete the child. This removes the constraint of "if the parent is a singleton" because it preserves the ancestry relation even in branching trees.

Compared with that approach, doing it like I do here has several advantages:

It is in-place: Since we only ever move a child up, we don't need to extend the array of children
It is faster: Since we only check single-child nodes, there will never be a case where we check a long array only to realize we can't compact at the last node.
It preserves the branching structure: In other words, not flattening nodes with multiple children can be seen as a feature.

Also note that the number of additional nodes that your approach would remove, but mine would not, is bounded: My approach produces trees at most twice as large as yours. The reason is that in a binary tree, the number of interior nodes is at most the number of leaves. So the efficiency gains offered by an even more collapsing GC are constant.

JoJoDeveloping · 2024-08-24T17:17:22Z

Also since wide trees cannot get very deep, it's probably fine to have a compacting logic that can't handle branching if we observe that issues come mostly from concrete examples where the tree is deep.

Note that further optimizations are possible elsewhere to handle wide trees. For instance, those implemented in this branch, which makes reads and writes skip more subtrees. But I did not see them cause a huge speedup, and they also affect diagnostics in rare cases, so I did not PR them yet.

RalfJung · 2024-08-24T17:23:32Z

We should probably have a definition of replaceable(parent, child) and then exhaustively test that:

Let p_parent and p_child be two permissions such that replaceable(p_parent, p_child), and where p_parent is not protected. Then all accesses to them either
* move them further along such that afterwards `replaceable(p_parent', p_child')` (they remain in simulation)

* cause UB. In that case, the UB must also be caused if only `p_child` is accessed in that manner.
@Vanille-N can you add such a test? Note that replaceable is this method.

Is this the right check? "Remain in simulation" is very weak. For instance, juts making replaceable always return true seems to satisfy this check, but is obviously not a valid implementation for replaceable.

JoJoDeveloping · 2024-08-24T17:25:58Z

Is this the right check? "Remain in simulation" is very weak. For instance, juts making replaceable always return true seems to satisfy this check, but is obviously not a valid implementation for replaceable.

Such an implementation would fail the second condition (e.g. if the parent is Frozen and to be replaced by a child being Active, child writes will be UB only at the parent). And I think that the relation we are looking for here is precisely "the largest simulation where UB at the child implies UB at the parent."

Note that whether or not something causes UB is the only observable action here. So formally, the parent must simulate the child, including the child's steps to UB.

In preparation for rust-lang#3837, the tree traversal needs to be made bottom-up, because the current top-down tree traversal, coupled with that PR's changes to the garbage collector, can introduce non-deterministic error messages if the GC removes a parent tag of the accessed tag that would have triggered the error first. This is a breaking change for the diagnostics emitted by TB. The implemented semantics stay the same.

RalfJung · 2024-08-27T11:25:50Z

Such an implementation would fail the second condition (e.g. if the parent is Frozen and to be replaced by a child being Active, child writes will be UB only at the parent).

Your comment says they have to satisfy one of the conditions, and the first one is trivially satisfied. So it's okay to violate the second condition, the entire soundness criterion you are suggesting is still satisfied as far as I can tell.

JoJoDeveloping · 2024-08-27T11:36:23Z

Your comment says they have to satisfy one of the conditions, and the first one is trivially satisfied. So it's okay to violate the second condition, the entire soundness criterion you are suggesting is still satisfied as far as I can tell.

I mean you either move along or cause UB, you can't have both at the same time, so that's why it's "either."

RalfJung · 2024-08-27T11:53:03Z

Ah, "move along" implies "without UB". :)

bors · 2024-08-27T12:50:47Z

☔ The latest upstream changes (presumably #3847) made this pull request unmergeable. Please resolve the merge conflicts.

In preparation for rust-lang#3837, the tree traversal needs to be made bottom-up, because the current top-down tree traversal, coupled with that PR's changes to the garbage collector, can introduce non-deterministic error messages if the GC removes a parent tag of the accessed tag that would have triggered the error first. This is a breaking change for the diagnostics emitted by TB. The implemented semantics stay the same.

Make TB tree traversal bottom-up In preparation for #3837, the tree traversal needs to be made bottom-up, because the current top-down tree traversal, coupled with that PR's changes to the garbage collector, can introduce non-deterministic error messages if the GC removes a parent tag of the accessed tag that would have triggered the error first. This is a breaking change for the diagnostics emitted by TB. The implemented semantics stay the same.

Follow-up on rust-lang#3833 and rust-lang#3835. In these PRs, the TB GC was fixed to no longer cause a stack overflow. One test that motivated it was the test `fill::horizontal_line` in `tiny_skia`. But not causing stack overflows was not a large improvents, since it did not fix the fundamental issue: The tree was too large. The test now ran, but it required gigabytes of memory and hours of time, whereas it finishes within seconds in Stacked Borrows. The problem in that test was that it used [`slice::chunked`](https://doc.rust-lang.org/std/primitive.slice.html#method.chunks) to iterate a slice in chunks. That iterator is written to reborrow at each call to `next`, which creates a linear tree with a bunch of intermediary nodes, which also fragments the `RangeMap` for that allocation. The solution is to now compact the tree, so that these interior nodes are removed. Care is taken to not remove nodes that are protected, or that otherwise restrict their children.

JoJoDeveloping · 2024-08-27T18:27:38Z

I squashed all the comments here. The reason is that I made some changes to the overall garbage collector infrastructure in the first commit here, that I undid in one of the later commits (because turns out it's unnecessary), but when rebasing to the master branch where the GC also changed, git got confused, and I could not be bothered to correctly replay my changes as I knew that a later commit would undo them.

bench-cargo-miri/slice-chunked/src/main.rs

RalfJung

I still have to check the tests

src/borrow_tracker/tree_borrows/perms.rs

RalfJung · 2024-08-28T10:18:47Z

src/borrow_tracker/tree_borrows/perms.rs

+            // Active can be replaced by Frozen, since it is not protected
+            (Active, Frozen) => true,
+            (Active, Disabled) => true,
+            // Frozen can only be replaced by Disabled


Suggested change

// Frozen can only be replaced by Disabled

// Frozen can only be replaced by Disabled (and itself).

RalfJung · 2024-08-28T10:18:54Z

src/borrow_tracker/tree_borrows/perms.rs

+            (Active, ReservedIM) => false,
+            (Active, ReservedFrz { .. }) => false,
+            (Active, Active) => true,
+            // Active can be replaced by Frozen, since it is not protected


Suggested change

// Active can be replaced by Frozen, since it is not protected

// Active can be replaced by Frozen, since it is not protected.

RalfJung · 2024-08-28T10:19:00Z

src/borrow_tracker/tree_borrows/perms.rs

+            (Frozen, Frozen) => true,
+            (Frozen, Disabled) => true,
+            (Frozen, _) => false,
+            // Disabled can not be replaced by anything else


Suggested change

// Disabled can not be replaced by anything else

// Disabled can not be replaced by anything else.

RalfJung · 2024-08-28T10:20:00Z

src/borrow_tracker/tree_borrows/tree.rs

@@ -128,6 +128,22 @@ impl LocationState {
        Ok(transition)
    }

+    /// Like `perform_access`, but ignores the diagnostics, and also is pure.


Suggested change

/// Like `perform_access`, but ignores the diagnostics, and also is pure.

/// Like `perform_access`, but ignores the concrete error cause and also uses state-passing

/// rather than a mutable reference.

src/borrow_tracker/tree_borrows/tree.rs

RalfJung · 2024-08-28T10:26:04Z

src/borrow_tracker/tree_borrows/tree.rs

+    /// should have no children, but this is not checked, so that nodes
+    /// whose children were rotated somewhere else can be deleted without
+    /// having to first modify them to clear that array.
+    /// otherwise (i.e. the GC should have marked it as removable).


The last line doesn't sound like a sentence...?

src/borrow_tracker/tree_borrows/tree.rs

RalfJung · 2024-08-28T12:21:09Z

Looks good, thanks. :)

@bors r+

bors · 2024-08-28T12:21:12Z

📌 Commit ff0bc0f has been approved by RalfJung

It is now in the queue for this repository.

bors · 2024-08-28T12:22:20Z

⌛ Testing commit ff0bc0f with merge d1aa077...

bors · 2024-08-28T12:49:02Z

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing d1aa077 to master...

JoJoDeveloping commented Aug 23, 2024

View reviewed changes

JoJoDeveloping mentioned this pull request Aug 25, 2024

Make TB tree traversal bottom-up #3843

Merged

JoJoDeveloping force-pushed the tb-compacting-provenance-gc branch from 96e6c48 to 8d9e92b Compare August 25, 2024 15:47

JoJoDeveloping force-pushed the tb-compacting-provenance-gc branch from 8d9e92b to 2f98d27 Compare August 25, 2024 16:53

JoJoDeveloping force-pushed the tb-compacting-provenance-gc branch from 2f98d27 to 3acb87a Compare August 27, 2024 18:22

JoJoDeveloping force-pushed the tb-compacting-provenance-gc branch from 3acb87a to 1cd7317 Compare August 27, 2024 18:26

JoJoDeveloping marked this pull request as ready for review August 27, 2024 18:59

RalfJung reviewed Aug 28, 2024

View reviewed changes

bench-cargo-miri/slice-chunked/src/main.rs Show resolved Hide resolved

Add benchmark for TB slowdown

203a7a3

JoJoDeveloping force-pushed the tb-compacting-provenance-gc branch from a395a0d to 203a7a3 Compare August 28, 2024 10:21

RalfJung reviewed Aug 28, 2024

View reviewed changes

src/borrow_tracker/tree_borrows/tree.rs Outdated Show resolved Hide resolved

address nits

ff0bc0f

bors merged commit d1aa077 into rust-lang:master Aug 28, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Tree Borrows Provenance GC compact the tree #3837

Make Tree Borrows Provenance GC compact the tree #3837

JoJoDeveloping commented Aug 23, 2024 •

edited

Loading

JoJoDeveloping Aug 23, 2024 •

edited

Loading

JoJoDeveloping Aug 23, 2024

JoJoDeveloping Aug 23, 2024 •

edited

Loading

Vanille-N Aug 24, 2024

Vanille-N Aug 24, 2024 •

edited

Loading

JoJoDeveloping Aug 24, 2024 •

edited

Loading

JoJoDeveloping Aug 23, 2024 •

edited

Loading

JoJoDeveloping commented Aug 23, 2024 •

edited

Loading

JoJoDeveloping commented Aug 23, 2024 •

edited

Loading

Vanille-N commented Aug 24, 2024

Vanille-N commented Aug 24, 2024 •

edited

Loading

Vanille-N commented Aug 24, 2024

JoJoDeveloping commented Aug 24, 2024

JoJoDeveloping commented Aug 24, 2024 •

edited

Loading

RalfJung commented Aug 24, 2024

JoJoDeveloping commented Aug 24, 2024 •

edited

Loading

RalfJung commented Aug 27, 2024

JoJoDeveloping commented Aug 27, 2024

RalfJung commented Aug 27, 2024

bors commented Aug 27, 2024

JoJoDeveloping commented Aug 27, 2024 •

edited

Loading

RalfJung left a comment

RalfJung Aug 28, 2024

RalfJung Aug 28, 2024

RalfJung Aug 28, 2024

RalfJung Aug 28, 2024

RalfJung Aug 28, 2024

RalfJung commented Aug 28, 2024

bors commented Aug 28, 2024

bors commented Aug 28, 2024

bors commented Aug 28, 2024

	// Frozen can only be replaced by Disabled
	// Frozen can only be replaced by Disabled (and itself).

	// Active can be replaced by Frozen, since it is not protected
	// Active can be replaced by Frozen, since it is not protected.

	// Disabled can not be replaced by anything else
	// Disabled can not be replaced by anything else.

	/// Like `perform_access`, but ignores the diagnostics, and also is pure.
	/// Like `perform_access`, but ignores the concrete error cause and also uses state-passing
	/// rather than a mutable reference.

Make Tree Borrows Provenance GC compact the tree #3837

Make Tree Borrows Provenance GC compact the tree #3837

Conversation

JoJoDeveloping commented Aug 23, 2024 • edited Loading

JoJoDeveloping Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

JoJoDeveloping Aug 23, 2024

Choose a reason for hiding this comment

JoJoDeveloping Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

Vanille-N Aug 24, 2024

Choose a reason for hiding this comment

Vanille-N Aug 24, 2024 • edited Loading

Choose a reason for hiding this comment

JoJoDeveloping Aug 24, 2024 • edited Loading

Choose a reason for hiding this comment

JoJoDeveloping Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

JoJoDeveloping commented Aug 23, 2024 • edited Loading

JoJoDeveloping commented Aug 23, 2024 • edited Loading

Vanille-N commented Aug 24, 2024

Vanille-N commented Aug 24, 2024 • edited Loading

Vanille-N commented Aug 24, 2024

JoJoDeveloping commented Aug 24, 2024

JoJoDeveloping commented Aug 24, 2024 • edited Loading

RalfJung commented Aug 24, 2024

JoJoDeveloping commented Aug 24, 2024 • edited Loading

RalfJung commented Aug 27, 2024

JoJoDeveloping commented Aug 27, 2024

RalfJung commented Aug 27, 2024

bors commented Aug 27, 2024

JoJoDeveloping commented Aug 27, 2024 • edited Loading

RalfJung left a comment

Choose a reason for hiding this comment

RalfJung Aug 28, 2024

Choose a reason for hiding this comment

RalfJung Aug 28, 2024

Choose a reason for hiding this comment

RalfJung Aug 28, 2024

Choose a reason for hiding this comment

RalfJung Aug 28, 2024

Choose a reason for hiding this comment

RalfJung Aug 28, 2024

Choose a reason for hiding this comment

RalfJung commented Aug 28, 2024

bors commented Aug 28, 2024

bors commented Aug 28, 2024

bors commented Aug 28, 2024

JoJoDeveloping commented Aug 23, 2024 •

edited

Loading

JoJoDeveloping Aug 23, 2024 •

edited

Loading

JoJoDeveloping Aug 23, 2024 •

edited

Loading

Vanille-N Aug 24, 2024 •

edited

Loading

JoJoDeveloping Aug 24, 2024 •

edited

Loading

JoJoDeveloping Aug 23, 2024 •

edited

Loading

JoJoDeveloping commented Aug 23, 2024 •

edited

Loading

JoJoDeveloping commented Aug 23, 2024 •

edited

Loading

Vanille-N commented Aug 24, 2024 •

edited

Loading

JoJoDeveloping commented Aug 24, 2024 •

edited

Loading

JoJoDeveloping commented Aug 24, 2024 •

edited

Loading

JoJoDeveloping commented Aug 27, 2024 •

edited

Loading