Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce RefCell::try_borrow_unguarded #59211

Merged
merged 4 commits into from Apr 11, 2019

Conversation

@nox
Copy link
Contributor

commented Mar 15, 2019

Come sit next to the fireplace with me, this is going to be a long story.

So, you may already be aware that Servo has weird design constraints that forces us developers working on it to do weird things. The thing that interests us today is that we do layout on a separate thread with its own thread pool to do some things in parallel, whereas the data it uses comes from the script thread, which implements the entire DOM and related pieces, with !Sync data types such as RefCell<T>.

The invariant we maintain is that script does not do anything ever with the DOM data as long as layout is doing its job. That's all nice and all, but one thing we don't ensure is that we don't actually know if script was currently mutably borrowing some RefCell<T> prior to starting layout, which may lead to aliasing mutable memory and obviously undefined behaviour.

This PR reinstates RefCell::borrow_state so that this method can make use of it and return None if the cell was mutably borrowed.

Cc @SimonSapin

nox added some commits Mar 15, 2019

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Mar 15, 2019

r? @KodrAus

(rust_highfive has picked a reviewer for you, use r? to override)

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Mar 15, 2019

I might be misunderstanding, but is there a reason you can't have the function call borrow or try_borrow? And then do the unsafe cast to extend the lifetime of the reference returned as appropriate?

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 15, 2019

We call DomRefCell::borrow_for_layout from multiple threads in parallel. The layout threads. Using RefCell::try_borrow would potentially change the borrow flag from multiple threads at once non-atomically. I need to observe it without changing it.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Mar 15, 2019

Ah, I see. On the other hand, RefCell doesn't store the state atomically/via locking so I'm not sure it's safe to read the borrow state without some form of locking either?

I think the only reason the as_ptr call technically doesn't cause any sort of problem is that we're only returning a shared reference.

cc @RalfJung as well, might have relevance for stacked borrows and UCG in general

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 15, 2019

To send a value over a channel to a different thread, that channel already must do some synchronisation to ensure that memory accesses aren't observed in the wrong order etc, that's independent from what RefCell::borrow_state would allow me to do here.

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 15, 2019

Let me try to explain the use case in depth:

We can't get &mut values for the entire DOM at once

In Servo, all of DOM (Node, Element, NodeList, etc) is written in Rust but the actual struct values are stored and owned by JavaScript objects, which obviously are owned by the JavaScript engine. The lifecycle of the values is totally managed by the engine, and they will be freed only when no other JavaScript value points to them; furthermore, a lot of these values define cycles on the Rust side (children point to parents and parents point to children), let alone on the JS side where the sky is the limit. I hope that makes it clear that we can't avoid inner mutability.

Layout needs Sync data

Layout is a core piece of a Web browser, and one of the things that define Servo is that our layout component does its job parallelising a damn lot of things, which by definition means it needs Sync data. The problem is that one of its inputs is the DOM itself.

DOM cannot use expensive atomic data structures

DOM is the damn thing that JS accesses whenever it uses Element.getAttribute etc, and attributes and many other things in DOM are mutable. Given how crucial it is for DOM operations to be fast, we cannot afford to use atomic data structures when the JS itself is single-threaded and we only need Sync things when running layout.

So what do we do?

Servo currently defines a lot of Sync wrappers around non-Sync values, and provides methods on these wrappers such that the invariant maintained is "when those wrappers are used from the layout threads, we are sure script doesn't go behind our back and starts doing things again with the DOM".

Why do you need borrow_state then?

Because bugs can happen, and there is a single scenario that scares me:

  1. JavaScript calls DOM code.
  2. DOM code mutably borrows something in the DOM and calls back into JS.
  3. JS calls DOM code which triggers a layout operation.
  4. layout calls DomRefCell::borrow_for_layout on the thing that was mutably borrowed in step 2.

During step 4, script is ensured to not run anymore and thus cannot ever change any refcell's borrow state while layout runs, but layout cannot do anything defensively to avoid the UB caused by accessing a refcell that was mutably borrowed prior to running layout.

Does that make sense to you as to why I need RefCell::borrow_state now?

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Mar 15, 2019

I think my confusion lies not in the need for borrow_state -- I see that -- but why would borrow_state be considered safe?

I'm envisioning that the layout thread calls borrow_state to check if it's currently mutably borrowed, and during that call/after it JS goes in and borrow_mut's the RefCell. Maybe that can't happen though?

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 15, 2019

I'm envisioning that the layout thread calls borrow_state to check if it's currently mutably borrowed, and during that call/after it JS goes in and borrow_mut's the RefCell. Maybe that can't happen though?

That's the responsibility of the script/layout interface that ensures that script doesn't run while layout is going on. It indeed can't happen.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Mar 15, 2019

Sounds good.

I think the concern/part of the reason we deprecated and removed this originally is that getting it right is hard; and this API is actually not useful for anyone who's not doing correct but hard to get right (i.e., using RefCell from multiple threads) things.

Maybe we should add to the docs on borrow_state that "you really shouldn't use this unless you really know what you are doing."

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 15, 2019

I think the concern/part of the reason we deprecated and removed this originally is that getting it right is hard; and this API is actually not useful for anyone who's not doing correct but hard to get right (i.e., using RefCell from multiple threads) things.

Yeah I can sympathise with that. I'm not even entirely sure that we 100% got it correctly in Servo, given there is too much unsafe code all over the place related to the script/layout invariant I described earlier, and nothing to try to avoid the specific scenario I told of. On a side note that's why I'm working on inert these days.

Maybe we should add to the docs on borrow_state that "you really shouldn't use this unless you really know what you are doing."

Sounds good to me, though if someone is calling it in multiple threads at once they already should be aware of being careful because they must have written an unsafe Sync implementation somewhere.

@KodrAus

This comment has been minimized.

Copy link
Contributor

commented Mar 15, 2019

Can we think of any other use-case for this that doesn't involve pretending RefCell is Sync?

@Centril Centril requested a review from RalfJung Mar 16, 2019

Show resolved Hide resolved src/libcore/cell.rs Outdated
Show resolved Hide resolved src/libcore/cell.rs Outdated
@Centril

This comment has been minimized.

Copy link
Contributor

commented Mar 16, 2019

As an alternative approach re. #59211 (comment), you may consider two `methods:

fn will_borrow_succeed(&self) -> bool { ... }
fn will_borrow_mut_succeed(&self) -> bool { ... }

These would correspond to the actual checks done for borrow and borrow_mut.

@RalfJung

This comment has been minimized.

Copy link
Member

commented Mar 16, 2019

The method itself seems fine, just from a "is this a safe extension of RefCell" standpoint. But of course that's not the question here.

However, I'm afraid I don't understand enough about how Servo uses this to evaluate whether Servo's use of this is correct. Is it possible to produce a small amount of code demonstrating this?

JavaScript calls DOM code.
DOM code mutably borrows something in the DOM and calls back into JS.
JS calls DOM code which triggers a layout operation.
layout calls DomRefCell::borrow_for_layout on the thing that was mutably borrowed in step 2.

In the last step, which thread will this call happen in? You mentioned layout is in a different thread, so it seems to me that there's a race here between DOM code releasing the mutable borrow and layout calling borrow_state. You said above it is the responsibility of the layout code to make sure that does not happen, but then it seems like this check does not actually help to avoid UB. What am I missing?

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 16, 2019

In the last step, which thread will this call happen in? You mentioned layout is in a different thread, so it seems to me that there's a race here between DOM code releasing the mutable borrow and layout calling borrow_state. You said above it is the responsibility of the layout code to make sure that does not happen, but then it seems like this check does not actually help to avoid UB. What am I missing?

There is no race condition because we prevent them by doing the necessary synchronisation. That method prevents UB in the case we had a bug where a refcell was mutably borrowed before layout started. This check does avoid UB, because we wouldn't look into a refcell that was been mutably borrowed.

@RalfJung

This comment has been minimized.

Copy link
Member

commented Mar 18, 2019

I see, so the "doing the synchronization" part is separate from the "prevent reentrant accesses" part, and borrow_state lets you catch bugs in the latter, but you still rely on the former?

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 18, 2019

@RalfJung Yes.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 18, 2019

This method has existed before, and was removed because:

We've added/stabilized try_borrow and try_borrow_mut, these should no longer be needed

But now there is a new use case that was not known at the time and that is not covered by the try_ methods. (More on this below.) Usually we tend to accept PRs for new unstable APIs without more process, but since this reverts a previous decision let’s get team sign off:

@rfcbot fcp merge

Having two methods that return booleans instead of one that returns an enum sounds fine in terms of functionality. (Or maybe just one? If there is no use case without UB for the other one…) Naming them might be awkward, but .borrow() would succeed might be easier to document than “state” semantics of each enum variant.


Imagine that some code has access to &mut RefCell<T>. By definition, no access to the refcell other than through this borrow is possible, as long as this borrow exists. Through .get_mut(), we can obtain &mut T and from that &T. If T: Sync, we can send that &T to multiple threads and operate on the value from there.

The Servo use case is similar: in that we want to do multi-threaded computation on &T with T: Sync, but different in that we start from only &RefCell<T>. We have external synchronization ensuring that no other access to that refcell are going to be made for the duration of this computation, because the thread that the refcell belongs to is blocked while waiting for those computation results. But it remains that accessing &T is still UB if there is cell::RefMut<T> (effectively &mut T) somewhere on the stack of the original thread. There shouldn’t be, but bugs happen. A runtime check helps catch this kind of bug before it causes UB.

Additionally there isn’t just one refcell, but a whole tree of nodes that each contain multiple refcells. Tree traversal is where we have parallelism, so we can’t do the runtime check on each refcell on the original thread. (What we send to computation threads is struct WrapperWithCarefullyLimitedApi<'a>(&'a Node).) Therefore, to avoid a need for atomic access or other synchronization, the check can only read the refcell’s borrow flag/counter, without writing like refcell.try_borrow().is_ok() would.

@rfcbot

This comment has been minimized.

Copy link

commented Mar 18, 2019

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Mar 18, 2019

I'd be against the method names that @Centril proposed.

The primary reason is that will_borrow_succeed is misleading in that the "true" result is meaningless.

There seems to be no reason for the mut variant (will_borrow_mut_succeed) as there's no case where you can obtain a RefMut from a RefCell but can't use borrow_mut or try_borrow_mut.

Perhaps the API should instead be unsafe fn borrow_unchecked(&self) -> &T, and would panic if currently borrowed but not increment the borrow counter? That seems to fit Servo's use case and is more obviously not the API to use, unlike borrow_state.

I'm not a fan of the "unchecked" since it does check that it is currently safe; but because it doesn't increment the ref-count it would potentially be unsafe in the future.

To be clear, I'm opposed to re-adding borrow_state as-is because it feels like introducing a safe API that has no use case in safe code, as well as having a bunch of edge-cases that are difficult to properly document. The unsafe fn borrow_unchecked(&self) -> &T API seems better suited for our purposes.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 18, 2019

This borrow is not completely unchecked, the point is to check that the refcell is in an appropriate state when the borrow starts. But then we don’t track this new borrow or when it ends.

Maybe unsafe fn borrow_untracked(&self) -> Option<&T>?

@KodrAus

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

I'm not really convinced about this API or exposing the borrow state. The potential for abuse seems high. I think the root of my reservations are that just based on the context here I'm not sure RefCell is the most appropriate tool for the job in the first place.

On the other hand, it looks like we had almost exactly the same discussion already in #27708 and decided to stabilize RefCell::as_ptr.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

I'm not sure RefCell is the most appropriate tool for the job

If so, the appropriate tool is some hypothetical type that has most of its methods copy-pasted from RefCell, just so that additional methods can access private fields. This seems counter-productive.

@KodrAus

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

@SimonSapin I don't think that has to be as counter-productive as you suggest. There's value in having the freedom to implement exactly the thing you need to solve the problems you have. If that coincidentally looks a lot like RefCell then that's not necessarily important.

I guess my question is whether RefCell::try_borrow_unguarded is broadly useful enough outside Servo to support in std when it could suggest you don't want RefCell to begin with. Maybe that's a question for stabilization though. I'd be interested to see what valid usecases pop up.

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 20, 2019

I guess my question is whether RefCell::try_borrow_unguarded is broadly useful enough outside Servo to support in std when it could suggest you don't want RefCell to begin with. Maybe that's a question for stabilization though. I'd be interested to see what valid usecases pop up.

Just because only Servo uses it doesn't mean it shouldn't be in libstd. You say "coincidentally looks a lot like RefCell", but it would be exactly like RefCell, with exactly one more method: try_borrow_unguarded.

@emilio

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

Also, fwiw, in the past bindgen (before I rewrote it) kept a forked refcell just to be able to access the borrow state.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

Can you say more about what bindgen did with the borrow sate? Could it have used try_borrow instead, if that was available at the time? Or was it something more complex?

@emilio

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2019

This is the commit that removed the RefCell: rust-lang/rust-bindgen@cfdf15f

Looking at the BorrowState usage in the deleted files (rust-lang/rust-bindgen@cfdf15f#diff-2c09afcdc3c420ab0678ba9b5e83959cL421) looks like it wanted try_borrow_mut.

@rfcbot

This comment has been minimized.

Copy link

commented Mar 21, 2019

🔔 This is now entering its final comment period, as per the review above. 🔔

@diwic

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2019

@nox

  • DOM code mutably borrows something in the DOM and calls back into JS.

I'm not sure what that "something" is, but it sounds like a recipe for disaster in itself, because that JS code can in turn call into DOM trying to change the very same thing, causing a RefCell panic?

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 22, 2019

I'm not sure what that "something" is, but it sounds like a recipe for disaster in itself, because that JS code can in turn call into DOM trying to change the very same thing, causing a RefCell panic?

Something here is indeed any DOM call, and that is indeed a bug if we call JS while a RefCell from within the DOM is mutably borrowed, because that's a case that will eventually cause a panic as you describe. This PR is about being able to panic from layout too instead of causing straight UB if the JS code calls a DOM method that calls layout.

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 22, 2019

Err, misread your comment, "something" here is anything that needs to be mutated sometimes in DOM, be it element's attributes, ids, urls, whatever. Indeed, if we keep that mutably borrowed while we reenter into JS code, that will ultimately lead to a panic. This is by design and not going to change any time soon.

@diwic

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2019

@nox

Okay, so when Layout threads call try_borrow_unguarded, they would typically unwrap the result, causing a panic rather than UB, since mutably borrowing something and then calling into JS would be a bug anyway?

Fwiw, if my two cents are worth anything, it seems to me like what you want to do (putting RefCell into some parallel-sync-mode when script is not running) is so niche that maybe you need something else than RefCell to suit your use case. (Maybe something where the borrow flag can be in "Sync" mode as well as "Reading" and "Writing" modes? Not sure.) And if you like to implement this on top of RefCell like you do today with DOMRefCell then borrow_state is at least a little less niche than try_borrow_unguarded.

@nox

This comment has been minimized.

Copy link
Contributor Author

commented Mar 22, 2019

I'm 100% fine with your two cents, and I would be ok with exposing borrow_state directly, I changed it to try_borrow_unguarded because other people said borrow_state is too vague and too general-without-a-purpose, if I may say.

@rfcbot

This comment has been minimized.

Copy link

commented Mar 31, 2019

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@KodrAus

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

Thanks for your patience @nox! I'll re-open the tracking issue and push this one through.

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

📌 Commit 38811a1 has been approved by KodrAus

@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

⌛️ Testing commit 38811a1 with merge 8509127...

bors added a commit that referenced this pull request Apr 11, 2019

Auto merge of #59211 - nox:refcell-borrow-state, r=KodrAus
Introduce RefCell::try_borrow_unguarded

*Come sit next to the fireplace with me, this is going to be a long story.*

So, you may already be aware that Servo has weird design constraints that forces us developers working on it to do weird things. The thing that interests us today is that we do layout on a separate thread with its own thread pool to do some things in parallel, whereas the data it uses comes from the script thread, which implements the entire DOM and related pieces, with `!Sync` data types such as `RefCell<T>`.

The invariant we maintain is that script does not do anything ever with the DOM data as long as layout is doing its job. That's all nice and all, but one thing we don't ensure is that we don't actually know if script was currently mutably borrowing some `RefCell<T>` prior to starting layout, which may lead to aliasing mutable memory and obviously undefined behaviour.

This PR reinstates `RefCell::borrow_state` so that [this method](https://github.com/servo/servo/blob/master/components/script/dom/bindings/cell.rs#L23-L30) can make use of it and return `None` if the cell was mutably borrowed.

Cc @SimonSapin
@bors

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: KodrAus
Pushing 8509127 to master...

@bors bors added the merged-by-bors label Apr 11, 2019

@bors bors merged commit 38811a1 into rust-lang:master Apr 11, 2019

1 check passed

homu Test successful
Details
@nox

This comment has been minimized.

Copy link
Contributor Author

commented Apr 13, 2019

Just wanted to say that try_borrow_unguarded is now used in Servo and, believe it or not, all tests passed without a problem, which means that the soundness issue this method prevents is not currently present in our codebase, woohoo.

@nox nox deleted the nox:refcell-borrow-state branch Apr 13, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.