Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Cell to non-Copy types #1651

Merged
merged 5 commits into from Jan 23, 2017
Merged

Extend Cell to non-Copy types #1651

merged 5 commits into from Jan 23, 2017

Conversation

Amanieu
Copy link
Member

@Amanieu Amanieu commented Jun 15, 2016

This RFC extends the Cell type to work with non-Copy types.

Rendered

@alexcrichton
Copy link
Member

Oddly enough @SimonSapin, @nox, and I were just talking about this, and we ended up concluding that we probably wouldn't want to extend the Cell type itself primarily for consistency with the other trait impls on Cell. For example the PartialEq and other assorted traits require T: Copy whereas this means that not all Cell types would be equatable/comparable.

I'd also personally err on the side of being conservative and drop take in favor of replace, but other than that this all seems great to me!

# Motivation
[motivation]: #motivation

It allows safe inner-mutability of non-`Copy` types without the overhead of `RefCell`'s reference counting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which types? What is a concrete use case for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@durka
Copy link
Contributor

durka commented Jun 15, 2016

@alexcrichton

For example the PartialEq and other assorted traits require T: Copy

Can you explain this remark? As far as I can tell it's not true for PartialEq, so I want to know what assorted traits you refer to :)

@alexcrichton
Copy link
Member

@durka ah yeah sure! So right now the standard library has:

impl<T: PartialEq + Copy> PartialEq for Cell<T> { ... }

If, however, Cell can accept an arbitrary T it'll be expected for that to look like:

impl<T: PartialEq> PartialEq for Cell<T> { ... }

Unfortunately, though, we can't implement that. This means that we'd probably have to have PartialEq only defined for Copy types, which seems inconsistent enough that I'd prefer to have a separate cell type.

@durka
Copy link
Contributor

durka commented Jun 15, 2016

Oh I see. I thought you were saying that it was trait PartialEq<Rhs>: Copy. I agree it'd be kinda weird if not all Cells were comparable.

On Wed, Jun 15, 2016 at 1:11 PM, Alex Crichton notifications@github.com
wrote:

@durka https://github.com/durka ah yeah sure! So right now the standard
library has:

impl<T: PartialEq + Copy> PartialEq for Cell { ... }

If, however, Cell can accept an arbitrary T it'll be expected for that to
look like:

impl<T: PartialEq> PartialEq for Cell { ... }

Unfortunately, though, we can't implement that. This means that we'd
probably have to have PartialEq only defined for Copy types, which seems
inconsistent enough that I'd prefer to have a separate cell type.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1651 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAC3nyQm7s0NykjupPLUmUqZcCn7Gtgsks5qMDIngaJpZM4I2Wj9
.

@nox
Copy link
Contributor

nox commented Jun 15, 2016

I actually formalised MoveCell in Mitochondria. You can find another Cell idea in it (OnceCell) for which I should probably write a RFC myself.

I have various use cases for both in Servo.

https://github.com/nox/mitochondria/blob/master/src/move.rs#L3-L46

```rust
impl<T> Cell<T> {
fn set(&mut self, val: T);
fn replace(&mut self, val: T) -> T;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should all be &self right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops

@glaebhoerl
Copy link
Contributor

glaebhoerl commented Jun 15, 2016

Along with

fn replace(&self, other: T) -> T

we could also have

fn swap(&self, other: &mut T)
fn swap_with(&self, other: &Cell<T>) // TODO name

(the first two are symmetric with std::mem)

cc #1106

@strega-nil
Copy link

strega-nil commented Jun 15, 2016

@alexcrichton You should be able to implement PartialEq, at least, unless I'm not thinking right. It takes an &T parameter.

fn eq (&self, other: &Self) {
  let self_ptr = self.as_unsafe_cell().get();
  let other_ptr = other.as_unsafe_cell().get();
  unsafe { &*self_ptr == &*other_ptr }
}

Ah, wait, you could have some sort of way to access &self as an &Cell and write to it, perhaps through some sort of thread local static, which would then violate aliasing.

@nox
Copy link
Contributor

nox commented Jun 16, 2016

That is unsound because you call PartialEq from inside the UnsafeCell.

}

impl<T: Copy> Cell<T> {
fn get(&self);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> T

@SimonSapin
Copy link
Contributor

I think this needs to be in the detailed design:

Cell::set is currently:

    pub fn set(&self, value: T) {
        unsafe {
            *self.value.get() = value;
        }
    }

But for T: Drop this can be unsound:

    struct Evil(Box<u32>, Rc<MoveCell<Option<Evil>>>);
    impl Drop for Evil {
        fn drop(&mut self) {
            mem::drop(self.1.take());  // Mess with the "other" node, which might be `self`.
            self.0.clone();  // use after free!
        }
    }
    let a = Rc::new(MoveCell::new(None));
    a.replace(Some(Evil(Box::new(5), a.clone())));  // Make a reference cycle.
    a.set(None);  // Trigger Evil::drop while in the cell

To avoid this, the method needs to be rewritten so that Drop is not called "from inside" the cell:

    pub fn set(&self, value: T) {
        self.value.replace(value);  // semi-colon to drop the return value of replace
    }

@aturon aturon added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Jun 21, 2016
@aturon aturon self-assigned this Jun 23, 2016
@nikomatsakis
Copy link
Contributor

@ubsan indeed, that is not sound, as the whole point of Cell is that you never get a reference to the interior. (Hence the name Cell vs RefCell.)

@nikomatsakis
Copy link
Contributor

I'm in favor of having something like Cell where I can swap the contents -- but I guess having a separate type (to resolve the PartialEq discrepancy) might also be OK (e.g., MoveCell) due to the PartialEq discrepancy. Regardless it galls me to use RefCell in this case and pay the price for flags and unnecessary runtime checks.

(I'd also like RefCell to have a take option... which in many cases might obviate the need for me to use Cell, since I suspect I would often write Cell<Option<Vec<...>>> so I can swap with None, and in that case I'd probably rather use RefCell<Vec<...>> if it had a take() method -- since it already has the flags.)

@alexcrichton
Copy link
Member

@rfcbot fcp close

My personal preference would be to not enhance Cell with these methods, and if we did anything to go in the direction of #1659.

The second drawback mentioned is a showstopper for me. In general if you're working with a Cell<T> type you might expect common traits like Debug, PartialEq, etc to work by default. When you run into that one Cell<T> that works differently, however, I feel like it's too big of a surprise factor to be worth it for a niche method like this.

I'm going to propose that the libs team closes this RFC as a result of that drawback, but I'm curious to hear others' thoughts!

@rfcbot
Copy link
Collaborator

rfcbot commented Oct 10, 2016

Team member @alexcrichton has proposed to close this. The next step is review by the rest of the tagged teams:

Concerns:

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@nikomatsakis
Copy link
Contributor

Hmm. I am sad, because I think that having a plethora of Cell types is kind of confusing; but the PartialEq impl does seem to sort of box us in here. 👍

@glaebhoerl
Copy link
Contributor

I don't see why the PartialEq thing is such a big deal at all. The PartialEq impl stays the same as it always was, and independently of it, some new capabilities get added.

There's no principle saying that every method/impl of a type should require the same trait bounds. I don't even understand why you would expect that.

@alexcrichton
Copy link
Member

For me at least it's all about the surprise factor of using Cell. It's documented and intended to be primarily used as Copy types, so then it can be very surprising to see it with a non-copy type or where functionality no longer works.

@nikomatsakis
Copy link
Contributor

@glaebhoerl

I don't see why the PartialEq thing is such a big deal at all. The PartialEq impl stays the same as it always was, and independently of it, some new capabilities get added.....There's no principle saying that every method/impl of a type should require the same trait bounds. I don't even understand why you would expect that.

I agree with you on the general principle; but of course one has to evaluate each case on its merits. In this case particular case I feel conflicted. You're making me rethink my opinion a bit. Let me just throw out some various chaotic thoughts.

Combining sharing and mutability is still a weak spot

I feel like in general we need to smooth out our story on sharing + mutability. This begins with education: explaining better when it is legal to mutate (the naming of &mut T strongly suggests to people that it is the only way to mutate, reasonably enough, and I think our messaging on sharing + mutability needs to get more crisp).

But also just looking at the APIs: The cell vs ref-cell split is logical, but not ideal. I think it's not well-explained, for one thing, but also it is often stricter than normal. Many uses of RefCell that I find myself doing in practice would be perfectly safe as a Cell that called clone. For example, RefCell<Rc<T>>, which is known not to invoke any methods of T; similarly I often want ref-cells with more complex types where I do a lot of swapping (as well as things like a RefCell<&mut Vec<T>> where I just push to accumulate, and then later recover mutable access).

Moreover, and most importantly I think, RefCell is really hard to use right. It's all too easy to hold the lock longer than you mean to, and tracking down a "double borrow" is more annoying than I would like (that's something else we could try to address, at least in debug builds).

I'd prefer if we had a widely usable alternative that avoided these pitfalls, so that we could promote it as the preferred way to have sharing and mutability (something like "avoid it if you can, but if not, use Cell").

Note that there is nothing stopping me from prototyping these ideas in a crates.io crate except for laziness. =)

What I would prefer: an expanded Cell

I would prefer if we could expand on Cell so that it can do a wide range of operations on a wide range of types. Ideally we would maintain the invariant that none of these operations will lead to a borrow error or other cell-related panic at runtime, but it might be better to loosen that to being "extremely unlikely" to lead to a borrow error.

Since we want to eliminate cell-related panics, we can't permit open-ended borrows like foo.borrow(). That requires a ref-cell still. But we can permit a bunch of operations:

- `get` -- clones the value (requires `T: Clone`, but see below)
- `set` -- overwrites the value (works for all `T`)
- `swap` -- swaps the value (works for all `T`)
- `take` for `Cell<Option<T>>`, which swaps with `None`
- `push` for `Cell<Vec<T>>`, which just pushes an element
- `pop` for `Cell<Vec<T>>`, which just pops an element

The last few might seem surprising. The idea here is that we know that push() on Vec<T> doesn't invoke any user-defined code, and that it doesn't mutate any cells, so this should be ok. (I think?) We can basically grow the set of operations here over time.

Note that I said get could work any T: Clone. Obviously in the general case this would require a safety flag and could panic -- but it's very unlikely to happen in practice unless you have some wacky clone impls. I think I'd prefer to support it for all T: Clone and use specialization to optimize away the flag, but I could see it also being nice to never have a flag and only support get() for T: Copy or types (notably Rc<T> and Arc<T>) that can be cloned without risking mutating the cell (this would be an unsafe trait).

Why partial eq bothers me

OK, without that context, why do I care about PartialEq? I'm not sure if I should, but I guess it just feels like a surprising case of the API changing (more so than other things). It's because the reason it can't work for arbitrary T is tied to this sort of subtle fact that PartialEq might do random things and bad code could cause problems (but in practice such problems are very unlikely).

Random middle ground

I wonder if it'd be a good first step to just extend Cell to copy types or those that implement an unsafe CellSafeClone trait (notably Rc and Arc). This would basically be things that promise that their clone method cannot affect any cell that contains self.

An irrelevant, whiny aside: why I wish cell did not have PartialEq

In retrospect, my preference would be that Cell<T> does not implement PartialEq under any circumstances. Cell is the marker that converts types from being "a value" into something with their own identity, and that is precisely where the semantics of PartialEq get murky. I'd rather we had no default, so that you can write your own wrappers. If we were to have behavior, I would have chosen pointer identity.

My analogy is to things like Python:

class Counter:
    def __init__(value):
        self.value = 0

x = Counter(0)
y = Counter(0)
x == x // True
x == y // False
x.value += 1
x == x // still True
x == y // still False

I consider this Rust code to be a direct translation, but of course it behaves differently:

let x = Rc::new(Cell::new(0));
let y = Rc::new(Cell::new(0));
x == x // true
x == y // true!
x.set(x.get() + 1);
x == x // still true, of course
x == y // now false, hmm.

This is because == in Python defaults to identity. This is a safer default when you have sharing+mutability, since it means that if x == y is true, it remains true no matter what you do.

OTOH, there are times when you want a value-like semantics. Usually if you are "freezing" a value. But in rust those are much rarer, I think, because you would usually use ownership for such scenarios and not require Cell. So I'd rather that people "opt-in" to that by wrapping the Cell with their own type.

But ... yeah, water under the bridge.

@glaebhoerl
Copy link
Contributor

What I would prefer: an expanded Cell

I feel like there's a conceptual difference here between replace, swap, and swap_with, which work with any type and rely on nothing besides Rust's own semantics for their safety, and the others, which are more ad hoc extensions to support specific operations on specific std library types whose implementations we happen to control. In other words, replace and friends feel to me like basic primitives which have been left out accidentally. Another qualitative difference is that replace & co. are "memcpy-only" operations - which they have in common with Cell's existing operations - while the others are not.

(Note that I'm not arguing against potentially adding the others as well at some point. Just against lumping them in with replace.)


It's because the reason it can't work for arbitrary T is tied to this sort of subtle fact that PartialEq might do random things and bad code could cause problems (but in practice such problems are very unlikely).

I feel like the rationale here is much more straightforward than that:

"Hey how come there isn't a PartialEq on Cell for non-Copy types?"

"Try implementing it and find out."

"Oh, I see. It seems like you can't express it using the API."


In retrospect, my preference would be that Cell does not implement PartialEq under any circumstances. Cell is the marker that converts types from being "a value" into something with their own identity, and that is precisely where the semantics of PartialEq get murky. I'd rather we had no default, so that you can write your own wrappers. If we were to have behavior, I would have chosen pointer identity.

I agree with this completely and the same thought has occurred to me as well. (Another reference point is Haskell where instance Eq (IORef a) compares by address - obviously, because the contents of the reference cannot even be accessed from pure code. And I haven't checked but I would be almost shocked if ML's refs didn't have the same behavior as well.)

@glaebhoerl
Copy link
Contributor

Yep, good point.

@antrik
Copy link

antrik commented Nov 16, 2016

Regarding mental model / documentation ergonomics, I believe this will actually improve the situation: I was recently thinking about how to best explain Cell<> and related types; and I was really struggling with explaining why Cell<> is only for Copy types... Because really I could not think of a fundamental reason.

I'm surprised that some people are so vehemently contesting the usefulness of this. (On the now-closed MoveCell<> RFC even more than here...) To me, the use cases seem pretty much the same as for the "regular" Cell<> type; and questioning the usefulness of Cell<> on move types doesn't feel much different than questioning the usefulness of Cell<> altogether... I believe it's a basic language building block, that is likely to come up in pretty much any sufficiently complex program -- relegating this to some obscure external library, or to another type that is both less efficient and less ergonomic (run-time vs. compile-time checking) in cases where it's not really needed, affects the efficiency and ergonomics of the language as a whole. It kinda breaks Rust's promise that you never have to pay for abstractions you are not using...

The specific use case that brought me here is for Unix file descriptors, which I want to wrap in a non-copy type. (While technically they are just a c_int, semantically they are actually references to kernel data structures -- and need to be treated just like other references regarding aliasing/cloning/dropping, to avoid undue overhead, potential leaking, or use-after-free...) However, sometimes we need to put file descriptors into a Cell<> for interior mutability. (In the specific case of ipc-channel, because of the way the Serde API works.)

This is an interesting example I think, because it demonstrates that the difference between a low-cost abstraction and a zero-cost abstraction can be fundamental: if we could continue using Cell<>, the non-copy wrapper would be a no-brainer, as it enables better employing the type system for more robust and ergonomic code at no cost. If it requires a switch to RefCell<> on the other hand, it suddenly becomes an unfortunate tradeoff between avoiding overhead or improving robustness...

@aturon
Copy link
Member

aturon commented Dec 21, 2016

Apologies for the long delay moving forward on this!

Given the thrust of the overall discussion here, it seems both that there's a pretty decent motivation, and a reasonable mental model.

@Amanieu, can you update the RFC to try to incorporate some of the discussion, in particular around the motivation and mental model issues? (No changes needed to the actual design.)

Meanwhile, I'll kick off the process for heading toward FCP.

@rfcbot fcp merge

@rfcbot
Copy link
Collaborator

rfcbot commented Dec 21, 2016

Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@pnkfelix
Copy link
Member

@rfcbot reviewed

@nrc
Copy link
Member

nrc commented Jan 6, 2017

ping @BurntSushi, @brson, @sfackler, @withoutboats please review!

@solson
Copy link
Member

solson commented Jan 15, 2017

ping @withoutboats, only review left

@rfcbot
Copy link
Collaborator

rfcbot commented Jan 15, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Jan 15, 2017
@aturon aturon merged commit 3e925a5 into rust-lang:master Jan 23, 2017
@aturon
Copy link
Member

aturon commented Jan 23, 2017

The FCP period has elapsed without any further discussion, so the RFC has now been merged! Thanks @Amanieu!

Tracking issue

@rfcbot
Copy link
Collaborator

rfcbot commented Jan 25, 2017

The final comment period is now complete.

@RalfJung
Copy link
Member

Not sure if this is the right place to ask, but I have a question concerning this RFC: Why is get_mut still restricted to Copy types? As far as I can see, it would be perfectly fine to allow get_mut for any T. (Actually, I have proven that this is the case. ;)

@Manishearth
Copy link
Member

Yeah, it seems sound to do that, it's probably still restricted because nobody thought of it. Might be worth opening up a quick rfc for this, or perhaps just bringing it up on the tracking issue.

@alexcrichton
Copy link
Member

@RalfJung sounds good to me! (there's probably a few more methods that can be generalized as well)

@alexcrichton
Copy link
Member

alexcrichton commented Feb 13, 2017

Also I don't think such a change would require an RFC any more

@aturon
Copy link
Member

aturon commented Feb 13, 2017

Agreed, a PR is fine.

@RalfJung
Copy link
Member

All right, will prepare one.

@alexcrichton well, the other two are taking a shared borrow of the Cell, so they are way more complicated to talk about. On the other hand, they are returning &UnsafeCell<T> and *mut T respectively, so they don't make any promises anyway -- they are trivially safe to call on non-Copy types, but what you can do with the resulting pointer is a whole different story.

@SimonSapin
Copy link
Contributor

I think that only get really requires Copy. Everything else can be generalized. (Though I don’t care so much for as_unsafe_cell since it’s deprecated.)

@F001
Copy link
Contributor

F001 commented Jan 15, 2018

I did a comparison of the methods of Cell and RefCell, and the current situation confuses me. The differences are:

  1. Cell::as_ptr requires Sized, RefCell::as_ptr does not.
  2. Cell implements Clone when T: Copy, RefCell implements Clone when T: Clone.
  3. RefCell has these methods borrow try_borrow borrow_mut try_borrow_mut.
  4. Cell has these methods: Cell::get when T: Copy, Cell::take when T: Default.

The main questions on top of my mind are:

  1. is it still necessary to split std::cell to two flavors ?
  2. why not merge these two types into one ?

It is totally fine to me if Cell also has borrow_* methods, and RefCell has a take method if T: Default, and of course RefCell has a get method if T: Copy.

As niko stated above, the mental model of "Cell supports moving bits around, but never gives a reference to the inside" seems pretty coherent, before this RFC is merged. But after this RFC is merged, the distinction between these two types are very vague, and the motivation of this separation seems not strong enough.

To make cell types as easy as it can be, I'm proposing to merge the methods together and make these two types exactly identical.

We can update the mental model to: if you want to move bits around, use get replace take methods, if you want to reference to the inside, use borrow_* methods.

@Manishearth
Copy link
Member

@F001 yes, it is necessary to have two, the inner implementation differs and the guarantees provided are different.

@SimonSapin
Copy link
Contributor

@F001 To expand a bit on Manish’s explanation: an important invariant of safe Rust is that mutation cannot happen while there multiple ways (aliasing) to access something. This leads to the separation of &T and &mut T. Cells (of various kinds) allow mutation through &T, so they have to be careful in their safe public API to maintain the invariant. Cell and RefCell are two ways to achieve that, with different tradeoff.

  • Cell is less flexible: it never provides to arbitrary code (which might include a Clone for T impl) references to the inside of the cell. You can only copy or swap entire T values. See https://github.com/kuchiki-rs/kuchiki/blob/v0.6.0/src/move_cell.rs#L99-L127 for an example of why Clone for Cell<T> where T: Clone would be bad.
  • RefCell can provide inner references, but need to count them to make sure that a RefMut is the only reference as long as it exists. So it has space overhead (an extra usize) to store that counter, and time overhead (for these counter increments and decrements).

This tradeoff is why they both exist. And the different constraints explain the different API. Except for as_ptr, I think. Maybe this is just an oversight and Cell::as_ptr can be extended to T: ?Sized. Wanna file a PR for that one?

@F001
Copy link
Contributor

F001 commented Jan 15, 2018

@SimonSapin I see. set take should not coexist with borrow_*. If the refcell is borrowed, and take is called in this period, it can only panic. The Clone for Cell<T> where T: Clone is much trickier.

Thank you all for clarification.

@SimonSapin
Copy link
Contributor

set and take could be added to RefCell, but they’d not be super useful since they’re equivalent to *refcell.borrow_mut() = new_value and refcell.borrow_mut().take() respectively. borrow_mut already panics when the refcell is already borrowed.

@Centril Centril added A-cell Proposals relating to interior mutability. A-machine Proposals relating to Rust's abstract machine. labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cell Proposals relating to interior mutability. A-machine Proposals relating to Rust's abstract machine. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet