Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add std::cell::Ref::map and friends #25747

Merged
merged 2 commits into from May 29, 2015

Conversation

Projects
None yet
9 participants
@SimonSapin
Copy link
Contributor

SimonSapin commented May 24, 2015

For slightly complex data structures like rustc_serialize::json::Json, it is often convenient to have helper methods like Json::as_string(&self) -> Option<&str> that return a borrow of some component of &self.

However, when RefCells are involved, keeping a Ref around is required to hold a borrow to the insides of a RefCell. But Ref so far only references the entirety of the contents of a RefCell, not a component. But there is no reason it couldn’t: Ref internally contains just a data reference and a borrow count reference. The two can be dissociated.

This adds a map_ref function that creates a new Ref for some other data, but borrowing the same RefCell as an existing Ref.

Example:

struct RefCellJson(RefCell<Json>);

impl RefCellJson {
    fn as_string(&self) -> Option<Ref<str>> {
        map_ref(self.borrow(), |j| j.as_string())
    }
}

r? @alexcrichton

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented May 24, 2015

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see CONTRIBUTING.md for more information.

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 24, 2015

https://github.com/rust-lang/rfcs#when-you-need-to-follow-this-process lists "Additions to std", but maybe it should be "#[stable] additions to std"? Anyway let me know if I should write an RFC.

Unresolved questions:

  • Is there a better name for this?
  • Should it be a method instead? It would shadow T methods that could otherwise be used through Deref.

@SimonSapin SimonSapin force-pushed the SimonSapin:map_ref branch 4 times, most recently from 62e8200 to 2b49fc6 May 24, 2015

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented May 24, 2015

I've added and experimented with this once. I scrapped the idea for my use case, because this makes it easy to return Ref, at which point you push the responsibility to not do a borrowing error onto the user of your API. I didn't like that, but if you have a use case for it, I suspect this is so useful we should have it available for when you need it.

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented May 24, 2015

Wait, why don't you use a closure here?

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 24, 2015

What borrowing error do you mean?

This design of taking &U doesn't work. The usage I want of iit doesn't borrow-check. Trying something with a closure.

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented May 24, 2015

By borrowing error I mean a panic in borrow() or (more likely) borrow_mut.

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 24, 2015

Updated

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 25, 2015

Sigh. I realized the PR as-is introduces memory unsafety: the closure can keep around the pointer it’s given longer than the lifetime of the Ref. This test case prints false then true, meaning we’re accessing Foo after its destructor has run.

#![feature(core)]

use std::cell::{RefCell, Ref, map_ref};

struct Foo {
    destroyed: bool
}

impl Drop for Foo {
    fn drop(&mut self) {
        self.destroyed = true;
    }
}

fn main() {
    let a = RefCell::new(Box::new(Foo { destroyed: false }));
    let mut b = None;
    let _: Option<Ref<()>> = map_ref(a.borrow(), |s| {
        b = Some(&**s);
        None
    });
    println!("{}", b.unwrap().destroyed);
    *a.borrow_mut() = Box::new(Foo { destroyed: false });
    println!("{}", b.unwrap().destroyed);  // Use after free
}

I think making map_ref accept a more restrictive Fn closure instead of FnOnce would patch this particular hole, but is it enough?

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented May 25, 2015

It looks like the closure idea was broken then. Sorry about that, my bad.

@huonw

This comment has been minimized.

Copy link
Member

huonw commented May 25, 2015

cc #19220

I think making map_ref accept a more restrictive Fn closure instead of FnOnce would patch this particular hole, but is it enough?

I suspect not; couldn't one make b a Cell<Option<...>> instead?

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 25, 2015

Right. Here is an alternative idea.

@jdm

This comment has been minimized.

Copy link
Contributor

jdm commented May 25, 2015

There's prior art that should definitely be considered at #19220.

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented May 25, 2015

I believe this version is sound, it looks good. Why use an optional return value though?

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 25, 2015

My use case it returning a new Ref to some component that may or may not be there. But checking whether it’s there is the same amount of work as accessing it. If map_ref didn’t involve options, I’d have to do this work twice.

Example in Kuchiki:

pub struct ElementData {
    pub name: QualName,
    pub attributes: RefCell<HashMap<QualName, String>>,
}

impl ElementData {
    fn get_attribute(&self, name: Atom) -> Option<Ref<str>> {
        // With `map_ref` as proposed:
        map_ref(self.attributes.borrow(), |attrs| {
            attrs.get(&QualifiedName { ns: ns!(""), local: name }).map(|s| &**s)
        })

        // With a simplified `map_ref`, two HashMap lookups:
        let borrow = self.attributes.borrow();
        let key = QualifiedName { ns: ns!(""), local: name };
        if borrow.contains_key(&key) {
            Some(map_ref(borrow, |attrs| &*attrs[&key]))
        } else {
            None
        }
    }
}
@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 25, 2015

Unlike #19220, this doesn’t change the semantic of existing items (it just adds a new one), so I don’t think it has the same issues. The example program at #19220 (comment) correctly fails to build with a lifetime error (after updating it for language changes).

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 25, 2015

Maybe my get_attribute method could return Ref<Option<_>> instead of Option<Ref<_>>, but that seems more awkward to deal with.

@alexcrichton alexcrichton added the T-libs label May 26, 2015

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented May 26, 2015

I, too, have wanted this kind of functionality from time to time before, and I'm totally fine seeing it prototyped! This is a minor enough API that I don't believe it will require an RFC, but I'll cc @rust-lang/libs to see if other have opinons as well :)

I have the same question as @bluss as well in that from an API perspective, could you elaborate on why the closure returns Option<&U>? I'd expect it to return &U.

Is there a better name for this?

Naming-wise this may also wish to consider RefMut which may also want the capability to map itself to another reference.

Should it be a method instead? It would shadow T methods that could otherwise be used through Deref.

We've been quite wary of adding methods on types that implement Deref, so I think for now it'd want to stick around as a global function. I think when considering the name for how to map a RefMut the names map_ref and map_ref_mut do mirror each other well. Another option could possibly be an associated function Ref::map and RefMut::map, although I'm not sure if there's a way to force the compiler to not think that they're available as methods as well.

Apart from the Option return value, I agree with @bluss that this appears sound to me.

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 26, 2015

Regarding Option<&U>, my use case is that I want to return Option<Ref<_>>, an optional reference to a component that may or may not be there. Another example:

pub enum NodeData {
    Element(ElementData),
    Text(String),
    Comment(String),
    Doctype(Doctype),
    Document(DocumentData),
}

pub struct Node {
    // ...
    pub data: RefCell<NodeData>,
}

impl Node {
    fn as_element(&self) -> Option<Ref<ElementData>> {
        map_ref(self.data.borrow(), |data| match *data {
            Element(ref element) => Some(element),
            _ => None
        })
    }
}

If map_ref was simplified to

pub fn map_ref<'b, T: ?Sized, U: ?Sized, F>(orig: Ref<'b, T>, f: F) -> Ref<'b, U>
where F: FnOnce(&T) -> &U

… this would be more tricky to implement and require matching twice:

impl Node {
    fn as_element(&self) -> Option<Ref<ElementData>> {
        let borrow = self.data.borrow();
        match *borrow {
            Element(_) => map_ref(borrow, |data| match *data {
                Element(ref element) => element,
                _ => unreachable!()
            }),
            _ => None
        }
    }
}

I mentioned before that maybe I could instead return Ref<Option<_>> but that doesn’t work: There is no existing Option I can return a &Option reference to, from the closure.

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 26, 2015

Added map_ref_mut.

How can I run the doctests?

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 26, 2015

I’ve found out (thanks eddyb!) that I can run the doctests with make check-stage1-doc-crate-core TESTNAME=map_ref NO_REBUILD=1. But they’re failing and I have no idea why :( The output does not include the build error or the panic! message that make the test fail, and I didn’t manage to fix it by guessing and making changes blindly.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented May 26, 2015

Regarding Option<&U>, my use case is that I want to return Option<Ref<_>>, an optional reference to a component that may or may not be there.

I'm worried about the compositionality of this API, however. For example if I have a Ref<Result<T, E>> I may want to turn it into a Result<Ref<U>, &E> (or something like that). Singling out Option seems like it's addressing a problem, but perhaps tackling it from the wrong angle. For example Ref<Option<T>> is very similar to &Option<T> on which we use the as_ref method to convert to Option<&T> (or in this case Option<Ref<T>>).

I would personally rather see this as &T -> &U (aka as zero-cost as possible) and explore an ergonomic transformation of Ref<Option<T>> to Option<Ref<T>> later on.


Also, it looks like associated functions can indeed be defined that aren't methods:

struct A;
impl A {
    fn foo(a: &A) {}
}

let a = A;
a.foo(); // error
A::foo(&a); // ok

Perhaps static map functions could be used? You've already imported Ref and RefMut most likely if you're returning them, so it avoids the import of std::cell as well.

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented May 26, 2015

Using static functions sounds great unless there is any plan to expand UFCS and make them resolve with method syntax.

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 26, 2015

I would personally rather see this as &T -> &U (aka as zero-cost as possible) and explore an ergonomic transformation of Ref<Option> to Option<Ref> later on.

I agree that using a &T -> Option<&U> closure as proposed is not a great solution, but Ref<Option<T>> doesn’t actually work:

#![feature(core)]
use std::cell::*;

// So that I don’t have to re-bootrap...
fn simplified_map_ref<'b, T, U, F>(orig: Ref<'b, T>, f: F) -> Ref<'b, U>
where F: FnOnce(&T) -> &U {
    map_ref(orig, move |v| Some(f(v))).unwrap()
}

fn main() {
    let x: RefCell<Result<u32, ()>> = RefCell::new(Ok(5));
    simplified_map_ref(x.borrow(), |r| &r.ok());
}
a.rs:11:41: 11:47 error: borrowed value does not live long enough
a.rs:11     simplified_map_ref(x.borrow(), |r| &r.ok());
                                                ^~~~~~
note: in expansion of closure expansion
a.rs:11:36: 11:47 note: expansion site
a.rs:11:40: 11:47 note: reference must be valid for the anonymous lifetime #1 defined on the block at 11:39...
a.rs:11     simplified_map_ref(x.borrow(), |r| &r.ok());
                                               ^~~~~~~
a.rs:11:40: 11:47 note: ...but borrowed value is only valid for the block at 11:39
a.rs:11     simplified_map_ref(x.borrow(), |r| &r.ok());
                                               ^~~~~~~
error: aborting due to previous error
@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented May 27, 2015

I like the idea of using static methods on Arc etc instead of free functions in the arc module. We should make sure that future UFCS work won't make those methods callable with the dot syntax though.

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 27, 2015

We should make sure that future UFCS work won't make those methods callable with the dot syntax though.

If we decide to do this I don’t see an obstacle to this: we can tell such functions apart from methods by them not using self.

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 27, 2015

As discussed on IRC with @alexcrichton, I modified the PR to have both map functions that take &T -> &U closures, and filter_map that take &T -> Option<&U>. The naming is similar to Iterator methods.

#[unstable(feature = "core", reason = "recently added")]
#[inline]
pub fn map<U: ?Sized, F>(orig: RefMut<'b, T>, f: F) -> RefMut<'b, U>
where F: FnOnce(&mut T) -> &mut U {

This comment has been minimized.

@alexcrichton

alexcrichton May 27, 2015

Member

Stylistically we tend to format functions like this as:

pub fn foo()
    where ...
{
    // ...
}

This comment has been minimized.

@SimonSapin

SimonSapin May 27, 2015

Author Contributor

Done.

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 27, 2015

Here is an attempt at generalizing filter_map:

impl<'b: 'c, 'c, T: ?Sized> Ref<'b, T> {
    pub fn generalized_map<U: ?Sized, F, R>(orig: Ref<'b, T>, f: F) -> R
    where F: FnOnce(&T, &FnOnce(&'c U) -> Ref<'c, U>) -> R {
        f(orig._value, &move |new| Ref {
            _value: new,
            _borrow: orig._borrow,
       })
    }
}

But using it doesn’t borrow-check:

    let x: RefCell<Result<u32, ()>> = RefCell::new(Ok(5));
    let b: Option<Ref<u32>> = Ref::generalized_map(x.borrow(), |r, new_ref| match *r {
        Ok(ref value) => Some(new_ref(value)),
        Err(_) => None,
    });
    assert_eq!(*b.unwrap(), 5);
}

I’m not sure if I’m doing something wrong or if this is not possible to express in current Rust. Regardless, a function that takes a closure that takes another closure is quite convoluted, not a great API.

/// This is an associated function that needs to be used as `Ref::clone(...)`.
/// A `Clone` implementation or a method would interfere with the widespread
/// use of `r.borrow().clone()` to clone the contents of a `RefCell`.
#[unstable(feature = "core",

This comment has been minimized.

@alexcrichton

alexcrichton May 27, 2015

Member

Could these features switch to something other than "core"? Something like cell_extras would be fine.

This comment has been minimized.

@SimonSapin

SimonSapin May 27, 2015

Author Contributor

Done.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented May 27, 2015

This API looks good to me, and I also believe the implementation is sound, so I'm all good with this!

I'd like to hear more opinions about Type::foo as opposed to std::type::foo as well as how fancy we want to get with the extra methods on Ref{,Mut}.

@aturon

This comment has been minimized.

Copy link
Member

aturon commented May 28, 2015

+1 to the basic idea here.

@alexcrichton

I'd like to hear more opinions about Type::foo as opposed to std::type::foo

This one is tricky. In the limit, with full inherent associated items, many "single type" modules could be flattened to just the type, and this is one step further in that direction. The main advantage is that you're usually importing the type already, so this saves you an additional import of the module/function. To be honest, I think it's inevitable that APIs will drift in this direction, and don't see much reason to fight it in std.

(In the long run, this means that true free functions are predominantly going to be code that is not clearly associated with any particular type.)

as well as how fancy we want to get with the extra methods on Ref{,Mut}.

I think the level of fanciness in the current PR is appropriate.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented May 28, 2015

Ok, thanks again for the PR @SimonSapin! r=me with a squash

@SimonSapin SimonSapin changed the title Add std::cell::map_ref Add std::cell::Ref::map and friends May 28, 2015

Move std::cell::clone_ref to a clone associated function on std::cell…
…::Ref

... and generalize the bounds on the value type.

@SimonSapin SimonSapin force-pushed the SimonSapin:map_ref branch from 04f106f to 64e72b0 May 28, 2015

SimonSapin added a commit to SimonSapin/rust that referenced this pull request May 28, 2015

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 28, 2015

Squashed into two commits.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented May 28, 2015

@bors

This comment has been minimized.

Copy link
Contributor

bors commented May 28, 2015

⌛️ Testing commit 64e72b0 with merge b597347...

bors added a commit that referenced this pull request May 28, 2015

Auto merge of #25747 - SimonSapin:map_ref, r=alexcrichton
For slightly complex data structures like `rustc_serialize::json::Json`, it is often convenient to have helper methods like `Json::as_string(&self) -> Option<&str>`  that return a borrow of some component of `&self`.

However, when `RefCell`s are involved, keeping a `Ref` around is required to hold a borrow to the insides of a `RefCell`. But `Ref` so far only references the entirety of the contents of a `RefCell`, not a component. But there is no reason it couldn’t: `Ref` internally contains just a data reference and a borrow count reference. The two can be dissociated.

This adds a `map_ref` function that creates a new `Ref` for some other data, but borrowing the same `RefCell` as an existing `Ref`.

Example:

```rust
struct RefCellJson(RefCell<Json>);

impl RefCellJson {
    fn as_string(&self) -> Option<Ref<str>> {
        map_ref(self.borrow(), |j| j.as_string())
    }
}
```

r? @alexcrichton
@bors

This comment has been minimized.

Copy link
Contributor

bors commented May 29, 2015

💔 Test failed - auto-linux-64-nopt-t

@SimonSapin SimonSapin force-pushed the SimonSapin:map_ref branch from 64e72b0 to d0afa6e May 29, 2015

@SimonSapin

This comment has been minimized.

Copy link
Contributor Author

SimonSapin commented May 29, 2015

Oh noes.

Should be fixed, with this diff before squashing:

diff --git a/src/libcoretest/lib.rs b/src/libcoretest/lib.rs
index 78c7215..3d14b3f 100644
--- a/src/libcoretest/lib.rs
+++ b/src/libcoretest/lib.rs
@@ -24,6 +24,7 @@
 #![feature(step_by)]
 #![feature(slice_patterns)]
 #![feature(float_from_str_radix)]
+#![feature(cell_extras)]

 extern crate core;
 extern crate test;
@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented May 29, 2015

@bors

This comment has been minimized.

Copy link
Contributor

bors commented May 29, 2015

⌛️ Testing commit d0afa6e with merge 25fc917...

bors added a commit that referenced this pull request May 29, 2015

Auto merge of #25747 - SimonSapin:map_ref, r=alexcrichton
For slightly complex data structures like `rustc_serialize::json::Json`, it is often convenient to have helper methods like `Json::as_string(&self) -> Option<&str>`  that return a borrow of some component of `&self`.

However, when `RefCell`s are involved, keeping a `Ref` around is required to hold a borrow to the insides of a `RefCell`. But `Ref` so far only references the entirety of the contents of a `RefCell`, not a component. But there is no reason it couldn’t: `Ref` internally contains just a data reference and a borrow count reference. The two can be dissociated.

This adds a `map_ref` function that creates a new `Ref` for some other data, but borrowing the same `RefCell` as an existing `Ref`.

Example:

```rust
struct RefCellJson(RefCell<Json>);

impl RefCellJson {
    fn as_string(&self) -> Option<Ref<str>> {
        map_ref(self.borrow(), |j| j.as_string())
    }
}
```

r? @alexcrichton

@bors bors merged commit d0afa6e into rust-lang:master May 29, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@SimonSapin SimonSapin deleted the SimonSapin:map_ref branch May 29, 2015

XMPPwocky added a commit to XMPPwocky/rust that referenced this pull request May 29, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.