Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make set_iterate_upper_bound safe #377

Merged
merged 1 commit into from Apr 8, 2020
Merged

Conversation

wqfish
Copy link
Contributor

@wqfish wqfish commented Jan 17, 2020

This function is probably the only unsafe public API of this library. The unsafe version is not super easy to use since it requires the user to carefully make sure the buffer is alive. For example, the following
usage would be incorrect:

let mut opts = ReadOptions::default();
let upper_bound = b"k4".to_vec();
unsafe {
    opts.set_iterate_upper_bound(upper_bound);
}

A relatively easy way to make it safe is to store the buffer inside ReadOptions and store ReadOptions inside the iterator, so the buffer will not be dropped when the iterator is being used.

Fixes #299
Fixes #329

@wqfish
Copy link
Contributor Author

wqfish commented Feb 7, 2020

@aleksuss @vitvakatu thoughts on this one? I'd love to get a few remaining PRs merged so I can start using this library. Also happy to discuss alternative approaches.

@dralves
Copy link

dralves commented Feb 12, 2020

I'd be interested in seeing this change go in as well. Anything I can do to help?

@aleksuss
Copy link
Member

aleksuss commented Feb 12, 2020

@wqfish @dralves sorry for a silence. Just a little busy right now. I remember and will review this PR ASAP 👍🏻

src/db.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@vitvakatu vitvakatu left a comment

LGTM, thank you for your contribution

src/db.rs Outdated
/// When iterate_upper_bound is set, the inner C iterator keeps a pointer to the upper bound
/// inside `_readopts`. Storing this makes sure the upper bound is always alive when the
/// iterator is being used.
_readopts: ReadOptions,
Copy link
Member

@aleksuss aleksuss Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest it looks redundantly a bit.

src/db.rs Outdated
) -> Result<DBRawIterator<'a>, Error> {
unsafe {
Ok(DBRawIterator {
inner: ffi::rocksdb_create_iterator_cf(db.inner, readopts.inner, cf_handle.inner),
_readopts: readopts,
Copy link
Member

@aleksuss aleksuss Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we clone readopts here in favour to leave public API as is ?

Copy link
Contributor Author

@wqfish wqfish Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can clone it if we want to avoid API changes. I'm just curious what the plan is for the public API changes? It seems that you just merged #383 which already changed some of these APIs. Maybe we could make some public API changes in the next release anyway?

Copy link
Contributor

@vitvakatu vitvakatu Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wqfish the problem with current API is that readopts are not clonable, so you're forced to construct them every time you want to use them for a new iterator.
We should try to avoid unnecessary API changes whenever possible, because each API change is an additional work for our users.

Copy link
Contributor Author

@wqfish wqfish Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I see ReadOptions doesn't implement Clone, I don't feel this can be done in a backward compatible way. If we still let readopts own the buffer, we either need to take ownership of readopts like the current PR, or add some lifetime restriction to ensure that readopts outlives the iterator. Both would break some existing code and it seems that the changes caused by taking ownership of readopts might actually be minimal compared to other options.

Also note that the APIs for Snapshot currently take readopts by value, so this could be a bit consistency improvements as well.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened with this discussion?
At least making ReadOptions clonnable would be very desirable - right now this API got broken if one just has ReadOptions passed by reference / stored once in the wrapper struct.

@thefallentree
Copy link

thefallentree commented Mar 2, 2020

hello, can we merge this?

@wqfish
Copy link
Contributor Author

wqfish commented Apr 2, 2020

@aleksuss how can we move forward on this one?

@aleksuss
Copy link
Member

aleksuss commented Apr 3, 2020

@wqfish first of all we need to resolve the merge conflict.

This function is probably the only unsafe public API of this library.
The unsafe version is not super easy to use since it requires the user
to carefully make sure the buffer is alive. For example, the following
usage would be incorrect:

```
let mut opts = ReadOptions::default();
let upper_bound = b"k4".to_vec();
unsafe {
    opts.set_iterate_upper_bound(upper_bound);
}
```

A relatively easy way to make it safe is to store the buffer inside
`ReadOptions` and store `ReadOptions` inside the iterator, so the buffer
will not be dropped when the iterator is being used.

Fixes rust-rocksdb#299
Fixes rust-rocksdb#329
@wqfish
Copy link
Contributor Author

wqfish commented Apr 3, 2020

@aleksuss I rebased :)

@@ -130,6 +130,7 @@ pub struct BlockBasedOptions {

pub struct ReadOptions {
pub(crate) inner: *mut ffi::rocksdb_readoptions_t,
iterate_upper_bound: Option<Vec<u8>>,
Copy link
Contributor

@ordian ordian Apr 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using Option<Box<[u8]>> instead. First, it occupies less space (16 bytes vs 24 bytes) and second, it guarantees the pointer will remain the same (vec can reallocate, it doesn't in this case, but still it's not guaranteed at compile-time).

/// When iterate_upper_bound is set, the inner C iterator keeps a pointer to the upper bound
/// inside `_readopts`. Storing this makes sure the upper bound is always alive when the
/// iterator is being used.
_readopts: ReadOptions,
Copy link
Contributor

@ordian ordian Apr 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why ReadOptions needs to outlive the iterator, it seems to me that only upper_bound needs that.
You can see e.g. that this test passes

#[test]
fn iterator_test_upper_bound() {
    let path = "_rust_rocksdb_iteratortest_upper_bound";
    {
        let db = DB::open_default(path).unwrap();
        db.put(b"k1", b"v1").unwrap();
        db.put(b"k2", b"v2").unwrap();
        db.put(b"k3", b"v3").unwrap();

        let mut readopts = ReadOptions::default();
        unsafe { readopts.set_iterate_upper_bound(b"k3"); }

        let iter = db.iterator_opt(IteratorMode::Start, &readopts);

        drop(readopts);
        // ^^^^^^^^^^^^ NOTE THE DROP ABOVE

        let expected: Vec<_> = vec![(b"k1", b"v1"), (b"k2", b"v2")]
            .into_iter()
            .map(|(k, v)| (k.to_vec().into_boxed_slice(), v.to_vec().into_boxed_slice()))
            .collect();
        assert_eq!(expected, iter.collect::<Vec<_>>());
    }
    let opts = Options::default();
    DB::destroy(&opts, path).unwrap();
}

EDIT: simplified the test.

Copy link
Contributor Author

@wqfish wqfish Apr 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that only upper_bound needs that and this test passes. However, the goal of this PR is to make set_iterate_upper_bound safe, so users can use this API without writing unsafe code, and reduce the possibility of user messing up by calling this API incorrectly. And this test above doesn't show how we can achieve that goal without making any changes to the APIs.

Let's say we do not make changes to the existing APIs, so user would still pass a reference to functions like iterator_opt. Can you elaborate how the user code would look like in this approach?

Copy link
Contributor

@ordian ordian Apr 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I haven't really thought this through. One alternative would be to tie the lifetime of &ReadOptions to the iterator (&'b ReadOptions or &'a ReadOptions), but I'm not sure it's better than passing it by value. The problem with passing it by value though is that a user can't reuse the same ReadOptions as it's not clonable.

Or maybe it's possible to do some trickery with Pin<Box<[u8]>>?

Copy link
Contributor

@ordian ordian Apr 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea is to store the upper_bound as Option<Arc<[u8]>> in both ReadOptions and the iterator. I think this could work.

Copy link
Contributor Author

@wqfish wqfish Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought about doing the lifetime stuff, but I kind of feel that it's not worth the effort. Adding lifetime restriction is not going to be a backward compatible change anyway, since the users might have the following code:

struct MyDB {
    inner: rocksdb::DB,
}

impl MyDB {
    fn get_iter(&self) -> DBIterator {
        let readopts = ...;
        DB::iterator_opt(&readopts)
    }
}

Also note that the snapshot APIs take read options by value:

pub fn iterator_opt(&self, mode: IteratorMode, mut readopts: ReadOptions) -> DBIterator<'a> {
So we could make some consistency improvements :) I honestly think passing by value is gonna be a small change on the user side.

I'll give the Arc idea a try and see how the code looks like.

}

pub fn iterator_opt<'a: 'b, 'b>(
&'a self,
mode: IteratorMode,
readopts: &ReadOptions,
readopts: ReadOptions,
Copy link
Contributor

@ordian ordian Apr 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as noted in #377 (comment), this API pessimization is not necessary

@ordian
Copy link
Contributor

ordian commented Apr 8, 2020

@aleksuss my concern about API pessimization shouldn't be a blocker for merging, it's fine as is

@aleksuss aleksuss merged commit 81aa016 into rust-rocksdb:master Apr 8, 2020
1 check passed
@wqfish wqfish deleted the unsafe branch Apr 9, 2020
@ilblackdragon
Copy link

ilblackdragon commented May 25, 2020

@wqfish This broke the usage like this - https://github.com/nearprotocol/nearcore/blob/master/core/store/src/db.rs#L169, when this is wrapped and ReadOptions are created once.

Can we make ReadOptions clonnable at least?
But also consistency is broken between get_cf_opt taking read options by reference and iterator_cf_opt by value.

@aleksuss
Copy link
Member

aleksuss commented May 25, 2020

Can we make ReadOptions clonnable at least?

It's not trivial AFAIK. Now we can create a shallow copy of the ReadOptions only. The reason is because of rocksdb C API doesn't provide a function for creating deep copy of rocksdb_readoptions_t which ReadOptions includes.

@wqfish
Copy link
Contributor Author

wqfish commented May 25, 2020

I can check if it's easy to add a C API to copy the options.

@aleksuss
Copy link
Member

aleksuss commented May 25, 2020

@wqfish FYI @DarkEld3r is investigating it now.

@stanislav-tkach
Copy link
Contributor

stanislav-tkach commented May 25, 2020

@wqfish Originally I was interested in implementing copy for Options, but as far as I can see, all options kinds share the same issue. To fix this a copy constructor should be added and then exported in the C API. I don't know the reasons for not providing such constructors, so I decided to ask about that before submitting the pull request: https://groups.google.com/forum/#!topic/rocksdb/lsNm9PG3wp4

@ilblackdragon
Copy link

ilblackdragon commented May 26, 2020

How about original suggestion @ordian to add lifetime to the iterator to call iterator_cf_opt with reference to options? This would be consistent with got get_cf_opt and other functions are. Also would make 0.13->0.14 migration a bit simpler :)

rleungx pushed a commit to rleungx/rust-rocksdb that referenced this pull request Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store upper_bound in DBIterator Dangling pointers with set_iterate_upper_bound
8 participants