Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an API for Consumer to read all slots? #97

Open
eval-exec opened this issue Dec 21, 2022 · 12 comments
Open

Add an API for Consumer to read all slots? #97

eval-exec opened this issue Dec 21, 2022 · 12 comments

Comments

@eval-exec
Copy link

eval-exec commented Dec 21, 2022

Now, I read all slots from Consumer by this:

  let result = consumer.read_chunk(consumer.slots());

But sometimes I don't care how many slots hold by RingBuffer, I just want to read all slots.

Do you think it's a good idea to add a method for Consumer like this:

  let (all_slots_size, chunks) = consumer.read_all();
@mgeier
Copy link
Owner

mgeier commented Dec 21, 2022

I'm aware that calling slots() before getting a ReadChunk adds an extra step in some use cases.
But we are intentionally not automatically getting all slots, because this might avoid an unnecessary access of an atomic variable.

We could of course add a new function that returns a ReadChunk containing all available slots, but there are some potential problems to be aware of:

  • adding code adds maintenance burden, we should only do it if it's worth it
  • adding a function increases the API surface, maybe making it harder to learn?
  • how should we call this function?

I think read_all() is not a good name, because it will not necessarily read all slots.
Granted, read_chunk() is also not great, because it doesn't read a chunk, but it returns a ReadChunk from where we can read if we so choose.
But since read_chunk() is already not great, it might be even harder to find a good name for the new function.
Because this new function would also return a ReadChunk right?

I'm not against adding convenience functions, but the names must be good.
This is also why I haven't merged #66 yet, because I'm not convinced that the names are good.

 let (all_slots_size, chunks) = consumer.read_all();

I don't think that all_slots_size is necessary, because chunks have a .len() method.
And is chunks intentionally plural? I would expect one chunk to be returned, or am I missing something?

@mgeier
Copy link
Owner

mgeier commented Dec 21, 2022

You could also have a look at https://github.com/agerasev/ringbuf, which has a very different API, and it does not require specifying the number of slots, AFAICT.

@eval-exec
Copy link
Author

eval-exec commented Dec 22, 2022

I don't think that all_slots_size is necessary, because chunks have a .len() method. And is chunks intentionally plural? I would expect one chunk to be returned, or am I missing something?

I agree with you, all_slots_size is not necessary. And it should be chunk instead of chunks.

@eval-exec
Copy link
Author

You could also have a look at https://github.com/agerasev/ringbuf, which has a very different API, and it does not require specifying the number of slots, AFAICT.

Thank you. 😄

@Mek101
Copy link

Mek101 commented Jan 25, 2023

I would suggest a couple of things:

  • Instead of forcing the user to choose between getting a chunk of a fixed size (the current read_chunk implementation) or getting a slice of any size (the proposal), why add instead of the latter a function that returns a chunk of at least a given size?
    This way if one doesn't care about the chunk's size (like @eval-exec) one could just specify 0 as a minimum, while if one requires to have at least a certain size (while also accepting more), they can have the option to do so.
    I'm not a native english speaker, but I would suggest as a name for such method read_chunk_least(min: usize).
    Such a function would behave like @eval-exec's read_all() proposal when min is 0, return an error if there weren't enough available slots and return a chunk whose length is >= min in any other case.
  • Add a symmetric interface to the producer.

@mgeier
Copy link
Owner

mgeier commented Jan 26, 2023

Thanks for the suggestions!

if one requires to have at least a certain size (while also accepting more)

Is this a real use case that you have?
If yes, can you please elaborate?

@Mek101
Copy link

Mek101 commented Jan 26, 2023

Is this a real use case that you have? If yes, can you please elaborate?

I'm using this library in a small side project of mine as a channel between an IO-thread and a parser thread.
In order to keep the commits and system calls low, I would like to do vectored reads with the slices as large as possible.
And since doing a sytem-call with an excessively small buffer would be counterproductive, I would like to specify a minimum size.

@mgeier
Copy link
Owner

mgeier commented Jan 27, 2023

Thanks, that's an interesting use case!

So if I understand correctly, you need something like this (using existing API):

let chunk = c.read_chunk(c.slots()).unwrap();
if chunk.len() >= 25 {
    // do something
}

... but with less noise:

if let Ok(chunk) = c.read_chunk_least(25) {
    // do something
}

This new function could then also be used for simply getting all slots by using 0 (but this would still need unwrap()):

let chunk = c.read_chunk_least(0).unwrap();
// do something

I don't love the name, and I think it's not good to have two similar functions that both take the same arguments (a single number). I think this might be confusing.

For comparison, if we had a function to get all slots unconditionally (let's call it max_read_chunk() for now), your use case would look like this:

let chunk = c.max_read_chunk();
if chunk.len() >= 25 {
    // do something
}

This would still be more than a one-liner, but at least the .unwrap() could be avoided.

I think having to type a few characters less is not reason enough to add a new function, but avoiding unwrap() might be?

If we have to decide between the two options shown above, I think I would vote for the second one, which avoids unwrap() in all cases.

Another way to look at it is that both read_chunk() and read_chunk_least() would have the same signature (usize) -> Result<ReadChunk, ChunkError>, while max_read_chunk() would have () -> ReadChunk. Looking at this, I think that the function signatures on their own are more understandable than the names of the functions!

@mgeier
Copy link
Owner

mgeier commented Jan 27, 2023

There is also a subtle performance-related question: should read_chunk_least(25) try to return as many slots as possible (which requires touching an atomic variable) or would it be OK to return fewer slots (but of course still at least 25) if that could be done without an atomic operation?

Latter cannot be achieved with the current API.

@Mek101
Copy link

Mek101 commented Jan 27, 2023

So if I understand correctly, you need something like this (using existing API):

let chunk = c.read_chunk(c.slots()).unwrap();
if chunk.len() >= 25 {
    // do something
}

Mostly, but as the slots() documentation mentions there's a data race in there, which could mean I would get less than all availabe slots

... but with less noise:

if let Ok(chunk) = c.read_chunk_least(25) {
    // do something
}

Yes.

For comparison, if we had a function to get all slots unconditionally (let's call it max_read_chunk() for now), your use case would look like this:

let chunk = c.max_read_chunk();
if chunk.len() >= 25 {
    // do something
}

Doesn't seem like a bad idea to me. It would even work for my usecase since I'm wrapping it all anyway

There is also a subtle performance-related question: should read_chunk_least(25) try to return as many slots as possible (which requires touching an atomic variable) or would it be OK to return fewer slots (but of course still at least 25) if that could be done without an atomic operation?

A bool perhaps? It's a simple solution but probably not exactly elegant

@mgeier
Copy link
Owner

mgeier commented Jan 28, 2023

as the slots() documentation mentions there's a data race in there

Just to clarify, that's not a data race, at least I hope not!
If it were a data race (as Rust uses the term: https://doc.rust-lang.org/nomicon/races.html), it would be Undefined Behavior and rtrb would be unsound.

It is natural that once one thread has finished reading an atomic variable, the other thread can change it again. Whether this happens within a single function or whether another function is called (in this case c.slots()) doesn't make any real difference.

So even if we implement a new function read_chunk_least() or max_read_chunk() or whatever, the fact would remain that at some point we have to read from an atomic variable, and from that point on, we don't have any knowledge what happens to that variable.

From a thread-communication point of view, there is no difference between c.read_chunk(c.slots()).unwrap() and c.max_read_chunk(). In both cases, the atomic tail variable is read exactly once and when we get our ReadChunk, the Producer might have already committed new elements.

In the current implementation, c.read_chunk(c.slots()).unwrap() might result in a few redundant instructions, because none of the functions is marked #[inline]. It would be interesting to benchmark this and see if it makes a difference.

@mgeier
Copy link
Owner

mgeier commented Dec 9, 2023

Is there still interest in this feature?

I'm still not convinced that it is worth implementing, since it can be written as c.read_chunk(c.slots()).unwrap(), which doesn't seem that bad to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants