Add tests & more #34

QuarticCat · 2023-08-24T05:10:26Z

I'm auditing Kanal. Till now I've looked through most of the blocking implementation except oneshot. Here's some feedback:

I'm worried about the fairness of your backoff spinlock implementation. The longer a thread waits, the harder it gets the lock. MCS lock might be a better alternative. We can store MCS nodes in senders/receivers. It can use the backoff strategy as well.
The underlying VecDeque can cause high latency and memory peak when resizing. Some async tasks are sensitive to latency. Both tokio and monoio utilize block lists to implement their channels. In addition, this design might enable some lock-free operations.
signal.rs generally looks good to me. But auditing atomics is really hard. Maybe you can use https://github.com/tokio-rs/loom to do some tests.
I added some tests regarding overaligned ZSTs. The rationale of these tests is that if the alignment is not respected, it will fail debug assertions inside std::ptr::{read, write, copy_nonoverlapping}.
I have an idea about recording the precise state of the channel to reduce the number of branches. For example, if we know there are waiting senders when sending, we can directly create a signal and wait rather than check the queue and waitlist first. A prototype can be found in my chan_state branch. The code is a bit messy and the benchmark result doesn't seem good.
I don't understand the purpose of send_option_timeout & try_send_option.
I think KanalPointer is entirely redundant. Why don't you directly store the final buffer in Signal, i.e., signal_terminator -> signal.buffer rather than signal_terminator -> signal.ptr -> buffer?

fereidani · 2023-08-26T11:34:33Z

Hi, Thank you for your contribution! I will review this ASAP.
Edit: The test part looks okay, but Signal::wake seems like it contains a race condition, I need to think about it.

fereidani · 2023-08-26T16:38:46Z

I think your code could cause race conditions in this part of the code because it does not clone the parked thread before changing its value. As std::thread::park(); does not guarantee that thread block and may return immediately, it is possible that the thread wakes instantly reads the flag and destroys signal thread before waker process get a chance to clone thread handler.
https://github.com/fereidani/kanal/blob/0d3a3e1e17ddb55280f4cc4d032ebd1e922a7bc0/src/signal.rs#L154C1-L160C23

I don't understand your statement about why the longer thread waits the harder it gets for it, in implementation spin time increases by each failed attempt, i agree that there are more works to be done on the lock part of kanal, but kanal priorities short access as it asumes channel lock time is small and in practice we do not only send/recv messages between threads like our benchmark but we do some processes in between so lock is not contended as much as our benchmark, keep in mind that there is no requirement or specification about channel fairness in original channel article and it is only conventional, if someone really needs the fairness we have the std-lock feature which provides that.
Adding a custom queue would be a good idea, but benchmarks and extensive tests are required.
Unfortunately, currently, I am busy with another project so I can't add loom to the project right now but PR would be welcome.
chan_state idea looks cool and neat but I'm unsure it is possible to achieve same performance with it, i put so much time optimizing branches and jumps in Kanal, but i am willing to change it if you provide a implementation with same or better performance.
send_option_timeout and try_send_option is for really rare conditions when a programmer want to send something but is not willing to wait for a receiver to appear and take its data, meaning if channel queue is not full or there is a available receiver it finishes its job but in other cases it will return back the data, I know it's not common but i saw some usecases in practice for it.
kanal pointer optimizes data accesses and eliminates multiple reads and writes for types with size(T) > pointer_size by using direct stack copy from the sender, the idea of stack-to-stack copy is from golang but i optimized it further by skipping pointer access for types smaller than pointer type using KanalPointer.

QuarticCat · 2023-08-26T18:17:59Z

As std::thread::park(); does not guarantee that thread block and may return immediately, it is possible that the thread wakes instantly reads the flag and destroys signal thread before waker process get a chance to clone thread handler.

You're right. I'll revert that commit tomorrow.

I don't understand your statement about why the longer thread waits the harder it gets for it.

Think about a high contention scenario, multiple threads are contending for one mutex. As the spin time increases, there's more chance that some thread else gets the lock before its next spin. Practically this rarely happens though, I suppose.

kanal pointer optimizes data accesses and eliminates multiple reads and writes for types with size(T) > pointer_size by using direct stack copy from the sender.

I know, but signal is also stored on the stack, isn't it? It's still a stack-to-stack copy.

I'm experimenting with some of the ideas I mentioned above. If I get any progress, I'll send you a PR. :)

fereidani · 2023-09-03T19:57:03Z

Try changing and experimenting with it, but in my experience putting it outside of the signal helps the compiler generate a more optimized code as it doesn't need to extract the struct field from the signal structure.
Let me know when your PR is ready for merge, thanks.

QuarticCat · 2023-09-03T20:00:17Z

In my experience putting it outside of the signal helps the compiler generate a more optimized code as it doesn't need to extract the struct field from the signal structure.

Sounds fair.

Let me know when your PR is ready for merge, thanks.

It's ready.

QuarticCat · 2023-09-03T22:29:56Z

MCS lock might be a better alternative. We can store MCS nodes in senders/receivers. It can use the backoff strategy as well.

It turns out that MCS lock isn't applicable here. Since senders/receivers are Sync, we cannot store MCS nodes inside them. Otherwise multiple threads can lock on one node multiple times, which is an error. 😢

fereidani · 2023-09-20T00:57:55Z

Thank you for your contribution

add overaligned_zst tests

61fe69a

QuarticCat mentioned this pull request Sep 4, 2023

Oneshot: Data race detected #35

Open

fereidani merged commit 2dc6a1e into fereidani:main Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests & more #34

Add tests & more #34

QuarticCat commented Aug 24, 2023 •

edited

Loading

fereidani commented Aug 26, 2023 •

edited

Loading

fereidani commented Aug 26, 2023

QuarticCat commented Aug 26, 2023

fereidani commented Sep 3, 2023

QuarticCat commented Sep 3, 2023

QuarticCat commented Sep 3, 2023

fereidani commented Sep 20, 2023

Add tests & more #34

Add tests & more #34

Conversation

QuarticCat commented Aug 24, 2023 • edited Loading

fereidani commented Aug 26, 2023 • edited Loading

fereidani commented Aug 26, 2023

QuarticCat commented Aug 26, 2023

fereidani commented Sep 3, 2023

QuarticCat commented Sep 3, 2023

QuarticCat commented Sep 3, 2023

fereidani commented Sep 20, 2023

QuarticCat commented Aug 24, 2023 •

edited

Loading

fereidani commented Aug 26, 2023 •

edited

Loading