Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests & more #34

Merged
merged 1 commit into from
Sep 20, 2023
Merged

Add tests & more #34

merged 1 commit into from
Sep 20, 2023

Conversation

QuarticCat
Copy link
Contributor

@QuarticCat QuarticCat commented Aug 24, 2023

I'm auditing Kanal. Till now I've looked through most of the blocking implementation except oneshot. Here's some feedback:

  • I'm worried about the fairness of your backoff spinlock implementation. The longer a thread waits, the harder it gets the lock. MCS lock might be a better alternative. We can store MCS nodes in senders/receivers. It can use the backoff strategy as well.
  • The underlying VecDeque can cause high latency and memory peak when resizing. Some async tasks are sensitive to latency. Both tokio and monoio utilize block lists to implement their channels. In addition, this design might enable some lock-free operations.
  • signal.rs generally looks good to me. But auditing atomics is really hard. Maybe you can use https://github.com/tokio-rs/loom to do some tests.
  • I added some tests regarding overaligned ZSTs. The rationale of these tests is that if the alignment is not respected, it will fail debug assertions inside std::ptr::{read, write, copy_nonoverlapping}.
  • I have an idea about recording the precise state of the channel to reduce the number of branches. For example, if we know there are waiting senders when sending, we can directly create a signal and wait rather than check the queue and waitlist first. A prototype can be found in my chan_state branch. The code is a bit messy and the benchmark result doesn't seem good.
  • I don't understand the purpose of send_option_timeout & try_send_option.
  • I think KanalPointer is entirely redundant. Why don't you directly store the final buffer in Signal, i.e., signal_terminator -> signal.buffer rather than signal_terminator -> signal.ptr -> buffer?

@fereidani
Copy link
Owner

fereidani commented Aug 26, 2023

Hi, Thank you for your contribution! I will review this ASAP.
Edit: The test part looks okay, but Signal::wake seems like it contains a race condition, I need to think about it.

@fereidani
Copy link
Owner

I think your code could cause race conditions in this part of the code because it does not clone the parked thread before changing its value. As std::thread::park(); does not guarantee that thread block and may return immediately, it is possible that the thread wakes instantly reads the flag and destroys signal thread before waker process get a chance to clone thread handler.
https://github.com/fereidani/kanal/blob/0d3a3e1e17ddb55280f4cc4d032ebd1e922a7bc0/src/signal.rs#L154C1-L160C23

  • I don't understand your statement about why the longer thread waits the harder it gets for it, in implementation spin time increases by each failed attempt, i agree that there are more works to be done on the lock part of kanal, but kanal priorities short access as it asumes channel lock time is small and in practice we do not only send/recv messages between threads like our benchmark but we do some processes in between so lock is not contended as much as our benchmark, keep in mind that there is no requirement or specification about channel fairness in original channel article and it is only conventional, if someone really needs the fairness we have the std-lock feature which provides that.
  • Adding a custom queue would be a good idea, but benchmarks and extensive tests are required.
  • Unfortunately, currently, I am busy with another project so I can't add loom to the project right now but PR would be welcome.
  • chan_state idea looks cool and neat but I'm unsure it is possible to achieve same performance with it, i put so much time optimizing branches and jumps in Kanal, but i am willing to change it if you provide a implementation with same or better performance.
  • send_option_timeout and try_send_option is for really rare conditions when a programmer want to send something but is not willing to wait for a receiver to appear and take its data, meaning if channel queue is not full or there is a available receiver it finishes its job but in other cases it will return back the data, I know it's not common but i saw some usecases in practice for it.
  • kanal pointer optimizes data accesses and eliminates multiple reads and writes for types with size(T) > pointer_size by using direct stack copy from the sender, the idea of stack-to-stack copy is from golang but i optimized it further by skipping pointer access for types smaller than pointer type using KanalPointer.

@QuarticCat
Copy link
Contributor Author

As std::thread::park(); does not guarantee that thread block and may return immediately, it is possible that the thread wakes instantly reads the flag and destroys signal thread before waker process get a chance to clone thread handler.

You're right. I'll revert that commit tomorrow.

I don't understand your statement about why the longer thread waits the harder it gets for it.

Think about a high contention scenario, multiple threads are contending for one mutex. As the spin time increases, there's more chance that some thread else gets the lock before its next spin. Practically this rarely happens though, I suppose.

kanal pointer optimizes data accesses and eliminates multiple reads and writes for types with size(T) > pointer_size by using direct stack copy from the sender.

I know, but signal is also stored on the stack, isn't it? It's still a stack-to-stack copy.

I'm experimenting with some of the ideas I mentioned above. If I get any progress, I'll send you a PR. :)

@fereidani
Copy link
Owner

Try changing and experimenting with it, but in my experience putting it outside of the signal helps the compiler generate a more optimized code as it doesn't need to extract the struct field from the signal structure.
Let me know when your PR is ready for merge, thanks.

@QuarticCat
Copy link
Contributor Author

In my experience putting it outside of the signal helps the compiler generate a more optimized code as it doesn't need to extract the struct field from the signal structure.

Sounds fair.

Let me know when your PR is ready for merge, thanks.

It's ready.

@QuarticCat
Copy link
Contributor Author

MCS lock might be a better alternative. We can store MCS nodes in senders/receivers. It can use the backoff strategy as well.

It turns out that MCS lock isn't applicable here. Since senders/receivers are Sync, we cannot store MCS nodes inside them. Otherwise multiple threads can lock on one node multiple times, which is an error. 😢

@fereidani fereidani merged commit 2dc6a1e into fereidani:main Sep 20, 2023
@fereidani
Copy link
Owner

Thank you for your contribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants