Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solution 1# for client halt bug #56

Closed
wants to merge 1 commit into from

Conversation

KolbyML
Copy link
Member

@KolbyML KolbyML commented May 23, 2023

fixes #54 and #44

The issue was / how to replicate the bug

        // Write as much data as possible into send buffer.
        while let Some((data, ..)) = self.pending_writes.front() {
            if data.len() <= send_buf.available() {
                let (data, tx) = self.pending_writes.pop_front().unwrap();
                send_buf.write(&data).unwrap();
                let _ = tx.send(Ok(data.len()));
                self.writable.notify_one();
            } else {
                break;
            }
    }

If data.len() is > send_buf.available then the code will "lock"/stale data from being able to be sent. A.k.a data will never be added to the send buffer resulting in a timeout with 0 packets sent.

pictures of wireshark testing this bug can be seen here #54 (comment)

where did this bug come from?

I believe it is because receive buffer was written first where having a limit makes sense (for sending the final packet), but then the code was copy pasted creating send buffer this limit was kept but didn't have a usecase, but still had checks which then caused the stall.

How was the bug fixed?

By removing avaliability/size limit from send buffer and hence the unneeded checks. Resolving the stall send bug.

To iterate there isn't a reason why we should have that limit for send buffer. There is already code that handles windowing packets.

@KolbyML
Copy link
Member Author

KolbyML commented May 23, 2023

@jacobkaufmann @jacobkaufmann @emhane

ping for review

@emhane
Copy link
Member

emhane commented May 24, 2023

I pulled your fix, I can send at least 10 MB no problem now.

Could you link the code lines you base your assumptions on? It helps to argue for your case by speeding up the process for readers tying to recreate your trail of thought. I'm interested in "There is already code that handles windowing packets", because many times there are reasons to mange congestion control on multiple levels.

Rn multiplexing is like this:
one udp socket to one utp socket
one utp socket to many streams
one stream to one connection
one connection to one congestion control

One congestion controller per stream exists (conn acceptor and initiator), hence it seems the congestion controller is accounting for the concurrent sending and receiving of packets on that stream, but isn't managing congestion control of concurrent streams, at least not in an informed way since that would require the congestion controller to know the state of the other congestion controllers (other streams). We only have one udp socket to receive and send packets from many streams on and we want those streams to progress concurrently, so congestion control of streams does matter. Therefore we may need the check if data.len() <= send_buf.available() { after all for this purpose.

Nonetheless, this fix doesn't let me send 100 MB, which it should to be stable. Running your fix on my error-handling branch I get this informative error: thread 'stream::test::test_transfer_100_megabyte' panicked at 'Should read 100 megabyte: Conn(ConnClosed("idle uTP connection time out"))', which leads me to think the error originates somewhere here and here where the idle time out is reset. Adding the connection endpoint to that error tells me it is the acceptor of the connection that times out which in my test only receives data: thread 'stream::test::test_transfer_100_megabyte' panicked at 'Should read 100 megabyte: Conn(ConnClosed("idle uTP connection time out, Acceptor((10328, 10650))"))'.

Running this test for 10 MB without your fix Kolby, I get the same idle time out error, but from the initiator side, which in my test only sends data: thread 'stream::test::test_transfer_10_megabyte' panicked at 'Should send 10 megabyte: Conn(ConnClosed("idle uTP connection time out, Initiator((36072, 1))")).

By this information, I think that use of the idle time out is funky and needs to be inspected.

Great work!

@KolbyML
Copy link
Member Author

KolbyML commented May 24, 2023

:shipit: good dective work I will debug with 100MB now. Recieveing a packet should reset the idle time in theory so maybe I will look at that

@KolbyML
Copy link
Member Author

KolbyML commented May 24, 2023

Could you link the code lines you base your assumptions on? It helps to argue for your case by speeding up the process for readers tying to recreate your trail of thought. I'm interested in "There is already code that handles windowing packets", because many times there are reasons to mange congestion control on multiple levels.

It would probably just be quicker to control f send_buffer then control f receive_buffer. The limited shouldn't affect congestion controls but I guess I should look into it more.

@KolbyML
Copy link
Member Author

KolbyML commented May 24, 2023

100mb fail is caused by a timeout bug, cause when I run wire shark the data is being sent fine.

So this PR fixes a stale bug, I will try to find the timeout bug

@jacobkaufmann
Copy link
Collaborator

this is not a bug, and the limit on the send buffer is deliberate. there is a TODO to make the capacity configurable.

@KolbyML
Copy link
Member Author

KolbyML commented May 24, 2023

this is not a bug, and the limit on the send buffer is deliberate. there is a TODO to make the capacity configurable.

I am not saying having a limit is a bug. I am saying

        // Write as much data as possible into send buffer.
        while let Some((data, ..)) = self.pending_writes.front() {
            if data.len() <= send_buf.available() {
                let (data, tx) = self.pending_writes.pop_front().unwrap();
                send_buf.write(&data).unwrap();
                let _ = tx.send(Ok(data.len()));
                self.writable.notify_one();
            } else {
                break;
            }
    }

If this else condition is hit the client will stall which doesn't seem to be intentional? Hence the stall is unintential?

image
^ when the else condition is hit this is what happens
image
^ this is what is supposed to happen

So the limit it itself isn't the problem, it is how it is handled. I proposed this as a solution cause the way the library is archtected there isn't an easy way to resolve this a different way. also pending_writes and send buffer are practically the same thing.

I don't see how this isn't a bug saying this will halt your client and kill your connection

@KolbyML
Copy link
Member Author

KolbyML commented May 24, 2023

Here is an alternate solution that solves the halt problem #59 and gives the responsibility to the user (this also aligns which your comment).

I think this solution is flawed though since if someone does

1.5MB 0.2MB
the 1.5MB will be cut off and the user will try to resend it

but then the data will be unordered
1MB 0.2MB 0.5MB (0.5 from the original send)

A way to solve this would be to repush the data onto pending writes, but that would have the downside of more complexity since we would need to remember we already pushed 1MB for the rx channel.

So a solution to that would be remove pending writes all together since it is redundant then have the user interact like

User -> send buffer

instead of

User -> pending_writes -> send buffer

this would resolve the complexity and give the responsibility to the user. Why do we need 2 write buffers?

@KolbyML KolbyML changed the title Remove limit on send buffer Solution 1# for client halt bug May 25, 2023
@jacobkaufmann
Copy link
Collaborator

yea sorry I think I misinterpreted your comment. what I think should happen is that an attempt to write returns an error when the send buffer (or the sum of send buffer and pending writes) exceeds the capacity. the capacity exists to limit the amount of resource consumption. related, the capacity should be set to a value that is greater than or equal to the largest conceivable write your program will perform.

the separation between pending writes and send buffer exists so that send buffer can be quite simple. I'm not saying a simpler design isn't available, but the solution here where the capacity of the send buffer is removed is not in the right direction.

@KolbyML
Copy link
Member Author

KolbyML commented May 26, 2023

the separation between pending writes and send buffer exists so that send buffer can be quite simple. I'm not saying a simpler design isn't available, but the solution here where the capacity of the send buffer is removed is not in the right direction.

I think that is fair, the short term solution would then be to return an error like you said then add a todo possibly to simplify that

@KolbyML
Copy link
Member Author

KolbyML commented May 26, 2023

closed in favor of #61

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants