Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOF returns earlier than the last message on receipt of Reset #652

Closed
enobufs opened this issue May 2, 2019 · 3 comments · Fixed by pion/sctp#44, pion/datachannel#18 or #653
Closed

Comments

@enobufs
Copy link
Member

enobufs commented May 2, 2019

Your environment.

  • Version: sctp e0c13 (webrtc@v2.0.10, sctp@v1.6.1)
  • Browser: n/a

What did you do?

libp2p test (SubTestPingPong) is failing.
The test uses ioutil.ReadAll() in datachannel's detached mode which reads data until EOF is received.
See slack thread for more details: https://gophers.slack.com/archives/CAK2124AG/p1556658527035200

What did you expect?

When the sender side's SCTP stream is closed, Reset chunk will be received with indicating the last TSN in the chunk. Receiver side should forward all the data received that are equal to or smaller than the reset's TSN.

What happened?

stream.ReadSCTP() returns EOF before the last message (which has TSN <= the last TSN reset chunk indicates) becomes available to read.

@enobufs
Copy link
Member Author

enobufs commented May 2, 2019

Related issues:

Race condition in exsiting tests (in pion/webrtc - datachannel tests)

// Note(albrow): We need to create a data channel in order to trigger ICE
// candidate gathering in the background for the JavaScript/Wasm bindings. If
// we don't do this, the complete offer including ICE candidates will never be
// generated.
if _, err := pcOffer.CreateDataChannel("initial_data_channel", nil); err != nil {
return err
}

Some tests expects the first datachannel opened is the one the test tried to open and cash the pointer in the test, but the first one opened may be the one created by the above code, causing unexpected behavior with race conditions. All the tests using the above code should check the data channel label always.

On receipt of EOS, pion/datachannel closes the outgoing channel

This code should be removed:
https://github.com/pion/datachannel/blob/29ea294b9d9e6af5fd37cfc3f035a4d38b48cba5/datachannel.go#L178-L185 (edited)

It is the requirement from libp2p that even EOF is received on a datachannel, the datachannel be available for write until the app calls Close on the datachannel. (in the context of detached mode)

@enobufs
Copy link
Member Author

enobufs commented May 3, 2019

Current pion/sctp has a few problems which prevent 'reset' transmission from correctly causing EOF on the other end:

(1) Bug with a comparison between senderLastTSN and peerLastTSN.
(2) for loop inside stream.ReadSCTP() has a multiple of lock/unlock phases, which is causing s.readErr was evaluated before checking reassembly queue when it acquires a lock again. Consequently, io.EOF was returned when there's a data in the reassembly queue.
(3) Related to (2), the condition variable was not associated with the mutex. Probably this is why multiple of lock/unlock were needed. (at least, read from the queue and evaluate error when no data, has to be atomic!)
(4) When sender side closes a stream and sends a reset, if there’s a DATA chunk in the pending queue (TSN not assigned yet), the data will be lost.
(5) There is only one instance of paramOutgoingResetRequest and chunkReconfig at a time. Therefore, it would not work if there are multiple streams and they are resetting at the same time (with different TSN).

A PR will follow!

enobufs added a commit to pion/sctp that referenced this issue May 3, 2019
enobufs added a commit to pion/datachannel that referenced this issue May 3, 2019
enobufs added a commit to pion/datachannel that referenced this issue May 3, 2019
@enobufs
Copy link
Member Author

enobufs commented May 3, 2019

pingpong_test.zip

The fixes yield much higher success ratio (90%) but it still occasionally fail.
I noticed dc.Detach() is returning an error (the server(offerer side).

OnDataChannel was called
Detach failed: datachannel not opened yet, try calling Detach from OnOpensctp TRACE: 10:49:06.163881 association.go:1441: [0xc00038c000] handleReconfig

@backkem do you have any thoughts? (I feel like I've been in a long and deep rabbit hole - just digging deeper and never been able to come out of it... lol)

enobufs added a commit to pion/sctp that referenced this issue May 3, 2019
enobufs added a commit that referenced this issue May 3, 2019
enobufs added a commit that referenced this issue May 3, 2019
enobufs added a commit that referenced this issue May 5, 2019
Added another ping-pong test without using detach
Resolves #652
enobufs added a commit that referenced this issue May 5, 2019
backkem pushed a commit to pion/sctp that referenced this issue May 7, 2019
backkem pushed a commit to pion/sctp that referenced this issue May 7, 2019
backkem added a commit to pion/sctp that referenced this issue May 7, 2019
backkem added a commit to pion/sctp that referenced this issue May 7, 2019
backkem pushed a commit to pion/datachannel that referenced this issue May 7, 2019
backkem pushed a commit to pion/datachannel that referenced this issue May 7, 2019
backkem added a commit that referenced this issue May 7, 2019
enobufs added a commit that referenced this issue May 7, 2019
Use datachannel@v1.4.1
Resolves #652
enobufs added a commit that referenced this issue May 7, 2019
enobufs added a commit that referenced this issue May 7, 2019
enobufs added a commit that referenced this issue May 8, 2019
enobufs added a commit that referenced this issue May 8, 2019
enobufs added a commit that referenced this issue May 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant