Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: connection failed #78

Closed
bdandy opened this issue Aug 15, 2018 · 23 comments · Fixed by #130
Closed

Regression: connection failed #78

bdandy opened this issue Aug 15, 2018 · 23 comments · Fixed by #130
Assignees
Labels
bug Something isn't working

Comments

@bdandy
Copy link

bdandy commented Aug 15, 2018

The latest working commit c87c3ca
Current master doesn't establish connection at all, I think issue related because of latest ICE changes.
Tested with chrome browser.
Network map: client -> NAT -> server with real ip

@Sean-Der
Copy link
Member

Oof, sorry about that @bogdan-d

Will debug+fix after work, thanks!

@Sean-Der
Copy link
Member

Hey @bogdan-d can you cherry-pick c525537

If that works can push to master. Right now we are discarding STUN candidates (I will work on getting E2E testing working) would have really helped in this situation.

thanks!

@Sean-Der
Copy link
Member

Hey @bogdan-d I pushed that master, just to be safe.

Can you test again and see if that fixes it for you, I haven't had a chance to confirm that is it.

@bdandy
Copy link
Author

bdandy commented Aug 16, 2018

@Sean-Der Can't build master because of moved ice to internals.
Ice is required for this code

peerConnection.OnICEConnectionStateChange = func(connectionState ice.ConnectionState) {
		fmt.Printf("Connection State has changed %s \n", connectionState.String())
		switch connectionState {
		case ice.ConnectionStateCompleted:
			streamer.SetVideoOut(h264track.Samples)
			streamer.SetAudioOut(opusTrack.Samples)

		case ice.ConnectionStateFailed:
			streamer.SetVideoOut(nil)
			streamer.SetAudioOut(nil)
		}
	}

@bdandy
Copy link
Author

bdandy commented Aug 16, 2018

Tried commit 12e7e87 - still the same

@Sean-Der
Copy link
Member

Sean-Der commented Aug 16, 2018

@bogdan-d I reverted moving ICE to internals

Are both the client+server using STUN servers? Before pion-WebRTC would just accept any ICE traffic (because candidate parsing wasn't done) but now it filters.

So make sure the IPs that each are communicating via are in the SDP. It worked for me when I tired, but will try a different couple scenarios.

@Sean-Der
Copy link
Member

Sean-Der commented Aug 16, 2018

Hey @bogdan-d

I updated the examples, that should be more useful. I just tested and the following scenario works using master. Would like to fix up your scenario, I am hoping that the ICE security checks (and lack of STUN on one side) is what is breaking you.

AWS (pion-WebRTC) -> NAT -> Public Internet <- NAT <- Home (Chromium)

@bdandy
Copy link
Author

bdandy commented Aug 17, 2018

Hey @Sean-Der
Both client and server are using same STUN server like in examples:

// Create a new RTCPeerConnection
	peerConnection, err := webrtc.New(webrtc.RTCConfiguration{
		ICEServers: []webrtc.RTCICEServer{
			{
				URLs: []string{"stun:stun.l.google.com:19302"},
			},
		},
	})
    let pc = new RTCPeerConnection({iceServers: [{urls: "stun:stun.l.google.com:19302"}]});

@bdandy
Copy link
Author

bdandy commented Aug 17, 2018

Here are SDP's of both request and responce

v=0
o=- 8636688666754455556 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio video
a=msid-semantic: WMS
m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:0qPb
a=ice-pwd:Gz1OFI3T9XqFYjwV/cOSR5Jy
a=ice-options:trickle
a=fingerprint:sha-256 AC:14:1C:7E:D9:E4:94:34:CC:CE:42:4C:65:A7:40:BD:7C:3E:A6:AA:AD:40:02:73:80:91:6A:82:33:25:83:1F
a=setup:actpass
a=mid:audio
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=recvonly
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 102 123 127 122 125 107 108 109 124
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:0qPb
a=ice-pwd:Gz1OFI3T9XqFYjwV/cOSR5Jy
a=ice-options:trickle
a=fingerprint:sha-256 AC:14:1C:7E:D9:E4:94:34:CC:CE:42:4C:65:A7:40:BD:7C:3E:A6:AA:AD:40:02:73:80:91:6A:82:33:25:83:1F
a=setup:actpass
a=mid:video
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:4 urn:3gpp:video-orientation
a=extmap:5 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-content-type
a=extmap:8 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
a=recvonly
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:96 VP8/90000
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96
a=rtpmap:98 VP9/90000
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98
a=rtpmap:100 H264/90000
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=fmtp:100 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
a=rtpmap:102 H264/90000
a=rtcp-fb:102 goog-remb
a=rtcp-fb:102 transport-cc
a=rtcp-fb:102 ccm fir
a=rtcp-fb:102 nack
a=rtcp-fb:102 nack pli
a=fmtp:102 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:123 rtx/90000
a=fmtp:123 apt=102
a=rtpmap:127 H264/90000
a=rtcp-fb:127 goog-remb
a=rtcp-fb:127 transport-cc
a=rtcp-fb:127 ccm fir
a=rtcp-fb:127 nack
a=rtcp-fb:127 nack pli
a=fmtp:127 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=4d0032
a=rtpmap:122 rtx/90000
a=fmtp:122 apt=127
a=rtpmap:125 H264/90000
a=rtcp-fb:125 goog-remb
a=rtcp-fb:125 transport-cc
a=rtcp-fb:125 ccm fir
a=rtcp-fb:125 nack
a=rtcp-fb:125 nack pli
a=fmtp:125 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=640032
a=rtpmap:107 rtx/90000
a=fmtp:107 apt=125
a=rtpmap:108 red/90000
a=rtpmap:109 rtx/90000
a=fmtp:109 apt=108
a=rtpmap:124 ulpfec/90000
v=0
o=- 805394319 1534481289 IN IP4 0.0.0.0
s=-
t=0 0
a=fingerprint:sha-256 7F:9C:8E:82:5A:73:F6:F3:A4:B8:EB:D6:50:18:3F:13:29:11:FC:49:9D:2E:70:08:13:65:33:A5:C5:E9:45:AF
a=group:BUNDLE audio video
m=audio 9 UDP/TLS/RTP/SAVPF 111 111
c=IN IP4 0.0.0.0
a=setup:active
a=mid:audio
a=ice-ufrag:HBNNkOEZNWBHDjlW
a=ice-pwd:HBNNkOEZNWBHDjlWHhVllIhsweHsuqKh
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10;useinbandfec=1
a=ssrc:2872755213 cname:pion1
a=ssrc:2872755213 msid:pion1 pion1
a=ssrc:2872755213 mslabel:pion1
a=ssrc:2872755213 label:pion1
a=sendonly
a=candidate:udpcandidate 1 udp 2848 10.0.0.14 57219 typ host generation 0
a=candidate:udpcandidate 2 udp 29996 10.0.0.14 57219 typ host generation 0
a=candidate:udpcandidate 1 udp 15416 40.118.62.201 57220 typ srflx raddr 10.0.0.14 rport 57220 generation 0
a=candidate:udpcandidate 2 udp 29584 40.118.62.201 57220 typ srflx raddr 10.0.0.14 rport 57220 generation 0
a=end-of-candidates
m=video 9 UDP/TLS/RTP/SAVPF 96 100 98 96 100 98
c=IN IP4 0.0.0.0
a=setup:active
a=mid:video
a=ice-ufrag:HBNNkOEZNWBHDjlW
a=ice-pwd:HBNNkOEZNWBHDjlWHhVllIhsweHsuqKh
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:96 VP8/90000
a=rtpmap:100 H264/90000
a=fmtp:100 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:98 VP9/90000
a=rtpmap:96 VP8/90000
a=rtpmap:100 H264/90000
a=fmtp:100 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f
a=rtpmap:98 VP9/90000
a=ssrc:477128949 cname:pion2
a=ssrc:477128949 msid:pion2 pion2
a=ssrc:477128949 mslabel:pion2
a=ssrc:477128949 label:pion2
a=sendonly
a=candidate:udpcandidate 1 udp 2848 10.0.0.14 57219 typ host generation 0
a=candidate:udpcandidate 2 udp 29996 10.0.0.14 57219 typ host generation 0
a=candidate:udpcandidate 1 udp 15416 40.118.62.201 57220 typ srflx raddr 10.0.0.14 rport 57220 generation 0
a=candidate:udpcandidate 2 udp 29584 40.118.62.201 57220 typ srflx raddr 10.0.0.14 rport 57220 generation 0
a=end-of-candidates

40.118.62.201 is real IP for the server.

@Sean-Der
Copy link
Member

Hey @bogdan-d the request doesn't have any candidates so pion-WebRTC is going to not allow any inbound traffic.

This worked before because we just allowed all traffic (ice-lite just isn't that smart) can you update your requester to only send the SDP after onicecandidate emits a null candidate like here

@bdandy
Copy link
Author

bdandy commented Aug 17, 2018

Thanks, that's fixed the issue. One problem is that collecting ICE candidates takes long time (about 30 seconds), but that's not related to the issue

@bdandy bdandy closed this as completed Aug 17, 2018
@Sean-Der
Copy link
Member

Wohoo! Glad that fixed it.

Is it slow on Pion-WebRTC, or your browser? Can you maybe drop slow ICE candidates (pion-WebRTC doesn't support TURN so you can drop those) or maybe choose better STUN servers?

trickle-ice is on our roadmap, but not in the near future :( thanks again for using pion-WebRTC and dealing with all the breakage! I know it is frustrating, but hopefully will get better quickly.

@bdandy
Copy link
Author

bdandy commented Aug 17, 2018

Thanks a lot for your help!

It's slow in the browser, and not depend on STUN server - discovering all ICE candidates is always 39 seconds (using Chrome). You can try check it here https://webrtc.github.io/samples/src/content/peerconnection/trickle-ice/

@bdandy
Copy link
Author

bdandy commented Aug 17, 2018

Here is a trick I've found to solve that

let timeout;
    pc.onicecandidate = e => {
        console.log("New candidate", e.candidate);
        if(timeout === undefined) {
            // wait 1 second for additional ice candidates
            timeout = setTimeout(()=>{
                let xhr = new XMLHttpRequest();
                xhr.open("POST", window.location.href + "stream", true);
                xhr.setRequestHeader('Content-Type', 'text/plain');
                xhr.onload = function () {
                    if (this.status === 200) {
                        pc.setRemoteDescription(new RTCSessionDescription(
                            {
                                type: 'answer',
                                sdp: atob(this.responseText)
                            }
                        ));
                    } else {
                        alert("Failed to create new session")
                    }
                };

                xhr.onerror = function() {
                    document.getElementById("message").innerHTML = "Failed to load " + window.location.href+"stream";
                };
                xhr.send(btoa(pc.localDescription.sdp));
            }, 1000)
        }
    };

@bdandy
Copy link
Author

bdandy commented Aug 17, 2018

One more problem related to the issue: now I don't see completed event in pions-webrtc, so stream starts successfully, but then it stopped after few seconds.
On browser side I see failed state in such case.
Also sometimes stream just don't start with failed state

@bdandy bdandy reopened this Aug 17, 2018
@bdandy
Copy link
Author

bdandy commented Aug 17, 2018

It looks like deadlock, after connected state there are no other state chanes, even failed

@Sean-Der
Copy link
Member

Hey @bogdan-d I pushed a small fix to master (unrelated to this issue though I believe)

I am not able to reproduce the deadlock, I just ran the save-to-disk example for 15 minutes and seems to be doing fine :/ do you have a reproduce?

There was a deadlock in ICEAgent, but it was fixed. I don't have any idea except pprof I am here to help I really want to fix up all your issues and get back up and running.

@Sean-Der
Copy link
Member

Also @bogdan-d connected is the final state, we currently don't transition to completed (full ICE is still being worked on)

@Sean-Der
Copy link
Member

Hey @bogdan-d just checking in, anything I can do here to help?

@bdandy
Copy link
Author

bdandy commented Aug 20, 2018

goroutine 315 [chan receive, 8 minutes]:
github.com/pions/webrtc/internal/network.(*port).networkLoop(0xc00071c080)
	C:/Users/user/Documents/Go/pkg/mod/github.com/pions/webrtc@v0.0.0-20180818083340-bae8deefdc5f/internal/network/port-receive.go:138 +0x93
created by github.com/pions/webrtc/internal/network.newPort
	C:/Users/user/Documents/Go/pkg/mod/github.com/pions/webrtc@v0.0.0-20180818083340-bae8deefdc5f/internal/network/port.go:38 +0x16d

goroutine 1837 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0xc00018cc34, 0xc0000fe600)
	C:/Go/src/runtime/sema.go:71 +0x44
sync.(*Mutex).Lock(0xc00018cc30)
	C:/Go/src/sync/mutex.go:134 +0x106
sync.(*RWMutex).Lock(0xc00018cc30)
	C:/Go/src/sync/rwmutex.go:93 +0x34
github.com/pions/webrtc/internal/network.(*Manager).SendRTP(0xc00018cbb0, 0xc0000fe630)
	C:/Users/user/Documents/Go/pkg/mod/github.com/pions/webrtc@v0.0.0-20180818083340-bae8deefdc5f/internal/network/manager.go:147 +0x44
github.com/pions/webrtc.(*RTCPeerConnection).NewRTCTrack.func1(0xd95e08d7001adb6f, 0xc0001ae480, 0xc0001adb60, 0xc000378000)
	C:/Users/user/Documents/Go/pkg/mod/github.com/pions/webrtc@v0.0.0-20180818083340-bae8deefdc5f/media.go:163 +0xcf
created by github.com/pions/webrtc.(*RTCPeerConnection).NewRTCTrack
	C:/Users/user/Documents/Go/pkg/mod/github.com/pions/webrtc@v0.0.0-20180818083340-bae8deefdc5f/media.go:150 +0xe0

goroutine 318 [chan receive, 8 minutes]:
github.com/pions/webrtc/internal/network.(*port).networkLoop(0xc00071c240)
	C:/Users/user/Documents/Go/pkg/mod/github.com/pions/webrtc@v0.0.0-20180818083340-bae8deefdc5f/internal/network/port-receive.go:138 +0x93
created by github.com/pions/webrtc/internal/network.newPort
	C:/Users/user/Documents/Go/pkg/mod/github.com/pions/webrtc@v0.0.0-20180818083340-bae8deefdc5f/internal/network/port.go:38 +0x16d

goroutine 316 [chan receive, 8 minutes]:
github.com/pions/webrtc/internal/network.(*port).networkLoop(0xc00071c100)
	C:/Users/user/Documents/Go/pkg/mod/github.com/pions/webrtc@v0.0.0-20180818083340-bae8deefdc5f/internal/network/port-receive.go:138 +0x93
created by github.com/pions/webrtc/internal/network.newPort
	C:/Users/user/Documents/Go/pkg/mod/github.com/pions/webrtc@v0.0.0-20180818083340-bae8deefdc5f/internal/network/port.go:38 +0x16d

goroutine 299 [IO wait, 8 minutes]:
internal/poll.runtime_pollWait(0x5184820, 0x72, 0xcb8840)
	C:/Go/src/runtime/netpoll.go:173 +0x6d
internal/poll.(*pollDesc).wait(0xc000141248, 0x72, 0xaad400, 0x0, 0x0)
	C:/Go/src/internal/poll/fd_poll_runtime.go:85 +0xa1
internal/poll.(*ioSrv).ExecIO(0x1099af0, 0xc000141098, 0xc52820, 0x407078, 0xc000130118, 0xc0001300c0)
	C:/Go/src/internal/poll/fd_windows.go:222 +0x137
internal/poll.(*FD).ReadFrom(0xc000141080, 0xc0000d4000, 0x2000, 0x2000, 0x0, 0x0, 0x0, 0x0, 0x0)
	C:/Go/src/internal/poll/fd_windows.go:620 +0xe4
net.(*netFD).readFrom(0xc000141080, 0xc0000d4000, 0x2000, 0x2000, 0x0, 0xc000329f60, 0xc000329e40, 0x40d74a, 0x5181ee0)
	C:/Go/src/net/fd_windows.go:158 +0x62
net.(*UDPConn).readFrom(0xc00030c058, 0xc0000d4000, 0x2000, 0x2000, 0x10100c000329ea8, 0x0, 0x2000, 0xc0000d4000)
	C:/Go/src/net/udpsock_posix.go:47 +0x71
net.(*UDPConn).ReadFrom(0xc00030c058, 0xc0000d4000, 0x2000, 0x2000, 0x444d3e, 0x2000, 0xb56080, 0xc000329f01, 0xc0000d4000)
	C:/Go/src/net/udpsock.go:121 +0x74
golang.org/x/net/ipv4.(*payloadHandler).ReadFrom(0xc000190100, 0xc0000d4000, 0x2000, 0x2000, 0x2000, 0x2000, 0x5ae181, 0x1, 0xc000329fc0, 0x5ae181)
	C:/Users/user/Documents/Go/pkg/mod/golang.org/x/net@v0.0.0-20180808004115-f9ce57c11b24/ipv4/payload_nocmsg.go:19 +0x7f
github.com/pions/webrtc/internal/network.(*port).networkLoop.func1(0xc00071c100, 0xc0001ae720)
	C:/Users/user/Documents/Go/pkg/mod/github.com/pions/webrtc@v0.0.0-20180818083340-bae8deefdc5f/internal/network/port-receive.go:121 +0xb6
created by github.com/pions/webrtc/internal/network.(*port).networkLoop
	C:/Users/user/Documents/Go/pkg/mod/github.com/pions/webrtc@v0.0.0-20180818083340-bae8deefdc5f/internal/network/port-receive.go:118 +0x72

Here it looks like deadlock - semacquire, 6 minutes

@Sean-Der
Copy link
Member

Sean-Der commented Sep 5, 2018

Hey @bogdan-d sorry for the delay, just hit with so many different issues.

Do you have an easy reproduce?

I don't understand how the lock in the portsManager could be the issue. @kc5nra @trivigy @backkem any ideas? We could make that an atomic really easily, but would rather fully understand the issue.

@Sean-Der Sean-Der self-assigned this Sep 5, 2018
@Sean-Der Sean-Der added bug Something isn't working blocked labels Sep 5, 2018
@bdandy
Copy link
Author

bdandy commented Sep 5, 2018

Hey @Sean-Der, I haven't looked into it in depth yet, was on vacation. But also have no idea why it's blocking. Maybe it was related to go 1.11beta3, will try build master with new stable release

@Sean-Der
Copy link
Member

Good news @bogdan-d I think your issue was debugged+fixed in #108 sorry I wasn't able to get to the bottom of it. Lots of issues, not enough time!

Sean-Der added a commit that referenced this issue Sep 14, 2018
PortsManager + IceAgent are both protected by a mutex, and they both
have the ability to call and lock on other.

In a reproducible deadlock the IceAgent would send a message via the
ports manager, and at the same time the portsManager was attempting to
query the IceAgent for valid candidates.

This is fixed by moving the portsManager to a ReadLock. The places where
we lock don't have this interaction

Resolves #78, Resolves #108
Sean-Der added a commit that referenced this issue Sep 14, 2018
PortsManager + IceAgent are both protected by a mutex, and they both
have the ability to call and lock on other.

In a reproducible deadlock the IceAgent would send a message via the
ports manager, and at the same time the portsManager was attempting to
query the IceAgent for valid candidates.

This is fixed by moving the portsManager to a ReadLock. The places where
we lock don't have this interaction

Resolves #78, Resolves #108
Sean-Der added a commit that referenced this issue Sep 14, 2018
PortsManager + IceAgent are both protected by a mutex, and they both
have the ability to call and lock on other.

In a reproducible deadlock the IceAgent would send a message via the
ports manager, and at the same time the portsManager was attempting to
query the IceAgent for valid candidates.

This is fixed by moving the portsManager to a ReadLock. The places where
we lock don't have this interaction

Resolves #78, Resolves #108
Antonito pushed a commit to Antonito/webrtc that referenced this issue Feb 26, 2019
PortsManager + IceAgent are both protected by a mutex, and they both
have the ability to call and lock on other.

In a reproducible deadlock the IceAgent would send a message via the
ports manager, and at the same time the portsManager was attempting to
query the IceAgent for valid candidates.

This is fixed by moving the portsManager to a ReadLock. The places where
we lock don't have this interaction

Resolves pion#78, Resolves pion#108
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging a pull request may close this issue.

2 participants