Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop -> OOM in SampleBuilder edge case #1810

Closed
kcking opened this issue May 13, 2021 · 10 comments
Closed

Infinite loop -> OOM in SampleBuilder edge case #1810

kcking opened this issue May 13, 2021 · 10 comments

Comments

@kcking
Copy link

kcking commented May 13, 2021

If the s.buildSample call inside of purgeBuffers happens to find a complete packet, s.active is updated to s.consume.tail

s.active.head = consume.tail
, but then the loop condition is triggered again in purgeBuffers and s.active is reset back to s.filled, causing the same packet to be unmarhsaled over and over in a tight loop. I changed the continue to return which seems to fix this behavior for me, but there might be a more correct solution

Your environment.

  • Version: webrtc master d625f6f
  • Browser: Chrome
  • Using samplebuilder in ion-avp

What did you do?

Saw this issue in SampleBuilder used in ion-avp connected to ion-sfu, caused by a vp8 stream sent from chrome specifically when some minor packet reordering occurred.

What did you expect?

Reconstruct packets even with reordering

What happened?

Infinite loop + OOM

@Sean-Der
Copy link
Member

Nice catch @kcking ! Sorry about hitting this bug

@robin-raymond you have the bandwidth to look into this? If not I can grab?

@kcking if you are interested in contributing a unit test that fails would be a really great first contribution. Would love to work together on this. Would be nice if @robin-raymond has a quick fix, but if not this could be ours to take.

@robin-raymond
Copy link
Contributor

I will take a look.

@danj565
Copy link

danj565 commented May 17, 2021

Hi @kcking and @robin-raymond. Thank you for your contributions! I believe I ran across the same issue. The program panics on line 246 with an "out of memory" error occasionally.

data = append(data, p...)

webrtc Version: v3.0.29
Browser: Chrome

Here is the trace if you are interested:

`

  • : fatal error: runtime: out of memory
  • : runtime stack:
  • : runtime.throw(0x17f1101, 0x16)
  • : /usr/local/go/src/runtime/panic.go:1117 +0x72
  • : runtime.sysMap(0xc1bc000000, 0x4000000, 0x2613130)
  • : /usr/local/go/src/runtime/mem_linux.go:169 +0xc6
  • : runtime.(*mheap).sysAlloc(0x25f9780, 0x400000, 0x42c2b7, 0x25f9788)
  • : /usr/local/go/src/runtime/malloc.go:729 +0x1e5
  • : runtime.(*mheap).grow(0x25f9780, 0x9, 0x0)
  • : /usr/local/go/src/runtime/mheap.go:1346 +0x85
  • : runtime.(*mheap).allocSpan(0x25f9780, 0x9, 0x100, 0x7f6af2ae8d38)
  • : /usr/local/go/src/runtime/mheap.go:1173 +0x609
  • : runtime.(*mheap).alloc.func1()
  • : /usr/local/go/src/runtime/mheap.go:910 +0x59
  • : runtime.systemstack(0x0)
  • : /usr/local/go/src/runtime/asm_amd64.s:379 +0x66
  • : runtime.mstart()
  • : /usr/local/go/src/runtime/proc.go:1246
  • : goroutine 1334506 [running]:
  • : runtime.systemstack_switch()
  • : /usr/local/go/src/runtime/asm_amd64.s:339 fp=0xc00f37b6a0 sp=0xc00f37b698 pc=0x46f860
  • : runtime.(*mheap).alloc(0x25f9780, 0x9, 0x7f6af2d00001, 0x7f6af2ae8d38)
  • : /usr/local/go/src/runtime/mheap.go:904 +0x85 fp=0xc00f37b6f0 sp=0xc00f37b6a0 pc=0x427f65
  • : runtime.(*mcache).allocLarge(0x7f6b2a424108, 0x12000, 0x100, 0x7f6af2ae8d38)
  • : /usr/local/go/src/runtime/mcache.go:224 +0x97 fp=0xc00f37b748 sp=0xc00f37b6f0 pc=0x4185d7
  • : runtime.mallocgc(0x12000, 0x0, 0x9d00, 0xc1bbff0000)
  • : /usr/local/go/src/runtime/malloc.go:1078 +0x925 fp=0xc00f37b7d0 sp=0xc00f37b748 pc=0x40e165
  • : runtime.growslice(0x151db20, 0xc1bbff0000, 0xddd0, 0xe000, 0xe26f, 0x49f, 0x5a4, 0x0)
  • : /usr/local/go/src/runtime/slice.go:224 +0x154 fp=0xc00f37b838 sp=0xc00f37b7d0 pc=0x4507f4
  • : github.com/pion/webrtc/v3/pkg/media/samplebuilder.(*SampleBuilder).buildSample(0xc01b880000, 0x681a6701, 0xc0e4f95700)
  • : /Users/dj/go/pkg/mod/github.com/pion/webrtc/v3@v3.0.29/pkg/media/samplebuilder/samplebuilder.go:246 +0x30b fp=0xc00f37b8d8 sp=0xc00f37b838 pc=0
  • : github.com/pion/webrtc/v3/pkg/media/samplebuilder.(*SampleBuilder).purgeBuffers(0xc01b880000)
  • : /Users/dj/go/pkg/mod/github.com/pion/webrtc/v3@v3.0.29/pkg/media/samplebuilder/samplebuilder.go:135 +0x106 fp=0xc00f37b900 sp=0xc00f37b8d8 pc=0
  • : github.com/pion/webrtc/v3/pkg/media/samplebuilder.(*SampleBuilder).Push(0xc01b880000, 0xc015bd63f0)
  • : /Users/dj/go/pkg/mod/github.com/pion/webrtc/v3@v3.0.29/pkg/media/samplebuilder/samplebuilder.go:167 +0xa5 fp=0xc00f37b920 sp=0xc00f37b900 pc=0x
  • : my_package/pion.(*WebmSaver).PushVP8(0xc00e8df4d0, 0xc015bd63f0)
  • : /Users/dj/go/src/my_package/pion/saver.go:136 +0x58 fp=0xc00f37b9a0 sp=0xc00f37b920 pc=0x1272958
  • : my_package/pion.(*StreamService).runMain(0xc0007bf830, 0xc01ce3b830, 0xc018e34c00, 0xc01533b200, 0x4, 0x4, 0x0, 0x0, 0x0, 0x0, ...)
  • : /Users/dj/go/src/my_package/pion/stream.go:728 +0x166f fp=0xc00f37bf48 sp=0xc00f37b9a0 pc=0x1278eaf
  • : runtime.goexit()
    `

@robin-raymond
Copy link
Contributor

Ok, I understand what's happening. Active becomes "empty" after consuming the packets, but instead of the "filled" getting purged by "purgeConsumedBuffers", it doesn't because active is now empty and empty buffers cannot cause any purging. So the solution is to purge the "consumed" area first, then "active". That will resolve the issue. I'm going to link a PR to give this a test if you like since you've been hitting this issue.

@robin-raymond
Copy link
Contributor

@danj565 You might want to check if your issue is related or not. I'm uncertain how it would cause memory overflow, unless the infinite loop causes a backlog in the builder which indirectly causes it.

@danj565
Copy link

danj565 commented May 26, 2021

@robin-raymond Thanks again for your contribution. I will replicate and report back if I hit the OOM issue again.

@danj565
Copy link

danj565 commented Jun 22, 2021

I can confirm the samplebuilder fix works in production! Thanks again @robin-raymond .. @Sean-Der I noticed this was not apart of any release. Let me know what I can do to help. Thank you for the great library!

@Sean-Der
Copy link
Member

Hey @danj565 thanks for using it :)

Mind if I add you to the Pion organization? You can push a new minor tag whenever you like git tag v3.0.30 && git push --tags

The only restriction we have is that pushing to master requires PR + one review. We really need more contributors/more maintainers so your help would be highly appreciated :)

@danj565
Copy link

danj565 commented Jun 24, 2021

@Sean-Der sorry for the delayed response. I would love to contribute. Please add me to the organization.

@Sean-Der
Copy link
Member

@danj565 done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants