Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC_realloc / GC_register_finalizer_ignore_self abort due to mmap(PROT_NONE) #334

Closed
bcardiff opened this issue Oct 7, 2020 · 5 comments
Closed

Comments

@bcardiff
Copy link
Contributor

bcardiff commented Oct 7, 2020

Environment: Arch, 5.8.13-arch1-1, x86_64

We are experiencing some crashes on malloc and register_finalizer_ignore_self.

The GC_register_finalizer_ignore_self trace we got is without debug information.

Crash on GC_register_finalizer_ignore_self

Initiating full world-stop collection!

124010496 bytes in heap blacklisted for interior pointers

--> Marking for collection #30 after 240657568 allocated bytes

Mark stack overflow; current size = 4096 entries
... stripped multiple Mark stack overflow; current size = 4096 entries ...
Mark stack overflow; current size = 4096 entries

Pushed 8 thread stacks
Pushed 8 thread stacks
Starting marking for mark phase number 29
Starting mark helper 0
Starting mark helper 1
Starting mark helper 2
Starting mark helper 3
Finished mark helper 1
Finished mark helper 3
Finished mark helper 2
Finished mark helper 0
Finished marking for mark phase number 29

GC #30 freed 219632432 bytes, heap 2751364 KiB (+ 2958460 KiB unmapped)

World-stopped marking took 510 msecs (332 in average)

Bytes recovered before sweep - f.l. count = -3577248

In-use heap: 31% (192855 KiB pointers + 676400 KiB other)

Immediately reclaimed 192710496 bytes, heapsize: 5846859776 bytes (3029463040 unmapped)

mmap(PROT_NONE) failed

Invalid memory access (signal 11) at address 0x0

[0x55e48b8d9cd0] print_backtrace at /usr/lib/crystal/exception/call_stack.cr:121:5
[0x55e48ac0c823] __crystal_sigfault_handler at /usr/lib/crystal/signal.cr:348:3
[0x7f1fdfac10f0] ???
[0x7f1fdf87f925] abort +473
[0x7f1fdfae1970] ???
[0x7f1fdfae1a49] ???
[0x7f1fdfae445a] ???
[0x7f1fdfae9aed] ???
[0x7f1fdfaed704] ???
[0x7f1fdfaeddea] ???
[0x55e48af5c615] add_finalizer_impl at /usr/lib/crystal/gc/boehm.cr:163:5

When the GC is compiled with -g3 we reached the following trace.

Crash on GC_realloc

Initiating full world-stop collection!

67641344 bytes in heap blacklisted for interior pointers

--> Marking for collection #29 after 171774736 allocated bytes

Mark stack overflow; current size = 4096 entries
... stripped multiple Mark stack overflow; current size = 4096 entries ...
Mark stack overflow; current size = 4096 entries
Pushed 8 thread stacks
Pushed 8 thread stacks
Starting marking for mark phase number 28
Starting mark helper 0
Starting mark helper 1
Starting mark helper 2
Starting mark helper 3
Finished mark helper 3
Finished mark helper 2
Finished mark helper 1
Finished mark helper 0
Finished marking for mark phase number 28

GC #29 freed 128550240 bytes, heap 2035076 KiB (+ 2257532 KiB unmapped)

World-stopped marking took 542 msecs (325 in average)

Bytes recovered before sweep - f.l. count = -4970016

In-use heap: 27% (136287 KiB pointers + 423040 KiB other)

Immediately reclaimed 79725536 bytes, heapsize: 4395630592 bytes (2311712768 unmapped)

mmap(PROT_NONE) failed

Invalid memory access (signal 11) at address 0x0

[0x55fec67d8620] print_backtrace at /usr/lib/crystal/exception/call_stack.cr:121:5
[0x55fec5afb753] __crystal_sigfault_handler at /usr/lib/crystal/signal.cr:348:3
[0x7fc5475c10f0] ???
[0x7fc54737f925] abort +473
[0x7fc5475f7830] GC_unmap +157
[0x7fc5475e3405] GC_unmap_old +153
[0x7fc5475e5c74] GC_finish_collection +1020
[0x7fc5475e4f15] GC_try_to_collect_inner +423
[0x7fc5475e66ff] GC_collect_or_expand +250
[0x7fc5475eb94e] GC_alloc_large +365
[0x7fc5475ec12c] GC_generic_malloc +384
[0x7fc5475ec3a9] GC_malloc_kind_global +361
[0x7fc5475f8acb] GC_malloc_kind +230
[0x7fc5475ecaf8] GC_generic_or_special_malloc +59
[0x7fc5475ecd01] GC_realloc +481
[0x55fec5b5cffd] realloc at /usr/lib/crystal/gc/boehm.cr:120:5

Both of them have in common the mmap(PROT_NONE) failed part.

What other information should I provide to diagnose these crashes further? Is there a workaround to try out?

I don't have a reproducible short example to reproduce them so far.

@ivmai
Copy link
Owner

ivmai commented Oct 9, 2020

Is it reproducible with bdwgc master?

This could be related to issue #324 .
Is it reproducible if you increase vm.max_map_count?
@wangp has created the patch to avoid hitting the limit, it is in review by me now, but you can try it anyway.

@bararchy
Copy link

@ivmai We've been tackling this issue for a few days and I'm testing your suggestions, sadly I can't quickly verify a simple sysctl change as we are running on docker and it's quite tricky to play with those settings there.

Do you know when we should have a release with this in? can we close a pre-release (just to get a tar) for master and this patch inside so we can test if it works?

Thanks! :)

@ivmai
Copy link
Owner

ivmai commented Oct 12, 2020

can we close a pre-release (just to get a tar) for master and this patch inside so we can test if it works?

Please use snapshot of https://github.com/wangp/bdwgc/tree/unmap-limit

@bararchy
Copy link

@ivmai 🎉 seems like it's working, no more crashes wohoo!

@ivmai
Copy link
Owner

ivmai commented Jun 9, 2021

Duplicates #324 (solved)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants