Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded Compute Segfaults in CI on GL/Linux #4285

Closed
Tracked by #3678
cwfitzgerald opened this issue Oct 23, 2023 · 6 comments · Fixed by #5129
Closed
Tracked by #3678

Multithreaded Compute Segfaults in CI on GL/Linux #4285

cwfitzgerald opened this issue Oct 23, 2023 · 6 comments · Fixed by #5129
Labels
api: gles Issues with GLES or WebGL area: infrastructure Testing, building, coordinating issues type: bug Something isn't working

Comments

@cwfitzgerald
Copy link
Member

Randomly, but consistently within the run, multithreaded compute segfaults in CI. This doesn't happen when I run it locally against llvmpipe. This doesn't happen when running it in CI on windows against llvmpipe.

I suspect this might be an issue with EGL, but I have no idea how to reproduce it.

ci-log.txt

@cwfitzgerald cwfitzgerald added type: bug Something isn't working area: infrastructure Testing, building, coordinating issues api: gles Issues with GLES or WebGL labels Oct 23, 2023
@jimblandy
Copy link
Member

jimblandy commented Jan 10, 2024

Running as:

$ LD_PRELOAD=/usr/lib64/libc_malloc_debug.so.0 MALLOC_CHECK_=1 jb with-vulkan --lavapipe cargo nextest run -p wgpu-test --test wgpu-test map_offset

I get the following interesting message:

free(): invalid pointer

This seems to be here:

#1  DestroyThreadState (threadState=0x7ffff2715b30) at /usr/src/debug/libglvnd-1.6.0-2.fc38.x86_64/src/EGL/libeglcurrent.c:141

Edit: jb with-vulkan CMD is a personal utility of mine that runs CMD with environment variables set that point it at my local Mesa build. I don't think it should be necessary, so you can just omit those two words from the command.

@jimblandy jimblandy self-assigned this Jan 10, 2024
@jimblandy
Copy link
Member

Note that this issue affects many wgpu tests, not just the compute:

     Summary [   3.389s] 76 tests run: 52 passed, 24 failed, 0 skipped
     SIGABRT [   1.689s] wgpu-test::wgpu-test [Executed Failure: ALWAYS] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::device::cross_device_bind_group_usage
     SIGABRT [   1.682s] wgpu-test::wgpu-test [Executed Failure: ALWAYS] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::pipeline::pipeline_default_layout_bad_module
     SIGABRT [   1.845s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::bind_group_layout_dedup::bind_group_layout_deduplication
     SIGABRT [   1.894s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::buffer::map_offset
     SIGABRT [   1.118s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::clear_texture::clear_texture_depth
     SIGABRT [   1.271s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::clear_texture::clear_texture_uncompressed
     SIGABRT [   1.686s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::device::device_destroy_then_more
     SIGABRT [   1.064s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::device::request_device_error_message_native
     SIGABRT [   1.690s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::encoder::drop_encoder_after_error
     SIGABRT [   1.727s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::instance::initialize
     SIGABRT [   1.605s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::nv12_texture::nv12_texture_bad_format_view_plane
     SIGABRT [   1.840s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::regression::issue_3349::multi_stage_data_binding
     SIGABRT [   1.879s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::regression::issue_3457::pass_reset_vertex_buffer
     SIGABRT [   1.830s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::regression::issue_4024::queue_submitted_callback_ordering
     SIGABRT [   1.659s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::resource_descriptor_accessor::buffer_size_and_usage
     SIGABRT [   1.778s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::resource_error::bad_buffer
     SIGABRT [   1.865s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::scissor_tests::scissor_test_empty_rect_with_offset
     SIGABRT [   1.878s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::shader_primitive_index::draw
     SIGABRT [   1.827s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::shader_primitive_index::draw_indexed
     SIGABRT [   0.584s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::write_texture::write_texture_subset_2d
     SIGABRT [   0.538s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::write_texture::write_texture_subset_3d
     SIGABRT [   0.526s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::zero_init_texture_after_discard::discarding_color_target_resets_texture_init_state_check_visible_on_copy_after_submit
     SIGABRT [   0.412s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::zero_init_texture_after_discard::discarding_color_target_resets_texture_init_state_check_visible_on_copy_in_same_encoder
     SIGABRT [   0.556s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon Pro WX 3200 Series (RADV POLARIS12)/0] wgpu_test::zero_init_texture_after_discard::discarding_depth_target_resets_texture_init_state_check_visible_on_copy_in_same_encoder
error: test run failed

@jimblandy

This comment was marked as off-topic.

@jimblandy

This comment was marked as off-topic.

@jimblandy
Copy link
Member

Anyway, that's probably entirely different from the original issue, since it's all Vulkan. Filed separate issue #5084.

@jimblandy jimblandy removed their assignment Jan 18, 2024
@cwfitzgerald
Copy link
Member Author

It's not impossible these are the same problem, vulkan is still fully initialized on the GL tests

jimblandy added a commit to jimblandy/wgpu that referenced this issue Jan 24, 2024
Join all threads before returning from the test case, to ensure that
we don't return from `main` until all open `Device`s have been
dropped.

This avoids a race condition in glibc in which a thread calling
`dlclose` can unmap a shared library's code even while the main thread
is still running its finalization functions. (See gfx-rs#5084 for details.)
Joining all threads before returning from the test ensures that the
Vulkan loader has finished `dlclose`-ing the Vulkan validation layer
shared library before `main` returns.

Remove `skip` for this test on GL/llvmpipe. With this change, that has
not been observed to crash. Without it, the test crashes within ten
runs or so.

Fixes gfx-rs#5084.
Fixed gfx-rs#4285.
jimblandy added a commit to jimblandy/wgpu that referenced this issue Jan 24, 2024
Join all threads before returning from the test case, to ensure that
we don't return from `main` until all open `Device`s have been
dropped.

This avoids a race condition in glibc in which a thread calling
`dlclose` can unmap a shared library's code even while the main thread
is still running its finalization functions. (See gfx-rs#5084 for details.)
Joining all threads before returning from the test ensures that the
Vulkan loader has finished `dlclose`-ing the Vulkan validation layer
shared library before `main` returns.

Remove `skip` for this test on GL/llvmpipe. With this change, that has
not been observed to crash. Without it, the test crashes within ten
runs or so.

Fixes gfx-rs#5084.
Fixed gfx-rs#4285.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: gles Issues with GLES or WebGL area: infrastructure Testing, building, coordinating issues type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants