Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocess: Protect proc->ret with a semaphore #4545

Merged
merged 17 commits into from
Jun 19, 2024
Merged

Conversation

kazarmy
Copy link
Member

@kazarmy kazarmy commented Jun 9, 2024

Your checklist for this pull request

  • I've read the guidelines for contributing to this repository
  • I made sure to follow the project's coding style
  • I've documented or updated the documentation of every function and struct this PR changes. If not so I've explained why.
  • I've added tests that prove my fix is effective or that my feature works (if possible)
  • I've updated the rizin book with the relevant information (if needed)

Detailed description

This pr fixes the following proc->ret data race, as mentioned on Mattermost:

WARNING: ThreadSanitizer: data race (pid=18332)
  Read of size 4 at 0x7b2800014fb8 by thread T3:
    #0 rz_subprocess_ret ../librz/util/subprocess.c:1551 (librz_util.so.0.8+0xa4d10)
    #1 rz_test_run_asm_test ../binrz/rz-test/run.c:355 (rz-test+0xda11)
    #2 rz_test_run_test ../binrz/rz-test/run.c:511 (rz-test+0xe34e)
    #3 worker_th ../binrz/rz-test/rz-test.c:686 (rz-test+0x5796)
    #4 thread_main_function ../librz/util/thread.c:21 (librz_util.so.0.8+0xaeb37)

  Previous write of size 4 at 0x7b2800014fb8 by thread T1 (mutexes: write M0):
    #0 sigchld_th ../librz/util/subprocess.c:807 (librz_util.so.0.8+0xa2467)
    #1 thread_main_function ../librz/util/thread.c:21 (librz_util.so.0.8+0xaeb37)

  Location is heap block of size 152 at 0x7b2800014fa0 allocated by thread T3:
    #0 calloc ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:672 (libtsan.so.0+0x31edc)
    #1 rz_subprocess_start_opt ../librz/util/subprocess.c:1047 (librz_util.so.0.8+0xa37bc)
    #2 rz_subprocess_start ../librz/util/subprocess.c:1614 (librz_util.so.0.8+0xa50e4)
    #3 rz_test_run_asm_test ../binrz/rz-test/run.c:349 (rz-test+0xd9e8)
    #4 rz_test_run_test ../binrz/rz-test/run.c:511 (rz-test+0xe34e)
    #5 worker_th ../binrz/rz-test/rz-test.c:686 (rz-test+0x5796)
    #6 thread_main_function ../librz/util/thread.c:21 (librz_util.so.0.8+0xaeb37)

  Mutex M0 (0x7b0c00000060) created at:
    #0 pthread_mutex_init ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1227 (libtsan.so.0+0x4bee1)
    #1 rz_th_lock_new ../librz/util/thread_lock.c:28 (librz_util.so.0.8+0xb1f87)
    #2 rz_subprocess_init ../librz/util/subprocess.c:819 (librz_util.so.0.8+0xa2df9)
    #3 rz_test_main ../binrz/rz-test/rz-test.c:357 (rz-test+0x7885)
    #4 main ../binrz/rz-test/rz-test.c:1406 (rz-test+0x52d5)

  Thread T3 (tid=18338, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x605b8)
    #1 rz_th_new ../librz/util/thread.c:211 (librz_util.so.0.8+0xaee4c)
    #2 rz_test_main ../binrz/rz-test/rz-test.c:536 (rz-test+0x8482)
    #3 main ../binrz/rz-test/rz-test.c:1406 (rz-test+0x52d5)

  Thread T1 (tid=18334, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:969 (libtsan.so.0+0x605b8)
    #1 rz_th_new ../librz/util/thread.c:211 (librz_util.so.0.8+0xaee4c)
    #2 rz_subprocess_init ../librz/util/subprocess.c:828 (librz_util.so.0.8+0xa2e3b)
    #3 rz_test_main ../binrz/rz-test/rz-test.c:357 (rz-test+0x7885)
    #4 main ../binrz/rz-test/rz-test.c:1406 (rz-test+0x52d5)

SUMMARY: ThreadSanitizer: data race ../librz/util/subprocess.c:1551 in rz_subprocess_ret

(The above was obtained using gcc TSan on Ubuntu 22.04 -- this build is broken regarding subprocesses in places not related to this pr, such as in !)

Hopefully, this fixes the problem with macos asm tests as well.

Test plan

All builds are green.

Closing issues

...

@kazarmy kazarmy marked this pull request as ready for review June 9, 2024 06:39
librz/util/subprocess.c Outdated Show resolved Hide resolved
librz/util/subprocess.c Outdated Show resolved Hide resolved
librz/util/subprocess.c Outdated Show resolved Hide resolved
@@ -41,6 +41,7 @@ struct rz_subprocess_t {
HANDLE stderr_read;
HANDLE proc;
int ret;
RzThreadSemaphore *ret_sem;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure this is the correct way. I would expect this to be not thread safe and have a semaphore/lock where this is tossed into a threaded code.
Also this change should be done in rz_sys_fork etc.. Where we use a mutex but it should be a semaphore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to be not thread safe and have a semaphore/lock where this is tossed into a threaded code.

With d2ad89d, there is a semaphore whether or not the calling code is multithreaded.

Also this change should be done in rz_sys_fork etc.. Where we use a mutex but it should be a semaphore

The mutex in rz_sys_fork seems to be for a purpose unrelated to the data race in the description. There might be a race there too though.

@kazarmy kazarmy changed the title subprocess: Protect proc->ret with a semaphore if needed subprocess: Protect proc->ret with a semaphore Jun 9, 2024
Copy link
Member

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from me, but please wait for @wargio first, since he is more familiar with this code.

@kazarmy
Copy link
Member Author

kazarmy commented Jun 9, 2024

298605f 98f728c tested via rz-test -t 1 db/archos/linux-x64/dbg_gdbserver_rebase ... 298605f caused the data race warning to reappear

@XVilka
Copy link
Member

XVilka commented Jun 9, 2024

Doesn't seem to help the assembly test on macOS:

[XX] db/asm/java <asm> newarray int
-- <asm> newarray int <--> bc0a
-- assembly

@kazarmy
Copy link
Member Author

kazarmy commented Jun 9, 2024

Just noticed this killpipe pairing:

rz_xwrite(proc->killpipe[1], &r, 1);

rizin/librz/util/subprocess.c

Lines 1328 to 1331 in a3ffd6e

if (FD_ISSET(proc->killpipe[0], &rfds)) {
timedout = false;
child_dead = true;
}

so there does appear to be read/write synchronization on proc->ret despite the protestations of TSan, so closing this.

@kazarmy kazarmy closed this Jun 9, 2024
@kazarmy kazarmy reopened this Jun 14, 2024
Comment on lines 116 to 118
do {
ret = sem_wait(sem->sem);
} while (ret == -1 && errno == EINTR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe do while(sem_wait(sem->sem) < 0 && errno == EINTR);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok 5b7f8ee

@kazarmy
Copy link
Member Author

kazarmy commented Jun 14, 2024

[XX] db/asm/tricore <asm> add.f d1, d7, d12
-- <asm> add.f d1, d7, d12 <--- 6b0c2117 ---> <IL>
-- IL
--- expected
+++ actual

@kazarmy kazarmy marked this pull request as draft June 14, 2024 14:45
@github-actions github-actions bot removed the RzCore label Jun 18, 2024
@kazarmy kazarmy marked this pull request as ready for review June 18, 2024 11:47
@kazarmy
Copy link
Member Author

kazarmy commented Jun 18, 2024

I think that's it for now ... 85089d0 is inelegant but the CI failure of 7b8ed9e suggests that there's something wrong with the original way of doing things (i think). Even if this pr doesn't fix the macos asm test problem, it does have 2 things going for it:

  1. The warning in the pr description goes away.
  2. Isn't a semaphore more lightweight than a pipe?

@wargio
Copy link
Member

wargio commented Jun 19, 2024

I think that's it for now ... 85089d0 is inelegant but the CI failure of 7b8ed9e suggests that there's something wrong with the original way of doing things (i think). Even if this pr doesn't fix the macos asm test problem, it does have 2 things going for it:

1. The warning in the [pr description](https://github.com/rizinorg/rizin/pull/4545#issue-2342038472) goes away.

2. Isn't a semaphore more lightweight than a pipe?

i strongly think the main issue is how we fork and execute. the whole thing in sys.c is very nasty

@kazarmy
Copy link
Member Author

kazarmy commented Jun 19, 2024

i did consider whether pipes were being closed a bit too soon, but that's a quest for another day i suppose

@kazarmy kazarmy merged commit 3a33626 into dev Jun 19, 2024
47 checks passed
@kazarmy kazarmy deleted the subprocess-ret_sem branch June 19, 2024 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants