-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck on select() syscall #12
Comments
pgreplay doesn't block on I'd assume that the problem is not that the system call gets stuck or takes too long, but that there is no response from the PostgreSQL server. pgreplay just keeps running Perhaps you are suffering from the following problem that is mentioned in the “limitations” section of the README:
This can easily happen if many concurrent sessions change the same data concurrently, as can often happen in an artificial load test: Session 1, time x: Not pgreplay will faithfully replay the statements in that order and send the second statement 0.002 milliseconds after the first, then it waits for the first statement to return before it can send the third statement (because they use the same database session). Now it can happen that the second statement gets executed first, since they get sent almost at the same time. Then session 2 has the row locked, and the update of session 1 is blocked. So pgreplay will wait in vain for the first statement to finish, and it won't send the fourth statement (which would remove the lock and allow processing to proceed) unless it has first sent the third statement. In that situation, pgreplay has deadlocked with the database and is stuck. It has no way to detect that – it cannot tell the difference if the first statement is blocked or just takes a long time to complete. On the other hand, it cannot arbitrarily reorder the statements it sends, because they might depend on each other. Here is a link to a thread on the (retired) mailing list that had that very problem. You could try the SQL statement mentioned there:
If that gets replay unstuck, you are hitting this problem. Unfortunately there is not much else that can be done about this, except to use a less artificial work load that does not modify the same table rows over and over in concurrent sessions... |
Thanks for your (long) explanation! As there is nothing that can be done, I close this issue. |
Hello,
I tried to replay log after a pgbench test. pgreplay had successfully parsed log files
But when I replay, it seems stuck on do_select call:
Here a perf report of specific pid:
with gdb it seems stuck on
pgreplay/database.c
Line 146 in c9e93ea
I tried first with master and manually compiled version with archlinux and kernel 4.20.6-arch1-1-ARCH
Then I tried with debian stretch inside an lxc container (so, same kernel as above).
Maybe there is something wrong with this syscall under recent kernel? I read in this post that
select
perform worse thanepoll
: https://jvns.ca/blog/2017/06/03/async-io-on-linux--select--poll--and-epoll/Note : my goal is to sample SQL workload with pg_sampletolog and replay it with pgreplay by adjusting speedfactor.
Thanks
The text was updated successfully, but these errors were encountered: