Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pg_qualstats causing coredump and db crash #3

Closed
bbotev01 opened this issue Apr 4, 2016 · 7 comments
Closed

pg_qualstats causing coredump and db crash #3

bbotev01 opened this issue Apr 4, 2016 · 7 comments

Comments

@bbotev01
Copy link

bbotev01 commented Apr 4, 2016

I have been using pg_qualstats version 0.0.7 for several months without issues, but since we upgraded to Postgres 9.5.0 in late January one of our particularly load intensive databases has experienced 2 crashes related to pg_qualstats. Here is what the postgres log looks like:

2016-03-31 10:09:50.216 EDT [18427]: [10-1] user=,db=,host=,sessionid=56bf3b24.47fb LOG: server process (PID 13930) was terminated by signal 11: Segmentation fault
2016-03-31 10:09:50.216 EDT [18427]: [11-1] user=,db=,host=,sessionid=56bf3b24.47fb DETAIL: Failed process was running: COMMIT
2016-03-31 10:09:50.216 EDT [18427]: [12-1] user=,db=,host=,sessionid=56bf3b24.47fb LOG: terminating any other active server processes

And here is the corresponding message in the var log messages:
kernel: postgres[13930]: segfault at 8 ip 00007f9fb41ca664 sp 00007ffe696f7b70 error 4 in pg_qualstats.so[7f9fb41c5000+7000]

The issue occurs even when the extension is not created in any of the databases. The above crash occurred just having the pg_qualstats library loaded with shared_preload_libraries. The crash is infrequent and I am not able to reproduce the condition causing the crash. I hope this is helpful in figuring out the issue causing this behavior.
Thanks,
Boyan Botev

@rjuju
Copy link
Member

rjuju commented Apr 7, 2016

Ouch :(

The fact that the extension isn't created in any database should not have any impact, it only add SQL access functions to see the content of pg_qualstats shared memory.

Is it a linux x86 64 bits server (which distribution), and do you have a core dump by any chance? Could you install the postgresql and pg_qualstats debug information package, that'd help to get more useful logs, and know where the crash happened in pg_qualstats.

What's the result of addr2line -e postgres 00007f9fb41ca664 ?

What kind of load is there on the database (read and/write queries, kind of WHERE or JOIN clauses, datatype, operators or any relevant detail).

@bbotev01
Copy link
Author

bbotev01 commented Apr 7, 2016

It's 64bit RHEL. In particular 2.6.32-573.12.1.el6.x86_64. Unfortunately
this issue only occurred on the production server. It happened 5 times in
total. I had to take the extension out completely yesterday by removing it
from the shared_preload_libraries. I'll install the debug package on a
lower environment, but don't want to risk more crashes in the prod
environment for the moment. I'll do my best to try to reproduce it in
another place. I noticed that the process which caused the crash was
running a nasty SQL select statement with 1750 bind variables all inside a
where clause. It seems to be coming from an ORM tool because I don't think
a person could write something like this. I also don't have the coredumps
because the dumps could not fit in the location where they were being
dumped.

The result of
$ addr2line -e postgres 00007f9fb41ca664
??:0

On Thu, Apr 7, 2016 at 4:34 AM, Julien Rouhaud notifications@github.com
wrote:

Ouch :(

The fact that the extension isn't created in any database should not have
any impact, it only add SQL access functions to see the content of
pg_qualstats shared memory.

Is it a linux x86 64 bits server (which distribution), and do you have a
core dump by any chance? Could you install the postgresql and pg_qualstats
debug information package, that'd help to get more useful logs, and know
where the crash happened in pg_qualstats.

What's the result of addr2line -e postgres 00007f9fb41ca664 ?

What kind of load is there on the database (read and/write queries, kind
of WHERE or JOIN clauses, datatype, operators or any relevant detail).


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
https://github.com/dalibo/pg_qualstats/issues/3#issuecomment-206759961

@rjuju
Copy link
Member

rjuju commented Apr 12, 2016

Yes, it's better to avoid crashing production server again.

Could you send the query and definition of involved tables? If you need privacy, you can send it to powa(AT)dalibo.com. If you can't share them, can you specify the datatypes and operators used, and if they are ANDed or Ored? Also if it was a SELECT or other statement kind.

@rjuju
Copy link
Member

rjuju commented May 31, 2016

Hello,

As I couldn't reproduce this issue with the queries you provided off-list, I tried sqlsmith to find issues, and found this one: https://github.com/dalibo/pg_qualstats/issues/5

Do you also use this kind of JOIN on expression?

@bbotev01
Copy link
Author

bbotev01 commented Jun 1, 2016

You may be onto something here. The query preceding each crash has a view
with a bunch of subselects and expression joins in it. It is very possible
this is the same issue.

On Tue, May 31, 2016 at 4:07 PM, Julien Rouhaud notifications@github.com
wrote:

Hello,

As I couldn't reproduce this issue with the queries you provided off-list,
I tried sqlsmith to find issues, and found this one: #5
https://github.com/dalibo/pg_qualstats/issues/5

Do you also use this kind of JOIN on expression?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/dalibo/pg_qualstats/issues/3#issuecomment-222804254,
or mute the thread
https://github.com/notifications/unsubscribe/AP70HfYc73kSO84SAj0evAtZDIDtY4loks5qHJTwgaJpZM4H_VGA
.

@rdunklau
Copy link
Collaborator

rdunklau commented Aug 8, 2016

Could you try against the latest release (1.0.0) and let us know if it fixes it ? Thanks.

@rjuju
Copy link
Member

rjuju commented Jun 29, 2017

Hello @bbotev01, did you have the opportunity to test a newer release (1.0.2 is highly recomended).

For reference, your problem should have been fixed by dalibo@e0ddf70.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants