New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsyslog (8.29.0) segfaults when using mmexternal plugin #1822

Closed
larsks opened this Issue Oct 7, 2017 · 17 comments

Comments

Projects
None yet
2 participants
@larsks

larsks commented Oct 7, 2017

I'm running rsyslog 8.24.0 on CentOS 7 (so, rsyslog-8.24.0-12.el7.x86_64). I'm trying to write a filtering plugin in Python. Starting with the example code, I modified onReceive so that it looks like:

def onReceive(msg):
        data = json.loads(msg)
        print json.dumps({"$!": {"sometag": "somevalue"}})

That causes a segfault as soon as any message comes through.

Responses that don't attempt to modify $! don't cause a segfault, but the updated keys do not end up in $!all-json.

@larsks

This comment has been minimized.

Show comment
Hide comment
@larsks

larsks Oct 7, 2017

This also happens with 8.29.0 from http://rpms.adiscon.com/v8-stable.

larsks commented Oct 7, 2017

This also happens with 8.29.0 from http://rpms.adiscon.com/v8-stable.

@rgerhards

This comment has been minimized.

Show comment
Hide comment
@rgerhards

rgerhards Oct 7, 2017

Member

Can you run rsyslog under valgrind and post the result?

Member

rgerhards commented Oct 7, 2017

Can you run rsyslog under valgrind and post the result?

@larsks

This comment has been minimized.

Show comment
Hide comment
@larsks

larsks Oct 7, 2017

The results from running valgrind rsyslogd -n are here. Let me know if you'd like me to paste them into this issue instead.

A side effect of running under valgrind is that the filter works correctly (the output of $!all-json includes ..., "sometag": "somevalue", ...).

larsks commented Oct 7, 2017

The results from running valgrind rsyslogd -n are here. Let me know if you'd like me to paste them into this issue instead.

A side effect of running under valgrind is that the filter works correctly (the output of $!all-json includes ..., "sometag": "somevalue", ...).

@larsks

This comment has been minimized.

Show comment
Hide comment
@larsks

larsks Oct 7, 2017

I left a comment here a moment ago that you may see in your email, but it was lie. Even with libfastjson-0.99.6 I still see the segfaults.

larsks commented Oct 7, 2017

I left a comment here a moment ago that you may see in your email, but it was lie. Even with libfastjson-0.99.6 I still see the segfaults.

@larsks

This comment has been minimized.

Show comment
Hide comment
@larsks

larsks Oct 7, 2017

A gdb backtrace, if that's of any interest:

Program received signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  0x00007fffe4000078 in ?? ()
#1  0x00007ffff718d6b9 in fjson_object_put (jso=0x7fffe4003ae0) at json_object.c:201
#2  0x00007ffff718dd08 in fjson_object_object_delete (jso=0x7fffe4003550) at json_object.c:404
#3  0x00007ffff718d6b9 in fjson_object_put (jso=0x7fffe4003550) at json_object.c:201
#4  0x00005555555943cc in MsgSetPropsViaJSON_Object (pMsg=pMsg@entry=0x7fffec0048b0, json=json@entry=0x7fffe4003550) at msg.c:4563
#5  0x000055555559467b in MsgSetPropsViaJSON (pMsg=pMsg@entry=0x7fffec0048b0, jsonstr=<optimized out>) at msg.c:4537
#6  0x00007ffff4aed75d in processProgramReply (pMsg=<optimized out>, pWrkrData=<optimized out>) at mmexternal.c:257
#7  callExtProg (pWrkrData=pWrkrData@entry=0x7fffe4002440, pMsg=pMsg@entry=0x7fffec0048b0) at mmexternal.c:503

larsks commented Oct 7, 2017

A gdb backtrace, if that's of any interest:

Program received signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  0x00007fffe4000078 in ?? ()
#1  0x00007ffff718d6b9 in fjson_object_put (jso=0x7fffe4003ae0) at json_object.c:201
#2  0x00007ffff718dd08 in fjson_object_object_delete (jso=0x7fffe4003550) at json_object.c:404
#3  0x00007ffff718d6b9 in fjson_object_put (jso=0x7fffe4003550) at json_object.c:201
#4  0x00005555555943cc in MsgSetPropsViaJSON_Object (pMsg=pMsg@entry=0x7fffec0048b0, json=json@entry=0x7fffe4003550) at msg.c:4563
#5  0x000055555559467b in MsgSetPropsViaJSON (pMsg=pMsg@entry=0x7fffec0048b0, jsonstr=<optimized out>) at msg.c:4537
#6  0x00007ffff4aed75d in processProgramReply (pMsg=<optimized out>, pWrkrData=<optimized out>) at mmexternal.c:257
#7  callExtProg (pWrkrData=pWrkrData@entry=0x7fffe4002440, pMsg=pMsg@entry=0x7fffec0048b0) at mmexternal.c:503

@larsks

This comment has been minimized.

Show comment
Hide comment
@larsks

larsks Oct 7, 2017

Stumbling around a bit, but it looks like something is either not being initialized properly or there is some sort of use-after-free going on. fjson_object_object_delete is being called with pg != NULL, pg->children[0].v != NULL, but values in .v seem uninitialized (e.g., .v->o_type == 134290912).

larsks commented Oct 7, 2017

Stumbling around a bit, but it looks like something is either not being initialized properly or there is some sort of use-after-free going on. fjson_object_object_delete is being called with pg != NULL, pg->children[0].v != NULL, but values in .v seem uninitialized (e.g., .v->o_type == 134290912).

@larsks

This comment has been minimized.

Show comment
Hide comment
@larsks

larsks Oct 10, 2017

The corruption seems to happen after jsonMerge calls json_object_put(json), here. And indeed, if I comment out that line, everything works fine, although in theory I am now leaking memory something fierce.

@rgerhards, let me know if you have any thoughts on this. I'm not at all familiar with either rsyslog or libfastjson, so I'm not sure what's going on.

larsks commented Oct 10, 2017

The corruption seems to happen after jsonMerge calls json_object_put(json), here. And indeed, if I comment out that line, everything works fine, although in theory I am now leaking memory something fierce.

@rgerhards, let me know if you have any thoughts on this. I'm not at all familiar with either rsyslog or libfastjson, so I'm not sure what's going on.

@larsks

This comment has been minimized.

Show comment
Hide comment
@larsks

larsks Oct 10, 2017

I wonder:

MsgSetPropsViaJSON_object calls MsgSetPropsViaJSON which calls MsgAddJSON which calls jsonMerge which calls fjson_object_put...but after percolating back up the stack, MsgSetPropsViaJSON_object also calls fjson_object_put on an object that is an ancestor of the JSON object seen by jsonMerge. Is this the source of the problem? It looks like this behavior was reported to the mailing list by @bdolez last december, but it doesn't look as if there was any response to that message.

larsks commented Oct 10, 2017

I wonder:

MsgSetPropsViaJSON_object calls MsgSetPropsViaJSON which calls MsgAddJSON which calls jsonMerge which calls fjson_object_put...but after percolating back up the stack, MsgSetPropsViaJSON_object also calls fjson_object_put on an object that is an ancestor of the JSON object seen by jsonMerge. Is this the source of the problem? It looks like this behavior was reported to the mailing list by @bdolez last december, but it doesn't look as if there was any response to that message.

larsks added a commit to larsks/rsyslog that referenced this issue Oct 10, 2017

do not call fjson_object_put from jsonMerge
MsgSetPropsViaJSON_Object will call fjson_object_put on an ancestor of
the tree seen by jsonMerge.

Closes rsyslog#1822
@larsks

This comment has been minimized.

Show comment
Hide comment
@larsks

larsks Oct 10, 2017

If anybody finds themselves here via Google, I've put together a package that includes #1837 at https://copr.fedorainfracloud.org/coprs/larsks/rsyslog/. This is probably not the correct fix for the problem, but it will let you successfully experiment with external message modification plugins.

larsks commented Oct 10, 2017

If anybody finds themselves here via Google, I've put together a package that includes #1837 at https://copr.fedorainfracloud.org/coprs/larsks/rsyslog/. This is probably not the correct fix for the problem, but it will let you successfully experiment with external message modification plugins.

@larsks

This comment has been minimized.

Show comment
Hide comment
@larsks

larsks Oct 12, 2017

@rgerhards Can we tag this for the v8.31 milestone? The mmexternal module isn't functional without the fix.

larsks commented Oct 12, 2017

@rgerhards Can we tag this for the v8.31 milestone? The mmexternal module isn't functional without the fix.

@larsks larsks changed the title from rsyslog (8.24.0) segfaults when using mmexternal plugin to rsyslog (8.29.0) segfaults when using mmexternal plugin Oct 12, 2017

@rgerhards rgerhards added this to the v8.31 milestone Oct 13, 2017

larsks added a commit to larsks/rsyslog that referenced this issue Oct 16, 2017

do not call fjson_object_put from jsonMerge
MsgSetPropsViaJSON_Object will call fjson_object_put on an ancestor of
the tree seen by jsonMerge.

Closes rsyslog#1822

larsks added a commit to larsks/rsyslog that referenced this issue Oct 17, 2017

do not call fjson_object_put from jsonMerge
MsgSetPropsViaJSON_Object will call fjson_object_put on an ancestor of
the tree seen by jsonMerge.

Closes rsyslog#1822

PascalWithopf added a commit to PascalWithopf/rsyslog that referenced this issue Nov 6, 2017

mmexternal: test for Segmentation Fault
test for the Segmentation Fault that appears upon using mmexternal as described in rsyslog#1822
@rgerhards

This comment has been minimized.

Show comment
Hide comment
@rgerhards

rgerhards Nov 6, 2017

Member

@PascalWithopf has created a testbench test that reproduces this: PascalWithopf@1744dde. As it turns out, the problem seems to be even simpler and not be bound to using the variable root ($!). It seems to reproduce even with the unmodified sample script. That said, we seem to have a regression, what might be useful to know hunting down the best fix.

Member

rgerhards commented Nov 6, 2017

@PascalWithopf has created a testbench test that reproduces this: PascalWithopf@1744dde. As it turns out, the problem seems to be even simpler and not be bound to using the variable root ($!). It seems to reproduce even with the unmodified sample script. That said, we seem to have a regression, what might be useful to know hunting down the best fix.

PascalWithopf added a commit to PascalWithopf/rsyslog that referenced this issue Nov 6, 2017

mmexternal: test for Segmentation Fault
test for the Segmentation Fault that appears upon using mmexternal as described in rsyslog#1822
@rgerhards

This comment has been minimized.

Show comment
Hide comment
@rgerhards

rgerhards Nov 6, 2017

Member

Things begin to become clearer now.

First of all, there was a config error (typo) in the initial test @PascalWithopf created, and that error also lead to a segfault (so we can create a new issue) - good to have caught this as a side effect.

I now also know why it is hard to reproduce the issue: the non-standard config option interface.input="fulljson" is required in mmnormalize configuration! Otherwise it kind of works (the script gets into trouble. It's detectable if you take sufficient attention, though... ;-)

Anyhow, I think we are now on a good path forward - and have also found a couple of things very useful!

Member

rgerhards commented Nov 6, 2017

Things begin to become clearer now.

First of all, there was a config error (typo) in the initial test @PascalWithopf created, and that error also lead to a segfault (so we can create a new issue) - good to have caught this as a side effect.

I now also know why it is hard to reproduce the issue: the non-standard config option interface.input="fulljson" is required in mmnormalize configuration! Otherwise it kind of works (the script gets into trouble. It's detectable if you take sufficient attention, though... ;-)

Anyhow, I think we are now on a good path forward - and have also found a couple of things very useful!

PascalWithopf added a commit to PascalWithopf/rsyslog that referenced this issue Nov 10, 2017

mmexternal: test for Segmentation Fault
test for the Segmentation Fault that appears upon using mmexternal as described in rsyslog#1822
@rgerhards

This comment has been minimized.

Show comment
Hide comment
@rgerhards

rgerhards Nov 10, 2017

Member

We actually have two different error pathes. If no message variables at all exist, json root is null, which triggers a different code path than when at least one message var exists. Both result in different error locations. In the former case, jsonMerge() is not called.

From testing:

Path 1: no msg var set
==5481==    by 0x451A86: msgAddJSON (msg.c:4796)                 "FREE"
... (optimized-out ... even though -O0, but seen in debug log)
==5665==    by 0x4512D9: MsgSetPropsViaJSON_Object (msg.c:4600)  FREE
==5665==    by 0x4511D0: MsgSetPropsViaJSON (msg.c:4572)
==5665==    by 0x70A0543: processProgramReply (mmexternal.c:257)
==5665==    by 0x709FD2D: callExtProg (mmexternal.c:503)

PATH 2: msg var set
==5481==    by 0x451EAE: jsonMerge (msg.c:4738)			FREE
==5481==    by 0x451A86: msgAddJSON (msg.c:4799)
==5481==    by 0x4517B2: msgSetPropViaJSON (msg.c:4517)
==5481==    by 0x4512BF: MsgSetPropsViaJSON_Object (msg.c:4596)  FREE
==5481==    by 0x4511D0: MsgSetPropsViaJSON (msg.c:4572)
==5481==    by 0x70A0543: processProgramReply (mmexternal.c:257)
==5481==    by 0x709FD2D: callExtProg (mmexternal.c:503)
==5481==    by 0x709EAFC: doAction (mmexternal.c:526)
Member

rgerhards commented Nov 10, 2017

We actually have two different error pathes. If no message variables at all exist, json root is null, which triggers a different code path than when at least one message var exists. Both result in different error locations. In the former case, jsonMerge() is not called.

From testing:

Path 1: no msg var set
==5481==    by 0x451A86: msgAddJSON (msg.c:4796)                 "FREE"
... (optimized-out ... even though -O0, but seen in debug log)
==5665==    by 0x4512D9: MsgSetPropsViaJSON_Object (msg.c:4600)  FREE
==5665==    by 0x4511D0: MsgSetPropsViaJSON (msg.c:4572)
==5665==    by 0x70A0543: processProgramReply (mmexternal.c:257)
==5665==    by 0x709FD2D: callExtProg (mmexternal.c:503)

PATH 2: msg var set
==5481==    by 0x451EAE: jsonMerge (msg.c:4738)			FREE
==5481==    by 0x451A86: msgAddJSON (msg.c:4799)
==5481==    by 0x4517B2: msgSetPropViaJSON (msg.c:4517)
==5481==    by 0x4512BF: MsgSetPropsViaJSON_Object (msg.c:4596)  FREE
==5481==    by 0x4511D0: MsgSetPropsViaJSON (msg.c:4572)
==5481==    by 0x70A0543: processProgramReply (mmexternal.c:257)
==5481==    by 0x709FD2D: callExtProg (mmexternal.c:503)
==5481==    by 0x709EAFC: doAction (mmexternal.c:526)

rgerhards added a commit to rgerhards/rsyslog that referenced this issue Nov 10, 2017

mmexternal: test for Segmentation Fault
test for the Segmentation Fault that appears upon using mmexternal as described in rsyslog#1822
@rgerhards

This comment has been minimized.

Show comment
Hide comment
@rgerhards

rgerhards Nov 10, 2017

Member

The issue is indeed pretty complex. It is related to the inability of libfastjson/json-c to put (free) a subordinate object. If so, the upper layer object, if deconstructed, tries to access the pointer to the already free'd subordinate. This code path is used inside msg.c when json strings are processed. It looks like we need to take the subordinate, which contains the actual content from the tree root, which contains the field to be updated.

Member

rgerhards commented Nov 10, 2017

The issue is indeed pretty complex. It is related to the inability of libfastjson/json-c to put (free) a subordinate object. If so, the upper layer object, if deconstructed, tries to access the pointer to the already free'd subordinate. This code path is used inside msg.c when json strings are processed. It looks like we need to take the subordinate, which contains the actual content from the tree root, which contains the field to be updated.

@rgerhards

This comment has been minimized.

Show comment
Hide comment
@rgerhards

rgerhards Nov 10, 2017

Member

I should have more carefully looked at the backtrace @larsks posted: it actually points right into that spot of libfastjson:

#2  0x00007ffff718dd08 in fjson_object_object_delete (jso=0x7fffe4003550) at json_object.c:404

I was soo sure that it was an object free error on the top level object...

Member

rgerhards commented Nov 10, 2017

I should have more carefully looked at the backtrace @larsks posted: it actually points right into that spot of libfastjson:

#2  0x00007ffff718dd08 in fjson_object_object_delete (jso=0x7fffe4003550) at json_object.c:404

I was soo sure that it was an object free error on the top level object...

rgerhards added a commit to rgerhards/rsyslog that referenced this issue Nov 10, 2017

core/msg: fix segfault in MsgSetPropsViaJSON
which most importantly was triggerred by mmexternal (but could
potentially be triggered by other uses as well)

closes rsyslog#1822

rgerhards added a commit to rgerhards/rsyslog that referenced this issue Nov 10, 2017

testbench: add test for segfault in MsgSetPropsViaJSON
This was fixed in previous commit.

See also rsyslog#1822

rgerhards added a commit to rgerhards/rsyslog that referenced this issue Nov 10, 2017

core/msg: fix segfault in MsgSetPropsViaJSON
which most importantly was triggerred by mmexternal (but could
potentially be triggered by other uses as well)

closes rsyslog#1822

rgerhards added a commit to rgerhards/rsyslog that referenced this issue Nov 10, 2017

testbench: add test for segfault in MsgSetPropsViaJSON
This was fixed in previous commit.

See also rsyslog#1822

rgerhards added a commit to rgerhards/rsyslog that referenced this issue Nov 10, 2017

testbench: add tests for segfault in MsgSetPropsViaJSON
Test for the root problem fixed in previous commit.

See also rsyslog#1822

rgerhards added a commit to rgerhards/rsyslog that referenced this issue Nov 10, 2017

testbench: add tests for segfault in MsgSetPropsViaJSON
Test for the root problem fixed in previous commit.

See also rsyslog#1822

rgerhards added a commit to rgerhards/rsyslog that referenced this issue Nov 11, 2017

testbench: add tests for segfault in MsgSetPropsViaJSON
Test for the root problem fixed in previous commit.

See also rsyslog#1822
@larsks

This comment has been minimized.

Show comment
Hide comment
@larsks

larsks commented Nov 11, 2017

Thanks!

@rgerhards

This comment has been minimized.

Show comment
Hide comment
@rgerhards

rgerhards Nov 11, 2017

Member

It was really weired -- I have also created an issue in libfastjson to better support detecting these types of errors. They are very unintitive. @larsks Thanks for the initial good bug report and follow on work! Would be great if you could give master also a try to double check all is well ;-)

Member

rgerhards commented Nov 11, 2017

It was really weired -- I have also created an issue in libfastjson to better support detecting these types of errors. They are very unintitive. @larsks Thanks for the initial good bug report and follow on work! Would be great if you could give master also a try to double check all is well ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment