Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong semantics in omelasticsearch createMsgFromRequest() #3573

Closed
snaix opened this issue Mar 28, 2019 · 7 comments

Comments

Projects
None yet
4 participants
@snaix
Copy link

commented Mar 28, 2019

Expected behavior

As createMsgFromRequest() comment said in plugins/omelasticsearch.c :

request string looks like this:
"{"create":{"_index": "rsyslog_testbench","_type":"test-type",
"_id":"FAEAFC0D17C847DA8BD6F47BC5B3800A"}}\n
{"msgnum":"x00000000","viaq_msg_id":"FAEAFC0D17C847DA8BD6F47BC5B3800A"}\n"
we don't want the meta header, only the data part
start = first \n + 1
end = last \n

An expected behavior is rsyslog can get rawmsg as data part.

Actual behavior

Only message format that contains "message" filed can be extracted from request, but not for custom message format.
If there is no message field, all request text will be filled for rawmsg.
It's confusing when use omelasticsearch retry. user can not get a correct rawmsg from rsyslog, and could lead a data lost because a wrong json message(contain meta header) would be sent to ES.

Steps to reproduce the behavior

Environment

The latest code shows that this problem still exists.

@davidelang

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

@snaix

This comment has been minimized.

Copy link
Author

commented Mar 28, 2019

can you please show a message written with the RSYSLOG_DebugFormat that has the problem you are describing? I am not understanding the issue. David Lang

For instance:
I have a ruleset A for a normal omelasticsearch action, and another ruleset B for retryRuleset when omelasticsearch failed for 429 status .
Obviously, the original, failed message from ruleset A is needed in ruleset B .
Current, ruleset B extract the original message from elasticsearch reqeuset like:

"{"create":{"_index": "rsyslog_testbench","_type":"test-type",
"_id":"FAEAFC0D17C847DA8BD6F47BC5B3800A"}}\n
{"msgnum":"x00000000","viaq_msg_id":"FAEAFC0D17C847DA8BD6F47BC5B3800A"}\n"

the request have two part separated by "\n"
The first part is meta header, the second part is data part.

The original message is the data part, we needed:

{"msgnum":"x00000000","viaq_msg_id":"FAEAFC0D17C847DA8BD6F47BC5B3800A"}\n"

But only data part contains "message" field can be extracted, or omelasticsearch will keep the whole request in rawmsg.
so we get a rawmsg like this:

"{"create":{"_index": "rsyslog_testbench","_type":"test-type",
"_id":"FAEAFC0D17C847DA8BD6F47BC5B3800A"}}\n
{"msgnum":"x00000000","viaq_msg_id":"FAEAFC0D17C847DA8BD6F47BC5B3800A"}\n"

but not:

{"msgnum":"x00000000","viaq_msg_id":"FAEAFC0D17C847DA8BD6F47BC5B3800A"}\n"

@rgerhards

This comment has been minimized.

Copy link
Member

commented Mar 28, 2019

I admit I do not fully understand the question. @richm I guess you can shed some light?

@richm

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

The problem is here:

	if (fjson_object_object_get_ex(jo_request, "message", &jo_msg)) {
		const char *rawmsg = json_object_get_string(jo_msg);
		const size_t msgLen = (size_t)json_object_get_string_len(jo_msg);
		MsgSetRawMsg(*msg, rawmsg, msgLen);
	} else {
		MsgSetRawMsg(*msg, request, strlen(request));
	}

If the original request has a field "message" we use the value of that field as the $rawmsg value for the record. If it did not have "message", we just use the entire request - the metadata + the data. I'm not sure, but I'm guessing the presence of the extra newline \n in the rawmsg causes problems when the data is resent to Elasticsearch? Are you sending the rawmsg field to Elasticsearch?

@snaix is asking that instead, the rawmsg should be only the data part, not the metadata, so something like this:

	if (fjson_object_object_get_ex(jo_request, "message", &jo_msg)) {
		const char *rawmsg = json_object_get_string(jo_msg);
		const size_t msgLen = (size_t)json_object_get_string_len(jo_msg);
		MsgSetRawMsg(*msg, rawmsg, msgLen);
	} else {
		MsgSetRawMsg(*msg, datastart, datalen);
	}

richm added a commit to richm/rsyslog that referenced this issue Mar 29, 2019

omelasticsearch - retry: set rawmsg to data from original request
rsyslog#3573
Previously, when constructing the message to submit for a retry
for an original request, if the original request did not contain
the field `message`, the system property `rawmsg` was set to
the entire metadata + data from the original request.  This was
causing problems with Elasticsearch.  This patch changes
the code so that the `rawmsg` will be set to only the data part
of the original request if there is no `message` field.
@richm

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

@snaix if you are able to run from source, please try #3577

@rgerhards rgerhards added this to the v8.1904 milestone Apr 8, 2019

@rgerhards

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

closed via #3577

@rgerhards rgerhards closed this Apr 8, 2019

@snaix

This comment has been minimized.

Copy link
Author

commented Apr 10, 2019

@richm Yes, that's what I want to say. I want to get the data part but not entire raw message (meta + data), because I can't parse it.
Thank you for your excellent work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.