Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot assign boolean, null or numeric types for JSON properties or local variables which encumbers JSON template output #2827

Closed
JPvRiel opened this issue Jul 9, 2018 · 14 comments · Fixed by #3663
Assignees
Milestone

Comments

@JPvRiel
Copy link

JPvRiel commented Jul 9, 2018

I might be missing something fundamental, but it seems cumbersome or complicated to output JSON formats with non-string data types for JSON fields? After reading documentation, it's unclear how JSON boolean, null or numeric types can be coerced (other than via very explicit / verbose 'hand crafted' templates):

  • JSON message property values can be set via $!, but only strings can be directly assigned (not numbers, boolean or null)?
  • RainerScript is a typeless language
  • RainerScript ABNF
  • I tried a hacky trick of assigning an expression (which works partly), but the underlying C representation has false represented as 0 and true represented as 1, and this is produced in the JSON object instead of false and true.

Expected behavior

I wish it were possible to assign values in $!var as boolean or null types and, via JSON template options, output JSON data types such as true, false, null. Instead of having to build static templates, it would be ideal to output a dynamically built JSON structure, e.g. %jsonmesg%, or the subtree %$!:::jsonr%. Often, some messages will have extra JSON fields extracted that other messages don't, so this dynamic behaviour is desirable. However, local variables or JSON property assignment appears to be limited to literal/quoted strings. E.g. I want to, but can't do the following (even when null is a valid JSON type):

set $!rfc5424-sd = null;

When attempting set a JSON property as true, false or null, a syntax error on token ';' results because rsyslog rainerscript syntax expects all JSON property or local variable assignments to be quoted literal strings. As a result, this implies that rainerscript doesn't allow directly values that are not of type string?

E.g. template output with JSON output options won't produce "$!": { "rfc5424-sd": null }

Actual behavior

Only the literal quoted string is accepted without error:

set $!rfc5424-sd = "null";

E.g. template output with JSON options produces "$!": { "rfc5424-sd": "null" }

Steps to reproduce the behavior

Given example to illustrate (/etc/rsyslog.d/test.conf):

module(load="mmpstrucdata")

input(
  type="imuxsock"
  Socket="/var/lib/rsyslog/test"
  ruleset="test"
)

template(
  name="TmplRSyslogJSON"
  type="string"
  string="%jsonmesg%\n"
)
template(
  name="TmplJSON"
  type="string"
  string="{%msg:::jsonf%, %rawmsg:::jsonf%, %timereported:::jsonf%, %hostname:::jsonf%, %syslogtag:::jsonf%, %inputname:::jsonf%, %fromhost:::jsonf%, %fromhost-ip:::jsonf%, %pri:::jsonf%, %syslogfacility:::jsonf%, %syslogseverity:::jsonf%, %timegenerated:::jsonf%, %programname:::jsonf%, %protocol-version:::jsonf%, %structured-data:::jsonf%, %app-name:::jsonf%, %procid:::jsonf%, %msgid:::jsonf%, %uuid:::jsonf%, \"$!\": { \"rfc5424-sd\": %$!rfc5424-sd%, \"bool_true\": $!bool_true, \"bool_false\": $!bool_false, \"bool_false_coerced\": $!bool_true_coerced } }\n"
)
template(name="set_true" type="list") {
  constant(value="true")
}

ruleset(name="test") {
  # Parse RFC5424 structured elements into JSON
  action(type="mmpstrucdata")
  # Set proper JSON null representation if there was no structured data
  if ($structured-data == "-") then {
    set $!rfc5424-sd = "null";
  }
	
  set $!bool_true = exec_template("set_true");
  set $!bool_false = "false";
  set $!bool_true_coerced = (1 == 1);
  action(
    type="omfile"
    template="TmplRSyslogJSON"
    file="/tmp/rsyslog.json"
  )
  action(
    type="omfile"
    template="TmplJSON"
    file="/tmp/coerced.json"
  )
}

Test log message

logger -u /var/lib/rsyslog/test -t test "test JSON types"

Compare the output below and not JSON types null, true or false are not possible without heavy handed static template definitions. TmplJSON is much more work to maintain compared to TmplRSyslogJSON.

/tmp/rsyslog.json is the output of internal rsyslog JSON representation:

{ "msg": " test JSON types", "rawmsg": "<5>Jul  9 14:37:19 test: test JSON types", "timereported": "2018-07-09T14:37:19.050931+00:00", "hostname": "centos7", "syslogtag": "test:", "inputname": "imuxsock", "fromhost": "centos7", "fromhost-ip": "127.0.0.1", "pri": "5", "syslogfacility": "0", "syslogseverity": "5", "timegenerated": "2018-07-09T14:37:19.050931+00:00", "programname": "test", "protocol-version": "0", "structured-data": "-", "app-name": "test", "procid": "-", "msgid": "-", "uuid": null, "$!": { "rfc5424-sd": "null", "bool_true": "true", "bool_false": "false", "bool_true_coerced": 1 } }

/tmp/coerced.json is the output of the fully defined template:

{"msg":" test JSON types", "rawmsg":"<5>Jul  9 14:37:19 test: test JSON types", "timereported":"Jul  9 14:37:19", "hostname":"centos7", "syslogtag":"test:", "inputname":"imuxsock", "fromhost":"centos7", "fromhost-ip":"127.0.0.1", "pri":"5", "syslogfacility":"0", "syslogseverity":"5", "timegenerated":"Jul  9 14:37:19", "programname":"test", "protocol-version":"0", "structured-data":"-", "app-name":"test", "procid":"-", "msgid":"-", "uuid":"B441575B4FBC4382BBBD2692AFCE9893", "$!": { "rfc5424-sd": null, "bool_true": true, "bool_false": false, "bool_true_coerced": 1 } }

Also, no clue why one message had UUID output but not the other (incidental, not my concern in reporting this issue/request to enhance supporting JSON data types).

Environment

  • rsyslog version: 8.36.0
  • platform: CentOS 7.5
@JPvRiel
Copy link
Author

JPvRiel commented Jul 9, 2018

By the way, a major motivation for this is because of how elastic does not like differing types for fields. E.g. if the structured data element doesn't parse due to - in the RFC5424 field, then the JSON null type for $!rfc5424-sd is more technically correct than the single character string of - or empty string. Elasticsearch data mappings will either expect a string or an object (nested) but it will cause a mapping conflict to produce both types for a given JSON 'field' within the same index.

I also like including metadata in the JSON object about if the source sending the message had a valid syslog header or not, e.g. setting $!syslog-meta.header-valid to true or false, to help exclude data that might not be as trustworthy...

@davidelang
Copy link
Contributor

davidelang commented Jul 9, 2018 via email

@JPvRiel
Copy link
Author

JPvRiel commented Jul 9, 2018

Fair comment that this is a corner case. Unsetting a var/JSON subtree is a good suggestion as alternative to null. But then, I can't reference that vaule in a template (unless I also conditionally apply diffrent templates). The result is that, say for 4 fields that may or may not exist, one then can use at at least 4 templates (more accounting for possible combinations). Your suggestion is a good option if applied to carefully mange and prune $! and use that to have dynamic output of JSON fields (without needing to write too many templates).

Nonetheless, there's still the case of booleans. E.g. I like adding a Boolean to indicate if the message 1) had valid priority, 2) appeared to have a valid syslog header, 3) was secured over TLS with client auth, etc. (Provinance, e.g. how much can this log event/source be trusted).

It's feasible to simply use "true"/"false" strings in such cases, just not as neat/correct compared to using actual JSON types. Argument might also apply to numbers.

This might be a reason to use logstash in front of elastic, as it can fix/set types, or take care with elasticsearch field mappings to coerce type when it indexes, but my preference is to try get the data format correct/neat in rsyslog when possible.

If rsyslog had more flexibility with types and setting properties, then logstash, etc isn't needed as much.

If after a while, no one else cares for this flexibility, I'll happily close the issue (understandable that it's not good to adapt a product to fringe requirements).

@rgerhards
Copy link
Member

rgerhards commented Jul 9, 2018 via email

@snaix
Copy link

snaix commented Jul 16, 2018

There was another question,
Can I know a variable is null?
for example:
if $!foo == null then unset $!foo;

but I had tested it, rsyslog show me a syntax error.

@JPvRiel
Copy link
Author

JPvRiel commented Jul 16, 2018

Can I know a variable is null?

In my limited testing, I think I recall that it'll likely evaluate to whatever the underlying C code representation is, so understanding that, try this:

if ($!foo == 0) then { unset $!foo; }

However, currently (v8.36) rsyslog rainerscript configuration syntax does not permit directly assigning or using analogous C/C++ keywords like null, true or false. As far as I can guess, only string literals (for assignment, since assignments always have to be quoted) and string literals or numbers (in expressions) are permitted by the config/syntax parsing. Maybe an assignment to represent/fudge a setting for null or false could be achieved via set $!foo = cnum(0); even if you cannot do set $!foo = 0;, but it will still be seen as a number and the output template will still treat this as a number, not a boolean or null value.

Also note, in C/C++, on a raw implementation level, null is indistinguishable from false. false == 0 and true == 1. For a conditional statement, anything that evaluates to a non-zero value is also considered true.

JPvRiel added a commit to JPvRiel/docker-rsyslog that referenced this issue Jul 24, 2018
See:
- rsyslog/rsyslog#2827
- rsyslog/rsyslog#2873

$!syslog-relay!* is replaced with $.syslog-relay!* local vars since $! 
output does not honour formatting in templates or the format_date 
function
@dch
Copy link

dch commented Feb 13, 2019

I could create a new issue if needed, but this seems appropriate already.

There is no way to output a number field directly, without wrapping quotes:

For example:

property(outname="timestamp" name="timereported" dateFormat="unixtimestamp" format="jsonf")

produces "timestamp": "123456789" which is then not parsed as a timestamp by tools like graylog's GELF.

Obviously the much more error-prone solution below does work, but then the option.jsonf can't be used via template(name="JSON" type="list" option.jsonf="on") any more:

constant(value="\",\"timestamp\":")
property(name="timegenerated" dateformat="unixtimestamp")
constant(value=",\...

This would be a very useful change.

@JPvRiel
Copy link
Author

JPvRiel commented Feb 20, 2019

@dch yip, I've too resorted to horribly complicated custom hand crafted manual JSON formatting to get rsyslog to produce JSON number and null types (it's indeed error prone).

In our case, the only other option was something like logstash or an elasticsearch indexing template to be setup to coerce/convert the field from a string to a number type. Not sure if GELF has that kind of option?

@rgerhards
Copy link
Member

I thought I had someting implemented last year. Let me check. Else ACK that it would be "good to have" ;-). That said, my TODO unfortunately is long...

@rgerhards
Copy link
Member

mhhh, quick look doesn't reveal anything. I thought we had a (list) template option for "output datatype", but I find no trace.

@rgerhards rgerhards self-assigned this Feb 20, 2019
@rgerhards rgerhards modified the milestones: v8.1903, v8.1904 Feb 20, 2019
@davidelang
Copy link
Contributor

davidelang commented Feb 20, 2019 via email

@rgerhards rgerhards modified the milestones: v8.1904, v8.1905 Apr 15, 2019
rgerhards added a commit to rgerhards/rsyslog that referenced this issue May 14, 2019
The new "datatype" template option permits to generate non-string
data rather easily. It works together with jsonf formatting, which
is what people should use nowadays.

closes rsyslog#2827
rgerhards added a commit to rgerhards/rsyslog that referenced this issue May 14, 2019
The new "datatype" template option permits to generate non-string
data rather easily. It works together with jsonf formatting, which
is what people should use nowadays.

closes rsyslog#2827
rgerhards added a commit to rgerhards/rsyslog that referenced this issue May 15, 2019
The new "datatype" and "onEmpty"  template options permits to
generate non-string data rather easily. It works together with
jsonf formatting, which is what people should use nowadays.

closes rsyslog#2827
rgerhards added a commit to rgerhards/rsyslog that referenced this issue May 16, 2019
The new "datatype" and "onEmpty"  template options permits to
generate non-string data rather easily. It works together with
jsonf formatting, which is what people should use nowadays.

closes rsyslog#2827
@dch
Copy link

dch commented May 22, 2019

*** thank-you *** looking forwards to applying this diff already!

@andrewwade
Copy link

I don't think this should be closed. There is still no way to assign the json variables under $! to be number formatted. They all are strings.

@lock
Copy link

lock bot commented Dec 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Dec 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants