Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snmptrap -> redis JSON encoding bug #8

Open
jordansissel opened this issue May 18, 2015 · 4 comments
Open

snmptrap -> redis JSON encoding bug #8

jordansissel opened this issue May 18, 2015 · 4 comments

Comments

@jordansissel
Copy link
Contributor

(This issue was originally filed by @nvx at elastic/logstash#1636)


I have a simple logstash configuration that reads snmptraps and outputs them to redis which is json encoded.
Some messages fail to make it to redis with the logstash log reporting "Failed to convert event to JSON. Invalid UTF-8, maybe?". Looking at the code this appears to originate from within the redis output.

input {
  snmptrap {
    type => "snmptrap"
    codec => plain {
      charset => "BINARY"
    }
  }
}
output {
  redis {
    host => "127.0.0.1"
    data_type => "list"
    key => "logstash-raw"
  }
}

I have tried without specifying a codec, as well as explicitly setting the charset to BINARY. The SNMP traps do contain some non-ASCII characters (binary representations of MAC addresses and IP addresses) but they appear to be properly escaped with \xHH style notation in the output log.
The only difference I can spot between messages that make it to redis, compared ones that fail is the MAC address field. An example of a failing message has this field in the message part (Note it appears to be doubly escaped as this is from the error log which itself appears to be json encoded as well):

@value=\"\\xFC\\xF8\\xAE<.\\x18\"

And the same value again as parsed by the MIBs:

"MERU-WLAN-MIB::mwlApMacAddr" =  > "\xFC\xF8\xAE<.\x18",

Other MAC addresses that start with FC:F8:AE work, so I can only assume it is the latter half (3C:2E:18) that is breaking the encoding.

@GregMefford
Copy link

I'm hitting this bug as well and was trying to think about the "right" way to fix it. The problem is that some of the SNMP values are going to contain arbitrary binary strings. As far as I can tell, there's not a way to convert arbitrary binary strings to UTF-8 in a reversible way.

I can see where @jordansissel is saying that adding codec support would help with the encoding, but I'm not convinced it would help for arbitrary binary strings as it would not guarantee that you can still recover the same binary on the other side of Redis, for example.

So I'm thinking the best solution is to change the behavior of the snmptrap input so that binary fields are converted to hex strings or base64 strings or something (e.g. using String#unpack and Array#pack), so that they can be manipulated more easily without causing problems.

Obviously, this would break compatibility with anyone using the binary strings directly, but would alleviate the problem where people expect it to 'just work' and it currently doesn't in some cases.

Does this seem like a terrible idea?
(mentioning @nvx and @simmel from the old thread to bring them back into the discussion here)

@simmel
Copy link

simmel commented Mar 17, 2016

On Wed, 2016-03-16 at 13:55:28 -0700, Greg Mefford wrote:

I'm hitting this bug as well and was trying to think about the "right" way to fix it. The problem is that some of the SNMP values are going to contain arbitrary binary strings. As far as I can tell, there's not a way to convert arbitrary binary strings to UTF-8 in a reversible way.

Possible to store them in two fields? One UTF-8 'replace' encoded and
one binary?

I can see where @jordansissel is saying that adding codec support would help with the encoding, but I'm not convinced it would help for arbitrary binary strings as it would not guarantee that you can still recover the same binary on the other side of Redis, for example.

IIRC Redis can handle binaries? I'm not sure though.

So I'm thinking the best solution is to change the behavior of the snmptrap input so that binary fields are converted to hex strings or base64 strings or something (e.g. using String#unpack and Array#pack), so that they can be manipulated more easily without causing problems.

How would you manipulate that from logstash without having to retort to
the ruby filter to unpack and convert them?

@nvx
Copy link

nvx commented Mar 17, 2016

I guess the other question is how to properly represent this data once it's in ElasticSearch anyway? Perhaps the real solution is to convert early (like in the input plugin) from binary MACs to say HEX or whatnot, possibly as a configurable option?

@GregMefford
Copy link

Yeah, I think the only generic solution is to convert early in the input
plugin, because the MIB only specifies that it's a binary field, with prose
to describe how to decode the bytes in the binary. It's not just about MAC
addresses and it will vary from one trap to the next, so I don't think an
automatic solution is possible.

As far as how to process it downstream, it seems like a ruby filter is the
only option today, but I can imagine a binary-manipulation filter similar
to mutate as a possibility.

On Thursday, March 17, 2016, NV notifications@github.com wrote:

I guess the other question is how to properly represent this data once
it's in ElasticSearch anyway? Perhaps the real solution is to convert early
(like in the input plugin) from binary MACs to say HEX or whatnot, possibly
as a configurable option?


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#8 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants