Parsing error - value including a space followed by a token with a dot in it #54

MattCarothers · 2018-07-13T13:42:47Z

This line:

CEF:0|TheVendor|TheProduct|1|1|TheName|1| query_string=key1\=value1&key2\=value3 aa.bc&key3\=value4

Should result in this field:

          "query_string" => "key1=value1&key2=value3 aa.bc&key3=value4"

But instead it results in these two fields:

          "aa.bc&key3\\" => "value4",
          "query_string" => "key1=value1&key2=value3",

The issue is this regex on line 174 of cef.rb:

 message = message.gsub((/(\s+(\w+\.[^\s]\w+[^\|\s\.\=]+\=))/),'|^^^\2')

It breaks for the case where a value has a space in it followed by a token with a dot in it. That line of code seems specific to one ArcSight mode. Perhaps add a configuration flag for people who aren't using ArcSight to skip it?

The text was updated successfully, but these errors were encountered:

yaauie · 2018-08-16T17:06:28Z

The parser is pretty naive, doing a whole lot of superfluous splitting and replacing and then more work to put things back how it found them when it hits escapes and stuff, all of which could be done more simply with a scanning parser that has simple lookaheads that account for escaping.

This is quite similar to recent work that I did in the KF Filter Plugin, so I'm assigning this ticket to myself.

MattCarothers · 2018-08-16T19:08:51Z

You may find this helpful. I haven't tested it extensively, but I replaced everything from line 167to line 188 with one message.split():

    # Strip any whitespace from the message
    if not message.nil? and message.include? '='
      message = message.strip

      # If the last KVP has no value, add an empty string, this prevents hash errors below
      if message.end_with?('=')
        message = message + ' ' unless message.end_with?('\=')
      end

      message.split(/\s+(?=[^=\\\s]*[^\\]=)/).each{ |kv_pair|
        key, value = kv_pair.gsub(/\\=/, '=').split(/=/, 2)
        event.set(MAPPINGS[key] || key, value)
      }
    end

As far as I can tell, it works for everything except one corner case where the key has an escaped = sign in it.

Resolves: logstash-plugins#54

breml · 2018-08-18T09:29:35Z

@yaauie I thought a little bit about your above comment as well as the comment on the PR 55 and I have to admit, that I feel offended and I think that those comments are not in good accordance with the Elastic Community Code of Conduct. It is ok to propose improvements and we are all happy, if the code becomes better (bug-free, maintainable, more performant). But there is no value add (nor need) in judging the existing code in such a negative way (e.g. "naive" and "superfluous").
I don't feel, that these coments are very respectful towards the developers who helped to improve the codec to the state we have today. You have to keep in mind, that this codec already served well in a lot of cases (Elastic alone released a 6 part series of blog posts using this codec, starting here). Additonally, most of the developers built the existing state of the codec in their free/spare time, where it looks like you are on the payroll of Elastic. In my opinion, this should even raise the bar for the way you communicate with the community.
I am one of the developers who helped to improve the codec to the current state. When I joined, only one side of encode/decode even existed as well as there have been lots of incompatibilities to the specification I helped to remove. And last but not least, the unit tests, that are in place, allowed you to implement your scanner based approach in a short period of time in the first place.

CC: @jordansissel, @suyograo

yaauie · 2018-08-19T17:30:55Z

@breml I did not mean to insult you or the other contributors to this project, and appreciate that you were willing to call out that the effect of my words was offensive. I am sorry.

You are absolutely right that this codec has been useful to many people, and that the extensive tests that you and others put in the effort to maintain are what allowed me to confidently build a scanning parser that would be non-breaking.

In retrospect, my tone was not respectful of the other contributors and their efforts; in future I will be more intentional with how I communicate.

breml · 2018-08-20T08:41:52Z

@yaauie thanks for the apology, I really appreciate.
And also thanks for your work on the scanning parser, this will clearly improve this codec.

Resolves: logstash-plugins#54

yaauie self-assigned this Aug 16, 2018

yaauie added a commit to yaauie/logstash-codec-cef that referenced this issue Aug 17, 2018

implement scanning parser for CEF Extension Fields to catch edge-cases

6f4099b

Resolves: logstash-plugins#54

yaauie mentioned this issue Aug 17, 2018

Implement Scanning Parsers #55

Merged

yaauie added a commit to yaauie/logstash-codec-cef that referenced this issue Aug 20, 2018

implement scanning parser for CEF Extension Fields to catch edge-cases

317fef6

Resolves: logstash-plugins#54

yaauie closed this as completed in 77e6973 Aug 22, 2018

This was referenced Sep 10, 2018

CEF codec parses requestClientApplication when it shouldn't #56

Closed

fix: prevent preemptive field splitting on malformed input #57

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing error - value including a space followed by a token with a dot in it #54

Parsing error - value including a space followed by a token with a dot in it #54

MattCarothers commented Jul 13, 2018

yaauie commented Aug 16, 2018

MattCarothers commented Aug 16, 2018

breml commented Aug 18, 2018

yaauie commented Aug 19, 2018

breml commented Aug 20, 2018

Parsing error - value including a space followed by a token with a dot in it #54

Parsing error - value including a space followed by a token with a dot in it #54

Comments

MattCarothers commented Jul 13, 2018

yaauie commented Aug 16, 2018

MattCarothers commented Aug 16, 2018

breml commented Aug 18, 2018

yaauie commented Aug 19, 2018

breml commented Aug 20, 2018