Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple multiline messages testcases don't work with --sockets flag #39

Closed
dawi opened this issue May 7, 2017 · 14 comments
Closed

multiple multiline messages testcases don't work with --sockets flag #39

dawi opened this issue May 7, 2017 · 14 comments
Assignees

Comments

@dawi
Copy link

dawi commented May 7, 2017

I have a problem, testing multiline messages with logstash filter verifier and I am not sure if it is a bug or intended behaviour. Either way, a section in the readme about testing multiline messages could help a lot.

I am using the "json" codec to test multiline messages.

The issue is, that if you use the --sockets flag to speed up the tests you cannot have more than one multiline test case per test file.

In this case you currently have two options:

  1. Don't use the --sockets flag (which will result in slow tests)
  2. Put each multiline test case in a separate file.

Is there a reason that it is not possible to have multiple multiline testcases in one file in case you use the --sockets flag?

@magnusbaeck
Copy link
Owner

Could you supply an example testcase file that exhibits the problem?

@dawi
Copy link
Author

dawi commented May 7, 2017

Yes of course, I will create one.

@dawi
Copy link
Author

dawi commented May 7, 2017

The attached testcases.zip contains one pipeline configuration and two test directories.

Directory tests1 contains one test file with two test cases.
Directory tests2 contains the same two test cases but in two separate files.

testcases.zip

tests1 will run successfully without --sockets, but will fail with --sockets.
tests2 will run always successfully.

@magnusbaeck magnusbaeck self-assigned this May 7, 2017
@magnusbaeck
Copy link
Owner

Thanks, I'll have a look as soon as I can.

@dawi
Copy link
Author

dawi commented May 7, 2017

Many thanks for your efforts. :)

@breml
Copy link
Collaborator

breml commented May 8, 2017

@dawi could please you try again with codec json_lines instead of json. If it is still not working, please provide the error messages (set --loglevel to DEBUG and add --logstash-output).

I tried to quickly run your tests, but I failed, because you are using a quiet new feature of the grok filter (pattern_definitions) and I don't have such a recent version of logstash ready to run the tests.

@dawi
Copy link
Author

dawi commented May 8, 2017

Ok, it works with json_lines.

At the beginning I wanted to use json_lines, but maybe I used json_line instead of json_lines (which obviously cannot work) and came to the conclusion that I have to use json codec to be able to test multiline messages.

Anyway, the problem exists with json codec.

@breml
Copy link
Collaborator

breml commented May 8, 2017

@dawi true, but this is not resolvable due to the way, the plugin logstash-input-unix is working. The difference between logstash-input-stdin and logstash-input-unix is, that in https://github.com/logstash-plugins/logstash-input-stdin/blob/master/lib/logstash/inputs/stdin.rb#L37, the stdin plugin is reading the input line by line (without regard to the used codec) whereas in https://github.com/logstash-plugins/logstash-input-unix/blob/master/lib/logstash/inputs/unix.rb#L88 the unix input is reading available data chunks up to 16384 bytes, where the identification of events within those data chunks is completely left to the used codec. The json codec does not delimit the events on a line by line base, which is compensated by the stdin input as written above, but this is not the case for the unix input.

I suggest to close this issue, as it is working fine with json_lines codec.

@dawi
Copy link
Author

dawi commented May 8, 2017

Ok, I agree, but it would be good if the readme would be more explicit about this. I am wondering if there is any reason to use json instead of json_lines at all with logstash-filter-verifier. If not, then maybe the use of this codec this should be forbidden in logstash-filter-verifier or a warning could be printed.

@breml
Copy link
Collaborator

breml commented May 8, 2017

@dawi currently the readme states, that the codec normally should be one of line or json_lines (https://github.com/magnusbaeck/logstash-filter-verifier/blame/master/README.md#L202). Additionally there is a hint for the usage with --sockets, that in this case it is especially important to use either line or json_lines (https://github.com/magnusbaeck/logstash-filter-verifier/blame/master/README.md#L251).
Also LFV defaults to line codec, which works in both cases (with or without --sockets).

What else do you have in mind? If you want the readme to be more explicit about this issue, maybe you create a PR.

@dawi
Copy link
Author

dawi commented May 8, 2017

@breml Yes, I will think about it. But I find it difficult to decide what make sense and what not, since I am just using logstash only for two weeks now. I am currently wondering if it does make sense to use LFV with any other codec then lineor json_lines. And if not, why not forbid the use of codecs that are known to cause errors in some cases?

@magnusbaeck
Copy link
Owner

Issuing a moratorium on other codecs is probably a mistake since someone's bound to figure out clever ways to make use of other codecs (possibly custom ones that we don't even know exist). However, warning users that the codec they've configured most likely isn't the best choice would be totally doable. What do you think?

@breml
Copy link
Collaborator

breml commented May 9, 2017

TL;DR: I think it is save to raise a warning if a user uses a codec other than logstash-codec-lines or logstash-codec-json_line together with --sockets.

In my opinion the main issue with the logstash-input-unix (as well as logstash-input-tcp) is, that it is not an application level protocol, which has a definition of a message, but rather a transport protocol, which transports a stream of data (message = log event in this case). It is the responsibility of the application layer protocol to define, when a message ends and the next message starts. So we actually use the codecs logstash-codec-line and logstash-codec-json_lines to split our data stream into messages (our "protocol" from LFV point-of-view is, each message is separated by a newline).
The logstash-input-stdin in this regard acts quite similar to an application layer protocol, because every line of input is automatically considered a message.

This means, that all the codec, which assume to get the messages already properly separated (e.g. logstash-codec-csv, logstash-codec-compress_spooler) will not work in our current setup.

There is an other problem: LFV does not allow to configure the codec plugin, which means, our "application layer protocol" (each message on a line) must be supported by the codec by default. For example, the logstash-codec-cef would allow to configure a delimiter (which could be \n), but by default there is none set, which means, that this codec does also not work with LFV.
So in the end, I think there are only a few codecs, which possibly could work with LFV at the moment:

  • logstash-codec-gzip_lines
  • logstash-codec-es_bulk
  • logstash-codec-graphite
  • logstash-codec-edn_lines

So, I do not expect the majority of the codecs to currently work with LFV.

@magnusbaeck
Copy link
Owner

Thanks for the analysis @breml! I've pushed a commit that adds a warning when select codecs are used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants