Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiline.flush_pattern option #4019

Merged
merged 1 commit into from
May 11, 2017

Conversation

TheoAndersen
Copy link
Contributor

This allows for specifying a regex, which will flush the current multiline, thus ending the current multiline. Useful for using multiline to capture application events with 'start' and 'end' lines.

Example configuration
multiline.pattern: 'start'
multiline.negate: true
multiline.match: after
multiline.flush_pattern: 'end'

(#3964)

@elasticmachine
Copy link
Collaborator

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run.

1 similar comment
@elasticmachine
Copy link
Collaborator

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run.

@TheoAndersen
Copy link
Contributor Author

I have signed the CLA now.. can somebody trigger it to check again? :)

@TheoAndersen
Copy link
Contributor Author

TheoAndersen commented Apr 15, 2017

ahhr... need to run it through gofmt... will fix and amend

@ruflin
Copy link
Member

ruflin commented Apr 19, 2017

@TheoAndersen Thanks a lot for the contribution. I' m thinking if we should have a separate config option for start and end/flush pattern so we don't mix it with the other configs. The part I worry is that "misconfiguration" of all together could lead to strange side effects.

At the same time the implementation looks rather simple you did

multiline.start_pattern:
multiline.end_pattern:

@monicasarbu monicasarbu requested a review from tsg April 19, 2017 10:57
@TheoAndersen
Copy link
Contributor Author

@ruflin Thanks for the comments.

This PR is just the addition of the extra optional parameter flush_pattern, so i don't think it will affect existing configurations, unless you actively use this configuration.

One thing i have found though, is that the way i use this multiline configuration will force everything to be gathered into events - wheres i really just wanted to merge specific lines (matched with a start and end pattern) into a single event - and leave the rest as they are.

Ex if you have a log like this

Line 1
Line 2
Line 3 - Some event starts
Line 4 - Content of event
Line 5 - End of event
Line 6
Line 7

and use a configuration like this

multiline.pattern: 'Some event starts'
multiline.negate: true
multiline.match: after
multiline.flush_pattern: 'End of event'

then it will create 3 events.

  • one with line 1 to 2 merged
  • one with 3 to 5 (the one i really only wanted merged)
  • one with line 6 to 7

Where i really only wanted line 3 to 5 merged.

I'm not sure if this is possible to do, by continuing to hack or add on the current multiline implementation? using the multiline.negate: true will kindof make multiline try to merge all lines in the log wont it?

Do you have any suggestions for this?

Also i would ideally like to be able to specify multiple start-end patterns, if there was multiple kinds of events in there (but this could be done by piping in the regular expression i guess).

@ruflin
Copy link
Member

ruflin commented Apr 24, 2017

I feel like this PR is a good kick of to discuss how we should extend multiline and which use cases it should covered. Unfortunately the initial design doc is now in the filebeat repo which we made private to not have people opening issues there all the time. But in summary there are 2 things in there which we haven't implemented yet:

  • Support for multiple patterns instead of just one
  • Start / End patterns

What you implemented here is kind of the end pattern, but I agree with you that I would also expect the behaviour that one event for each line is sent, if no start pattern is started.

The initial implementation was done based on some example logs from java stack traces to json logs etc. I would suggest we proceed like this and having directly some real world examples that we want to work on. Perhaps what you did above is already all we need as the other format is more a theoretical one, but not sure.

One of the suggestion from the "old" issue looks as following:

# Enable mutline feature
multiline:
  # See description of start_pattern. Can be used alone and in combination with end_pattern
  start_pattern: regexp
  start_pattern_negate: bool

  # See description of end_pattern. Can be used alone and in combination with start_pattern
  end_pattern: regexp
  end_pattern_negate: bool

  # See description of after_pattern. Can only be used alone
  after_pattern: regexp
  after_pattern_negate: bool

  # See description of before_pattern. Can only be used alone
  before_pattern: regexp
  before_pattern_negate: true

This can probably be expressed in more compact way.

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We quickly discussed this internally and decided to move forward with this and add more feature to multiline at a later stage. This already solves lots of potential use cases.

I suggest we add a few system-tests to the mix (check tests/system/test_multiline.py) to make sure the behaviour is exactly as expected.

Could you also add a changelog entry?

@@ -60,16 +60,20 @@ func TestMultilineAfterNegateOK(t *testing.T) {
)
}

func TestMultilineBeforeNegateOK(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not overwrite an existing test but add a new one.

@TheoAndersen
Copy link
Contributor Author

This should have added the missing test, and the changelog

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I left you two minor comments.

// handle case when endPattern is reached
if mlr.flushMatcher != nil {
endPatternReached := (mlr.flushMatcher.Match(message.Content))
// fmt.Println("Next line: " + string(message.Content))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line should be removed.

flushMatcher := match.MustCompile(`EventEnd`)

testMultilineOK(t,
MultilineConfig{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add one more test where first there are lines which do not match the pattern EventStart?

@TheoAndersen TheoAndersen force-pushed the multiline_flush_pattern branch 2 times, most recently from adc6097 to 31a8a5c Compare May 8, 2017 12:18
@TheoAndersen
Copy link
Contributor Author

Can anybody help me with this build-error?

Can't seem to figure it out ( @ruflin? )

@ruflin
Copy link
Member

ruflin commented May 8, 2017

@TheoAndersen Bad timing. There was one commit today which was merged into master and was failing and you exactly got that one :-( Can you rebase on top of master and run make update and make fmt on the top beats level?

@TheoAndersen
Copy link
Contributor Author

Thats how it goes. Trying again :)

@ruflin
Copy link
Member

ruflin commented May 8, 2017

jenkins, test it

@TheoAndersen
Copy link
Contributor Author

Was that just thinking out loud, or does jenkins actually parse these comments?

if so..

Jenkins, test faster ;o)

@ruflin
Copy link
Member

ruflin commented May 8, 2017

Yes, jenkins parses the comments, but there are only a few and not all users can trigger the events ;-)

@@ -343,6 +343,12 @@ https://github.com/elastic/beats/compare/v5.2.2...v5.3.0[View commits]
- The `symlinks` and `harverster_limit` settings are now GA, instead of experimental. {pull}3525[3525]
- close_timeout is also applied when the output is blocking. {pull}3511[3511]
- Improve handling of different path variants on Windows. {pull}3781[3781]
- Restructure input.Event to be inline with outputs.Data {pull}3823[3823]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to bug you again, but this seems to have been messed up during rebasing. Unfortunately happens from time to time :-(

@ruflin
Copy link
Member

ruflin commented May 9, 2017

All good from my side and ready to merge as soon as changelog is updated.

This allows for specifying a regex, which will flush the current multiline, thus ending the current multiline. Useful for using multiline to capture application events with 'start' and 'end' lines.

Example configuration
  multiline.pattern: 'start'
  multiline.negate: true
  multiline.match: after
  multiline.flush_pattern: 'end'

(elastic#3964)
@TheoAndersen
Copy link
Contributor Author

Updated changelog

@ruflin ruflin merged commit 04cf26d into elastic:master May 11, 2017
@ruflin
Copy link
Member

ruflin commented May 11, 2017

@TheoAndersen Merged. Thanks a lot for this contribution.

@ppf2
Copy link
Member

ppf2 commented May 8, 2018

The changelog indicates that this feature is available in 5.3.0 and above.

https://github.com/elastic/beats/blob/master/CHANGELOG.asciidoc#beats-version-530

But the actual code does not appear until 6.0. Should we fix the changelog? @ruflin

@ruflin
Copy link
Member

ruflin commented May 11, 2018

@ppf2 It definitively seems like this changelog ended up in the completely wrong place.

@monicasarbu @tsg What should we do here as I remember our CHANGELOG is also in some release etc. I assume it's not solved by just removing it in 5.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants