New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add refiltering option for created events #793

Closed
wants to merge 8 commits into
base: master
from

Conversation

Projects
None yet
9 participants
@20goto10

20goto10 commented Nov 22, 2013

Currently in logstash an event created by a filter (e.g. "split") is not processed a second time by the filters. This PR adds a "refilter" option which will reinsert new events into the chain for further processing. Tags and conditionals in the configuration can be used to ensure that the filter is only applied to the created events. Refilter is available to all filters (though it only makes sense for filters which yield events back to the pipeline).

Note that this PR includes the split changes from PR #787 .

Example filter configuration -- in this example, my log data is presented in one large JSON array under "message" => "Records", which must be split to be handled correctly. On the second pass, which only filters the created events, the date filter is used to correct the @timestamp.

filter {
if !("splitted" in [tags])
{
json {
source => "message"
}
split {
field => "Records"
reuse_element => false
refilter => true
add_tag => ["splitted"]
}
}
if ("splitted" in [tags])
{
date {
add_tag => ["dated"]
match => ["eventTime", "ISO8601"]
}
}
}

@jordansissel

This comment has been minimized.

Show comment
Hide comment
@jordansissel

jordansissel Nov 22, 2013

Contributor

Thanks for helping improve logstash!

The behavior you're describing is actually a bug, not a feature, that logstash is not filtering events that go through the split filter. This patch introduces several changes, the most relatedly, the 'refilter' setting. Given this is a bug, adding a feature to work around that bug seems not like the best approach. :)

We will work on fixing this bug and will accept patches to fix it as well, but this approach feels more like a work around (and also a feature addition) than it does a bug fix.

Does this make sense? Happy to discuss further! :)

Contributor

jordansissel commented Nov 22, 2013

Thanks for helping improve logstash!

The behavior you're describing is actually a bug, not a feature, that logstash is not filtering events that go through the split filter. This patch introduces several changes, the most relatedly, the 'refilter' setting. Given this is a bug, adding a feature to work around that bug seems not like the best approach. :)

We will work on fixing this bug and will accept patches to fix it as well, but this approach feels more like a work around (and also a feature addition) than it does a bug fix.

Does this make sense? Happy to discuss further! :)

@jordansissel

This comment has been minimized.

Show comment
Hide comment
@jordansissel

jordansissel Nov 22, 2013

Contributor

For posterity, here's a way to reproduce the bug:

 % bin/logstash agent -e 'filter { split { terminator => "," } mutate { rename => { "type" => "fancypants" } } }'
Using milestone 2 filter plugin 'split'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.2.3.dev/plugin-milestones {:level=>:warn}
hello,world
{
  "message"    => "hello",
  "@timestamp" => "2013-11-22T17:59:39.932Z",
  "@version"   => "1",
  "type"       => "stdin",
  "host"       => "porkrevenge"
}
{
  "message"    => "world",
  "@timestamp" => "2013-11-22T17:59:39.932Z",
  "@version"   => "1",
  "type"       => "stdin",
  "host"       => "porkrevenge"
}


# This one works since it is not split.
hello
{
  "message"    => "hello",
  "@timestamp" => "2013-11-22T17:59:47.370Z",
  "@version"   => "1",
  "host"       => "porkrevenge",
  "fancypants" => "stdin"
}
Contributor

jordansissel commented Nov 22, 2013

For posterity, here's a way to reproduce the bug:

 % bin/logstash agent -e 'filter { split { terminator => "," } mutate { rename => { "type" => "fancypants" } } }'
Using milestone 2 filter plugin 'split'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.2.3.dev/plugin-milestones {:level=>:warn}
hello,world
{
  "message"    => "hello",
  "@timestamp" => "2013-11-22T17:59:39.932Z",
  "@version"   => "1",
  "type"       => "stdin",
  "host"       => "porkrevenge"
}
{
  "message"    => "world",
  "@timestamp" => "2013-11-22T17:59:39.932Z",
  "@version"   => "1",
  "type"       => "stdin",
  "host"       => "porkrevenge"
}


# This one works since it is not split.
hello
{
  "message"    => "hello",
  "@timestamp" => "2013-11-22T17:59:47.370Z",
  "@version"   => "1",
  "host"       => "porkrevenge",
  "fancypants" => "stdin"
}
@20goto10

This comment has been minimized.

Show comment
Hide comment
@20goto10

20goto10 Nov 22, 2013

That does make sense. Removing the "refilter" config option from these changes-- effectively setting it to "true" in all cases-- seems like a viable bugfix at first glance.

20goto10 commented Nov 22, 2013

That does make sense. Removing the "refilter" config option from these changes-- effectively setting it to "true" in all cases-- seems like a viable bugfix at first glance.

@gposton

This comment has been minimized.

Show comment
Hide comment
@gposton

gposton Feb 4, 2014

+1 thanks 20goto10

gposton commented Feb 4, 2014

+1 thanks 20goto10

@erez-rabih

This comment has been minimized.

Show comment
Hide comment
@erez-rabih

erez-rabih Feb 19, 2014

Hi,
Thanks for the fixes 20goto10 I think many people will enjoy this.
Are there plans to merge this PR to the next version of Logstash?

erez-rabih commented Feb 19, 2014

Hi,
Thanks for the fixes 20goto10 I think many people will enjoy this.
Are there plans to merge this PR to the next version of Logstash?

@20goto10

This comment has been minimized.

Show comment
Hide comment
@20goto10

20goto10 Feb 19, 2014

It doesn't look like this will be merged into Logstash. Hopefully someone will patch the bug instead. You can get a build that includes this PR, however, from the link here: http://mdsol.github.io/2014/01/27/parsing-amazon-cloudtrail-json-logs-with-a-customized-logstash-build.html

20goto10 commented Feb 19, 2014

It doesn't look like this will be merged into Logstash. Hopefully someone will patch the bug instead. You can get a build that includes this PR, however, from the link here: http://mdsol.github.io/2014/01/27/parsing-amazon-cloudtrail-json-logs-with-a-customized-logstash-build.html

@erez-rabih

This comment has been minimized.

Show comment
Hide comment
@erez-rabih

erez-rabih Feb 19, 2014

@20goto10 Thanks for clarifying that.

erez-rabih commented Feb 19, 2014

@20goto10 Thanks for clarifying that.

@erez-rabih

This comment has been minimized.

Show comment
Hide comment
@erez-rabih

erez-rabih Feb 23, 2014

Hi, just wanted to update this thread with a possible easy workaround for this common problem:

The idea is to redirect the exec output into a log and the read that log using the file input. My config ended up like this:

    exec {
            command => "ps ax | grep unicorn >> unicorn_ps.log"
            interval => 5
            type => "garbage"
    }
    file {
            path => "unicorn_ps.log"
            type => "unicorn_ps"
    }

Because I am redirecting output using the >> on the exec input there is no output at all from this input meaning no garbage is sent to logstash.

erez-rabih commented Feb 23, 2014

Hi, just wanted to update this thread with a possible easy workaround for this common problem:

The idea is to redirect the exec output into a log and the read that log using the file input. My config ended up like this:

    exec {
            command => "ps ax | grep unicorn >> unicorn_ps.log"
            interval => 5
            type => "garbage"
    }
    file {
            path => "unicorn_ps.log"
            type => "unicorn_ps"
    }

Because I am redirecting output using the >> on the exec input there is no output at all from this input meaning no garbage is sent to logstash.

@colinsurprenant colinsurprenant added the bug label May 1, 2014

@colinsurprenant colinsurprenant added this to the v1.4.2 milestone May 1, 2014

@cliff-wakefield

This comment has been minimized.

Show comment
Hide comment
@cliff-wakefield

cliff-wakefield May 21, 2014

Any updates on this bug?

Behaviour described is still present in v1.4.1

This entry implies it will be addressed in v1.4.2 is that really going to be the case?

cliff-wakefield commented May 21, 2014

Any updates on this bug?

Behaviour described is still present in v1.4.1

This entry implies it will be addressed in v1.4.2 is that really going to be the case?

@colinsurprenant

This comment has been minimized.

Show comment
Hide comment
@colinsurprenant

colinsurprenant Aug 8, 2014

Contributor

this is fixed by #1545 - @jordansissel posterity example passes

Contributor

colinsurprenant commented Aug 8, 2014

this is fixed by #1545 - @jordansissel posterity example passes

@colinsurprenant

This comment has been minimized.

Show comment
Hide comment
@colinsurprenant

colinsurprenant Aug 8, 2014

Contributor

and thanks for your work on this @20goto10!
It would be really appreciated if any of you could run tests on #1545

Contributor

colinsurprenant commented Aug 8, 2014

and thanks for your work on this @20goto10!
It would be really appreciated if any of you could run tests on #1545

colinsurprenant added a commit to colinsurprenant/logstash that referenced this pull request Aug 8, 2014

@jordansissel jordansissel added the O(2) label Aug 19, 2014

@elasticsearch-release

This comment has been minimized.

Show comment
Hide comment
@elasticsearch-release

elasticsearch-release Aug 26, 2014

Can one of the admins verify this patch?

elasticsearch-release commented Aug 26, 2014

Can one of the admins verify this patch?

colinsurprenant added a commit that referenced this pull request Sep 5, 2014

jordansissel added a commit that referenced this pull request Sep 5, 2014

Add recurse method for doing breadth-first traversal of the AST
This will be used by the filter flush compiler

Add generation of a flush lambda for each filter.

This allows filters to flush and have any generated events proceed
downward through the config as you would expect, respecting any
branches or future plugins.

Die on IOError which occurs when reading from a closed STDIN

Make filter_flusher invoke the new (and correct, I hope!) way to flush.

- On shutdown, we will also flush all filters.
- The flusher thread will terminate if we are shutting down.

Clarify the comment

Fix comment generation in the code to avoid newlines.

Add 'max_age' setting to multiline for flushing.

This setting chooses how long (in seconds) an event is considered to be
fresh before it will be automatically flushed.

This is useful for:
* slow log sources to get the 'last event' flushed,
* transaction-id-style events that have no obvious "end" event and also
  are mixed among other-id events in the same stream.

Also:
- Make filters have no teardown by default.
- Remove 'enable_flush' since it is not needed anymore; flush is always
  enabled.

refactor flush

new spool filter and specs, mainly for testing flushing

turn off unrelated test error for now

fix the flush logic, fix the flush compiles code to not include output section

synchronize cross-thread  access to @pending

refactor for performance and readability

synchronize cross-thread access to @spool

unused code

input:udp removed boggus ShutdownSignal handling, morphed loop do into while true, cosmetic reformat

use transcient events and not exceptions for in-flow pipeline signaling

inline flushing into  filterworker

removed now unnecessary flushing thread safety

fix conditionals bug for new events generated by filters & specs

spec for issue #793

performance tweeks

simplify filter handling of events and new_events

this removes unecessary duplication when treating the original event as
a special case (different from new_events generated by a filter).
Also, since @filter_func only outputs non-cancelled events, some checks
were also removed.

Move multiple filter specs to a filter_chains file

append events generated by a filter using unshift instead of insert

closes #793, closes #1429, closes #1431, closes #1548

@jsvd jsvd added the duplicate label Sep 9, 2014

@Skavenger0

This comment has been minimized.

Show comment
Hide comment
@Skavenger0

Skavenger0 Oct 27, 2014

Does anyone have a working 1.4.2 config file now this issue is fixed. I've tried the one from @20goto10 blog but obviously refilter doesnt exist in a standard 1.4.2 build?

Skavenger0 commented Oct 27, 2014

Does anyone have a working 1.4.2 config file now this issue is fixed. I've tried the one from @20goto10 blog but obviously refilter doesnt exist in a standard 1.4.2 build?

@colinsurprenant

This comment has been minimized.

Show comment
Hide comment
@colinsurprenant

colinsurprenant Oct 28, 2014

Contributor

@Skavenger0 the fix for this in #1545 will be included in the next release (1.5) of logstash

Contributor

colinsurprenant commented Oct 28, 2014

@Skavenger0 the fix for this in #1545 will be included in the next release (1.5) of logstash

@colinsurprenant colinsurprenant removed this from the 1.4.3 milestone Oct 28, 2014

@jordansissel jordansissel added this to the v1.5.0 milestone Nov 11, 2014

ajchalk added a commit to catalyst/logstash that referenced this pull request Dec 2, 2014

Add recurse method for doing breadth-first traversal of the AST
This will be used by the filter flush compiler

Add generation of a flush lambda for each filter.

This allows filters to flush and have any generated events proceed
downward through the config as you would expect, respecting any
branches or future plugins.

Die on IOError which occurs when reading from a closed STDIN

Make filter_flusher invoke the new (and correct, I hope!) way to flush.

- On shutdown, we will also flush all filters.
- The flusher thread will terminate if we are shutting down.

Clarify the comment

Fix comment generation in the code to avoid newlines.

Add 'max_age' setting to multiline for flushing.

This setting chooses how long (in seconds) an event is considered to be
fresh before it will be automatically flushed.

This is useful for:
* slow log sources to get the 'last event' flushed,
* transaction-id-style events that have no obvious "end" event and also
  are mixed among other-id events in the same stream.

Also:
- Make filters have no teardown by default.
- Remove 'enable_flush' since it is not needed anymore; flush is always
  enabled.

refactor flush

new spool filter and specs, mainly for testing flushing

turn off unrelated test error for now

fix the flush logic, fix the flush compiles code to not include output section

synchronize cross-thread  access to @pending

refactor for performance and readability

synchronize cross-thread access to @spool

unused code

input:udp removed boggus ShutdownSignal handling, morphed loop do into while true, cosmetic reformat

use transcient events and not exceptions for in-flow pipeline signaling

inline flushing into  filterworker

removed now unnecessary flushing thread safety

fix conditionals bug for new events generated by filters & specs

spec for issue #793

performance tweeks

simplify filter handling of events and new_events

this removes unecessary duplication when treating the original event as
a special case (different from new_events generated by a filter).
Also, since @filter_func only outputs non-cancelled events, some checks
were also removed.

Move multiple filter specs to a filter_chains file

append events generated by a filter using unshift instead of insert

closes #793, closes #1429, closes #1431, closes #1548

Conflicts:
	spec/conditionals/test.rb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment