plugin: Added filter feature per record. #42

oza · 2012-03-01T14:48:54Z

fluentd cannot apply filter per record currenty.
This patch provides user-defined filter feature per record,
even when using MultiEventStream.

This problem is one proposal for solving the issue #39(https://github.com/fluent/fluentd/issues/39)．

Signed-off-by: UEMURA Yuichi y.uemura@ntt.com

fluentd cannot apply filter per record currenty. This patch provides user-defined filter feature per record, even when using MultiEventStream. Signed-off-by: UEMURA Yuichi <y.uemura@ntt.com>

frsyuki · 2012-03-02T02:08:36Z

I think it was tough work to clear legal procedures to make your patch open:-)

I understood the filtering/validating feature at the input-side is important.
But your proposal is too big to add just a filtering feature into the in_tail plugin. I believe the plugin mechanism should be simple to keep simplicity of Fluentd.
It needs more examples or estimations of use-cases to add a new type of plugins.

You can do same thing by extending TailInput and overriding the parse_line method (ref. http://fluentd.org/doc/devel.html#custom-parser-for-tail-input-plugin).

kzk · 2012-03-02T02:19:00Z

Unfortunately, as an OSS product, it's hard to accept the patch which includes company's copyright.

That could be a problem, when we change the license or donate the code to somewhere.

Please reconsider it. Sorry about this.

Thanks - K

oza · 2012-03-02T03:54:41Z

@frsyuki Yes, TailInput is enough to filter when tailing.
Therefore, my patch is needless in case just tail-time filtering.

Our problem is that some input plugins of fluentd cannot restrict the size of input.
For example, the ForwardInput plugin doesn't have such mechanism:

module Fluent
class ForwardInput < Input
  # ...elision...
  def on_message(msg)
    # ...elision...
    elsif entries.class == Array
      # Forward
      es = MultiEventStream.new
      entries.each {|e|
        time = e[0].to_i
        time = (now ||= Engine.now) if time == 0
        record = e[1]
        # We'd like to hook record and to reduce output traffic.
        es.add(time, record)
      }  
      Engine.emit_stream(tag, es)
      ...elision...
  end

Input-time restriction is useful when the the owners of fluentd's instances are different.
However, one difficult problem come out when implementing it - the definition of log size depends on the application.
Thus, user-defined filtering mechanism is essential.

I think my filtering patch is one solution, but I don't know whether my idea is the best or not.
Do you have any idea?

oza · 2012-03-02T04:01:44Z

@kzk Yeah, Okey, I'll delete it next patch. Thank you for your pointing out :-)

oza · 2012-03-02T04:17:52Z

@frsyuki Note that this patch is just a prototype... the goal of this patch is to establish the generic method to limit input size.

frsyuki · 2012-03-06T01:11:05Z

Input-time restriction is useful when the the owners of fluentd's instances are different.

So, you can't trust the sender of logs at all, right? I imagine you need "multi-tenant" capability.

To satisfy your needs, the input plugin will need following features:
a) authentication
b) size/structure/semantics validation of records (validation per records)
c) bandwidth limitation (validation over multiple records)

In my opinion, a) should be done at the boundary of the "untrusted zone" and "trusted zone" not to allow unauthorized users to consume resources of the "trusted zone". And b) should be done at the boundary of the "frontend system" and "backend system". Because these validations tend to be complex and slow task because it needs be strict to prevent potential security holes.
So they have different requirements than backend aggregation system which needs simplicity and high performance.

in/out_forward plugins and Fluentd's emit/write mechanisms are optimized for availability and performance. I think it's difficult to add validation features without degrading these benefits.
So it will be a another input plugin like in_secure_forward rather than a kind of add-ons.

oza · 2012-03-06T15:34:53Z

So, you can't trust the sender of logs at all, right? I imagine you need "multi-tenant" capability.

Yes, you're right.

These validations tend to be complex and slow task because it needs be strict to prevent potential security holes.
They have different requirements than backend aggregation system which needs simplicity and high performance.

It seems like the best solution to layer between trusted and untrusted zone.

So it will be a another input plugin like in_secure_forward rather than a kind of add-ons.

This is our next work to come true layering :-)
We're going to share it as soon as possible.

plugin: Added filter feature per record.

629e0ae

fluentd cannot apply filter per record currenty. This patch provides user-defined filter feature per record, even when using MultiEventStream. Signed-off-by: UEMURA Yuichi <y.uemura@ntt.com>

frsyuki mentioned this pull request Mar 6, 2012

Fluentd ignores buffer_chunk_limit when MultiEventStream is larger than buffer_chunk_limit. #39

Closed

frsyuki closed this Mar 7, 2012

kvborodin mentioned this pull request Jun 18, 2019

Fluentd stuck/hangs because of infinity regexp (99.9%) please improve detection/validation #2464

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plugin: Added filter feature per record. #42

plugin: Added filter feature per record. #42

oza commented Mar 1, 2012

frsyuki commented Mar 2, 2012

kzk commented Mar 2, 2012

oza commented Mar 2, 2012

oza commented Mar 2, 2012

oza commented Mar 2, 2012

frsyuki commented Mar 6, 2012

oza commented Mar 6, 2012

plugin: Added filter feature per record. #42

plugin: Added filter feature per record. #42

Conversation

oza commented Mar 1, 2012

frsyuki commented Mar 2, 2012

kzk commented Mar 2, 2012

oza commented Mar 2, 2012

oza commented Mar 2, 2012

oza commented Mar 2, 2012

frsyuki commented Mar 6, 2012

oza commented Mar 6, 2012