Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin: Added filter feature per record. #42

Closed
wants to merge 1 commit into from
Closed

plugin: Added filter feature per record. #42

wants to merge 1 commit into from

Conversation

oza
Copy link
Member

@oza oza commented Mar 1, 2012

fluentd cannot apply filter per record currenty.
This patch provides user-defined filter feature per record,
even when using MultiEventStream.

This problem is one proposal for solving the issue #39(https://github.com/fluent/fluentd/issues/39).

Signed-off-by: UEMURA Yuichi y.uemura@ntt.com

fluentd cannot apply filter per record currenty.
This patch provides user-defined filter feature per record,
even when using MultiEventStream.

Signed-off-by: UEMURA Yuichi <y.uemura@ntt.com>
@frsyuki
Copy link
Member

frsyuki commented Mar 2, 2012

I think it was tough work to clear legal procedures to make your patch open:-)

I understood the filtering/validating feature at the input-side is important.
But your proposal is too big to add just a filtering feature into the in_tail plugin. I believe the plugin mechanism should be simple to keep simplicity of Fluentd.
It needs more examples or estimations of use-cases to add a new type of plugins.

You can do same thing by extending TailInput and overriding the parse_line method (ref. http://fluentd.org/doc/devel.html#custom-parser-for-tail-input-plugin).

@kzk
Copy link
Member

kzk commented Mar 2, 2012

Unfortunately, as an OSS product, it's hard to accept the patch which includes company's copyright.

That could be a problem, when we change the license or donate the code to somewhere.

Please reconsider it. Sorry about this.

Thanks - K

@oza
Copy link
Member Author

oza commented Mar 2, 2012

@frsyuki Yes, TailInput is enough to filter when tailing.
Therefore, my patch is needless in case just tail-time filtering.

Our problem is that some input plugins of fluentd cannot restrict the size of input.
For example, the ForwardInput plugin doesn't have such mechanism:

module Fluent
class ForwardInput < Input
  # ...elision...
  def on_message(msg)
    # ...elision...
    elsif entries.class == Array
      # Forward
      es = MultiEventStream.new
      entries.each {|e|
        time = e[0].to_i
        time = (now ||= Engine.now) if time == 0
        record = e[1]
        # We'd like to hook record and to reduce output traffic.
        es.add(time, record)
      }  
      Engine.emit_stream(tag, es)
      ...elision...
  end

Input-time restriction is useful when the the owners of fluentd's instances are different.
However, one difficult problem come out when implementing it - the definition of log size depends on the application.
Thus, user-defined filtering mechanism is essential.

I think my filtering patch is one solution, but I don't know whether my idea is the best or not.
Do you have any idea?

@oza
Copy link
Member Author

oza commented Mar 2, 2012

@kzk Yeah, Okey, I'll delete it next patch. Thank you for your pointing out :-)

@oza
Copy link
Member Author

oza commented Mar 2, 2012

@frsyuki Note that this patch is just a prototype... the goal of this patch is to establish the generic method to limit input size.

@frsyuki
Copy link
Member

frsyuki commented Mar 6, 2012

Input-time restriction is useful when the the owners of fluentd's instances are different.

So, you can't trust the sender of logs at all, right? I imagine you need "multi-tenant" capability.

To satisfy your needs, the input plugin will need following features:
a) authentication
b) size/structure/semantics validation of records (validation per records)
c) bandwidth limitation (validation over multiple records)

In my opinion, a) should be done at the boundary of the "untrusted zone" and "trusted zone" not to allow unauthorized users to consume resources of the "trusted zone". And b) should be done at the boundary of the "frontend system" and "backend system". Because these validations tend to be complex and slow task because it needs be strict to prevent potential security holes.
So they have different requirements than backend aggregation system which needs simplicity and high performance.

in/out_forward plugins and Fluentd's emit/write mechanisms are optimized for availability and performance. I think it's difficult to add validation features without degrading these benefits.
So it will be a another input plugin like in_secure_forward rather than a kind of add-ons.

@oza
Copy link
Member Author

oza commented Mar 6, 2012

So, you can't trust the sender of logs at all, right? I imagine you need "multi-tenant" capability.

Yes, you're right.

These validations tend to be complex and slow task because it needs be strict to prevent potential security holes.
They have different requirements than backend aggregation system which needs simplicity and high performance.

It seems like the best solution to layer between trusted and untrusted zone.

So it will be a another input plugin like in_secure_forward rather than a kind of add-ons.

This is our next work to come true layering :-)
We're going to share it as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants