Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion about new filtering mechanism of v11 #216

Closed
frsyuki opened this issue Nov 21, 2013 · 33 comments
Closed

Discussion about new filtering mechanism of v11 #216

frsyuki opened this issue Nov 21, 2013 · 33 comments
Labels

Comments

@frsyuki
Copy link
Member

frsyuki commented Nov 21, 2013

I have several ideas for the design and need feedback about it. Here is the design document:
https://github.com/fluent/fluentd/wiki/V11-filter-syntax

  • Which idea works well for you the best?
  • Why does it work well for you? How are you using/want to use Fluentd?
  • Which idea doesn't work for you?
@frsyuki frsyuki closed this as completed Nov 21, 2013
@tagomoris
Copy link
Member

Why closed?

@frsyuki frsyuki reopened this Nov 22, 2013
@frsyuki
Copy link
Member Author

frsyuki commented Nov 22, 2013

It's mistake :(

@sonots
Copy link
Member

sonots commented Nov 22, 2013

TL;NR

Idea1

Idea1 looks difficult to grasp. Idea1 passes copied messages like

<filter **>
  type copy
   # message goes here
<filter>
<match **>
   # message goes here
</match>

But, in v10, type copy passes messages to the nested store like

<match **>
   type copy
   <store>
   # message goes here
   </store>
   <store>
   # message goes here
   </store>
</match>

The behavior is completely different, and porting configuration of fluentd v10 to v11 looks hard to do.

Idea2

I simply don't understand what's going on on the example B of Idea2. Where does the label @aggregate_and_archive come from? So, It looked that it is hard to grasp.

<label @aggregate_and_archive>
  # forwarded data go to aggregate, then to archive
  <source>
    type forward
    # source can't have nested match
  </source>

Idea3

Idea3 can not be. It is not completely compatible with v10.

Idea4

This looks good, but I do not like copy directive because both copy and match directive are actually doing the similar thing, activating output plugins.

Why don't you add an option copy to match directive? Like

<match copy **>
  type groupcounter
  <match>
    type mongodb
  </match>
</copy>

# archive
<match **>
  type s3
</match>

I don't know how to add options to apache-like configuration, though....

Idea5

There was no explanation of Idea5, but is this a modified version of Idea4 on the point of to_label?

I feel

source => @label

is a good feature although I think we still should discuss on => symbol.
It is very nice if match section also can have => @label, but it looks difficult because match allows nesting. hmm.

Conclusion

It looks Idea4 is good. I suggested match copy for Idea4. For idea5, I felt we have to think of it more.

@frsyuki
Copy link
Member Author

frsyuki commented Nov 22, 2013

@sonots thank you for your feedback!
Idea 2 is very similar to Idea 3. Instead of having to_lavel parameter, <source> inputs records to the parent <label>.

@tagomoris
Copy link
Member

@sonots I added descriptions about Idea 5.

@tagomoris
Copy link
Member

I think, idea1 is a broken syntax. <filter> type copy </filter> changes what <filter> tag does (drain events or not).
To duplicate events in idea1, we should use <filter **> type copy </filter>.
Processings for same events will be written in different nest level. This is very confusing.

<filter **>
  type copy
  <match **>
    type file
    # 2 level nested
    path /path/to/x.log
  </match>
</filter>

<match **>
  type file
  # 1 level nested
  path /path/to/y.log
</match>

And we must modify tags to judge what to do for each events, with idea1.

@tagomoris
Copy link
Member

NOTE: we can use 'apache' syntax highlighting with GitHub flavored markdown.

@sonots
Copy link
Member

sonots commented Nov 22, 2013

Thank you for description for Idea5 and letting me know about apache syntax :)

Idea 5: <source> is not allowed to have nested matches. Multi-<match> tags in same nested level always get copied events. All matches can emit events to nested child matches only. And configuration syntax is extended for labels.

I before came up with the same thing with Multi-<match> tags in same nested level always get copied events. But, it breaks the compatibility with v10, so I suggested copy option for match directive.

@tagomoris
Copy link
Member

Breaking compatibilities on idea5 is not problem for me, because:

  • v11 configuration syntax and v10 configuration syntax are explicitly different by <worker> tag
  • major version ups exists to break compatibilities

But, copy option for match directive is acceptable for me, too.

<label @procedure>
  <match ** copy>
    type file
  </match>
  <match ** copy>
    type forward
  </match>
</label>

It may better to use other directive name to explain pass to next match, instead of match + copy.

@frsyuki
Copy link
Member Author

frsyuki commented Nov 26, 2013

It may better to use other directive name to explain pass to next match, instead of match + copy.

Do you mean <copy **> may be better? Or <pass **> for example?

@frsyuki
Copy link
Member Author

frsyuki commented Nov 26, 2013

Having "most of" compatibility is still important. It's easy to say "add directive" in a document but difficult to explain if matching rules changed.

@frsyuki
Copy link
Member Author

frsyuki commented Nov 26, 2013

Only reason why current code requires <worker> directive is that it has backward-compatibility mode. If it doesn't have <worker> directive, it uses backward-compatible mode to read the file.
If we have better way to solve it, we don't have to require <worker> directive. Actually I think it's better to NOT require <worker>.

@tagomoris
Copy link
Member

Do you mean <copy *> may be better? Or <pass *> for example?

Alternative directives seems fine, but copy/pass are not so good (probably you think so too)... I cannot find good word for match and pass to next match or to match and not to drain.

@mingqi
Copy link

mingqi commented Nov 26, 2013

I look through 5 ideas and examples. I felt all of them are hard understand
and confused. Could you consider below syntax which implement a new filter
plugin like copy.

For the example A:

type forward

<match **>
type copy

<store>
     type filter
     <store>
         type groupcounter
    </store>
    <store>
        type mongodb
    </store>
</store>

<store>
    type s3
</store>

I don't if this syntax is feasible understand current framework, but I like
this because:

  1. we already have a copy out plugin, we don't need more new copy tag, like
    <copy **>
  2. current fluent design, only two tag(source and match) on the top level
    is very concise. We shouldn't break it. more complex syntax should be
    implement inside plugin.

What's you thoughts?

@frsyuki
Copy link
Member Author

frsyuki commented Nov 26, 2013

Goal of idea 2 and 4 is to make plugin development easy to create a new filter plugin. With idea 2 or 4, Fluentd core can support plugins (type filter in your example) to route messages to inner plugins (groupcounter and mongo).
Idea 1 is more radical because it adds new syntax <filter> for the new filter plugins.

Regarding <copy>, for example, I had this configuration:

<source>
  type forward
</source>

# forward s3.** logs to s3
<match s3.**>
  type s3
</match>

This receives logs and forward "some" logs to S3.
After several months, I wanted to monitor how many records this server received in total. We'll add a copy plugin like this:

<source>
  type forward
</source>

<match **>
  type copy

  # I want to monitor all in-comming logs
  <store>
    type groupcounter
  </store>

  # forward s3.** logs to s3
  <store s3.**>
    type s3
  </store>
</match>

This doesn't work because we can't use match pattern to <store>. If we have <copy> section, it will be simple like this:

<source>
  type forward
</source>

# I want to monitor all in-comming logs
<copy **>
  type groupcounter
</copy>

# forward s3.** logs to s3
<match s3.**>
  type s3
</match>

@frsyuki
Copy link
Member Author

frsyuki commented Nov 26, 2013

@tagomoris do you mean having concept of <copy> is good, but its tag name is not good? How about these names, for example?

  1. <copy **>
  2. <pass **>
  3. <subscribe **>
  4. <sub **>
  5. <monitor **>
  6. <sniff **>
  7. <branch **>
  8. <observe **>

@frsyuki
Copy link
Member Author

frsyuki commented Nov 26, 2013

I removed idea 1 and 3. I agreed idea 1 is semantically broken. Idea 3 won't be better than 2 or 4.

@tagomoris
Copy link
Member

@frsyuki Yes, tag name <copy> seems misunderstanding.
<observe> looks good to mean 'to process events and not to block from matches below'.

@mingqi
Copy link

mingqi commented Nov 27, 2013

@frsyuki . I understand what your meaning. That's my another suggestions: let all matched out plugin can received the message instead of only the first one. Things will become easy if that. For your example in comments, we can:

<source>
  type forward
</source>

# I want to monitor all in-comming logs
<match **>
  type groupcounter
</copy>

# forward s3.** logs to s3
<match s3.**>
  type s3
</match>

@frsyuki
Copy link
Member Author

frsyuki commented Nov 27, 2013

@mingqi Yes, that's a possible idea. But users already have configuration file like this:

# monitor throughout of logs from certain application locally
<match app1.**>
  type groupcounter
  ...
</match>

# forward everything else (logs not from app1) to remote aggregation server
<match **>
  type forward
  <server>
     ...
  </server>
</match>

I think it's confusing for existent users if we change the meaning of <match> tag.
Thus here is an idea to introduce another tag, such as <copy>, <observer>, or <subscribe>.

New users can use the new tag, and existent users can keep using <match>.

@frsyuki
Copy link
Member Author

frsyuki commented Nov 27, 2013

@tagomoris How about <subscribe>? I think "subscribe" has similar meaning.

@frsyuki
Copy link
Member Author

frsyuki commented Nov 27, 2013

@masa please comment here if you have ideas. I think you know a lot of actual users of Fluentd.

@tagomoris
Copy link
Member

@frsyuki <subscribe> seems not to be written with <match>. Target of this tag is events, not event stream abstraction.

<subscribe **>
  type forward
</subscribe>
<match **>
  type file
</match>

Of course, we should hear voices of native English talkers.

@frsyuki
Copy link
Member Author

frsyuki commented Dec 10, 2013

I asked some people around me (physically) about naming. Comments were almost same with yours:

  • subscribe sounds confusing because semantics is different from match
  • observer is ok
  • match with option is clear in terms of meaning of the element, even though it doesn't look short or beautiful

@tagomoris
Copy link
Member

Some more topic about this issue:

  • source/match sections can have nested match?
  • output plugin can emits events?
  • events from output plugin emits are captured with only nested match? same level match? or global match?
  • How about extended syntax for labels instead of nested match?

I'll write some comments for each topics above.

@frsyuki
Copy link
Member Author

frsyuki commented Dec 10, 2013

Here is my idea:

  • source can't have nested match
    • source can emit records to the global (top-level) matches (by default), or a label (by to_label parameter)
  • match can have nested match
  • output plugin can emit records but all of them go to a nested match
    • in other words, match can't emit records to global (top-level) matches. only source can emit to the top-level matches
  • What do you mean extended syntax?

@tagomoris
Copy link
Member

I think that match should be able to emit events to both of nested matches and specified label.
It is needed to aggregate some plugin's output events.

@frsyuki
Copy link
Member Author

frsyuki commented Dec 10, 2013

Sounds good idea.
Example:

# emit to the top-level
<source>
  type forward
</source>

# filter & emit to a nested match
<match ** copy>
  type some_filter
  <match **>
    type file
  </match>
</match>
# pass-through because match has copy option

# filter & emit to a label
<match **>
  type some_filter
  to_label @xyz
</match>

# emit to a label
<source>
  type http
  to_label @xyz
</source>

<label xyz>
  <match **>
    ...
  </match>
</label>

@tagomoris
Copy link
Member

What I mean with extended syntax for labels (instead of nested match is wrong...) is like this:

<match servicename.** copy => @label>
  type foo_filter
</match>

This => @label means to_label @label, and plugin code cannot access this attribute.
By this way, we, plugin developers does not need to consider whether emit() bring events to nested child or specified label.
It's just same as copy specifier about that input events are copied or not.

@tagomoris
Copy link
Member

Syntax should be changed:

<match idea1.** copy => @label >
  type foo_filter
</match>

<match idea2.** copy : @label >
  type foo_filter
</match>

<match idea3.** copy @label >
  type foo_filter
</match>

@frsyuki
Copy link
Member Author

frsyuki commented Dec 10, 2013

By this way, we, plugin developers does not need to consider whether emit() bring events to nested child or specified label.

Base class (Agent class (for now)) will handle to_label parameter. Plugins inherit the base class and call super in configure method. In other words, anyways plugin developers don't have to consider whether events go to a nested match, or a label.

This is good but trivial I think > plugin code cannot access this attribute.

For me, your extra syntax looks confusing. I was confused that the copied events go to the specified label directly.

Syntax should be either of:

<match idea0.** copy>
  type foo_filter
  to_label @label
</match>

or

<match idea4.** copy>
  type foo_filter
  <match>
    type redirect
    to_label @label
  </match>
</match>

or possibly,

<match idea5.** copy>
  type foo_filter
  <match>
    to_label @label  # assuming default type is out_redirect
  </match>
</match>

@YueHonghui
Copy link

How about this example?

<match s3.**>
  type copy
  <store>
    type stdout
  </store>
  <store s3.type1.**>
    type file
    path /path/to/file
  </store>
</match>

@tagomoris
Copy link
Member

v12 is already released!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants