Tags in Object Key #31

sgessa · 2013-07-11T03:04:02Z

is it possible to have tags in the object key?

repeatedly · 2013-07-11T09:45:13Z

Currently no.
Do you want to specify %{tag} in s3_object_key_format?

sgessa · 2013-07-11T09:47:23Z

Yes please! How can I achieve this? I just could implement this locally but I just started playing with fluentd and plugins.

repeatedly · 2013-07-11T10:07:04Z

How to implement?

We can't assume single tag in s3 plugin because event has own tag.
For example:

<match foo.**>
  type s3
  # ...
</match>

In this case, events in s3 plugin may have foo.bar, foo.baz... tags.

sgessa · 2013-07-11T10:11:44Z

Yep I just want to add ${tag} in object key like this:

<match foo.**>
  type s3
  s3_object_key_format %{time_slice}_${tag}_%{index}.%{file_extension}
   ....
</match>

If event has tag foo.bar for example, I'm expecting to find it in the object key.
Also, if I want to take only "bar", I should be able to call remove_tag_prefix foo.

Thanks

repeatedly · 2013-07-11T10:22:04Z

Hm.

In your approach, S3 plugin stores multiple objects into S3 at the same time, right?

sgessa · 2013-07-11T10:24:42Z

Yes. I need the %{tag} because I'm storing access logs grouped by domain and I'm passing the domain name in the tag..

repeatedly · 2013-07-11T10:28:14Z

Okay. Could you send the pull request?
Maybe error handling and breaking idempotent are important factor.

sgessa · 2013-07-11T10:31:06Z

I don't know how to implement this, that's why I asked here :(
I started playing with fluentd yesterday :D

repeatedly · 2013-07-11T10:55:08Z

I see. We need some time if implement.

dave7373 · 2013-08-16T10:17:21Z

The is another plugin that adds this feature to the s3 plugin. Please check it out here:
https://github.com/campanja/fluent-output-router

jsermeno · 2013-09-20T01:49:15Z

We just began using fluentd in production. Right now we're using the plugin that dave7373 mentioned to achieve storing logs for each event in a different folder. It's working, although I would like to explore if there is a more efficient way. The fluent-output-router starts a new fluent-plugin-s3 for every event. This creates a lot of threads if you have a lot of events. Is it due to fluentd having a single buffer queue structure that new outputs must be instantiated if you want to have separate chunks for each event?

In your approach, S3 plugin stores multiple objects into S3 at the same time, right?

In the approach you discussed above, did you mean that in the write method you would split the chunk into separate pieces based on tag and then write each sub-chunk to a different S3 file? The only problem I see here is that you may get very small S3 files if an event only occurs a few times within a chunk. Where as if you had a individual chunk for each event this would occur less often. Maybe that is not a problem, and can be mitigated by making the chunk size larger? Perhaps it is also more efficient than creating a new output for each event. Are there downsides to making the chunk size larger? According to the documentation the default chunk size is 8m.

I should also mention that I would love to work on implementation if we come to some agreement on what the best solution is.

Thanks!

repeatedly · 2013-09-20T09:59:10Z

@jsermeno

We just began using fluentd in production.

Coool 👍

The fluent-output-router starts a new fluent-plugin-s3 for every event

forest and router plugin creates new output when receive the new tag, not every event. So the number of outputs / threads doesn't expload on many cases.

The only problem I see here is that you may get very small S3 files if an event only occurs a few times within a chunk.

Hmm... My concern is error handling.
S3 plugin and forest based tag separation use Fluentd's retry mechanizm when error occurred.

On the other hand, if we supports tag separation in S3 plugin, then we should implement own retry mechanizm which similar to Fluentd.
Because tag separation often executes multiple requests to S3. I already mentioned this point:

"Maybe error handling and breaking idempotent are important factor."

Maintain duplicated retry feature seems high cost and not so many advantages I think.

jsermeno · 2013-09-20T19:09:05Z

forest and router plugin creates new output when receive the new tag, not every event. So the number of outputs / threads doesn't expload on many cases.

Oops sorry, I did mean new tag.

Hmm... My concern is error handling.
S3 plugin and forest based tag separation use Fluentd's retry mechanizm when error occurred.

I see, do you believe that this optimization would be better suited to become part of fluentd itself? Perhaps there would be a configuration option that limits the number of threads somehow. Scribe for example, has a configuration option to prevent creating a new thread for each category / tag.

Maintain duplicated retry feature seems high cost and not so many advantages I think.

The cost does seem to be becoming larger than I initially thought. There are many advantages though. There are a number of use cases that require a high number of tags. Particularly when handling multiple applications. The number of tags in our case could easily exceed 1000 in the near future and could grow larger. We are already at several hundred. The main benefit I see in storing that many tags in separate folders is if you want to perform analytics on a small subset of events you do not have to open every file to search for the events and potentially speed up queries by quite a bit.

ryanc4 · 2013-11-02T05:41:25Z

Can we follow the same approach as in this plugin?

https://github.com/fluent/fluent-plugin-mongo/blob/master/lib/fluent/plugin/out_mongo.rb#L93

repeatedly · 2013-11-05T02:43:20Z

Sorry for the late reply.

@jsermeno

Scribe for example, has a configuration option to prevent creating a new thread for each category / tag.

This is interesting. I will check Scribe source code later.

@ryanc4

Currently no because S3 plugin already use same approach to separate record with event time.
For almost users, forest and S3 plugin is enough.
But above jsermeno case, we need more better performance option.

ryanc4 · 2013-11-06T15:11:46Z

@repeatedly I am not seeing s3 plugin is using emit to split the tag, I think by allowing splitting using the tag it allow us to do log analysis more quickly in S3 (with EMR)

repeatedly · 2013-11-08T20:25:05Z

@ryanc4 S3 plugin itself doesn't extend emit but TimeSlicedOutput, super class of S3 plugin, set time sliced string to key in emit. If supports tag included key in S3 plugin, we should extend TimeSlicedOutputs#emit. Maybe forest plugin is now better unless user has special reason.

repeatedly · 2013-11-08T21:14:29Z

@jsermeno I checked Scribe's newThreadPerCategory and I understood Scribe's buffer and thread management. I will think about implementing same feature on top of fluentd.

dieend · 2014-10-21T08:01:54Z

Is there any update for using tags in object key?

repeatedly · 2014-10-21T08:53:52Z

You can use fluent-plugin-forest to realize this goal: https://github.com/tagomoris/fluent-plugin-forest

prtk-ngm · 2019-08-13T09:46:06Z

please provide example how we can integarte forest plugin with s3 plugin to give dynaminc tag support in path

repeatedly closed this as completed Oct 21, 2014

breath-co2 mentioned this issue Dec 30, 2016

s3_object_key_format support %{tag} #179

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tags in Object Key #31

Tags in Object Key #31

sgessa commented Jul 11, 2013

repeatedly commented Jul 11, 2013

sgessa commented Jul 11, 2013

repeatedly commented Jul 11, 2013

sgessa commented Jul 11, 2013

repeatedly commented Jul 11, 2013

sgessa commented Jul 11, 2013

repeatedly commented Jul 11, 2013

sgessa commented Jul 11, 2013

repeatedly commented Jul 11, 2013

dave7373 commented Aug 16, 2013

jsermeno commented Sep 20, 2013

repeatedly commented Sep 20, 2013

jsermeno commented Sep 20, 2013

ryanc4 commented Nov 2, 2013

repeatedly commented Nov 5, 2013

ryanc4 commented Nov 6, 2013

repeatedly commented Nov 8, 2013

repeatedly commented Nov 8, 2013

dieend commented Oct 21, 2014

repeatedly commented Oct 21, 2014

prtk-ngm commented Aug 13, 2019

Tags in Object Key #31

Tags in Object Key #31

Comments

sgessa commented Jul 11, 2013

repeatedly commented Jul 11, 2013

sgessa commented Jul 11, 2013

repeatedly commented Jul 11, 2013

sgessa commented Jul 11, 2013

repeatedly commented Jul 11, 2013

sgessa commented Jul 11, 2013

repeatedly commented Jul 11, 2013

sgessa commented Jul 11, 2013

repeatedly commented Jul 11, 2013

dave7373 commented Aug 16, 2013

jsermeno commented Sep 20, 2013

repeatedly commented Sep 20, 2013

jsermeno commented Sep 20, 2013

ryanc4 commented Nov 2, 2013

repeatedly commented Nov 5, 2013

ryanc4 commented Nov 6, 2013

repeatedly commented Nov 8, 2013

repeatedly commented Nov 8, 2013

dieend commented Oct 21, 2014

repeatedly commented Oct 21, 2014

prtk-ngm commented Aug 13, 2019