Add a feature to clean up position informations on in_tail #1126

Open
thkoch2001 opened this Issue Jul 27, 2016 · 4 comments

Projects

None yet

4 participants

@thkoch2001
  • fluentd version: 0.12.20 - commit aee8086
  • environment: fluentd pods running on Google Container Engine / kubernetes

Google Container Engine uses fluentd pods to collect container log files via the in_tail plugin and forward the logs to Stackdriver logging.

When a container is deleted, kubernetes also deletes the containers log file and there will never be a log file at this filesystem path again.

However the position file will never clean up the obsolete line in the position file although the position value is ffffffffffffffff.

We see production clusters with position files of over 10000 lines.

Can this cause performance problems with fluentd? Should this be fixed?

The config stanza for the containers log files is:

<source>
  type tail
  format json
  time_key time
  path /var/log/containers/*.log
  pos_file /var/log/gcp-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag reform.*
  read_from_head true
</source>
@repeatedly
Member
repeatedly commented Jul 27, 2016 edited

in_tail removes untracked file positions at start phase.
So if you restart fluentd, pos_file is updated.

Can this cause performance problems with fluentd?

I'm not sure. I didn't receive any report of pos_file releated performance issue.

https://github.com/fluent/fluentd/blob/9d0a8dc963d5e10355e3183dff73d198a711a3e8/lib/fluent/plugin/in_tail.rb#L746

You can see pos_file implementation is here.
I think this IO cost should be ignored on normal file system.

@tagomoris
Member
tagomoris commented Jul 29, 2016 edited

The file which grows infinitely sounds terrible in production environment (even if that growing speed is slow, or if there are no reports about performance regression).
I think we should add any features to clean it up to tail plugin.

But currently, we have a plan to switch to use storage plugin from pos file for that purpose.
I think we can add this cleanup feature at the same time with switching storage plugins.

@tagomoris tagomoris changed the title from in_tail plugin does not clean deleted log files from position file to Add a feature to clean up position informations on in_tail Jul 29, 2016
@repeatedly
Member

I added note for this issue on http://docs.fluentd.org/articles/in_tail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment