Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple log rotations detected in logs for only 1 log rotation #3425

Closed
agniveshadhikari opened this issue Jun 19, 2021 · 14 comments
Closed
Assignees
Labels
stale waiting-for-user Similar to "moreinfo", but especially need feedback from user

Comments

@agniveshadhikari
Copy link

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug

We are trying to investigate why we are missing logs, While at it, we noticed that for each actual log rotation that happens, fluentd is logging multiple "detected log rotation lines".

This following three lines were printed when one log rotation happened on this container.

2021-06-19 15:51:47 +0000 [info]: #0 detected rotation of /var/log/containers/data-plane--dataplane-service-798ff9cdff-gtqzh_default_main-bcc4aa2379c08a281bc008248b839d4c845fe95d2f1a55dd4fb676b2d8c5df20.log; waiting 5 seconds
2021-06-19 15:51:47 +0000 [info]: #0 detected rotation of /var/log/containers/data-plane--dataplane-service-798ff9cdff-gtqzh_default_main-bcc4aa2379c08a281bc008248b839d4c845fe95d2f1a55dd4fb676b2d8c5df20.log; waiting 5 seconds
2021-06-19 15:51:47 +0000 [info]: #0 following tail of /var/log/containers/data-plane--dataplane-service-798ff9cdff-gtqzh_default_main-bcc4aa2379c08a281bc008248b839d4c845fe95d2f1a55dd4fb676b2d8c5df20.log

To Reproduce

Expected behavior

Only one log line for [info]: #0 detected rotation is printed.

Your Environment

We are using fluentd on AKS v1.19

  • Fluentd or td-agent version: fluentd --version or td-agent --version
    fluentd 1.12.3
  • Operating system: cat /etc/os-release
    NAME="Ubuntu"
    VERSION="18.04.5 LTS (Bionic Beaver)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 18.04.5 LTS"
    VERSION_ID="18.04"
    HOME_URL="https://www.ubuntu.com/"
    SUPPORT_URL="https://help.ubuntu.com/"
    BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
    PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
    VERSION_CODENAME=bionic
    UBUNTU_CODENAME=bionic
    
  • Kernel version: uname -r
    5.4.0-1047-azure

If you hit the problem with older fluentd version, try latest version first.

Your Configuration
Only pasting the kubernetes.conf file here

<label @FLUENT_LOG>
  <match fluent.**>
    @type null
  </match>
</label>

<source>
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/fluentd-containers.log.pos
  tag kubernetes.*
  read_from_head true
  <parse>
    @type multi_format
    # Read logs in JSON format for Kubernetes v1.18-
    <pattern>
      format json
      time_format "%Y-%m-%dT%H:%M:%S.%NZ"
      keep_time_key true
    </pattern>
    # Reads logs in CRI format for Kubernetes v1.19+
    # The CRI format is documented here: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/kubelet-cri-logging.md
    <pattern>
      format regexp
      expression /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/
      time_format '%Y-%m-%dT%H:%M:%S.%N%:z'
      keep_time_key true
    </pattern>
  </parse>
</source>

# Used for collecting fluentd metrics
<source>
  @type prometheus
  bind 0.0.0.0
  port 24231
  metrics_path /metrics
</source>

<source>
  @type prometheus_output_monitor
  interval 10
  <labels>
    hostname ${hostname}
  </labels>
</source>

<filter kubernetes.var.log.containers.**.log>
  @type kubernetes_metadata
</filter>

# Exclude events from Geneva containers since they just seem to echo events from other containers
<filter kubernetes.var.log.containers.geneva**.log>
  @type grep
  <exclude>
    key log
    pattern .*
  </exclude>
</filter>

<filter kubernetes.var.log.containers.**.log>
  @type prometheus
  <metric>
    name fluentd_input_status_num_records_total
    type counter
    desc The total number of incoming records
    <labels>
      tag ${tag}
      hostname ${hostname}
    </labels>
  </metric>
</filter>

# Make the IfxAuditLogs JSON parsable
<filter kubernetes.var.log.containers.**.log>
  @type record_modifier
  <record>
    _temp_ ${ if record.has_key?("log"); record["log"] = record["log"].to_s.gsub(/ifxaudit(appl|mgmt|fail):*/,''); end; nil }
  </record>
  remove_keys _temp_
</filter>

# Flatten fields nested within the 'log' field if it is JSON
<filter kubernetes.var.log.containers.**.log>
  @type parser
  key_name log
  <parse>
    @type json
    json_parser json
  </parse>
  reserve_data true # this preserves fields from the original record
  remove_key_name_field true # this removes the log field if successfully parsed as JSON
  reserve_time # the time was already parsed in the source, we don't want to overwrite it with current time.
  emit_invalid_record_to_error false # In case of unparsable log lines or CRI logs. Keep fluentd's error log clean
</filter>

# Flatten fields nested within the 'kubernetes' field and remove unnecessary fields
<filter kubernetes.var.log.containers.**.log>
  @type record_transformer
  enable_ruby
  <record>
    ContainerName ${record["kubernetes"]["container_name"]}
    NamespaceName ${record["kubernetes"]["namespace_name"]}
    PodName ${record["kubernetes"]["pod_name"]}
    Node ${record["kubernetes"]["host"]}
  </record>
  # https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/kubelet-cri-logging.md
  remove_keys docker,kubernetes,stream,logtag
</filter>

Your Error Log
See descirption

<!-- Write your **ALL** error log here -->

Additional context

@ashie
Copy link
Member

ashie commented Jun 19, 2021

Could you try v1.12.4 or later?
We fixed tail watcher leaks at: #3365 #3391

@shenmuxiaosen
Copy link

@ashie Looks like the latest td-agent 4.1.1 is still referring the fluentd v.1.12.3. Any idea when td-agent can have a new release for v.1.12.4 or v.1.13.0?

@ashie
Copy link
Member

ashie commented Jun 23, 2021

We'll prepare td-agent 4.2.0 which will bundle fluentd v1.13.x.
Until releasing it, please use td-agent-gem to update it.

$ sudo td-agent-gem install fluentd --version=x.y.z

@justinko
Copy link

@ashie I am experiencing this on 1.13.1. I think there may still be watcher leaks.

2021-06-28 19:20:48 +0000 [info]: #0 [containers.log] detected rotation of /var/log/containers/core-mpe-web-deployment-576fb65b75-m8pkh_core_core-mpe-web-8b7d873d3ef935863469847e07b2957acab0d65f1704a76c50ff830b9342d570.log; waiting 5 seconds
2021-06-28 19:20:48 +0000 [info]: #0 [containers.log] detected rotation of /var/log/containers/core-mpe-web-deployment-576fb65b75-m8pkh_core_core-mpe-web-8b7d873d3ef935863469847e07b2957acab0d65f1704a76c50ff830b9342d570.log; waiting 5 seconds

@cosmo0920
Copy link
Contributor

cosmo0920 commented Jul 1, 2021

Hmm..., the following line

rotated_tw = @tails[rotated_target_info]

should use Hash#delete like this?

rotated_tw = @tails.delete(rotated_target_info)

Hash#delete will remove the corresponding element by key immediately.
Current implementation depends on implicit GC mechanism.

@justinko
Copy link

justinko commented Jul 1, 2021

@cosmo0920 will that fix just remove the duplicate log line or will it potentially fix my log loss?

@cosmo0920
Copy link
Contributor

I think I might do. This issue might be caused by non-GCed elements on @tails.

@ashie
Copy link
Member

ashie commented Jul 1, 2021

should use Hash#delete like this?

rotated_tw = @tails.delete(rotated_target_info)

I'm not sure but it doesn't seem the cause of this issue, I think it's not needed.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Jul 1, 2021

Hmm.., this issue should be occurred in the different place where we already know....

@ashie ashie self-assigned this Jul 12, 2021
@ashie
Copy link
Member

ashie commented Jul 24, 2021

should use Hash#delete like this?

rotated_tw = @tails.delete(rotated_target_info)

I'm not sure but it doesn't seem the cause of this issue, I think it's not needed.

* `rotated_tw` will be unwatched by `detach_watcher_after_rotate_wait`

* Since `rotated_target_info` and `new_target_info` will provide same hash key by [in_tail: Fluentd v1.12.3 fails to handle file rotations on Windows #3365](https://github.com/fluent/fluentd/issues/3365), the value will be replaced by trailing `@tails[new_target_info] = `.

While I was investigating #3464, I confirmed that @tails.delete(rotated_target_info) is needed (although it's not cause of this issue #3425).
Otherwise keys in @tails won't be updated even if they have different inodes for same paths.
When follow_inode true, it will cause detecting multiple rotation (#3464).
On the other hand, when follow_inode is false, multiple rotation won't be occurred because it detected by path and inode won't be used.
So I think it's not cause of this issue #3425 because it doesn't use follow_inode.

@ashie ashie mentioned this issue Jul 27, 2021
@ashie
Copy link
Member

ashie commented Jul 30, 2021

As in #3464, it was reproducible on follow_inode true (already addressed).
But I can't yet reproduce it on follow_inode false case even when I change various conditions...

@ashie
Copy link
Member

ashie commented Sep 6, 2021

@agniveshadhikari It might be already fixed in the latest version (1.13.3 or 1.14.0).
Please let me know if your issue still reproduced on the latest versions.

@ashie ashie added the waiting-for-user Similar to "moreinfo", but especially need feedback from user label Sep 6, 2021
@github-actions
Copy link

github-actions bot commented Dec 5, 2021

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

@github-actions github-actions bot added the stale label Dec 5, 2021
@github-actions
Copy link

github-actions bot commented Jan 4, 2022

This issue was automatically closed because of stale in 30 days

@github-actions github-actions bot closed this as completed Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale waiting-for-user Similar to "moreinfo", but especially need feedback from user
Projects
None yet
Development

No branches or pull requests

5 participants