Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat stops harvesting logs #13607

Open
cdalexndr opened this issue Sep 11, 2019 · 29 comments
Open

Filebeat stops harvesting logs #13607

cdalexndr opened this issue Sep 11, 2019 · 29 comments
Labels
bug Stalled Team:Integrations Label for the Integrations team Team:Platforms Label for the Integrations - Platforms team

Comments

@cdalexndr
Copy link

cdalexndr commented Sep 11, 2019

  1. Filebeat harvested logs correctly
  2. Restarted logstash
  3. New logs appear in Kibana
  4. After some time, logs are not being harvested from a docker container, although new log entries are written to file

Note that logstash correctly receives logs from other containers harvested by filebeat.
Filebeat harvester didn't recover even after a few hours.

Filebeat configuration:

output.logstash:
  hosts: ["logstash:5044"]
setup.kibana.host: http://kibana:5601
filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true

Docker compose service:

web:
    image: app:latest
    expose:
      - 80
    labels:
      co.elastic.logs/multiline.pattern: '^[[:space:]]+(at|\\.{3})\\b|^Caused by:'
      co.elastic.logs/multiline.negate: false
      co.elastic.logs/multiline.match: after
      app: web
    environment:
      ELASTIC_APM_SERVER_URLS: http://apm:8200

My docker container that is not being harvested has id: 3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c

This container was also restarted at 2019-09-11 15:19:03,983.
Last log entry correctly processed has timestamp 2019-09-11 15:19:37,235.
Next log entry with timestamp 2019-09-11 15:20:15,276 and next ones are missing.

Filebeat relevant log:

2019-09-11T15:19:57.988Z        INFO    [monitoring]    log/log.go:145  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":28150,"time":{"ms":120}},"total":{"ticks":54510,"time":{"ms":150},"value":54510},"user":{"ticks":26360,"time":{"ms":30}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":13},"info":{"ephemeral_id":"17c964bc-009e-4840-89b6-4b9c511cdf14","uptime":{"ms":15900511}},"memstats":{"gc_next":8939216,"memory_alloc":4663776,"memory_total":6966547888},"runtime":{"goroutines":93}},"filebeat":{"events":{"active":-2,"added":114,"done":116},"harvester":{"open_files":5,"running":5}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":116,"batches":13,"total":116},"read":{"bytes":78},"write":{"bytes":31559}},"pipeline":{"clients":8,"events":{"active":0,"published":114,"total":114},"queue":{"acked":116}}},"registrar":{"states":{"current":10,"update":116},"writes":{"success":13,"total":13}},"system":{"load":{"1":1.19,"15":0.58,"5":0.86,"norm":{"1":0.2975,"15":0.145,"5":0.215}}}}}}
2019-09-11T15:20:04.554Z        INFO    input/input.go:149      input ticker stopped
2019-09-11T15:20:04.554Z        INFO    input/input.go:167      Stopping Input: 9894227653855818847
2019-09-11T15:20:04.554Z        INFO    log/harvester.go:274    Reader was closed: /var/lib/docker/containers/3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c/3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c-json.log. Closing.
2019-09-11T15:20:27.995Z        INFO    [monitoring]    log/log.go:145  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":28160,"time":{"ms":10}},"total":{"ticks":54610,"time":{"ms":100},"value":54610},"user":{"ticks":26450,"time":{"ms":90}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":12},"info":{"ephemeral_id":"17c964bc-009e-4840-89b6-4b9c511cdf14","uptime":{"ms":15930517}},"memstats":{"gc_next":5878912,"memory_alloc":7557664,"memory_total":6986384152,"rss":-2998272},"runtime":{"goroutines":82}},"filebeat":{"events":{"added":31,"done":31},"harvester":{"closed":1,"open_files":4,"running":4}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":30,"batches":12,"total":30},"read":{"bytes":72},"write":{"bytes":17806}},"pipeline":{"clients":7,"events":{"active":0,"filtered":1,"published":30,"total":31},"queue":{"acked":30}}},"registrar":{"states":{"current":10,"update":31},"writes":{"success":13,"total":13}},"system":{"load":{"1":1.24,"15":0.6,"5":0.89,"norm":{"1":0.31,"15":0.15,"5":0.2225}}}}}}

Logstash restarted timeline log:

[2019-09-11T15:16:36,298][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.3.0"}
..................
[2019-09-11T15:16:45,397][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

Filebeat repository entry:

  {
    "source": "/var/lib/docker/containers/3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c/3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c-json.log",
    "offset": 794342,
    "timestamp": "2019-09-11T15:20:04.5542604Z",
    "ttl": -1,
    "type": "container",
    "meta": null,
    "FileStateOS": {
      "inode": 802588,
      "device": 2049
    }
  }

Log file stats:

sh-4.2# stat 3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c-json.log
  File: '3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c-json.log'
  Size: 911932          Blocks: 1784       IO Block: 4096   regular file
Device: 801h/2049d      Inode: 802588      Links: 1
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-09-11 17:39:48.711393000 +0000
Modify: 2019-09-11 17:19:37.112086700 +0000
Change: 2019-09-11 17:19:37.112086700 +0000
 Birth: -

Hope I provided enough details to investigate this problem.


Elastic stack 7.3.0

@cdalexndr
Copy link
Author

cdalexndr commented Sep 11, 2019

After restarting filebeat docker service, logs are populated correctly.
Filebeat should have also harvested logs without needing a restart.

@r2r200
Copy link

r2r200 commented May 15, 2020

I have the same issue on versions from 7.2 till 7.6.2. I have 3 similar containers with filebeat in nomad. 1 of them has high load (~5000 messages per minute). And it stucks once a day. Reboot of filebeat-container helps.

@henriqueyun
Copy link

I am also facing this problem with filebeat 7.0.1, it runs and harvest for a while and eventually stops with those messages.

@Zstro
Copy link

Zstro commented Jul 22, 2020

And I am alse facing the same problem with filebeat 7.7.1,it‘’s harvested logs after restart,and only once

@nerddelphi
Copy link

I'm facing the same issue since 7.6.1, 7.8.1, 7.8.2-SNAPSHOT and now with 7.9.0, even after a lot of PRs trying to fix that.

My issue is related to k8s autodiscover.

@pauvos
Copy link

pauvos commented Sep 21, 2020

Happens to me as well (7.9.0, docker autodiscover). Filebeat closes and reopens the missing container log file serveral times due to inactivity and then simply stops watching the file after a while.

Just curios, do you guys have logrotation enabled for docker (I do...)?

$ cat /etc/docker/daemon.json 
{
"log-driver": "json-file",
    "log-opts": {
        "max-size": "100m", 
        "max-file": "3"
    }
}

@jsoriano jsoriano added bug Team:Platforms Label for the Integrations - Platforms team labels Nov 4, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@jsoriano
Copy link
Member

Original issue reported in 7.3.0, of inputs incorrectly stopped on container restarts, was fixed by #13127 in 7.4.0.

@jsoriano
Copy link
Member

@pauvos could you share the autodiscover configuration you are using?

@cdalexndr
Copy link
Author

cdalexndr commented Jul 10, 2021

Just happened again on 7.10.1

I'm no longer using logstash, and the problem was with the nginx (swag) container.
Logs are stored in files in container, and I attach a volume to filebeat so that they can be accessed by filebeat.
Paths are correct, and files exist, just they aren't harvested.
After restart, missing logs are populated.

My config:

output.elasticsearch:
  hosts: ["http://elasticsearch:9200"]
  index: "log-%{[docker][container][labels][app]}"
  pipeline: pipeline_logs
  ilm.enabled: false

filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true
      templates:
        - condition:
            contains:
              docker.container.image: swag
          config:
            - module: nginx
              access:
                enabled: true
                var.paths: [ "/config/log/nginx/access.log*" ]
              error:
                enabled: true
                var.paths: [ "/config/log/nginx/error.log*" ]

@jsoriano jsoriano added the Team:Integrations Label for the Integrations team label Jul 12, 2021
@jsoriano jsoriano removed their assignment Jul 12, 2021
@Majroch
Copy link

Majroch commented Jan 13, 2022

I have the same problem, but with Elastic Agent and Filebeat managed by Fleet server. After some time (randomly, from time to time), filebeat just stops sending logs to my ElasticSearch cluster.

I'm using:

  • ElasticSearch version: 7.16.2
  • Elastic Agent version: 7.16.2
  • Filebeat version: 7.16.2

All configuration to every Elastic Agent on all servers i'm performing through Fleet Management tab inside Kibana. In log files from filebeat, under /opt/Elastic/Agent/data/elastic-agent-XXXXXX/logs/default/filebeat-json.log* directory, i have logs from harvester a bunch of time:

{"log.level":"info","@timestamp":"2022-01-13T06:00:24.459+0100","log.logger":"input.harvester","log.origin":{"file.name":"log/harvester.go","file.line":336},"message":"Reader was closed. Closing.","service.name":"filebeat","input_id":"6f476387-59f8-4ccc-8baa-38f0c646fd3d","source":"/var/log/apt/history.log.1.gz","state_id":"native::142434-64768","finished":false,"os_id":"142434-64768","old_source":"/var/log/apt/history.log.1.gz","old_finished":true,"old_os_id":"142434-64768","harvester_id":"4a471e46-b22a-42af-8d83-df53309cc81f","ecs.version":"1.6.0"}

Until today i've just reinstalled the Elastic Agents with ansible playbook when that happened.

@Avers90
Copy link

Avers90 commented Mar 30, 2022

I had a similar issue. It helped me to reduce the logging level to warning:

    logging:
      level: warning
      metrics.enabled: false

@MonicaMagoniCom
Copy link

I'm experiencing the same issue with filebeat 8.4.3 with Kubernetes as autodiscover provider, are there any updates?

@euskadi31
Copy link

euskadi31 commented Mar 27, 2023

We have the same problem with autodiscover, filebeat is installed on the host machine and searches in the container logs

filebeat version 7.16.2 (amd64), libbeat 7.16.2 [3c518f4 built 2021-12-18 21:04:19 +0000 UTC]

@MonicaMagoniCom
Copy link

I had a similar issue. It helped me to reduce the logging level to warning:

    logging:
      level: warning
      metrics.enabled: false

This does not solve the issue

@belimawr
Copy link
Contributor

belimawr commented May 3, 2023

This seems to be related to #34388, probably the same issue caused by autodiscover with Kubernetes provider.

@MonicaMagoniCom Could you provide your configuration and some debug logs? A logging configuration like this should give enough information (I'm making some assumptions about your configuration here) and will not log any event ingested by Filebeat.

logging:
  level: debug
  selectors:
    - autodiscover
    - autodiscover.bus-filebeat
    - autodiscover.pod
    - beat
    - cfgwarn
    - crawler
    - hints.builder
    - input
    - input.filestream
    - input.harvester
    - kubernetes
    - modules
    - seccomp
    - service

@belimawr
Copy link
Contributor

belimawr commented May 3, 2023

Regarding update on this issue, we are aware of issues with autodiscover like the one I linked above and it is on our backlog.

@MonicaMagoniCom
Copy link

We removed the use of autodiscover Kubernetes in our filebeat configuration, since we were experiencing the issue. We replaced it with filebeat inputs and it is working correctly. So yes, the issue seems to be releated to autodiscover Kubernetes provider.

@euskadi31
Copy link

We have the same issue with docker autodiscover

@belimawr
Copy link
Contributor

belimawr commented May 4, 2023

@euskadi31

We have the same issue with docker autodiscover

Could you provide some debug logs following this configuration:

logging:
  level: debug
  selectors:
    - autodiscover
    - autodiscover.bus-filebeat
    - autodiscover.pod
    - beat
    - cfgwarn
    - crawler
    - hints.builder
    - input
    - input.filestream
    - input.harvester
    - kubernetes
    - docker
    - modules
    - seccomp
    - service

Or at least look your debug logs and see if you find a message like this:

Error creating runner from config: failed to create input: Can only start an input when all related states are finished: {Id: ea745ab688be85a9-native::3884203-2049, Finished: false, Fileinfo: &{frontend-86c8579b5b-mhnpg_helpdesk-frontend_frontend-mgmt-1cc73434a92abe9b93d9a3d971cfc4182e8ce64ac0e03f0c6e395875236fd514.log 374 416 {204820038 63804978609 0x56347552d700} {2049 3884203 1 33184 0 0 0 0 374 4096 8 {1669381808 728813408} {1669381809 204820038} {1669381809 204820038} [0 0 0]}}, Source: /var/log/containers/frontend-86c8579b5b-mhnpg_helpdesk-frontend_frontend-mgmt-1cc73434a92abe9b93d9a3d971cfc4182e8ce64ac0e03f0c6e395875236fd514.log, Offset: 0, Timestamp: 2023-01-19 13:38:27.166489276 +0000 UTC m=+58865.698641043, TTL: -1ns, Type: container, Meta: map[stream:stdout], FileStateOS: 3884203-2049}

@MonicaMagoniCom
Copy link

Yes, when we had the problem we were seeing this error with debug logs.

@euskadi31
Copy link

@belimawr yes i will watch it next week and i will tell you

@toms-place
Copy link

Hi is there any progress?

We are using beats version 8.4.3 and experiencing the same problem.
We have a lot of messages regarding close_eof: End of file reached. Closing because close_eof is enabled.
Although we already manually set it to false..

@belimawr
Copy link
Contributor

belimawr commented Oct 6, 2023

@toms-place the issue I mentioned (#34388) has been fixed in v8.8.2. Could you update to the latest version and verify if the issue still persists?

Regarding the close_eof, are you using log input, container input or filestream input? The log and container input use the close_eof configuration key (docs), however the filestream input uses close.reader.on_eof (docs).

@gaima8
Copy link

gaima8 commented Oct 6, 2023

We're a paying customer and have a support ticket open with elastic. We've been running 8.8.2 for a while now, sent debug logs from 8.8.2 about a month ago, and still have the problem.
Today we upgraded to 8.10.2.

@toms-place
Copy link

@belimawr -> I will report back, when we updated our prod systems.

regarding close_eof -> we are currently using container input.

@gaima8
Copy link

gaima8 commented Nov 7, 2023

FWIW we've been on filebeat 8.10.2 for a month now. Haven't seen any instances of harvesting stopping.
It hasn't been an actual solid month of pods running without the harvesting of their logs stopping, as the pods most obviously affected get updated from time to time, but, in the last week for example, nothing lost.

@belimawr
Copy link
Contributor

Only with debug logs I'll be able do dig into it. A few things that can cause Filebeat to miss some logs:

  1. Log rotation happens too fast and close.on_state_change.removed (for filestream or the equivalent for the log/container input) is enabled.
  2. inode reuse. If log rotation occurs in a way the inode is re-used Filebeat can start reading the new file from the wrong offset. The fingerprint file identity can help with that.

@botelastic
Copy link

botelastic bot commented Nov 9, 2024

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Stalled Team:Integrations Label for the Integrations team Team:Platforms Label for the Integrations - Platforms team
Projects
None yet
Development

No branches or pull requests