New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Memory leak associated with failure to setup harvesters #6797

Closed
jeremydonahue opened this Issue Apr 6, 2018 · 3 comments

Comments

Projects
None yet
6 participants
@jeremydonahue
Copy link

jeremydonahue commented Apr 6, 2018

Filebeat has a memory leak which is exposed by repeated failure to setup harvesters (eg. permission denied). This ticket is a result of the original discussion on the community support site.

From what I can tell, this comes down to channel.SubOutlet being created for the new harvester, but not cleaned up properly when h.Setup() throws an exception. I think this is because something that happens in harvester.Run() is what actually cleans up the outlet. Another possibility is that the new harvester is getting added to a state or registry structure somewhere and forgotten, so garbage collection isn't reaping it.

Here's some output from pprof. I've also attached the pprof file here.

33159.58kB of 33159.58kB total (  100%)
Dropped 308 nodes (cum <= 165.80kB)
Showing top 10 nodes out of 51 (cum >= 512.04kB)
      flat  flat%   sum%        cum   cum%
19975.92kB 60.24% 60.24% 19975.92kB 60.24%  runtime.malg
 7680.66kB 23.16% 83.40%  7680.66kB 23.16%  github.com/elastic/beats/filebeat/channel.SubOutlet
 2048.19kB  6.18% 89.58%  2048.19kB  6.18%  github.com/elastic/beats/filebeat/prospector/log.NewHarvester
 1357.91kB  4.10% 93.68%  1357.91kB  4.10%  runtime.allgadd
 1024.08kB  3.09% 96.76%  1024.08kB  3.09%  runtime.acquireSudog
  544.67kB  1.64% 98.41%   544.67kB  1.64%  github.com/elastic/beats/libbeat/publisher/queue/memqueue.NewBroker
  528.17kB  1.59%   100%   528.17kB  1.59%  regexp.(*bitState).reset
         0     0%   100%   528.17kB  1.59%  github.com/elastic/beats/filebeat/beater.(*Filebeat).Run
         0     0%   100%   512.04kB  1.54%  github.com/elastic/beats/filebeat/channel.CloseOnSignal.func1
         0     0%   100%   512.04kB  1.54%  github.com/elastic/beats/filebeat/channel.SubOutlet.func1
(pprof) list SubOutlet
Total: 32.38MB
ROUTINE ======================== github.com/elastic/beats/filebeat/channel.SubOutlet in /home/jeremy/src/go/src/github.com/elastic/beats/filebeat/channel/util.go
    7.50MB     7.50MB (flat, cum) 23.16% of Total
         .          .     15:// SubOutlet create a sub-outlet, which can be closed individually, without closing the
         .          .     16:// underlying outlet.
         .          .     17:func SubOutlet(out Outleter) Outleter {
         .          .     18:	s := &subOutlet{
         .          .     19:		isOpen: atomic.MakeBool(true),
       1MB        1MB     20:		done:   make(chan struct{}),
       2MB        2MB     21:		ch:     make(chan *util.Data),
    4.50MB     4.50MB     22:		res:    make(chan bool, 1),
         .          .     23:	}
         .          .     24:
         .          .     25:	go func() {
         .          .     26:		for event := range s.ch {
         .          .     27:			s.res <- out.OnEvent(event) 

profile001

Config (abridged):

filebeat.modules:
  - module: system

# yes, we know fqdn is in the `beats` field
fields_under_root: true
fields:
  source_host: {{ fqdn }}

filebeat.shutdown_timeout: 10s
filebeat.registry_flush: 30s

output.kafka:
  hosts: ["kafka-01..."]

  topic: {{ topic }}

  required_acks: 1
  compression: snappy
  client_id: '{{ client_id }}'
  keep_alive: 10m

  partition.round_robin:
    reachable_only: true

logging.level: info

Note, in the above scenario, /var/log/auth.log is the file which Filebeat can't access. We can, of course, fix the permissions to alleviate this problem, but that doesn't really fix anything.

  • Version: Filebeat 6.2.2
  • Operating System: Ubuntu 14.04. Kernel: 4.4.0-111-generic
  • Steps to Reproduce: Run Filebeat with the above config. Make sure at least 1 file found by the prospector generates a permission denied error when opening the file.
  • Notes: This is a slow leak, which makes testing it harder. I think lowering the default value of scan_frequency (eg. 1s) will make it leak memory faster.

Please let me know if there are more details I can provide, and I'm happy to test any potential solutions when we narrow down the problem.

Thanks,
Jeremy

@adriansr adriansr assigned adriansr and unassigned adriansr Apr 9, 2018

adriansr added a commit to adriansr/beats that referenced this issue Apr 11, 2018

Filebeat: Fix leak in log harvester (elastic#6797)
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

Fixes elastic#6797

adriansr added a commit to adriansr/beats that referenced this issue Apr 11, 2018

Filebeat: Fix leak in log harvester (elastic#6797)
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

Fixes elastic#6797

adriansr added a commit to adriansr/beats that referenced this issue Apr 11, 2018

Filebeat: Fix leak in log harvester (elastic#6797)
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

This prevents a memory leak caused by some internal goroutines not
stopping if the harvester Setup() fails, for example when files cannot
be read.

Fixes elastic#6797

adriansr added a commit to adriansr/beats that referenced this issue Apr 13, 2018

Filebeat: Fix leak in log harvester (elastic#6797)
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

This prevents a memory leak caused by some internal goroutines not
stopping if the harvester Setup() fails, for example when files cannot
be read.

Fixes elastic#6797

@urso urso closed this in #6829 Apr 13, 2018

urso added a commit that referenced this issue Apr 13, 2018

Filebeat: Fix leak in log harvester (#6797) (#6829)
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

This prevents a memory leak caused by some internal goroutines not
stopping if the harvester Setup() fails, for example when files cannot
be read.

Fixes #6797
@andrewkroh

This comment has been minimized.

Copy link
Member

andrewkroh commented Apr 18, 2018

Fixed in #6829 (for master).

adriansr added a commit to adriansr/beats that referenced this issue May 18, 2018

Filebeat: Fix leak in log harvester (elastic#6797)
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

This prevents a memory leak caused by some internal goroutines not
stopping if the harvester Setup() fails, for example when files cannot
be read.

Fixes elastic#6797

adriansr added a commit to adriansr/beats that referenced this issue May 18, 2018

Filebeat: Fix leak in log harvester (elastic#6797)
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

This prevents a memory leak caused by some internal goroutines not
stopping if the harvester Setup() fails, for example when files cannot
be read.

Fixes elastic#6797

ph added a commit that referenced this issue May 24, 2018

Filebeat: Fix leak in log harvester (#6797)
This patch reorganizes a little bit how the log harvester works, so that
suboutlets are only created when the harvester is ready to use them
(inside Run()), instead of being passed during constructor.

This prevents a memory leak caused by some internal goroutines not
stopping if the harvester Setup() fails, for example when files cannot
be read.

Fixes #6797
@jo3rg

This comment has been minimized.

Copy link

jo3rg commented Jun 13, 2018

@andrewkroh is this already fixed in any stable release?

@ph

This comment has been minimized.

Copy link
Member

ph commented Jun 13, 2018

This will be fixed in the 6.3.0 release and next 6.2.5, which should happen any day.

mpfz0r added a commit to Graylog2/collector-sidecar that referenced this issue Sep 21, 2018

Update filebeat and winlogbeat to 6.4.1
Our bundled filebeat had a memory leak (elastic/beats#6797).

Fixes #283

While here:
 The "-configtest" option is deprecated since Beats 6.0.
 Add a version switch to avoid warning.

mariussturm added a commit to Graylog2/collector-sidecar that referenced this issue Oct 2, 2018

Update filebeat and winlogbeat to 6.4.1 (#296)
Our bundled filebeat had a memory leak (elastic/beats#6797).

Fixes #283

While here:
 The "-configtest" option is deprecated since Beats 6.0.
 Add a version switch to avoid warning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment