New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory issues with logstash 1.5.1 and higher (1.5.0 is OK) #3754

Closed
mzubal opened this Issue Aug 19, 2015 · 7 comments

Comments

Projects
None yet
5 participants
@mzubal

mzubal commented Aug 19, 2015

Hi,
there have been several other bugs reports related to logstash taking too much memory. I encountered similar issues as well.
Environment:

  • windows server 2012
  • JRE 1.8.0_60
  • logstash configuration is pasted below
  • installed logstash-filter-aggregate
  • ES_MAX_MEM = 384m

Running 1.5.1 and higher (I have tried all of them) causes the java process slowly taking more memory (as can be seen on screenshot below), ending up with GBs of memory consumed. The Java Heap is fine as can be seen from the screenshots - I suspect the native memory usage to be the problem here. I have tried to remove all filters and elasticsearch output (just keeping the inputs and stdout output) with the same result. On 1.5.0 this doesn't happen (as can be seen on screenshot below).
So this leads me to conclusion, that something bad happened in 1.5.1 causing this behavior.
I would be very glad if you checked that. Thanks!

logstash-1.5.0:
1 5 0
logstash-1.5.3:
1 5 3

logstash config:

input {
  file {
    path => [
      "C:/inetpub/wwwroot/**/logs/**/ALL.log*",
      "C:/inetpub/wwwroot/**/logs/**/FATAL.log*",
      "C:/DAE_JMS/**/logs/**/ALL.log*",
      "C:/DAE_JMS/**/logs/**/FATAL.log*"          
    ]
    type => damas_log
  }
  file {
    path => [

      "C:/inetpub/wwwroot/**/logs/**/RequestPerformance.csv*"

    ]
    type => damas_perf_log
  }
}

filter {
    mutate {
      add_field => [ "environment", "NBS10" ]
      add_field => [ "application", "DAMAS" ]
    }

    if [type] == "damas_log" {

        if ([message] =="[Header]") {
            drop{}
        }
        if ([message] =="[Footer]") {
            drop{}
        }   

        grok {
            match => { "message" => "%{DATA}OID:\s*%{NUMBER:oid} %{DATA}" }     
            tag_on_failure => []
        }

        grok {
            match => { "message" => "%{DATA}Message ID:\s*%{WORD:messageId} %{DATA}" }      
            tag_on_failure => []
        }

        csv{
            columns => ["dateTime", "threadOrJob", "requestId", "logLevel", "messageType", "message", "UC", "RequestId", "DurationMs", "oid"]
            separator => "; "
        }   

        if ([message] == "BSS_INTERFACE: Perf") {
            mutate {
                add_field => [ "meter", "METER_DF_BO" ]
            }
        }

        multiline {
            pattern => "(^.+Exception: .+)|(^...at.+)|(^...---.+---)"
            what => "previous"
            }

        date {
          match => [ "dateTime" , "yyyy-MM-dd HH:mm:ss,SSS"]      
        }
    } else {

        mutate {
            add_field => [ "meter", "METER_GUI_BO" ]
        }
        csv {
            columns => ["userLogin", "userName", "requestDateBegin", "requestTimeBegin", "serverName", "requestID", "ucCode", "pageUrl", "pageUrlQuery", "dbDuration", "serverDuration", "clientDuration", "clientDuration1", "clientDuration2", "clientDuration3", "clientDuration4", "column17", "column18", "column19", "column20", "userCode", "subjectCode", "objectOID", "objectFullCode", "isAsync", "isError", "viewAction", "column28", "column29", "column30", "column31", "column32", "column33", "column34"]
            separator => ";"
        }
        mutate {
            add_field => [ "dateTime", "%{requestDateBegin} %{requestTimeBegin}"]
        }

        ruby { code => "
            event['dbDuration'] = event['dbDuration'].to_s.gsub(',','.').to_f * 1000;
            event['serverDuration'] = event['serverDuration'].to_s.gsub(',','.').to_f * 1000;
            event['clientDuration1'] = event['clientDuration1'].to_s.gsub(',','.').to_f * 1000;
            event['clientDuration2'] = event['clientDuration2'].to_s.gsub(',','.').to_f * 1000;
            event['clientDuration3'] = event['clientDuration3'].to_s.gsub(',','.').to_f * 1000;
            event['clientDuration4'] = event['clientDuration4'].to_s.gsub(',','.').to_f * 1000;

            event['clientDuration'] = (event['clientDuration1'] || 0) + (event['clientDuration2'] || 0) + (event['clientDuration3'] || 0) + (event['clientDuration4'] || 0);

            event['DurationMs'] = (event['clientDuration'] || 0) + (event['serverDuration'] || 0);
        " 
        }

        if ([viewAction]) {
             aggregate {
                 task_id => "%{requestID}"
                 code => "
                    map['dbDuration'] = event['dbDuration'];
                    map['serverDuration'] = event['serverDuration'];
                    map['objectOID'] = event['objectOID'];
                    map['objectFullCode'] = event['objectFullCode'];
                    map['viewAction'] = event['viewAction'];
                 "
             }
             drop{}
        }

        if (![viewAction]) {
             aggregate {
                 task_id => "%{requestID}"
                 code => "
                    event['DurationMs'] = (map['serverDuration'] || 0) + (event['clientDuration'] || 0);

                    event['dbDuration'] = map['dbDuration'];
                    event['serverDuration'] = map['serverDuration'];
                    event['objectOID'] = map['objectOID'];
                    event['objectFullCode'] = map['objectFullCode'];
                    event['viewAction'] = map['viewAction'];
                 "
                 end_of_task => true
                 timeout => 120
             }
        }

        mutate {
            convert => { "DurationMs" => "integer" }
        }

        date {
          match => [ "dateTime" , "dd.MM.yyyy HH:mm:ss,SSS"]      
        }
    }
}

output {
  elasticsearch {   
    host => "192.168.1.7"  
    cluster => "nbselasticcluster"
    protocol => "http" 
    index => "logstash-nbs-damas-%{+YYYY.MM.dd}" 
  }
  stdout {
    codec => rubydebug
  }
}
@elvarb

This comment has been minimized.

elvarb commented Aug 19, 2015

Could you check out how the total thread count is over time?

@mzubal

This comment has been minimized.

mzubal commented Aug 20, 2015

It is visible in the bottom right corner on both screenshots: thread count remains practically constant over time.

@coffeepac

This comment has been minimized.

coffeepac commented Oct 20, 2015

I'm seeing the same behavior on win2k8r2 + logstash 1.5.4 + jre 1.8.0_60. file input and kafka output. going to try downgrading to logstash 1.5.0 and see if that fixes things.

EDIT: its better but I'm still seeing an inexorable climb in usage as reported by perfmon

@jsvd

This comment has been minimized.

Member

jsvd commented Nov 5, 2015

This problem as been identified as a leak in jruby >= 1.7.20 that started shipping in 1.5.1
More info on this jruby/jruby#3446

@mzubal

This comment has been minimized.

mzubal commented Nov 10, 2015

Thanks to all of you for digging this out. Are there any plans to fix this in 1.X or just 2.1?

@suyograo

This comment has been minimized.

Member

suyograo commented Nov 13, 2015

@mzubal this will be backported to 1.5

@suyograo

This comment has been minimized.

Member

suyograo commented Mar 1, 2016

Fixed in jruby/jruby#3446. LS version 1.5.6 and 2.1.0

@suyograo suyograo closed this Mar 1, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment