Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue after upgrading from 2.3.x to 5.6.3 (70k -> 8k eps) #8560

Closed
marioluan opened this issue Oct 31, 2017 · 13 comments
Closed
Assignees

Comments

@marioluan
Copy link

marioluan commented Oct 31, 2017

After upgrading from 2.3.4 to 5.6.3, I noticed two performance issues:

  1. Grok filter using array is slower
  2. Overall performance is slower
  • Version:
    5.6.3

  • Operating System:
    RHEL 5.3

  • Config File :

# MULTIPLE grok filters
input {
    generator {
        type => foo
        message => "random message, la la la"
    }
}

filter {
    grok {
        match => {
            "message" => "^1 foo 1 bar$"
        }
    }
    grok {
        match => {
            "message" => "^2 foo 2 bar$"
        }
    }
    grok {
        match => {
            "message" => "^3 foo 3 bar$"
        }
    }
    grok {
        match => {
            "message" => "^4 foo 4 bar$"
        }
    }
    grok {
        match => {
            "message" => "^5 foo 5 bar$"
        }
    }
    grok {
        match => {
            "message" => "^6 foo 6 bar$"
        }
    }
    grok {
        match => {
            "message" => "^7 foo 7 bar$"
        }
    }
    grok {
        match => {
            "message" => "^8 foo 8 bar$"
        }
    }
    grok {
        match => {
            "message" => "^9 foo 9 bar$"
        }
    }
    grok {
        match => {
            "message" => "^10 foo 10 bar$"
        }
    }
    grok {
        match => {
            "message" => "^11 foo 11 bar$"
        }
    }
    grok {
        match => {
            "message" => "^12 foo 12 bar$"
        }
    }
    grok {
        match => {
            "message" => "^13 foo 13 bar$"
        }
    }
    grok {
        match => {
            "message" => "^14 foo 14 bar$"
        }
    }
    grok {
        match => {
            "message" => "^15 foo 15 bar$"
        }
    }
    grok {
        match => {
            "message" => "^16 foo 16 bar$"
        }
    }
    grok {
        match => {
            "message" => "^17 foo 17 bar$"
        }
    }
    grok {
        match => {
            "message" => "^18 foo 18 bar$"
        }
    }
    grok {
        match => {
            "message" => "^19 foo 19 bar$"
        }
    }
    grok {
        match => {
            "message" => "^20 foo 20 bar$"
        }
    }

    metrics {
        meter => "events"
        add_tag => "metric"
    }
}

output {
    if "metric" in [tags] {
        stdout {
            codec => line {
                format => "rate_1m: %{[events][rate_1m]}, rate_5m: %{[events][rate_5m]}"
            }
        }
    }
}
# SINGLE grok filter (array)
input {
    generator {
        type => foo
        message => "random message, la la la"
    }
}

filter {

    grok {
        match => {
            "message" => [
                "^1 foo 1 bar$",
                "^2 foo 2 bar$",
                "^3 foo 3 bar$",
                "^4 foo 4 bar$",
                "^5 foo 5 bar$",
                "^6 foo 6 bar$",
                "^7 foo 7 bar$",
                "^8 foo 8 bar$",
                "^9 foo 9 bar$",
                "^10 foo 10 bar$",
                "^11 foo 11 bar$",
                "^12 foo 12 bar$",
                "^13 foo 13 bar$",
                "^14 foo 14 bar$",
                "^15 foo 15 bar$",
                "^16 foo 16 bar$",
                "^17 foo 17 bar$",
                "^18 foo 18 bar$",
                "^19 foo 19 bar$",
                "^20 foo 20 bar$"
            ]
        }
    }

    metrics {
        meter => "events"
        add_tag => "metric"
    }
}

output {
    if "metric" in [tags] {
        stdout {
            codec => line {
                format => "rate_1m: %{[events][rate_1m]}, rate_5m: %{[events][rate_5m]}"
            }
        }
    }
}
  • Sample Data:
version 5.6.3 2.3.4
single grok (array) 8k 70k
multiple grok 23k 40k
  • Steps to Reproduce:
    Run logstash with the provided configuration for 5 minutes and check rate_1m.
@marioluan marioluan changed the title Performance issue after upgrading from 2.3.x to 5.6.3 Performance issue after upgrading from 2.3.x to 5.6.3 (70k -> 8k eps) Oct 31, 2017
@andrewvc
Copy link
Contributor

andrewvc commented Nov 1, 2017

Thanks for reporting this. After investigating this, the cause is the new timeout enforcement feature in grok. While this is quite useful, it is inherently expensive. I think we should add a flag to disable that if users want it.

@andrewvc
Copy link
Contributor

andrewvc commented Nov 2, 2017

I verified this, BTW, by commenting the watchdog code in/out in the vendor directory. Throughput on my box went from 14k/thread to 30k/thread.

A few attempted optimizations only yielded a 5k/s improvement.

This needs to be researched and improved in a more methodical way than my own ad-hoc experiment.

@IrlJidel
Copy link
Contributor

IrlJidel commented Nov 2, 2017

@andrewvc Thanks for looking at this.

What watchdog code did you comment out - was this in grok plugin or in core? Would setting timeout_millis => 0 have the same affect?

One thing that was really surprising in perf testing was that single grok with 20 patterns is much worse than 20 groks with one pattern. Is this a seperate issue or still related to the timeout enforcement?

@original-brownbear
Copy link
Member

original-brownbear commented Nov 5, 2017

@andrewvc see logstash-plugins/logstash-filter-grok#125, could make it more than twice as fast for the posted config without changes to functionality.

If we want more, I'd go with your suggestion to simply bypass the timeout logic when the timeout is set to 0. That seems to be the best value per unit of effort :) Let me know if you want me to make a PR for that too.

@marioluan
Copy link
Author

@andrewvc thanks for your feedback.

@original-brownbear thanks for digging into this :).

I rerun the benchmark with your patch, and the results look great:

version 5.6.3 2.3.4 5.6.3 (patched)
single grok (array) 8k 70k 54k
multiple grok 23k 40k 32k

@original-brownbear
Copy link
Member

@marioluan nice, thanks a lot for benchmarking this on 5.6.3!

I admit I only ran it against master (which likely explains the larger gain I saw, master has a somewhat lower framework overhead than 5.6.3 :) => more relative change there)

@marioluan
Copy link
Author

@original-brownbear rerun the benchmark with your new patch (logstash-plugins/logstash-filter-grok@1601d93).

version 5.6.3 2.3.4 5.6.3 (patch logstash-plugins/logstash-filter-grok@e38113b) 5.6.3 (patch logstash-plugins/logstash-filter-grok@1601d93)
single grok (array) 8k 70k 54k 60k
multiple grok 23k 40k 32k 32k

@original-brownbear
Copy link
Member

@marioluan ah nice, thanks :)
@andrewvc got some time to look at this again? :)

@marioluan
Copy link
Author

marioluan commented Nov 22, 2017

hey guys, do you have any expectation when this will be fixed/pushed/merged?

@original-brownbear
@andrewvc

@andrewvc
Copy link
Contributor

@marioluan we have just published version 3.4.5 of the grok plugin, that should correctly handle this issue.

@marioluan
Copy link
Author

marioluan commented Nov 28, 2017

That's wonderful news!
I've tested against the released version (4.0.0) and it fixed the issue.
Thanks a million! ;) @andrewvc @original-brownbear

@jmreymond
Copy link

How to install this new version ?
I have tried the following statements but it does not work :-(

bin/logstash-plugin install --version 3.4.5 logstash-filter-grok

Validating logstash-filter-grok-3.4.5
Plugin logstash-filter-grok version 3.4.5 does not exist
ERROR: Installation aborted, verification failed for logstash-filter-grok 3.4.5

@original-brownbear
Copy link
Member

@jmreymond the version is 4.0.0 not 3.4.5. Sorry for the mixup above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants