Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improved Drain pattern extraction algorithm (+optimization pass) #13003

Closed
wants to merge 15 commits into from

Conversation

benclive
Copy link
Contributor

What this PR does / why we need it:
Introduces a slightly modified Drain algorithm to extract patterns with the pattern-ingester.

  • Instead of pre-processing the whole string for all placeholders, this modification processes out certain common pieces (TIMESTAMP, DURATION) then will then parse out more values only if we see multiple NUM or HEX attributes at the same position in a log line. This serves to retain the static value if it only appears once (e.g. logs.go:48) vs if we see a lot of values appear together (e.g. logs.go:48, logs.go:23, logs.go:352 would be grouped under logs.go:) which massively aids readability.

Special notes:
Comparison between previous algorithm & this one:

  • Understandably higher CPU & allocations due to doing more work for each log line.

  • Total memory usage seems slightly lower, likely due to generating a better set of patterns & flatter tree
    image

  • Tradeoff is better quality patterns (and there are probably a new more things we can try)
    image

  • I've done an optimisation pass to streamline my initial nieve implementation to make this reasonable but its still about 2x the previous resource usage.

    • I've already cut the CPU down by 25%, there may be some more things to do here so I can iterate further in my more PRs to stop making this PR (even) larger.

New-algorithm vs New-algorithm-with-optimizations:

$ benchstat string-tokens.txt byte-tokens.txt                                                                                                                                                                                                                                                                                                                  ok | 11:45:11 
goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/v3/pkg/pattern/drain
                                                                               │ string-tokens.txt │           byte-tokens.txt           │
                                                                               │      sec/op       │   sec/op     vs base                │
Drain_TrainExtractsPatterns/Generate_patterns_on_high_variation_logfmt_logs-14         3.762m ± 0%   2.781m ± 2%  -26.07% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Generate_patterns_on_low_variation_logfmt_logs-14          277.6µ ± 1%   212.1µ ± 0%  -23.59% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Generate_patterns_on_json_formatted_logs-14                378.8µ ± 1%   236.4µ ± 0%  -37.60% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_distributor_logs-14                          13.331m ± 0%   9.680m ± 0%  -27.39% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_journald_logs-14                              3.680m ± 0%   2.771m ± 0%  -24.69% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_kafka_logs-14                                 2.514m ± 0%   2.222m ± 0%  -11.60% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_kubernetes_logs-14                            2.264m ± 0%   1.702m ± 0%  -24.84% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_vault_logs-14                                 1.687m ± 0%   1.368m ± 0%  -18.96% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_calico_logs-14                                3.380m ± 0%   2.729m ± 0%  -19.26% (p=0.000 n=10)
_logfmtTokenizer_Marshal/distributor-14                                                9.779m ± 1%   6.045m ± 0%  -38.18% (p=0.000 n=10)
geomean                                                                                2.393m        1.780m       -25.62%

                                                                               │ string-tokens.txt │           byte-tokens.txt            │
                                                                               │       B/op        │     B/op      vs base                │
Drain_TrainExtractsPatterns/Generate_patterns_on_high_variation_logfmt_logs-14        7.215Mi ± 0%   5.477Mi ± 0%  -24.08% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Generate_patterns_on_low_variation_logfmt_logs-14         638.9Ki ± 0%   521.7Ki ± 0%  -18.33% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Generate_patterns_on_json_formatted_logs-14               302.0Ki ± 0%   132.6Ki ± 0%  -56.10% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_distributor_logs-14                          29.29Mi ± 0%   23.88Mi ± 0%  -18.45% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_journald_logs-14                             6.965Mi ± 0%   5.523Mi ± 0%  -20.71% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_kafka_logs-14                                5.877Mi ± 0%   4.954Mi ± 0%  -15.70% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_kubernetes_logs-14                           5.699Mi ± 0%   4.661Mi ± 0%  -18.22% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_vault_logs-14                                5.519Mi ± 0%   4.538Mi ± 0%  -17.79% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_calico_logs-14                               6.457Mi ± 0%   5.459Mi ± 0%  -15.47% (p=0.000 n=10)
_logfmtTokenizer_Marshal/distributor-14                                               27.60Mi ± 0%   21.52Mi ± 0%  -22.03% (p=0.000 n=10)
geomean                                                                               4.955Mi        3.774Mi       -23.83%

                                                                               │ string-tokens.txt │           byte-tokens.txt           │
                                                                               │     allocs/op     │  allocs/op   vs base                │
Drain_TrainExtractsPatterns/Generate_patterns_on_high_variation_logfmt_logs-14         44.78k ± 0%   10.93k ± 0%  -75.58% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Generate_patterns_on_low_variation_logfmt_logs-14          3.969k ± 0%   1.279k ± 0%  -67.78% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Generate_patterns_on_json_formatted_logs-14                2378.0 ± 0%    860.0 ± 0%  -63.84% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_distributor_logs-14                          186.05k ± 0%   58.77k ± 0%  -68.41% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_journald_logs-14                              33.99k ± 0%   15.56k ± 0%  -54.21% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_kafka_logs-14                                 21.68k ± 0%   12.41k ± 0%  -42.76% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_kubernetes_logs-14                           17.342k ± 0%   9.425k ± 0%  -45.65% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_vault_logs-14                                18.063k ± 0%   6.060k ± 0%  -66.45% (p=0.000 n=10)
Drain_TrainExtractsPatterns/Patterns_for_calico_logs-14                                34.54k ± 0%   19.20k ± 0%  -44.42% (p=0.000 n=10)
_logfmtTokenizer_Marshal/distributor-14                                               165.03k ± 0%   20.00k ± 0%  -87.88% (p=0.000 n=10)
geomean                                                                                25.20k        8.864k       -64.83%

@benclive benclive requested a review from a team as a code owner May 21, 2024 12:51
@benclive benclive changed the title Optimised Drain while using new algorithm feat: Improved Drain pattern extraction algorithm (+optimization pass) May 21, 2024
@benclive
Copy link
Contributor Author

After some discussion, I've decided to carve this PR up into several incremental improvements with separate perf analysis for each one as this combines a whole load of optimizations and tokenization improvements in one place which makes it hard to analyze.

@benclive benclive closed this May 29, 2024
@benclive benclive deleted the optimise-better-parse-tree branch October 16, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant