Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Generic logline placeholder replacement and tokenization #12799

Merged
merged 4 commits into from
Apr 26, 2024

Commits on Apr 26, 2024

  1. feat: Generic logline placeholder replacement and tokenization

    This code allows us to preprocess generic logs and replace highly variable dynamic data (timestamps, IPs, numbers, UUIDs, hex values, bytesizes and durations) with static placeholders for easier pattern extraction and more efficient and user-friendly matching by the Drain algorithm.
    
    Additionally, there is logic that splits generic log lines into discrete tokens that can be used with Drain for better results than just naively splitting the logs on every whitespace. The tokenization here handles quote counting and emits quoted strings as a part of the same token. On the other side, it also handles likely JSON logs without any white spaces in them better, by trying to split `{"key":value}` pairs (without actually parsing the JSON).
    
    All of this is done without using regular expressions and without actually parsing the log lines in any specific format. That's why it works very efficiently in terms of CPU usage and alloctions, and should handle all log formats and unformatted logs equally well.
    na-- committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    c044eb8 View commit details
    Browse the repository at this point in the history
  2. fix: typo "boundry"->"boundary"

    na-- committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    41e45b7 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4e886c6 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c62c6cf View commit details
    Browse the repository at this point in the history