Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the limitation of the size of yaml file that exceeds 3MB #97

Merged
merged 20 commits into from May 12, 2023

Conversation

kaisecheng
Copy link
Contributor

@kaisecheng kaisecheng commented May 9, 2023

Fixed: #96

added setting yaml_dictionary_code_point_limit to config the maximum code point limit of the yaml file in dictionary_path to overcome the 3MB size limit from SnakeYaml 1.33. This setting is only effective for yaml. Yaml file over the size limit throws an exception. JSON and CSV currently do not have such restriction. The default value of yaml_dictionary_code_point_limit is 128MB.

How to test

  • checkout the branch
  • build the gem
    • bundle install
    • gem build logstash-filter-translate.gemspec
  • install the gem in Logstash
    • bin/logstash-plugin install --local ~/path/to/your/gem/logstash-filter-translate-3.4.1.gem
  • download a 4MB yaml from here
  • run the following config
input {
  file {
    path => "/path/to/yaml/big_yaml_4mb.yml"
    start_position => "beginning"
  }
}
filter {
  dissect {
    mapping => {
      "message" => "%{key}: %{val}"
    }
  }
  translate {
    source => "key"
    target => "translated"
    fallback => "UNKNOWN"
    dictionary_path => "/path/to/yaml/big_yaml_4mb.yml"
    yaml_dictionary_code_point_limit => 5000000
  }
}
output {
    stdout { codec => dots }
}
  • Logstash should start without error

… size

of the yaml file in `dictionary_path` to overcome the 3MB size limit from
SnakeYaml 1.33

Fixed: logstash-plugins#96
@kaisecheng kaisecheng changed the title added setting dictionary_file_max_bytes to config the maximum bytes… Fix the limitation of the size of yaml file that exceeds 3MB May 9, 2023
@kaisecheng kaisecheng requested a review from mashhurs May 9, 2023 14:21
@kaisecheng kaisecheng self-assigned this May 9, 2023
===== `dictionary_file_max_bytes`

* Value type is <<number,number>>
* Default value is 3145728 (3MB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that there didn't use to be a limit, and I don't think disabling limit checking it is an option (or is it?), I'm tempted to put a very high number here, like 128MB or even 1GB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no option to disable the limit check. I put it 128MB

@kaisecheng kaisecheng requested a review from jsvd May 11, 2023 10:14
docs/index.asciidoc Outdated Show resolved Hide resolved
docs/index.asciidoc Outdated Show resolved Hide resolved
@mashhurs
Copy link

mashhurs commented May 11, 2023

Changes overall LGTM by agreeing @jsvd 's comments.
One thing I would like to clarify here, Psych moved from snakeyaml to snakeyaml-engine this year. The reason behind as I understood to mitigate the CVE risks and performances (snakeyaml-engine is more suitable in JVM env).
This change also indicates that under the hood we are moving to snakeyaml-engine AND for the long term we should/may move our snakeyaml dependencies to snakeyaml-engine as well.

Reference thread: link

@kaisecheng kaisecheng requested a review from jsvd May 11, 2023 21:41
Copy link

@mashhurs mashhurs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few leftovers from the max_bytes setting.
Also, I tried locally with refresh_interval and increasing the dictionary over the code point limit, and the reloader thread crashed:

[2023-05-11T23:26:31,445][ERROR][logstash.filters.translate][main] Scheduler intercepted an error: {:exception=>NoMethodError, :message=>"private method `warn' called for nil:NilClass", :backtrace=>["/Users/joaoduarte/elastic/logstash-plugins/logstash-filter-translate/lib/logstash/filters/dictionary/file.rb:123:in `loading_exception'", "/Users/joaoduarte/elastic/logstash-plugins/logstash-filter-translate/lib/logstash/filters/dictionary/file.rb:60:in `load_dictionary'",

The two @logger.warn in file.rb need to be changed to just logger.warn to produce the right error message:

[2023-05-11T23:22:42,793][WARN ][logstash.filters.dictionary.yamlfile][main] Translate: The incoming YAML document exceeds the limit: 100 code points. when loading dictionary file at /tmp/dict.yml, continuing with old dictionary {:dictionary_path=>"/tmp/dict.yml"}

lib/logstash/filters/dictionary/file.rb Outdated Show resolved Hide resolved
lib/logstash/filters/dictionary/file.rb Outdated Show resolved Hide resolved
lib/logstash/filters/dictionary/file.rb Outdated Show resolved Hide resolved
lib/logstash/filters/dictionary/yaml_file.rb Outdated Show resolved Hide resolved
lib/logstash/filters/translate.rb Outdated Show resolved Hide resolved
docs/index.asciidoc Outdated Show resolved Hide resolved
kaisecheng and others added 7 commits May 12, 2023 01:26
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Copy link
Contributor

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a suggestion to avoid leaking the yaml code point limit setting to the base dictionary file initialize and initialize_for_file_type method signatures, by using an optional argument hash.

lib/logstash/filters/dictionary/file.rb Outdated Show resolved Hide resolved
lib/logstash/filters/translate.rb Outdated Show resolved Hide resolved
lib/logstash/filters/dictionary/yaml_file.rb Outdated Show resolved Hide resolved
lib/logstash/filters/dictionary/file.rb Outdated Show resolved Hide resolved
lib/logstash/filters/dictionary/file.rb Outdated Show resolved Hide resolved
kaisecheng and others added 3 commits May 12, 2023 11:15
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
kaisecheng and others added 3 commits May 12, 2023 11:16
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
@kaisecheng kaisecheng requested a review from jsvd May 12, 2023 14:01
Copy link
Contributor

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@kaisecheng kaisecheng merged commit b3e4bd5 into logstash-plugins:main May 12, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The incoming YAML document exceeds the limit: 3145728 code points
4 participants