Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard reserved tags field against incorrect use #14822

Merged
merged 31 commits into from
Jan 25, 2023

Conversation

kaisecheng
Copy link
Contributor

@kaisecheng kaisecheng commented Dec 22, 2022

Fixed: #14711

Release notes

The reserved top-level tags field has required a certain shape since v7.9. It supposes to be a string or an array of strings. Assigning a map to tags field could crash unexpectedly. The pull request #14822 guards tags field against incorrect use. Assigning a map will result in having a _tagsparsefailure in tags field and the illegal value will be written to _tags field.

A new setting --event_api.tags.illegal is added for backward compatibility. Since 8.7, the default value is rename to move illegal value to _tags field. warn maintains the same behavior of allowing illegal value assignment to tags field.

What does this PR do?

This PR prevents the reserved tags field to be assigned with key/value map. Top-level tags should only accept string of list. When tags get a map value, LogStash::Event will rewrite the value to _tags and add _tagsparsefailure to tags.

After the change, tags does not hold map value, while it uses to be allowed with ruby { code => "event.set('[tags][k]' , 'v');" }. This means the unconsumed events in PQ and DLQ with a map value in tags will have a different event structure. _tags hold illegal value from tags.

To make this change backward compatible, this PR adds a flag --event_api.tags.illegal to allow fallback to old logic.
There are two options.
warn - the old flow that allows illegal value assignment to tags field.
rename - the new flow that assigns illegal value to _tags field and adds _tagsparsefailure to tags. This is the default value in 8.7

Why is it important/What is the impact to the user?

Prior to this change, when a json payload contains tags with map value, the value is set successfully but if there is any error from plugins that needs to add a failure tag to tags, such as _jsonparsefailure or _timestampparsefailure, the pipeline crashes as it cannot append a tag to a map value. This is bad because users lose visibility of what went wrong.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

Follow the reproducer of #14711

Extra test case

input {
  generator {
    message => '{"super": "ball"}'
    codec => json
    count => 1
  }
}
filter {
  mutate { "add_field" => {"[tags][k]" => "v" } }
  ruby {  code => 'fail "intentional"' }
}
output {
 stdout {}
}

Expected output:
tags=> [_tagsparsefailure, _rubyexception]
_tags => { k => v}

Related issues

Use cases

Screenshots

Logs

@kaisecheng kaisecheng changed the title Guard reserved tags field Guard reserved tags field against incorrect use Dec 22, 2022
@kaisecheng kaisecheng marked this pull request as ready for review December 22, 2022 23:32
…lue assignment to `tags` field

`rename` - rename key/value pair assignment to `_tags` field, which is the default action
`warn` - allow key/value pair assignment to `tags` field, which is the old logic
@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2023

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

@kaisecheng kaisecheng requested a review from yaauie January 4, 2023 12:53
@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2023

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

@kaisecheng kaisecheng removed the request for review from yaauie January 4, 2023 17:43
@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2023

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

@kaisecheng kaisecheng requested a review from yaauie January 4, 2023 23:11
@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2023

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

logstash-core/locales/en.yml Outdated Show resolved Hide resolved
logstash-core/locales/en.yml Outdated Show resolved Hide resolved
// guard tags field from key/value map, only string or list is allowed
if (ILLEGAL_TAGS_ACTION == IllegalTagsAction.RENAME) {
final Object tags = Accessors.get(data, TAGS_FIELD);
if (tags instanceof ConvertedMap) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of a block-list of one particular bad shape, can we have an allow-list of the supported shapes?

Comment on lines 228 to 229
final FieldReference renamedField = renameIllegalTags(field);
Accessors.set(data, renamedField, Valuefier.convert(value));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this does what we want, since it renames the field independently of the shape of its value.

It may be worth intercepting TAGS_FIELD to be handled by a helper method:

    public void setField(final FieldReference field, final Object value) {
        if (field.equals(TAGS_FIELD)) {
            setTagsField(value);
            return;
        }
        switch (field.type()) {
            // ...
        }
    }
    private void setTagsField(final Object value)
        if (isLegalTagValue(value)) {
            setField(TAGS_FIELD, value);
        } else if (ILLEGAL_TAGS_ACTION == IllegalTagsAction.RENAME)
            setField(FieldReference.from(TAGS_FAILURE_FIELD), value);
        } else {
            // log warning? avoid flooding though?
            setField(TAGS_FIELD, value);
        }
    }

    private boolean isLegalTagValue(final Object value) {
        if (value instanceof String) { return true; }
        if (value instanceof List) { return true; } // TODO: make sure it is a list of _strings_, which is difficult to do with generics and type erasure

        return false;
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied the suggestion. It now handles the illegal tags field first in Event.setField()

Comment on lines 241 to 242
if (ILLEGAL_TAGS_ACTION == IllegalTagsAction.RENAME &&
field.getPath() != null && field.getPath().length > 0 && field.getPath()[0].equals(TAGS)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this as:

  • IF we are supposed to rename illegal-shaped tags
  • AND the field reference has a non-empty path
  • AND the first-level is tags

In my mind, we need to handle two cases when ILLEGAL_TAGS_ACTION == IllegalTagsAction.RENAME:

  • (path == null || path.length == 0) && key.equals(TAGS), with a value that is neither a string nor an array of strings (setting top-level tags to be an illegal shape)
  • (path != null && path.length > 0 && path[0].equals(TAGS)) regardless of value (setting field nested inside the tags, which cannot have nested fields in it)

Examples that should trigger a rename using the Event.setField(String, Object):

  • event.setField("[tags]", Map.of("poison", "true"))
  • event.setField("[tags][poison]", "true")

Similarly, using the Ruby APIs Event#set:

  • event.set("[tags]", { "poison" => "true"})
  • event.set("[tags][poison]", "true")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for listing the cases. The PR is changed to handle other illegal types, not limited to map. And _tags field holds a list of illegal values.

@@ -126,7 +126,7 @@ public static void setEscapeStyle(final String escapeStyleSpec) {
*/
private final int type;

private FieldReference(final String[] path, final String key, final int type) {
public FieldReference(final String[] path, final String key, final int type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wary of making this public, since it would allow plugins built with the Java plugin API to route around our field reference caching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revert it to private and build the FieldReference from string instead

kaisecheng and others added 4 commits January 10, 2023 15:26
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: Ry Biesemeyer <yaauie@users.noreply.github.com>
@github-actions
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

change `_tags` to a list to store all invalid value
build FieldReference from string instead of a public constructor
@github-actions
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

@github-actions
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

@github-actions
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

@github-actions
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

@jsvd
Copy link
Member

jsvd commented Jan 23, 2023

A few thoughts on the behaviour we have on @timestamp vs what we're doing in this PR:

# creating new event with incompatible type succeeds with a warning
irb(main):026:0> e = LogStash::Event.from_json('{"@timestamp": ""}').first
[2023-01-23T17:19:12,701][WARN ][org.logstash.Event       ] Error parsing @timestamp string value=
=> #<LogStash::Event:0x8d65037>
# the event has a new tag and wrong data is placed in _@timestamp
irb(main):027:0> e.to_hash
=> {"@timestamp"=>2023-01-23T17:19:12.701840Z, "@version"=>"1", "tags"=>["_timestampparsefailure"], "_@timestamp"=>""}
# after creation, attempting to set anything not compatible with LogStash::Timestamp throws exception
irb(main):028:0> e.set("@timestamp", "")
TypeError (wrong argument type String (expected LogStash::Timestamp))
irb(main):029:0> e.set("[@timestamp][1]", "")
Java::OrgLogstash::Accessors::InvalidFieldSetException (Could not set field '1' on object '2023-01-23T17:19:12.701840Z' to value ''.This is probably due to trying to set a field like [foo][bar] = someValuewhen [foo] is not either a map or a string)

On the Java side, setField is (too?) lenient, and allows us to break the event:

irb(main):031:0> je = e.to_java
=> #<Java::OrgLogstash::Event:0x822e8a1>
irb(main):032:0> je.setField("@timestamp", "")
=> nil
irb(main):034:0> je.to_string
Java::JavaLang::ClassCastException (class org.jruby.RubyString cannot be cast to class org.logstash.ext.JrubyTimestampExtLibrary$RubyTimestamp (org.jruby.RubyString and org.logstash.ext.JrubyTimestampExtLibrary$RubyTimestamp are in unnamed module of loader 'app'))

From this I think we can create a similar experience for the tags field:

  1. Creating a new event with a type that is not compatible with the reserved field should still create the event but store the value in an underscored prefixed field. This PR does that.
  2. Any set operation on an existing Ruby Event object should throw an error. Here this PR is smarter and allows the operation to occur and stores the invalid data alongside any previous one in the _tags field.

The main reason not to blindly crash on operations in 2. is that we may breaking some EXTREMELY edge cases out there in the world where a pipeline takes a fully created event and in the filter section sets a value to "tags" that isn't a string or list of strings. Considering this PR helps the "new event with broken 'tags'" use case already, I am comfortable throwing exceptions on setField. This should simplify the logic we need to support, and make the experience similar for the reserved fields @timestamp and tags.

@jsvd
Copy link
Member

jsvd commented Jan 23, 2023

For illustration, here is the tags behaviour in this PR when event_api.tags.illegal: rename:

irb(main):035:0> e = LogStash::Event.from_json('{"tags": {"hey": "you"} }').first
=> #<LogStash::Event:0x7edad7d2>
irb(main):036:0> e.to_hash
=> {"_tags"=>[{"hey"=>"you"}], "tags"=>["_tagsparsefailure"], "@version"=>"1", "@timestamp"=>2023-01-23T19:21:42.430284Z}
irb(main):037:0> e.set("tags", {"hi": "me"})
=> {:hi=>"me"}
irb(main):038:0> e.set("[tags][hello]", 0)
=> 0
irb(main):039:0> e.to_hash
=> {"_tags"=>[{"hey"=>"you"}, {"hi"=>"me"}, {"hello"=>0}], "tags"=>["_tagsparsefailure"], "@version"=>"1", "@timestamp"=>2023-01-23T19:21:42.430284Z}

…istory

- any set operation on event throws exception
@github-actions
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

Copy link
Member

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can now remove rebaseOnto and all its associated tests?

@github-actions
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

@kaisecheng
Copy link
Contributor Author

@jsvd I have changed to rename illegal value to _tags instead of making a list of illegal history and throw exception for any illegal set operation. Thanks for your review and please have a look again.

@kaisecheng kaisecheng requested a review from jsvd January 24, 2023 14:44
Copy link
Member

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with the behaviour, I just think we should find another name to describe the default behaviour, and then ensure the docs reflect that.
Some of the docs still describe the behaviour as if "rename" happens both on creation and set, and it's actually "rename on creation, reject on set"

docs/static/settings-file.asciidoc Outdated Show resolved Hide resolved
kaisecheng and others added 2 commits January 24, 2023 21:44
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
@github-actions
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

@github-actions
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_14822.docs-preview.app.elstc.co/diff

@kaisecheng kaisecheng requested a review from jsvd January 25, 2023 11:08
Copy link
Member

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kaisecheng kaisecheng merged commit 46443e4 into elastic:main Jan 25, 2023
@kaisecheng
Copy link
Contributor Author

@logstashmachine backport 8.7

@roaksoax
Copy link
Contributor

For future reference:

This change was release-noted in https://www.elastic.co/guide/en/logstash/current/logstash-8-7-0.html .
The setting available is event_api.tags.illegal as documented in https://www.elastic.co/guide/en/logstash/current/logstash-settings-file.html
Users can event_api.tags.illegal to warn to rollback the behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LogStash::Event should guard against incorrect use of reserved tags field.
4 participants