Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with Jackson 2.15: Can not write a field name, expecting a value #15564

Merged
merged 4 commits into from Nov 21, 2023

Conversation

edmocosta
Copy link
Contributor

@edmocosta edmocosta commented Nov 10, 2023

Release notes

Fixed issue with log.format=json and Jackson 2.15: Can not write a field name, expecting a value.

What does this PR do?

More details about the issue/ root-cause can be found here.

The RubyBasicObject is the super class of the majority of org.jruby.* objects. All subclasses fields inside the org.jruby.* package are not desired on the JSON logs. It does not bring any value to users and increase the log size considerably.

This PR creates a new Jackson serializer for the RubyBasicObject, which is registered with a lower priority, and is used as default for all Ruby objects when no more-specific/priority serializer is registered.

This new serializer applies the following logic:

  • Arrays, Collections and Maps are delegated to the default Jackson's serializer - which might end-up invoking this serializer for its elements.
  • Values which inspect method is implemented and owned by the Logstash module will be serialized using that method returned content.
  • If none of the above conditions match, it gets the serialized value by invoking the Ruby's to_s method, falling back
    to RubyBasicObject#to_s() and RubyBasicObject#anyToString() in case of errors.

Logstash does have a few Jackson serializers already implement for JRuby objects, but they're not being used by Log4j, as it has its own Log4jJsonObjectMapper implementation.

This PR also creates a new org.logstash.ObjectMappers.LOG4J_JSON_MAPPER, specific for Log4J, which register the existing Ruby serializers and is used to serialize the StructuredMessage parameters on the CustomLogEventSerializer.java.

The CustomLogEventSerializer was also changed to create a new Jackson's generator for each message's parameter entry, that way, the main generator is not compromised/invalidated in case any key/value fails to write, fixing the "Can not write a field name, expecting a value" issue.

Why is it important/What is the impact to the user?

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

How to test this PR locally

Run Logstash 8.10.x with bin/logstash --log.format=json.

Latest LS version using Jackson > 2.14 should log Can not write a field name, expecting a value errors, printing incomplete log messages.

Related issues

@edmocosta edmocosta changed the title [WIP] Fix issue with Jackson 2.15 and json log format [WIP] Fix issue with Jackson 2.15: Can not write a field name, expecting a value Nov 10, 2023
@edmocosta edmocosta changed the title [WIP] Fix issue with Jackson 2.15: Can not write a field name, expecting a value Fix issue with Jackson 2.15: Can not write a field name, expecting a value Nov 10, 2023
@edmocosta edmocosta marked this pull request as ready for review November 10, 2023 17:17
@edmocosta edmocosta requested a review from jsvd November 10, 2023 17:19
Comment on lines +84 to +89
try (final SegmentedStringWriter entryJsonWriter = new SegmentedStringWriter(new BufferRecycler());
final JsonGenerator entryGenerator = LOG4J_JSON_MAPPER.getFactory().createGenerator(entryJsonWriter)) {
entryGenerator.writeObject(paramValue);
generator.writeFieldName(paramName);
generator.writeRawValue(entryJsonWriter.getAndClear());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any thoughts on the performance impact of this change? if isValueSafeToWrite is false frequently then creating a new generator, writer and buffer recycler could get expensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my local tests (1k fields), It had a minor/same performance impact compared to using only the main generator. I've also tried to reuse the BufferRecycler, but it didn't save many cycles as it needs to invoke the getAndClear() method in every iteration. The SegmentedStringWriter also had similar results to a JDK StringWriter.

Comment on lines +110 to +111
final Object candidate = value.method(RubyString.newString(RubyUtil.RUBY, METHOD_INSPECT));
return candidate instanceof RubyMethod && ((RubyMethod) candidate).owner(RubyUtil.RUBY.getCurrentContext()).toString().toLowerCase().startsWith("logstash");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, do we know if this is costly performance wise when there's a lot of log entries? If necessary we could consider some memoization/caching strategy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, invoking JRuby's method demonstrated to be an expensive operation. Unfortunately, I couldn't find any better option here - with the same result - as those objects passed to the logger are dynamically generated/parsed. Memoazing/caching them would also be complex and prone to errors, IMO, as their values are actually coming from methods invocations, which might return different values even if the type is the same, for example.

In terms of overall performance, the CustomLogEventSerializer now uses the LOG4J_JSON_MAPPER to serialize the message parameters. That mapper registers a few JRuby's serializes with a higher priority than this generic one. It should speed up the process by avoiding this generic serializer to be used.

Considering the RubyBasicObjectSerializer is a kind of fallback, and that the current behavior (without this serializer) is going through the whole object hierarchic serializing all fields until it fails. I believe the cost of invoking those methods is considerably smaller.

WDYT? Any suggestions?

Copy link

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

@edmocosta edmocosta requested a review from jsvd November 20, 2023 12:41
Copy link
Member

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having done some basic performance tests there doesn't seem to be any degradation. it's great to finally have control over serialization.
LGTM

@edmocosta edmocosta merged commit 57dc14c into elastic:main Nov 21, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logstash: 8.10.2: Issues with fasterxml: Can not write a field name, expecting a value
4 participants