Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unrecognized character escape issue for stream data in DynamoDB source #3664

Closed
daixba opened this issue Nov 15, 2023 · 6 comments · Fixed by #3667
Closed

[BUG] Unrecognized character escape issue for stream data in DynamoDB source #3664

daixba opened this issue Nov 15, 2023 · 6 comments · Fixed by #3667
Assignees
Labels
bug Something isn't working
Milestone

Comments

@daixba
Copy link
Contributor

daixba commented Nov 15, 2023

Describe the bug
If the source item contains string with ', the final document in OpenSearch will have empty body.

After debug, An exception has been throw when converting the Stream event to Json event using jackson library.

com.fasterxml.jackson.core.JsonParseException: Unrecognized character escape ''' (code 39)
 at [Source: (String)"{"Content":"I\'m sorry, but I don\'t have access to that.","UserId":"1234","StartTime":"2023-10-13T08:00:29.818246","SessionId":"123"}"; line: 1, column: 16]
	at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2477)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:750)
	at com.fasterxml.jackson.core.base.ParserBase._handleUnrecognizedCharacterEscape(ParserBase.java:1353)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._decodeEscaped(ReaderBasedJsonParser.java:2713)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2233)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2206)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:323)
	at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializerNR.deserialize(UntypedObjectDeserializerNR.java:82)
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:623)
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:449)
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:32)
	at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4825)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3772)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3755)
	at org.opensearch.dataprepper.plugins.source.dynamodb.converter.StreamRecordConverter.convertData(StreamRecordConverter.java:107)
	at org.opensearch.dataprepper.plugins.source.dynamodb.converter.StreamRecordConverter.writeToBuffer(StreamRecordConverter.java:74)
	at org.opensearch.dataprepper.plugins.source.dynamodb.stream.ShardConsumer.run(ShardConsumer.java:247)
	at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

To Reproduce
Steps to reproduce the behavior:

For example, the content attribute contains '.

{
 "SessionId": "123",
 "UserId": "1234",
 "Content": "I'm sorry, but I don't have access to that.",
 "StartTime": "2023-10-13T08:00:29.818246"
}

And the doc in OpenSearch will be

{
        "_index": "test-complex",
        "_id": "123|1234",
        "_score": 1,
        "_source": {}
      }

Expected behavior
The doc in OpenSearch should match to DynamoDB source.

Screenshots
N/A

Additional context
No issue in export.

@daixba daixba added bug Something isn't working untriaged labels Nov 15, 2023
@daixba
Copy link
Contributor Author

daixba commented Nov 15, 2023

Seems to be an issue with Jackson Mapper, should we set up something like mapper.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER, true);

@dlvenable
Copy link
Member

Standard JSON does not require escaping single quotes. But, it appears that something is escaping them in the example you've provided.

"Content":"I\'m sorry, but I don\'t have access to that."

Is there a list of characters that DynamoDB will escape? And can we configure Jackson to allow allow or expect those?

@dlvenable
Copy link
Member

dlvenable commented Nov 15, 2023

Relatedly, what is the purpose of this exception handling?

This results in data being dropped. The dynamodb source accepts the data and acts as if it were handled. But, then sends it to OpenSearch empty.

@dlvenable
Copy link
Member

@daixba , This is a bug in the AWS SDK and appears to have been fixed in main. aws/aws-sdk-java-v2#4156

I'm unsure if it has been deployed. It is currently labeled as "pending-release".

dlvenable added a commit to dlvenable/data-prepper that referenced this issue Nov 15, 2023
…r's ObjectMapper. The EnhancedDocument is escaping single quotes. Also, skip data that cannot be parsed entirely rather than silently send empty data. Resolves opensearch-project#3664.

Signed-off-by: David Venable <dlv@amazon.com>
dlvenable added a commit to dlvenable/data-prepper that referenced this issue Nov 15, 2023
Signed-off-by: David Venable <dlv@amazon.com>
@dlvenable
Copy link
Member

Updating the AWS SDK to 2.21.23 does fix this bug.

@dlvenable dlvenable added this to the v2.6 milestone Nov 15, 2023
@dlvenable dlvenable self-assigned this Nov 15, 2023
dlvenable added a commit that referenced this issue Nov 15, 2023
Resolves a bug with escaped single quotes in the DynamoDB source by updating the AWS SDK to 2.21.23. Also, skip data that cannot be parsed entirely rather than silently send empty data. Resolves #3664.

Signed-off-by: David Venable <dlv@amazon.com>
@daixba
Copy link
Contributor Author

daixba commented Nov 16, 2023

hat is the purpose of this exception handling?

I remembered that I put a TODO here about what should we do if we failed to parse the json, but not sure when it's gone.

Thanks for the help on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants