-
Notifications
You must be signed in to change notification settings - Fork 31
feat(rt): replace GSON based JSON serde with KMP compatible impl #477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Aaron Todd <todaaron@amazon.com> Co-authored-by: Kyle Thomson <kylthoms@amazon.com>
Co-authored-by: Aaron Todd <todaaron@amazon.com> Co-authored-by: Kyle Thomson <kylthoms@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heroic!
| assertEquals(1, byteCountUtf8("$".encodeToByteArray()[0])) | ||
| assertEquals(2, byteCountUtf8("¢".encodeToByteArray()[0])) | ||
| assertEquals(3, byteCountUtf8("€".encodeToByteArray()[0])) | ||
| assertEquals(4, byteCountUtf8("\uD834\uDD22".encodeToByteArray()[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Can this (and other instances) be replaced with the literal Unicode character 𝄢 in source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so because it's a surrogate pair (you can try in the editor to copy and paste and it will encode it for you).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to copy/paste the 𝄢 symbol directly into source code in the editor and it seemed to render fine as a single character. Maybe that behavior is platform/font dependent.
It's a minor question so I can live with it as-is, it merely seemed conspicuous to render three Unicode characters and then fallback to Unicode escape sequences for the last test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is likely the case. I can't paste it directly, it auto formats it to the code points
| } | ||
|
|
||
| subprojects { | ||
| allprojects { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why was this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not be now. It was because we added code to serde/common that needed it. I'll check if it's still required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was added for CharStream tests (since removed). I think I'll keep it since it allows tests to be added to serde common if needed.
| How to run JSONTestSuite against serde-json deserialize | ||
| ======================================================== | ||
|
|
||
| When making changes to the lexer it is a good idea to run the | ||
| changes against the [JSONTestSuite](https://github.com/nst/JSONTestSuite) and manually examine the test results. | ||
|
|
||
| ### How to setup the JSONTestSuite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why not just add an integration test that uses JSONTestSuite directly into the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSONTestSuite is a bit interesting/onerous to setup and use. It does not lend itself to this kind of integration and I see it as an operational cost not worth paying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this line of thinking, can this code as-is serve as a general purpose JSON parser? If so perhaps it would make sense to break out as a separate dependency at some point in the future such that JSONTestSuite could run against it as the others they have in their repo. Obviously not something for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we are planning to support it as a general purpose parser I would think it probably isn't in our interest (or anyone else) to do that. Perhaps we fork it and set it up so that it's easier to run...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "good idea to run the changes against JSONTestSuite" sounds like best intentions to me, is there a way we can mechanize it? When I was playing around with the testing, I just copy/pasted all of the parser scenarios from their input files into a (admittedly massive) test class.
|
|
||
|
|
||
| // NOTE: set to whatever locally published version you are working on | ||
| val smithyKotlinVersion: String = "0.4.1-kmp-json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The versions users have will almost always be something like 0.4.0-alpha or 0.4.0-snapshot. We should make the example something that looks familiar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This readme is targeted towards developers (i.e. us). Not end users.
| val smithyKotlinVersion: String = "0.4.1-kmp-json" | ||
| dependencies { | ||
| implementation(kotlin("stdlib")) | ||
| implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.5.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Do we need to include a note about matching the coroutines version used by the rest of the project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's important, again this is targeted at developers of smithy-kotlin though so I would expect us to make changes here as necessary.
runtime/serde/serde-json/common/src/aws/smithy/kotlin/runtime/serde/json/JsonLexer.kt
Outdated
Show resolved
Hide resolved
| 'r' -> append('\r') | ||
| 'n' -> append('\n') | ||
| 't' -> append('\t') | ||
| else -> throw DeserializationException("Invalid escape character: `$byte`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Can we find a way for this to include position information?
runtime/serde/serde-json/common/src/aws/smithy/kotlin/runtime/serde/json/LexerState.kt
Outdated
Show resolved
Hide resolved
...me/serde/serde-json/common/test/aws/smithy/kotlin/runtime/serde/json/JsonStreamWriterTest.kt
Show resolved
Hide resolved
| /** | ||
| * Checks to see if a codepoint is in the supplementary plane or not (surrogate pair) | ||
| */ | ||
| @InternalApi | ||
| fun Char.Companion.isSupplementaryCodePoint(codePoint: Int): Boolean = codePoint in SUPPLEMENTARY_PLANE_LOW..MAX_CODEPOINT | ||
|
|
||
| /** | ||
| * Converts the [codePoint] to a char array. If the codepoint is in the supplementary plane then it will | ||
| * return an array with the high surrogate and low surrogate at indexes 0 and 1. Otherwise it will return a char | ||
| * array with a single character. | ||
| */ | ||
| @InternalApi | ||
| fun Char.Companion.codePointToChars(codePoint: Int): CharArray = when (codePoint) { | ||
| in 0 until SUPPLEMENTARY_PLANE_LOW -> charArrayOf(codePoint.toChar()) | ||
| in SUPPLEMENTARY_PLANE_LOW..MAX_CODEPOINT -> { | ||
| val low = MIN_LOW_SURROGATE.code + ((codePoint - 0x10000) and 0x3FF) | ||
| val high = MIN_HIGH_SURROGATE.code + (((codePoint - 0x10000) ushr 10) and 0x3FF) | ||
| charArrayOf(high.toChar(), low.toChar()) | ||
| } | ||
| else -> throw IllegalArgumentException("invalid codepoint $codePoint") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: I don't think I've ever seen (effectively) static extension methods before. Given that there's no state to use/reuse in Char.Companion, I think these might be better as top-level non-extension methods or extensions on Int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there's no state to use/reuse in Char.Companion
Not sure why this matters. It's more about scoping the methods to their intended use. I could see them as extensions off Int though if you feel strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time but only got maybe a quarter way through this. I'll continue but will share what I have so far. Very clean, nice work.
runtime/serde/serde-json/common/src/aws/smithy/kotlin/runtime/serde/json/JsonEncoder.kt
Outdated
Show resolved
Hide resolved
runtime/serde/serde-json/common/src/aws/smithy/kotlin/runtime/serde/json/JsonEncoder.kt
Outdated
Show resolved
Hide resolved
|
|
||
| private val state: ListStack<LexerState> = mutableListOf(LexerState.Initial) | ||
|
|
||
| private var depth: Int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
consider using UInt here
| How to run JSONTestSuite against serde-json deserialize | ||
| ======================================================== | ||
|
|
||
| When making changes to the lexer it is a good idea to run the | ||
| changes against the [JSONTestSuite](https://github.com/nst/JSONTestSuite) and manually examine the test results. | ||
|
|
||
| ### How to setup the JSONTestSuite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this line of thinking, can this code as-is serve as a general purpose JSON parser? If so perhaps it would make sense to break out as a separate dependency at some point in the future such that JSONTestSuite could run against it as the others they have in their repo. Obviously not something for this PR.
| override fun writeRawValue(value: String) = encodeValue(value) | ||
| } | ||
|
|
||
| internal fun String.escape(): String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion
This may be more efficient in time and concise if implemented with a map. Here we have two iterations on the input string, once to check to see if anything should be done, and then again to do a conditional transform. This could be collapsed to one iteration if there is a static Map<Char, String> which contains the escape values and then simply calling Map.getOrElse(chr) on each character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm ya I'll look into it
| /** | ||
| * The size of the state stack | ||
| */ | ||
| val size: Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
consider UInt
runtime/serde/serde-json/common/src/aws/smithy/kotlin/runtime/serde/json/JsonLexer.kt
Show resolved
Hide resolved
runtime/serde/serde-json/common/src/aws/smithy/kotlin/runtime/serde/json/JsonLexer.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work! I don't have any ship-blocking concerns. Some nits and questions.
| * Advance the cursor until next non-whitespace character is encountered | ||
| * @param peek Flag indicating if the next non-whitespace character should be consumed or peeked | ||
| */ | ||
| private fun nextNonWhitespace(peek: Boolean = false): Char? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion
I don't find any place where this function is called where peek is false, maybe unneeded.
runtime/serde/serde-json/common/src/aws/smithy/kotlin/runtime/serde/json/JsonLexer.kt
Show resolved
Hide resolved
runtime/serde/serde-json/common/src/aws/smithy/kotlin/runtime/serde/json/JsonLexer.kt
Show resolved
Hide resolved
runtime/serde/serde-json/common/src/aws/smithy/kotlin/runtime/serde/json/JsonStreamReader.kt
Show resolved
Hide resolved
| How to run JSONTestSuite against serde-json deserialize | ||
| ======================================================== | ||
|
|
||
| When making changes to the lexer it is a good idea to run the | ||
| changes against the [JSONTestSuite](https://github.com/nst/JSONTestSuite) and manually examine the test results. | ||
|
|
||
| ### How to setup the JSONTestSuite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "good idea to run the changes against JSONTestSuite" sounds like best intentions to me, is there a way we can mechanize it? When I was playing around with the testing, I just copy/pasted all of the parser scenarios from their input files into a (admittedly massive) test class.
Issue #
N/A
Description of changes
Removes GSON in favor of a hand rolled JSON encoder and lexer.
This was based on early work from @kiiadi in #42 but updated to handle Unicode, escape sequences, and improve overall correctness/strictness of the parser. The state machine was adapted from work in smithy-rs by @jdisanti which drastically simplified handling various conditions as well as improved the error messages generated.
Implementation notes:
peek()which requires state mutations to be delayed until the token that is peeked is actually consumed. This allowed removing theRawJsonTokentype.CharStreamabstraction was removed for performance reasons.suspendis incredibly slow when invoked on every character (not entirely surprising).decodeToString()on the appropriate slice of data. I'm not entirely sure why but suspect it's either a JVM intrinsic somewhere and/or has an optimized ASCII loop that is just faster when done in bulk.ByteArraypayload which requires us to keep generated deserializers as they are now where they read the entire payload contents into memory (val payload = httpResponse.body.readAll()). This trades memory for CPU. The team discussed and agreed that most service responses will be small enough that this is the right trade off.The tokenizer was ran through JSON test suite as described in TESTING.md.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.