Decouple text readers from iteration #92
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Decoupling text readers that are containers (ie: array and objects) from
iteration is a breaking change. Instead of writing:
One will now write:
The new version is more concise and allows users to iterate over a
fields or values more than once without cloning, which could have been
expensive due to the internal state of
ObjectReader
internally relyingon a lazily allocated vector to keep track of grouped keys. This state
has now been moved to the iterator returned by
ObjectReader::field_groups()
so that cloning of readers can now alwaysbe done cheaply.
All the
Iterator
trait functions are available to the new iteratorfunctions.
The API should be more intuitive as now there is zero chance of a user
accidentally interweaving calls to
ObjectReader::next_field()
withObjectReader::next_fields()
.Previously grouping keys always allocated a vector to hold the
respective values, but the vast majority of the time each group will
contain only a single value, so the group has been optimized such that
each field group is an enum that holds either one (which doesn't require
heap allocation) or many values (which will).
The algorithm for grouping keys has been changed to use a hashmap
instead of a vector which changes the algorithm complexity from (n^2 /
2) to (2n). It remains to be seen what kind of performance benefit this
brings.
This commit also moves several types out of being exported at the root
to avoid polluting the root namespace with highly specific structs (like
the iterator of values from grouped keys) that are better residing in
their own module (in this case
text
).Benchmarks show that deserialization performance seemed to benefit by a
percent or two