Decouple text readers from iteration #92

nickbabcock · 2022-06-04T17:16:12Z

Decoupling text readers that are containers (ie: array and objects) from
iteration is a breaking change. Instead of writing:

let data = b"name=aaa name=bbb core=123 name=ccc name=ddd";
let tape = TextTape::from_slice(data)?;
let mut reader = tape.windows1252_reader();
while let Some((key, _op, value)) = reader.next_field() {
    println!("{:?}={:?}", key.read_str(), value.read_str()?);
}

One will now write:

let data = b"name=aaa name=bbb core=123 name=ccc name=ddd";
let tape = TextTape::from_slice(data)?;
let reader = tape.windows1252_reader();
for (key, _op, value) in reader.fields() {
    println!("{:?}={:?}", key.read_str(), value.read_str()?);
}

The new version is more concise and allows users to iterate over a
fields or values more than once without cloning, which could have been
expensive due to the internal state of ObjectReader internally relying
on a lazily allocated vector to keep track of grouped keys. This state
has now been moved to the iterator returned by
ObjectReader::field_groups() so that cloning of readers can now always
be done cheaply.

All the Iterator trait functions are available to the new iterator
functions.

The API should be more intuitive as now there is zero chance of a user
accidentally interweaving calls to ObjectReader::next_field() with
ObjectReader::next_fields().

Previously grouping keys always allocated a vector to hold the
respective values, but the vast majority of the time each group will
contain only a single value, so the group has been optimized such that
each field group is an enum that holds either one (which doesn't require
heap allocation) or many values (which will).

The algorithm for grouping keys has been changed to use a hashmap
instead of a vector which changes the algorithm complexity from (n^2 /
2) to (2n). It remains to be seen what kind of performance benefit this
brings.

This commit also moves several types out of being exported at the root
to avoid polluting the root namespace with highly specific structs (like
the iterator of values from grouped keys) that are better residing in
their own module (in this case text).

Benchmarks show that deserialization performance seemed to benefit by a
percent or two

Decoupling text readers that are containers (ie: array and objects) from iteration is a breaking change. Instead of writing: ```rust let data = b"name=aaa name=bbb core=123 name=ccc name=ddd"; let tape = TextTape::from_slice(data)?; let mut reader = tape.windows1252_reader(); while let Some((key, _op, value)) = reader.next_field() { println!("{:?}={:?}", key.read_str(), value.read_str()?); } ``` One will now write: ```rust let data = b"name=aaa name=bbb core=123 name=ccc name=ddd"; let tape = TextTape::from_slice(data)?; let reader = tape.windows1252_reader(); for (key, _op, value) in reader.fields() { println!("{:?}={:?}", key.read_str(), value.read_str()?); } ``` The new version is more concise and allows users to iterate over a fields or values more than once without cloning, which could have been expensive due to the internal state of `ObjectReader` internally relying on a lazily allocated vector to keep track of grouped keys. This state has now been moved to the iterator returned by `ObjectReader::field_groups()` so that cloning of readers can now always be done cheaply. All the `Iterator` trait functions are available to the new iterator functions. The API should be more intuitive as now there is zero chance of a user accidentally interweaving calls to `ObjectReader::next_field()` with `ObjectReader::next_fields()`. Previously grouping keys always allocated a vector to hold the respective values, but the vast majority of the time each group will contain only a single value, so the group has been optimized such that each field group is an enum that holds either one (which doesn't require heap allocation) or many values (which will). The algorithm for grouping keys has been changed to use a hashmap instead of a vector which changes the algorithm complexity from (n^2 / 2) to (2n). It remains to be seen what kind of performance benefit this brings. This commit also moves several types out of being exported at the root to avoid polluting the root namespace with highly specific structs (like the iterator of values from grouped keys) that are better residing in their own module (in this case `text`). Benchmarks show that deserialization performance seemed to benefit by a percent or two

nickbabcock force-pushed the reader branch from 89a553f to ab46589 Compare June 4, 2022 17:21

nickbabcock force-pushed the reader branch from ab46589 to b9e9229 Compare June 4, 2022 17:28

nickbabcock merged commit e74e123 into master Jun 4, 2022

nickbabcock deleted the reader branch June 4, 2022 17:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple text readers from iteration #92

Decouple text readers from iteration #92

nickbabcock commented Jun 4, 2022

Decouple text readers from iteration #92

Decouple text readers from iteration #92

Conversation

nickbabcock commented Jun 4, 2022