Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple text readers from iteration #92

Merged
merged 1 commit into from
Jun 4, 2022
Merged

Decouple text readers from iteration #92

merged 1 commit into from
Jun 4, 2022

Conversation

nickbabcock
Copy link
Contributor

Decoupling text readers that are containers (ie: array and objects) from
iteration is a breaking change. Instead of writing:

let data = b"name=aaa name=bbb core=123 name=ccc name=ddd";
let tape = TextTape::from_slice(data)?;
let mut reader = tape.windows1252_reader();
while let Some((key, _op, value)) = reader.next_field() {
    println!("{:?}={:?}", key.read_str(), value.read_str()?);
}

One will now write:

let data = b"name=aaa name=bbb core=123 name=ccc name=ddd";
let tape = TextTape::from_slice(data)?;
let reader = tape.windows1252_reader();
for (key, _op, value) in reader.fields() {
    println!("{:?}={:?}", key.read_str(), value.read_str()?);
}

The new version is more concise and allows users to iterate over a
fields or values more than once without cloning, which could have been
expensive due to the internal state of ObjectReader internally relying
on a lazily allocated vector to keep track of grouped keys. This state
has now been moved to the iterator returned by
ObjectReader::field_groups() so that cloning of readers can now always
be done cheaply.

All the Iterator trait functions are available to the new iterator
functions.

The API should be more intuitive as now there is zero chance of a user
accidentally interweaving calls to ObjectReader::next_field() with
ObjectReader::next_fields().

Previously grouping keys always allocated a vector to hold the
respective values, but the vast majority of the time each group will
contain only a single value, so the group has been optimized such that
each field group is an enum that holds either one (which doesn't require
heap allocation) or many values (which will).

The algorithm for grouping keys has been changed to use a hashmap
instead of a vector which changes the algorithm complexity from (n^2 /
2) to (2n). It remains to be seen what kind of performance benefit this
brings.

This commit also moves several types out of being exported at the root
to avoid polluting the root namespace with highly specific structs (like
the iterator of values from grouped keys) that are better residing in
their own module (in this case text).

Benchmarks show that deserialization performance seemed to benefit by a
percent or two

Decoupling text readers that are containers (ie: array and objects) from
iteration is a breaking change. Instead of writing:

```rust
let data = b"name=aaa name=bbb core=123 name=ccc name=ddd";
let tape = TextTape::from_slice(data)?;
let mut reader = tape.windows1252_reader();
while let Some((key, _op, value)) = reader.next_field() {
    println!("{:?}={:?}", key.read_str(), value.read_str()?);
}
```

One will now write:

```rust
let data = b"name=aaa name=bbb core=123 name=ccc name=ddd";
let tape = TextTape::from_slice(data)?;
let reader = tape.windows1252_reader();
for (key, _op, value) in reader.fields() {
    println!("{:?}={:?}", key.read_str(), value.read_str()?);
}
```

The new version is more concise and allows users to iterate over a
fields or values more than once without cloning, which could have been
expensive due to the internal state of `ObjectReader` internally relying
on a lazily allocated vector to keep track of grouped keys. This state
has now been moved to the iterator returned by
`ObjectReader::field_groups()` so that cloning of readers can now always
be done cheaply.

All the `Iterator` trait functions are available to the new iterator
functions.

The API should be more intuitive as now there is zero chance of a user
accidentally interweaving calls to `ObjectReader::next_field()` with
`ObjectReader::next_fields()`.

Previously grouping keys always allocated a vector to hold the
respective values, but the vast majority of the time each group will
contain only a single value, so the group has been optimized such that
each field group is an enum that holds either one (which doesn't require
heap allocation) or many values (which will).

The algorithm for grouping keys has been changed to use a hashmap
instead of a vector which changes the algorithm complexity from (n^2 /
2) to (2n). It remains to be seen what kind of performance benefit this
brings.

This commit also moves several types out of being exported at the root
to avoid polluting the root namespace with highly specific structs (like
the iterator of values from grouped keys) that are better residing in
their own module (in this case `text`).

Benchmarks show that deserialization performance seemed to benefit by a
percent or two
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant