Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rust] Fix "UnexpectedEof" error when bbox results includes first item #362

Merged

Conversation

michaelkirk
Copy link
Collaborator

@michaelkirk michaelkirk commented Apr 18, 2024

Note: The error doesn't occur until reading the next item.

tldr; we had some bad accounting for tracking the file position that only affected bbox calculations that covered the first feature in the buffer + at least one other feature.

What's going on here

We want to set count in the iter initializer.

FGB includes count in the header, but it's optional. Because flatbuffer doesn't have null types, we can't distinguish between count being unset and count truly being zero.

So the way we detect an "empty" FGB is to try to read from the feature buffer in the iter's initializer. If the feature buffer is empty, this read will fail and we can immediately mark the FGB reader as "finished", which allows the iter to be well behaved elsewhere.

The other factor is that, as we read features, we track the offset into the feature buffer (cur_pos) so that we know how far we need to Seek from where we currently are when jumping to the next feature in the BBOX
results.

So, every time we read a feature_size, we've read four bytes and then when we read the feature we've moved whatever the feature_len encoded in those four bytes.

BUT THE THING IS - we weren't updating the cur_pos immediately after
reading feature_len, we updated it after reading the feature, and
included an extra 4 bytes at that point.

That usually works.

However - in the case of the very first feature, we've already added the
"4" to the feature offset. That's a sentinel value we check for to infer
it's the first feature so that we don't read the feature_len again.

So in the case that the bbox included the first feature, we were adding
an extra 4 to the cur_pos.

The fix is

We now advance cur_pos immediately after we read the feature length. After reading the feature, we only add feature_len, not feature_len+4.

I've also replaced the finished bool and the cur_pos == 4 check with a more explicable state enum.

Fixes #361

Note: The error doesn't occur until reading the *next* item.

What's going on here:

We want to set `count` in the iter initializer.

FGB includes `count` in the header, but it's optional. Because
flatbuffer doesn't have null types, we can't distinguish between `count`
being unset and `count` truly being zero.

So the way we detect an "empty" FGB is to try to read from the feature
buffer in the iter's initializer. If the feature buffer is empty, this
read will fail and we can immediately mark the FGB reader as "finished",
which allows the iter to be well behaved elsewhere.

The other factor is that, as we read features, we track the offset into
the feature buffer (`cur_pos`) so that we know how far we need to Seek
from where we currently are when jumping to the next feature in the BBOX
results.

So, every time we read a feature_size, we've read four bytes and then
when we read the feature we've moved whatever the feature_len encoded in
those four bytes.

BUT THE THING IS - we weren't updating the cur_pos immediately after
reading feature_len, we updated it after reading the feature, and
included an extra 4 bytes at that point.

That usually works.

However - in the case of the very first feature, we've already added the
"4" to the feature offset. That's a sentinel value we check for to infer
it's the first feature so that we don't read the feature_len *again*.

So in the case that the bbox included the first feature, we were adding
an *extra* 4 to the cur_pos.

The fix is:

We now advance cur_pos immediately after we read the feature length.
After reading the feature, we only add feature_len, not feature_len+4.

I've also replaced the `finished` bool and the `cur_pos == 4` check
with a more explicable `state` enum.

Fixes flatgeobuf#361
item_filter,
count,
feat_no: 0,
cur_pos: 4,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to initialize the cur_pos: 4 here to signify that we've already read the first feature length. This was necessary to correctly compute the offset if we were to seek to a subsequent feature.

return true;
}
}
false
}

/// Read feature size and return true if end of dataset reached
fn read_feature_size(fbs: &mut FgbFeature, reader: &mut R) -> bool {
fbs.feature_buf.resize(4, 0);
reader.read_exact(&mut fbs.feature_buf).is_err()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we didn't bump the cur_pos when actually reading feature size.

@@ -497,20 +535,21 @@ impl<R: Read, S> FeatureIter<R, S> {
let _feature = size_prefixed_root_as_feature(&self.fbs.feature_buf)?;
}
self.feat_no += 1;
self.cur_pos += self.fbs.feature_buf.len() as u64;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature_buf is the size_prefixed_root_as_feature. So it includes the size_prefix which is 4 bytes + feature_size.

In the new code, we only add feature_size here since we've already accounted for the +4 bytes closer to the actual reading, in read_feature_size.

}

fn read_feature(&mut self) -> Result<()> {
// Read feature size if not already read in select_all or select_bbox
if self.cur_pos != 4 && Self::read_feature_size(&mut self.fbs, &mut self.reader) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.cur_pos != 4 was our sentinel for "This is the first feature and we've already called read_feature_size in the initializer so don't do it again" - now we track an explicit State variable, which I think is a little more explicable.

@@ -137,6 +137,23 @@ fn read_bbox() -> Result<()> {
Ok(())
}

#[test]
fn read_bbox_including_first_feature() -> Result<()> {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to try it out, this test will explode with the old code.

@bjornharrtell bjornharrtell self-requested a review April 18, 2024 07:59
Copy link
Member

@bjornharrtell bjornharrtell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @michaelkirk - let's see if @pka have time to review, but if not this seems like good to merge to me.

Copy link
Member

@pka pka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for the embarassing implementation error.

The solution contains more changes than expected, but makes absolutely sense.

src/rust/src/file_reader.rs Outdated Show resolved Hide resolved
@bjornharrtell
Copy link
Member

@michaelkirk is this ready to merge?

@michaelkirk michaelkirk merged commit e4eb1ea into flatgeobuf:master Apr 30, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error: IO(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })
3 participants