-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rust] Fix "UnexpectedEof" error when bbox results includes first item #362
[rust] Fix "UnexpectedEof" error when bbox results includes first item #362
Conversation
1a24f1c
to
664cba4
Compare
Note: The error doesn't occur until reading the *next* item. What's going on here: We want to set `count` in the iter initializer. FGB includes `count` in the header, but it's optional. Because flatbuffer doesn't have null types, we can't distinguish between `count` being unset and `count` truly being zero. So the way we detect an "empty" FGB is to try to read from the feature buffer in the iter's initializer. If the feature buffer is empty, this read will fail and we can immediately mark the FGB reader as "finished", which allows the iter to be well behaved elsewhere. The other factor is that, as we read features, we track the offset into the feature buffer (`cur_pos`) so that we know how far we need to Seek from where we currently are when jumping to the next feature in the BBOX results. So, every time we read a feature_size, we've read four bytes and then when we read the feature we've moved whatever the feature_len encoded in those four bytes. BUT THE THING IS - we weren't updating the cur_pos immediately after reading feature_len, we updated it after reading the feature, and included an extra 4 bytes at that point. That usually works. However - in the case of the very first feature, we've already added the "4" to the feature offset. That's a sentinel value we check for to infer it's the first feature so that we don't read the feature_len *again*. So in the case that the bbox included the first feature, we were adding an *extra* 4 to the cur_pos. The fix is: We now advance cur_pos immediately after we read the feature length. After reading the feature, we only add feature_len, not feature_len+4. I've also replaced the `finished` bool and the `cur_pos == 4` check with a more explicable `state` enum. Fixes flatgeobuf#361
664cba4
to
669030a
Compare
item_filter, | ||
count, | ||
feat_no: 0, | ||
cur_pos: 4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to initialize the cur_pos: 4
here to signify that we've already read the first feature length. This was necessary to correctly compute the offset if we were to seek to a subsequent feature.
return true; | ||
} | ||
} | ||
false | ||
} | ||
|
||
/// Read feature size and return true if end of dataset reached | ||
fn read_feature_size(fbs: &mut FgbFeature, reader: &mut R) -> bool { | ||
fbs.feature_buf.resize(4, 0); | ||
reader.read_exact(&mut fbs.feature_buf).is_err() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we didn't bump the cur_pos when actually reading feature size.
@@ -497,20 +535,21 @@ impl<R: Read, S> FeatureIter<R, S> { | |||
let _feature = size_prefixed_root_as_feature(&self.fbs.feature_buf)?; | |||
} | |||
self.feat_no += 1; | |||
self.cur_pos += self.fbs.feature_buf.len() as u64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feature_buf is the size_prefixed_root_as_feature. So it includes the size_prefix which is 4 bytes + feature_size.
In the new code, we only add feature_size here since we've already accounted for the +4 bytes closer to the actual reading, in read_feature_size.
} | ||
|
||
fn read_feature(&mut self) -> Result<()> { | ||
// Read feature size if not already read in select_all or select_bbox | ||
if self.cur_pos != 4 && Self::read_feature_size(&mut self.fbs, &mut self.reader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.cur_pos != 4
was our sentinel for "This is the first feature and we've already called read_feature_size in the initializer so don't do it again" - now we track an explicit State variable, which I think is a little more explicable.
@@ -137,6 +137,23 @@ fn read_bbox() -> Result<()> { | |||
Ok(()) | |||
} | |||
|
|||
#[test] | |||
fn read_bbox_including_first_feature() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to try it out, this test will explode with the old code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @michaelkirk - let's see if @pka have time to review, but if not this seems like good to merge to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for the embarassing implementation error.
The solution contains more changes than expected, but makes absolutely sense.
@michaelkirk is this ready to merge? |
Note: The error doesn't occur until reading the next item.
tldr; we had some bad accounting for tracking the file position that only affected bbox calculations that covered the first feature in the buffer + at least one other feature.
What's going on here
We want to set
count
in the iter initializer.FGB includes
count
in the header, but it's optional. Because flatbuffer doesn't have null types, we can't distinguish betweencount
being unset andcount
truly being zero.So the way we detect an "empty" FGB is to try to read from the feature buffer in the iter's initializer. If the feature buffer is empty, this read will fail and we can immediately mark the FGB reader as "finished", which allows the iter to be well behaved elsewhere.
The other factor is that, as we read features, we track the offset into the feature buffer (
cur_pos
) so that we know how far we need to Seek from where we currently are when jumping to the next feature in the BBOXresults.
So, every time we read a feature_size, we've read four bytes and then when we read the feature we've moved whatever the feature_len encoded in those four bytes.
BUT THE THING IS - we weren't updating the cur_pos immediately after
reading feature_len, we updated it after reading the feature, and
included an extra 4 bytes at that point.
That usually works.
However - in the case of the very first feature, we've already added the
"4" to the feature offset. That's a sentinel value we check for to infer
it's the first feature so that we don't read the feature_len again.
So in the case that the bbox included the first feature, we were adding
an extra 4 to the cur_pos.
The fix is
We now advance cur_pos immediately after we read the feature length. After reading the feature, we only add feature_len, not feature_len+4.
I've also replaced the
finished
bool and thecur_pos == 4
check with a more explicablestate
enum.Fixes #361