Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(storage): cursor requests are [start, stop] instead of [start, stop) #21318

Merged
merged 2 commits into from
Apr 30, 2021

Conversation

jsternberg
Copy link
Contributor

The cursors were previously [start, stop) to be consistent with how flux
requests data, but the underlying storage file store was [start, stop]
because that's how influxql read data. This reverts back the cursor
behavior so that it is now [start, stop] everywhere and the conversion
from [start, stop) to [start, stop] is performed when doing the cursor
request to get the next cursor.

@jsternberg jsternberg requested a review from philjb April 28, 2021 14:27
@philjb philjb requested a review from dgnorton April 28, 2021 15:03
@lesam
Copy link
Contributor

lesam commented Apr 30, 2021

Is it possible to show this bug in a new flux end-to-end test? I see a lot of stuff testing mocks and lower level things, but for this sort of off by one error it would be nice to verify everything is really working end-to-end

@lesam
Copy link
Contributor

lesam commented Apr 30, 2021

I've tried a couple end-to-end tests and I can't make anything fail on master but pass on this branch - looks like range is setting the correct time bounds both on master and in this branch.

Reading the PR again, maybe this is expected? We seem to be changing both ends of the API simultaneously. Is this PR just to make the storage API agree with the docs, without user-facing change?

Copy link
Contributor

@lesam lesam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments - I'd like to see either an end-to-end test showing the broken case (so we can make sure it doesn't regress in the future, and confirm it is really fixed end-to-end), or get confirmation that this is a pure refactor without user-visible changes.

@jsternberg
Copy link
Contributor Author

I'm not sure what effects which storage engine so some of this may just be cleanup. I expect most queries are unaffected, but we did find one specific scenario.

When last() is used as a raw aggregate, a descending cursor is used. The descending cursor pathway opens a key cursor that reads the stop time, which is the start of a descending read, as an inclusive time. This point then gets filtered out by another portion of the code that determines it is not supposed to be present. In some rare circumstances, this one timestamp will be the only one in a TSM file entry. The file store returns that one point, the cursor filters it out, and then the cursor decides the query has no points because no points were returned.

In order to fix this, we had to make all of the APIs choose a consistent method of accessing the file store. Because of influxql setting a precedent, we decided storage APIs have to be [start, stop] or inclusive on both ends. This meant we had to change the flux cursor reading code to adjust its method, which is [start, stop), to [start, stop]. We did this in the place where the flux readers request new cursors to be created from the storage engine. This then meant we had to fix the storage engine code underneath that to consistently use [start, stop]. Previously, they were using a combination of the two. But, in most circumstances, the actual errant point would get filtered out by a higher level. For ascending cursors, the file store would return one too many points, but the cursors would then remove that errant point when it went to merge points.

The unit tests specifically test the underlying behavior for how files can get into bad situations. Generally, the current code works fine after a full compaction happens because the errors pop up when the files are more scattered. Also, these off-by-one errors don't appear to affect the cache.

@jsternberg jsternberg force-pushed the fix/inclusive-stop-cursor-request branch from 35cf560 to 72561bc Compare April 30, 2021 14:31
Copy link
Contributor

@lesam lesam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jsternberg jsternberg force-pushed the fix/inclusive-stop-cursor-request branch from 02857a2 to e2ac67d Compare April 30, 2021 15:08
The cursors were previously [start, stop) to be consistent with how flux
requests data, but the underlying storage file store was [start, stop]
because that's how influxql read data. This reverts back the cursor
behavior so that it is now [start, stop] everywhere and the conversion
from [start, stop) to [start, stop] is performed when doing the cursor
request to get the next cursor.
@jsternberg jsternberg force-pushed the fix/inclusive-stop-cursor-request branch from e2ac67d to 51b95cc Compare April 30, 2021 15:49
@lesam lesam merged commit 7766672 into master Apr 30, 2021
@jsternberg jsternberg deleted the fix/inclusive-stop-cursor-request branch April 30, 2021 16:15
lesam pushed a commit to lesam/influxdb that referenced this pull request Apr 30, 2021
The cursors were previously [start, stop) to be consistent with how flux
requests data, but the underlying storage file store was [start, stop]
because that's how influxql read data. This reverts back the cursor
behavior so that it is now [start, stop] everywhere and the conversion
from [start, stop) to [start, stop] is performed when doing the cursor
request to get the next cursor.

cherry-pick from influxdata#21318

Co-authored-by: Sam Arnold <sarnold@influxdata.com>
(cherry picked from commit 7766672)
lesam pushed a commit to lesam/influxdb that referenced this pull request Apr 30, 2021
The cursors were previously [start, stop) to be consistent with how flux
requests data, but the underlying storage file store was [start, stop]
because that's how influxql read data. This reverts back the cursor
behavior so that it is now [start, stop] everywhere and the conversion
from [start, stop) to [start, stop] is performed when doing the cursor
request to get the next cursor.

cherry-pick from influxdata#21318

Co-authored-by: Sam Arnold <sarnold@influxdata.com>
(cherry picked from commit 7766672)
lesam added a commit that referenced this pull request Apr 30, 2021
…op) (#21347)

* fix: backport tsdb fix for window pushdowns

From #19855

* fix(storage): cursor requests are [start, stop] instead of [start, stop)

The cursors were previously [start, stop) to be consistent with how flux
requests data, but the underlying storage file store was [start, stop]
because that's how influxql read data. This reverts back the cursor
behavior so that it is now [start, stop] everywhere and the conversion
from [start, stop) to [start, stop] is performed when doing the cursor
request to get the next cursor.

cherry-pick from #21318

Co-authored-by: Sam Arnold <sarnold@influxdata.com>
(cherry picked from commit 7766672)

* chore: fix formatting

Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
lesam added a commit to lesam/influxdb that referenced this pull request Apr 30, 2021
…op) (influxdata#21347)

* fix: backport tsdb fix for window pushdowns

From influxdata#19855

* fix(storage): cursor requests are [start, stop] instead of [start, stop)

The cursors were previously [start, stop) to be consistent with how flux
requests data, but the underlying storage file store was [start, stop]
because that's how influxql read data. This reverts back the cursor
behavior so that it is now [start, stop] everywhere and the conversion
from [start, stop) to [start, stop] is performed when doing the cursor
request to get the next cursor.

cherry-pick from influxdata#21318

Co-authored-by: Sam Arnold <sarnold@influxdata.com>
(cherry picked from commit 7766672)

* chore: fix formatting

Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
(cherry picked from commit 8edf7a4)
lesam added a commit to lesam/influxdb that referenced this pull request Apr 30, 2021
…op) (influxdata#21347)

* fix: backport tsdb fix for window pushdowns

From influxdata#19855

* fix(storage): cursor requests are [start, stop] instead of [start, stop)

The cursors were previously [start, stop) to be consistent with how flux
requests data, but the underlying storage file store was [start, stop]
because that's how influxql read data. This reverts back the cursor
behavior so that it is now [start, stop] everywhere and the conversion
from [start, stop) to [start, stop] is performed when doing the cursor
request to get the next cursor.

cherry-pick from influxdata#21318

Co-authored-by: Sam Arnold <sarnold@influxdata.com>
(cherry picked from commit 7766672)

* chore: fix formatting

Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
(cherry picked from commit 8edf7a4)
lesam added a commit that referenced this pull request Apr 30, 2021
…op) (#21347) (#21348)

* fix: backport tsdb fix for window pushdowns

From #19855

* fix(storage): cursor requests are [start, stop] instead of [start, stop)

The cursors were previously [start, stop) to be consistent with how flux
requests data, but the underlying storage file store was [start, stop]
because that's how influxql read data. This reverts back the cursor
behavior so that it is now [start, stop] everywhere and the conversion
from [start, stop) to [start, stop] is performed when doing the cursor
request to get the next cursor.

cherry-pick from #21318

Co-authored-by: Sam Arnold <sarnold@influxdata.com>
(cherry picked from commit 7766672)

* chore: fix formatting

Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
(cherry picked from commit 8edf7a4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants