Skip to content
This repository has been archived by the owner on Feb 21, 2024. It is now read-only.

Add MinRow and MaxRow calls #1983

Merged
merged 16 commits into from
Jun 17, 2019
Merged

Add MinRow and MaxRow calls #1983

merged 16 commits into from
Jun 17, 2019

Conversation

yuce
Copy link
Contributor

@yuce yuce commented May 30, 2019

Overview

Fixes #1984

Pull request checklist

  • I have read the contributing guide.
  • I have agreed to the Contributor License Agreement.
  • I have updated the documentation.
  • I have resolved any merge conflicts.
  • I have included tests that cover my changes.
  • All new and existing tests pass.
  • Make sure PR title conforms to convention in CHANGELOG.md.
  • Add appropriate changelog label to PR (if applicable).

Code review checklist

This is the checklist that the reviewer will follow while reviewing your pull request. You do not need to do anything with this checklist, but be aware of what the reviewer will be looking for.

  • Ensure that any changes to external docs have been included in this pull request.
  • If the changes require that minor/major versions need to be updated, tag the PR appropriately.
  • Ensure the new code is properly commented and follows Idiomatic Go.
  • Check that tests have been written and that they cover the new functionality.
  • Run tests and ensure they pass.
  • Build and run the code, performing any applicable integration testing.
  • Make sure PR title conforms to convention in CHANGELOG.md.
  • Make sure PR is tagged with appropriate changelog label.

@jaffee
Copy link
Member

jaffee commented May 30, 2019

@seebs this is WIP (still needs to be hooked up to the executor I guess), but if you have any input.

@yuce this direction looks pretty reasonable to me

if btc.tree.Len() == 0 {
return 0, nil
}
return btc.tree.First()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will subtly be broken in some contexts by the rowcache code, because I think there are some circumstances where the rowcache code lets a nil container sneak into the btree for a while. I don't think that's necessarily a bug in this code; probably it means that the rowcache code has to be stricter about ensuring that nil containers don't end up in the tree. (It turns out that the thing where an in-place operation can result in a nil container somewhat breaks iterators, though, if we try to delete the nil containers instead of allowing them in temporarily.)

Copy link
Contributor Author

@yuce yuce May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see most bTreeContainers methods assume btc.tree is not nil.First method has almost the same code with Last. Doesn't Last cause any problems with the rowcache code?

EDITED

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. We should possibly file a separate issue to check through that. And actually it might have gotten fixed, it might be the SliceContainers that wasn't dealing with it well. My memory is fuzzy.

Copy link
Contributor

@seebs seebs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this code is wrong, but it will clash with the rowcache code which I just merged; in some cases, such as removing a bit from the first container, I think First() will be able to produce a nil container, when we really almost certainly want it to produce the first non-empty container.

I should probably fix the rowcache code, in this case.

@yuce
Copy link
Contributor Author

yuce commented May 31, 2019

I've added MinRow and MaxRow calls. They return results as Pair. It's trivial to return Pairs from Min and Max but I don't feel very good about returning a different type result from the same call. I can update the PR if necessary though. Field with keys=true work OK, but these calls don't accept a filter yet.

@yuce yuce changed the title [WIP] Min max rowid Min max rowid Jun 3, 2019
@yuce yuce changed the title Min max rowid Add MinRow and MaxRow calls Jun 3, 2019
@yuce
Copy link
Contributor Author

yuce commented Jun 3, 2019

I've implemented filter support, will push it if it useful:

$ curl localhost:10101/index/i1 -d ''
$ curl localhost:10101/index/f1 -d ''
$ curl localhost:10101/index/i1/field/f1 -d ''
$ curl localhost:10101/index/i1/query -d 'Set(10, f1=1)'
$ curl localhost:10101/index/i1/query -d 'Set(20, f1=1)'
$ curl localhost:10101/index/i1/query -d 'Set(20, f1=2)'
$ curl localhost:10101/index/i1/query -d 'Row(f1=1)'
{"results":[{"attrs":{},"columns":[10,20]}]}
$ curl localhost:10101/index/i1/query -d 'Set(10, f1=2)'
$ curl localhost:10101/index/i1/query -d 'MaxRow(Row(f1=2), field=f1)'
{"results":[{"id":2,"count":2}]}
$ curl localhost:10101/index/i1/query -d 'MaxRow(Row(f1=1), field=f1)'
{"results":[{"id":2,"count":2}]}
$ curl localhost:10101/index/i1/query -d 'Set(10, f1=3)'
$ curl localhost:10101/index/i1/query -d 'MaxRow(Row(f1=1), field=f1)'
{"results":[{"id":3,"count":1}]}
$ curl localhcurl localhost:10101/index/i1/query -d 'Row(f1=1)'
{"results":[{"attrs":{},"columns":[10,20]}]}
$ curl localhost:10101/index/i1/query -d 'Set(50, f1=5)'
$ curl localhost:10101/index/i1/query -d 'Row(f1=1)'
{"results":[{"attrs":{},"columns":[10,20]}]}
$ curl localhost:10101/index/i1/query -d 'Row(f1=3)'
{"results":[{"attrs":{},"columns":[10]}]}
$ curl localhost:10101/index/i1/query -d 'MaxRow(Row(f1=3), field=f1)'
{"results":[{"id":3,"count":1}]}
$ curl localhost:10101/index/i1/query -d 'MaxRow(Row(f1=5), field=f1)'
{"results":[{"id":5,"count":1}]}
$ curl localhcurl localhost:10101/index/i1/query -d 'Set(100, f1=3)'
$ curl localhost:10101/index/i1/query -d 'MaxRow(Row(f1=3), field=f1)'
{"results":[{"id":3,"count":2}]}

@yuce
Copy link
Contributor Author

yuce commented Jun 3, 2019

Pushed the filter change. We can disable/remove it if necessary.

Copy link
Member

@jaffee jaffee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could avoid a lot of the new code in roaring by taking advantage of the existing Bitmap.Iterator. (e.g. Bitmap.Iterator().Next())

I'd prefer that we not add new fields to fragment unless absolutely necessary (it seems to be more of an optimization in this case).

This does bring up the unfortunate omission that we need a fast way to iterate over a bitmap's containers in reverse in order to support Max with filters. Looping from max down to min will be incredibly inefficient for high cardinality, sparse data.

executor.go Outdated
}, nil
}

// executeMaxRowShard returns the minimum row ID for a shard.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/min/max

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my Neighbors patch included adding Prev to bitmap iterators, and allowed iterating down through containers, but I didn't implement the corresponding iteration down through bits.

@yuce
Copy link
Contributor Author

yuce commented Jun 11, 2019

updated.

if v != 0 {
r := bits.LeadingZeros64(v)
return uint16((i-1)*64 + 63 - r)
return uint16(i*64 + 63 - r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the changes to this function purely stylistic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that actually changes the computed value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's exactly the same except that it goes down to 1 instead of 0. I think the loop condition needs to be >=0 now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh. i missed the "-1" up above, and was just looking at the "i-1" here. nevermind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reversed these changes.

fragment.go Outdated
@@ -118,6 +118,8 @@ type fragment struct {

// Stats reporting.
maxRowID uint64
minRowID uint64
hasRowID bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much benefit to caching hasRowID and minRowID on the fragment. They shouldn't be very expensive to look up (without a filter), and this brings up the opportunity for cache invalidation bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved minRowID to a function.

fragment.go Outdated
return f.maxRowID, 1
}
// iterate back from max row ID and return the first that intersects with filter.
for i := f.maxRowID; i >= f.minRowID; i-- {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a TODO to implement reverse container iteration to improve performance here for sparse data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have that implemented already in one of my branches. it wasn't very hard but it also wasn't tested.

Copy link
Member

@jaffee jaffee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great, thanks!

@yuce
Copy link
Contributor Author

yuce commented Jun 17, 2019

great!

@yuce yuce merged commit 284aac6 into FeatureBaseDB:master Jun 17, 2019
@yuce yuce deleted the min-max-rowid branch June 17, 2019 19:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend Min and Max calls to work for other field types beyond BSI
3 participants