Skip to content
This repository has been archived by the owner on Feb 21, 2024. It is now read-only.

Topn fill #57

Closed
tgruben opened this issue Feb 29, 2016 · 4 comments
Closed

Topn fill #57

tgruben opened this issue Feb 29, 2016 · 4 comments
Assignees

Comments

@tgruben
Copy link
Member

tgruben commented Feb 29, 2016

The topn needs a bit of adjustment. The process should be:

scatter out topn to each slice
gather results
-->fetch missing counts from non reporting slices<-- the missing part
sort and return top n

I've added a test, TestExecutor_Execute_TopN_fill, to express the problem

you think you can fix that @benbjohnson ?

@benbjohnson
Copy link
Contributor

@tgruben Yep, I can take a look.

@benbjohnson benbjohnson self-assigned this Feb 29, 2016
@benbjohnson
Copy link
Contributor

@tgruben Did the previous implementation do a refetch? From what I see in 6822dd9 it does a TopN() to fetch N*2 results from the nodes trims that down to N results.

This is the only change required for N*2: master...benbjohnson:top-n-2

I could see doing a min N of 10 or so just to make sure that small Ns always fetch a large enough resultset.

Here are the lines from the previous implementation I was referencing:

https://github.com/umbel/pilosa/blob/6822dd99b5245195dbe45fbe9cb1718fdfb51a2f/core/topn.go#L174
https://github.com/umbel/pilosa/blob/6822dd99b5245195dbe45fbe9cb1718fdfb51a2f/core/topn.go#L190

@tgruben
Copy link
Member Author

tgruben commented Mar 1, 2016

Yes,but most of the code that does the fill in was located in
core/query.go if you look at TopNPackage and check_pair around line 252

it is basically the results of the first scatter
That TopN package gets the first batch and then the check pair sends out
the requests for the fill in, so it was a little more complicated that just
doubling the request

Does that code make sense or do I need to expand..that one function is
basically the "combine results from all queries part.
(CatQueryStepHandler)

-Todd

On Mon, Feb 29, 2016 at 5:30 PM, Ben Johnson notifications@github.com
wrote:

@tgruben https://github.com/tgruben Did the previous implementation do
a refetch? From what I see in 6822dd9
6822dd9
it does a TopN() to fetch N*2 results from the nodes trims that down to N
results.

This is the only change required for N*2: master...benbjohnson:top-n-2
master...benbjohnson:top-n-2

I could see doing a min N of 10 or so just to make sure that small Ns
always fetch a large enough resultset.

Here are the lines from the previous implementation I was referencing:

https://github.com/umbel/pilosa/blob/6822dd99b5245195dbe45fbe9cb1718fdfb51a2f/core/topn.go#L174

https://github.com/umbel/pilosa/blob/6822dd99b5245195dbe45fbe9cb1718fdfb51a2f/core/topn.go#L190


Reply to this email directly or view it on GitHub
#57 (comment).

@benbjohnson
Copy link
Contributor

@tgruben Yep, that make sense. I missed that refetching part the first time around. I'll get that added in.

tgruben pushed a commit to tgruben/pilosa that referenced this issue Sep 4, 2019
jaffee pushed a commit that referenced this issue Jan 12, 2020
return total match counts for either min or max
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants