fix(sort): Make sort consistent for indexed and without indexed predicates#7241
fix(sort): Make sort consistent for indexed and without indexed predicates#7241ahsanbarkati merged 20 commits intomasterfrom
Conversation
|
@danielmai : this change is expected to produce consistent results while retaining performance of using sorted index. |
vmrajas
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 5 files reviewed, 3 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @pawanrawal, @vmrajas, and @vvbalaji-dgraph)
worker/sort.go, line 185 at r5 (raw file):
span.Annotate(nil, "sortWithIndex") maxCount := 0
Can you add a comment on what maxCount is going to store.
worker/sort.go, line 287 at r5 (raw file):
token := k.Term if !order.Desc { maxCount = int(ts.Count)
Can you add a comment on why this is done. I know that this would become a rather large comment. But, it would help in understanding the code later.
worker/sort.go, line 606 at r5 (raw file):
// intersectBucket intersects every UID list in the UID matrix with the // indexed bucket.
Can you also add a comment on the significance of count parameter.
pawanrawal
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 5 files reviewed, 10 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, and @vvbalaji-dgraph)
query/query1_test.go, line 1963 at r6 (raw file):
} func TestSortNull2(t *testing.T) {
All these could just have been one test with Table driven tests because you are testing the same thing that is null behavior with various values of first and offset
query/query1_test.go, line 2095 at r6 (raw file):
query := `{ me(func: uid(61, 62, 63, 64, 65, 66, 67, 68, 69, 70), orderdesc: pred, first: 2) {
How about a test with offset:5 and first:5 and another one with offset:9 and first:5
query/query2_test.go, line 977 at r6 (raw file):
"data": { "q": [{ "name_lang_index@de": "öffnen",
Are these the null value ones?
worker/sort.go, line 195 at r6 (raw file):
var emptySkippedList pb.List out[i].ulist = &emptyList out[i].skippedUids = &emptySkippedList
Just define it inline like
out[i].skippedUids = &pb.List{}
worker/sort.go, line 325 at r6 (raw file):
for _, uid := range ul.Uids { if _, ok := present[uid]; !ok { nullPreds = append(nullPreds, uid)
So this nullPreds is common across all the lists? Shouldn't it be reset for each uid list?
worker/sort.go, line 329 at r6 (raw file):
} requiredCount := int(ts.Count) - len(r.UidMatrix[i].Uids)
remainingCount would be a better name.
worker/sort.go, line 769 at r6 (raw file):
val.Value = nil nullsList = append(nullsList, uid) nullVals = append(nullVals, []types.Val{val})
What does val hold? Is it just a null? Would be the null be returned? I don't see null being returned in the query tests.
ahsanbarkati
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 5 files reviewed, 10 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @pawanrawal, and @vvbalaji-dgraph)
query/query1_test.go, line 1963 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
All these could just have been one test with Table driven tests because you are testing the same thing that is null behavior with various values of first and offset
Done.
query/query1_test.go, line 2095 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
How about a test with
offset:5 and first:5and another one withoffset:9 and first:5
Added these cases.
query/query2_test.go, line 977 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
Are these the null value ones?
Yes.
worker/sort.go, line 195 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
Just define it inline like
out[i].skippedUids = &pb.List{}
Done.
worker/sort.go, line 325 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
So this
nullPredsis common across all the lists? Shouldn't it be reset for each uid list?
Thanks. Fixed.
worker/sort.go, line 329 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
remainingCountwould be a better name.
Done.
worker/sort.go, line 769 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
What does
valhold? Is it just a null? Would be thenullbe returned? I don't seenullbeing returned in the query tests.
Yes. It is just the null values. It is being used for multisort, I wanted to keep the behavior of multisort unchanged, so appended the null values corresponding to the uid list, for which the predicates contains null or unsortable values.
pawanrawal
left a comment
There was a problem hiding this comment.
Do a bit more testing around different offsets and first combinations for different kind of datasets as discussed.
Reviewable status: 0 of 5 files reviewed, 10 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @pawanrawal, and @vvbalaji-dgraph)
query/query1_test.go, line 2008 at r7 (raw file):
{"predname":"nameJ"}]}}`, }, {11, 9, 5, false, `{"data": {"me":[
How about 12 and 5, does that return an empty result?
worker/sort.go, line 307 at r7 (raw file):
for i, ul := range ts.UidMatrix { var nullPreds []uint64
Add a comment
These are uids for which there is value for the predicate.
worker/sort.go, line 328 at r7 (raw file):
// We didn't get anything in the intersected result, it is possible that complete offset // is yet not applied. We need to apply the remainng offset on the null values.
remaining
worker/sort.go, line 331 at r7 (raw file):
if len(r.UidMatrix[i].Uids) == 0 { start := int(ts.Offset) - len(out[i].skippedUids.Uids) x.AssertTrue(start >= 0)
return an error here instead of crashing the server and think of a case where this doesn't hold
worker/sort.go, line 335 at r7 (raw file):
nullPreds = nullPreds[start:] } else { nullPreds = []uint64{}
nullsPreds = nullPreds[:0]
worker/sort.go, line 338 at r7 (raw file):
} } remainingCount := int(ts.Count) - len(r.UidMatrix[i].Uids)
Try removing the index on pred and your tests should still pass.
pawanrawal
left a comment
There was a problem hiding this comment.
Reviewed 1 of 5 files at r3, 1 of 3 files at r6, 3 of 3 files at r8.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @pawanrawal, and @vvbalaji-dgraph)
query/query1_test.go, line 2020 at r8 (raw file):
makeQuery := func(offset, first int32, desc, index bool) string { pred := "pred"
You can have two queries with the same alias
…cates (#7241) Make the result of sort consistent for predicate with and without index. After this change, the predicates with null values will appear at the last of the sort result for both ascending and descending sort, for both index/no-index case. Before this change the result for predicate with index didn't contain the null predicates and the one without index treated nulls as infinite value - they appeared at the beginning for descending while at the end for the ascending sort case. NOTE: This is a breaking change for the response of sort. Co-authored-by: Rajas Vanjape <rajas@dgraph.io> (cherry picked from commit 85278f8)
…cates (#7241) (#7301) Make the result of sort consistent for predicate with and without index. After this change, the predicates with null values will appear at the last of the sort result for both ascending and descending sort, for both index/no-index case. Before this change the result for predicate with index didn't contain the null predicates and the one without index treated nulls as infinite value - they appeared at the beginning for descending while at the end for the ascending sort case. NOTE: This is a breaking change for the response of sort. Co-authored-by: Rajas Vanjape <rajas@dgraph.io> (cherry picked from commit 85278f8)
…cates (#7241) Make the result of sort consistent for predicate with and without index. After this change, the predicates with null values will appear at the last of the sort result for both ascending and descending sort, for both index/no-index case. Before this change the result for predicate with index didn't contain the null predicates and the one without index treated nulls as infinite value - they appeared at the beginning for descending while at the end for the ascending sort case. NOTE: This is a breaking change for the response of sort. Co-authored-by: Rajas Vanjape <rajas@dgraph.io> (cherry picked from commit 85278f8)
…cates (#7241) (#7323) Make the result of sort consistent for predicate with and without index. After this change, the predicates with null values will appear at the last of the sort result for both ascending and descending sort, for both index/no-index case. Before this change the result for predicate with index didn't contain the null predicates and the one without index treated nulls as infinite value - they appeared at the beginning for descending while at the end for the ascending sort case. NOTE: This is a breaking change for the response of sort. Co-authored-by: Rajas Vanjape <rajas@dgraph.io> (cherry picked from commit 85278f8)
This PR makes the result of sort consistent for predicate with and without index. There was an issue that for predicate with index, we used to drop the nodes which doesn't contain the predicate used for sorting, while we retained them for that of without index.
The nulls were dropped in thee case of
sortWithIndexbecause in this case to calculate the result, we used to take intersection of the given list for sorting with the index of the predicate used for sorting, hence other predicates were dropped. In this change, we keep track of nodes which have null sort predicates and append them at the end of the result as required by pagination.Some benchmarks:
The dataset contain 1million UIDs, with half of them containing the predicate used for sorting. The time taken are in nano seconds.
The results of the same queries on Master:
This change is