results not always in same order with aggregates that have `group by *` #3968

dgnorton · 2015-09-03T02:23:31Z

> select * from rp0.cpu
name: cpu
---------
time            count    host        region    value
2015-09-03T00:34:50Z    1    server01    uswest    
2015-09-03T00:34:50Z    1    server01        
2015-09-03T00:34:50Z    1    server01    useast    

> select * from rp0.cpu
name: cpu
---------
time            count    host        region    value
2015-09-03T00:34:50Z    1    server01        
2015-09-03T00:34:50Z    1    server01    useast    
2015-09-03T00:34:50Z    1    server01    uswest

Can be reproduced by running this script multiple times until it fails: https://gist.github.com/dgnorton/b944f413d159c6d18957
Or, run the script once and then use the CLI to run select * from rp0.cpu multiple times.

The text was updated successfully, but these errors were encountered:

corylanou · 2015-09-03T13:17:44Z

Is this a group by * issues or a select * issue? The title is confusing me.

dgnorton · 2015-09-03T13:26:01Z

@corylanou the result of the select * above comes from a CQ that did a select count(value) into rp0.cpu from cpu group by time(5s), *. I think what's happening is that the group by * part caused it to be split out by unique tag sets. Selects only guarantee deterministic ordering by two fields: timestamp and then value (I think). In the case above, both timestamps and values are the same and it doesn't fallback to tag sets or series ID, so ordering becomes non-deterministic.

corylanou · 2015-09-03T13:28:07Z

Ah, that makes sense. I don't think any "sorting" is wired up besides the time/value as you stated. I have a lot of that "magic" worked out in some of my functions. It's a bit of a beast though because of the fact that all values are interfaces. However, for a first pass, we could settle for sorting on time, value, seriesID. That would solve this bug for now at least.

dgnorton · 2015-09-03T14:03:46Z

@otoolep and @DanielMorsing have done work in this area recently. Getting the ordering right without killing query performance has to be handled with care. E.g., adding a single integer comparison for seriesID in the point heap Less method could slow many queries by 20+%. @otoolep also brought up the concern that ordering by seriesID would not yield deterministic results between two separate clusters.

corylanou · 2015-09-03T14:09:19Z

Yeah, also realized that seriesID is meaning less as you probably need to sort based on column order (which makes more sense to me) and not on seriesID, which is not the same order as what a user can ask for.

dgnorton · 2016-05-17T16:57:35Z

Query engine has been rewritten. Closing.

dgnorton mentioned this issue Sep 3, 2015

fix #2555: add backreference in CQs #3876

Merged

beckettsean added the area/queries label Sep 8, 2015

dgnorton closed this as completed May 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

results not always in same order with aggregates that have `group by *` #3968

results not always in same order with aggregates that have `group by *` #3968

dgnorton commented Sep 3, 2015

corylanou commented Sep 3, 2015

dgnorton commented Sep 3, 2015

corylanou commented Sep 3, 2015

dgnorton commented Sep 3, 2015

corylanou commented Sep 3, 2015

dgnorton commented May 17, 2016

results not always in same order with aggregates that have group by * #3968

results not always in same order with aggregates that have group by * #3968

Comments

dgnorton commented Sep 3, 2015

corylanou commented Sep 3, 2015

dgnorton commented Sep 3, 2015

corylanou commented Sep 3, 2015

dgnorton commented Sep 3, 2015

corylanou commented Sep 3, 2015

dgnorton commented May 17, 2016

results not always in same order with aggregates that have `group by *` #3968

results not always in same order with aggregates that have `group by *` #3968