Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wire up SELECT DISTINCT #2568

Merged
merged 28 commits into from
May 19, 2015
Merged

Wire up SELECT DISTINCT #2568

merged 28 commits into from
May 19, 2015

Conversation

corylanou
Copy link
Contributor

Distinct with strings

the raw data

> select * from names
name: names
-----------
time                    first   last
2015-05-01T00:00:00Z    suzie   smith
2015-05-01T08:00:00Z    frank   smith
2015-05-01T16:00:00Z    jonny   jones

distinct with no where or group by

> select distinct(first) from names
name: names
-----------
time                    distinct
1970-01-01T00:00:00Z    [frank jonny suzie]

> select distinct(last) from names
name: names
-----------
time                    distinct
1970-01-01T00:00:00Z    [jones smith]

Distinct with no fields will currently return an error (not supported)

> select distinct() from names
ERR: error parsing query: distinct function requires at least one argument

Distinct with more than one field will currently return an error (not supported)

> select distinct(first, last) from names
ERR: error parsing query: distinct function can only have one argument

Distinct with a group by and no where clause will err out (like all aggregate functions should)

> select distinct(last) from names group by time(1h)
ERR: error parsing query: aggregate functions with GROUP BY time require a WHERE time clause

Distinct with group by and where should bucket values appropriatly

> select distinct(last) from names where time >= '2015-05-01T00:00:00Z' and time <= '2015-05-01T16:00:00Z' group by time(12h)
name: names
-----------
time                    distinct
2015-05-01T00:00:00Z    [smith]
2015-05-01T12:00:00Z    [jones]

> select distinct(first) from names where time >= '2015-05-01T00:00:00Z' and time <= '2015-05-01T16:00:00Z' group by time(12h)
name: names
-----------
time                    distinct
2015-05-01T00:00:00Z    [frank suzie]
2015-05-01T12:00:00Z    [jonny]

Distinct with numeric data

the raw data

> select * from cpu
name: cpu
---------
time                    value
2015-05-01T00:00:00Z    1.1
2015-05-01T08:00:00Z    1.2
2015-05-01T16:00:00Z    1.3
2015-05-02T00:00:00Z    2.1
2015-05-02T08:00:00Z    2.2
2015-05-02T16:00:00Z    2.3
2015-05-03T00:00:00Z    3.1
2015-05-03T08:00:00Z    3.2
2015-05-03T16:00:00Z    3.3
2015-05-04T00:00:00Z    4.1
2015-05-04T08:00:00Z    4.2
2015-05-04T16:00:00Z    4.3

distinct with no where or group by

select distinct(value) from cpu
name: cpu
---------
time                    distinct
1970-01-01T00:00:00Z    [1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 4.1 4.2 4.3]

distinct with different groupings

> select distinct(value) from cpu where time >= '2015-05-01T00:00:00Z' and time <= '2015-05-04T00:00:00Z' group by time(12h)
name: cpu
---------
time                    distinct
2015-05-01T00:00:00Z    [1.1 1.2]
2015-05-01T12:00:00Z    [1.3]
2015-05-02T00:00:00Z    [2.1 2.2]
2015-05-02T12:00:00Z    [2.3]
2015-05-03T00:00:00Z    [3.1 3.2]
2015-05-03T12:00:00Z    [3.3]
2015-05-04T00:00:00Z    [4.1 4.2]
> select distinct(value) from cpu where time >= '2015-05-01T00:00:00Z' and time <= '2015-05-04T00:00:00Z' group by time(1d)
name: cpu
---------
time                    distinct
2015-05-01T00:00:00Z    [1.1 1.2 1.3]
2015-05-02T00:00:00Z    [2.1 2.2 2.3]
2015-05-03T00:00:00Z    [3.1 3.2 3.3]
2015-05-04T00:00:00Z    [4.1 4.2 4.3]

@pauldix
Copy link
Member

pauldix commented May 13, 2015

Looks good so far. Although your sort doesn't handle a bool field :).

Curious how you're going to handle it when they pass in a tag name and a WHERE time clause ;)

@corylanou corylanou changed the title WIP - Distinct aggregate 1815 Wire up SELECT DISTINCT May 14, 2015
@corylanou corylanou changed the title Wire up SELECT DISTINCT Wire up SELECT DISTINCT May 14, 2015
@corylanou
Copy link
Contributor Author

Fixes #1815

@corylanou corylanou force-pushed the distinct-aggregate-1815 branch 4 times, most recently from 2ae20cf to fbc59ab Compare May 18, 2015 22:27

return other
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify this method to look like this... https://gist.github.com/dgnorton/3125e459045a5cedc948

@dgnorton
Copy link
Contributor

I agree with @jwilder that the Validate* functions should be converted to private...even the pre-existing ones. Including the root Validate function, which is only called at the end of parseSelectStatement.

@dgnorton
Copy link
Contributor

Need to add more tests for the validation code. e.g. select count(distinct(too, many, arguments)) passes.

@corylanou
Copy link
Contributor Author

Addressed all items. Will address select count(distinct(too, many, arguments)) tests in #1891 as it has more edge cases to deal with including this one.

@jwilder
Copy link
Contributor

jwilder commented May 19, 2015

LGTM 👍

corylanou added a commit that referenced this pull request May 19, 2015
@corylanou corylanou merged commit d8ddbef into master May 19, 2015
@corylanou corylanou deleted the distinct-aggregate-1815 branch May 19, 2015 15:27
@pauldix
Copy link
Member

pauldix commented May 19, 2015

Looks good. One thing left over from this is to handle when the user passes in a tag key to distinct. Logged #2612

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants