Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Adhoc filter #120

Closed
dsztykman opened this issue Apr 5, 2022 · 14 comments · Fixed by #135
Closed

Improve Adhoc filter #120

dsztykman opened this issue Apr 5, 2022 · 14 comments · Fixed by #135
Assignees
Milestone

Comments

@dsztykman
Copy link

dsztykman commented Apr 5, 2022

Is your feature request related to a problem? Please describe.
When you have a very large dataset spanning over multiple months/years, the adhoc query filter generates a single query:
SELECT DISTINCT $column FROM $table LIMIT 1000
This query can be very expensive as it can span over millions/billions of rows.
Also it'd be great to order the response by count or something else.

Describe the solution you'd like
We should add the timerange filter of the current dashboard into the adhoc filter.
So let's say your dashboard is doing now - 15 min the query should be something like:
SELECT DISTINCT $column FROM $table WHERE $__timeFilter($timecolumn) LIMIT 1000
Here the $__timeFilter would replace the query with the proper timerange.

Obviously you can't necessary know the name of the timefiltering column, so we should create a new constant variable like clickhouse_adhoc_query_timecolumn where user would specify the timecolumn to be used.

In general customising the query filter generated would be very interesting.
Another option would be to create a variable like called $adhoc_filtered_column which would contain the value of the column selected by the user.
And then let the user manage how the query filter is created with a constant variable clickhouse_adhoc_query_filter where user would use the $adhoc_filtered_column and craft their own filters like:
SELECT DISTINCT $adhoc_filtered_column FROM table WHERE $__timeFilter(timecolumn) ORDER BY COUNT() LIMIT 10

I prefer option 2 as the user can optimise and customise easily how they want to use adhocfilter

@gingerwizard
Copy link
Collaborator

Thanks @dsztykman
We discussed this and it definitely seems like a good enhancement. I think we will expose the ability to control the specific query used to get the values i.e. set SELECT DISTINCT $column FROM $table WHERE $__timeFilter($timecolumn) LIMIT 1000
$column will be available as a variable, as well as the normal global filters e.g. $__timeFilter(columnName)

The default will remain SELECT DISTINCT $column FROM $table LIMIT 1000

This will allow use of things like SAMPLE.

@bossinc
Copy link
Collaborator

bossinc commented Jun 24, 2022

@dsztykman I believe you are asking for a way to customize the adhoc filter. You can already customize the adhoc filter by creating a constant variable named clickhouse_adhoc_query with a value of your query. You can use the time range filter in your query.
https://grafana.com/grafana/plugins/grafana-clickhouse-datasource/#ad-hoc-filters

@bossinc bossinc reopened this Jun 24, 2022
@dsztykman
Copy link
Author

dsztykman commented Jun 24, 2022

You can customise with a static value to specify which table you are working on.
But you can't customise how to retrieve the value out of a column that you dynamically select from the adhoc filter.
That's what I'm looking for here.
@bossinc does that make sense?

@gingerwizard
Copy link
Collaborator

We aren't clear what you mean @dsztykman
You can now control the query used for ad-hoc filters - using both the $column and time variables. Are you looking for the query to be column dependent? i.e. different queries used for different columns?

@dsztykman
Copy link
Author

No I didn't realize we could use $column that's perfect !

@bossinc
Copy link
Collaborator

bossinc commented Jun 27, 2022

I implemented column to return all the columns that are being used for adhoc filters.
In my screenshot, I have an adhoc filter for machine_group and machine_name and $__adhocColumns is returning these two columns.
image
@dsztykman I am pretty sure this isn't what you are imagining this variable to do. What value is expected to be returned by column?

@dsztykman
Copy link
Author

No it's definitely not what I had in mind.
Basically I want to be able to specify a query filter in the adhoc filters.
Let's say you have a table called sample.cts and you want to enable adhoc filtering on this table in particular but in that table you want to limit adhoc query to the current timerange and order by count desc.

SELECT $column FROM sample.cts WHERE $__timeFilter(timestamp) ORDER BY COUNT() DESC GROUP BY $column LIMIT 100

This will populate the different value in the adhoc filter, based on the column selected by the users.
The clickhouse_adhoc_query allow you to control the column selection whereas I want to filter the value after selecting the column.
(Does that make sense?)

@dsztykman dsztykman reopened this Jun 28, 2022
@gingerwizard
Copy link
Collaborator

Maybe we should change the name clickhouse_adhoc_query to clickhouse_adhoc_column_query and introduce clickhouse_adhoc_value_query..would this fulfill the requirement @dsztykman ?

@dsztykman
Copy link
Author

dsztykman commented Jul 6, 2022

Yeah that would be good, but I would also need a variable for the selected column, so I can I create a constant variable called clickhouse_adhoc_value_query which would contain:
SELECT DISTINCT $column_adhoc FROM table WHERE $__timeFilter(timestamp) ORDER BY COUNT() LIMIT 100
For example

@gingerwizard
Copy link
Collaborator

Sorry im confused clickhouse_adhoc_query currently allows you to control both the column and values. Are you looking to be able to specify a different query for populating the values PER column?

@dsztykman
Copy link
Author

I think I got confused :)
Technically yes I think clickhouse_adhoc_column_query and clickhouse_adhoc_value_query would fulfill the requirement.
Let's say I have 30 different tables, with the clickhouse_adhoc_column_query I could specify the table I want to use to get the column list.
Then with clickhouse_adhoc_value_query I could specify how to retrieve the list of values.
Does that make sense?

@gingerwizard
Copy link
Collaborator

I discussed this with @dsztykman, and i think there are two requests here:

  1. Currently clickhouse_adhoc_query returns the column/s and values to render for selection. You can return multiple columns and values. This is limited in that one query will control the values for all columns, e.g. you can't have a different limit or ordering per column - per SQL limitations. Columns from multiple tables are possible but you'd need to use a JOIN/UNION and this is restrictive and likely slow.
  2. (1) means that the same date field is available for all columns. This is currently likely as all queries will be from the same table. However, for multiple table scenarios, this might not be the case.
  3. One option might be a different ad-hoc filter table. Unfortunately, this isn't possible. The clickhouse_adhoc_query applies to all ad-hoc filters. Currently, if you have multiple ad-hoc filters on a dashboard for different tables, the same clickhouse_adhoc_query value will be used.

Allowing multiple ad-hoc filters, each with their own clickhouse_adhoc_query, would address alot of these issues.

  1. For multiple tables, users could simply create separate ad-hoc filters associating each with their own filter.
  2. If users need separate queries for each column, they could in theory group these into different ad-hoc queries.

This is imperfect since (2) might not always be viable but having a query for each column isn't better IMO - it would still require creation of lots of constant variables per column.

@gingerwizard
Copy link
Collaborator

Discussed with @bossinc

The way we think about this is template variables give you full context of the column - allowing you to specify a query per column. With hundreds of columns, however, this results in hundreds of drop-downs. This isn't realistic.

The other extreme (which is the current implementation of ad-hoc filters) is no context of the column. This means a single query for all columns - column values cant have a different ORDER and we can't filter them by a different date field.

The compromise, especially for column-orientated dbs with many columns, is probably an ad-hoc filter which is table aware. Discussing with @bossinc this feels like a core Grafana request. Most columns in the same table would be ordered and filtered by the same time column for example.

@dsztykman

@bossinc
Copy link
Collaborator

bossinc commented Jan 27, 2023

Closing. Please reopen if there is more to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants