Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENG-46252 : Pinot time range filtering support #224

Merged
merged 12 commits into from
Jul 26, 2024

Conversation

anujgoyal1
Copy link
Contributor

No description provided.

@anujgoyal1 anujgoyal1 changed the title Pinot time range filtering support ENG-46252 : Pinot time range filtering support Jul 22, 2024
# additionalTenantFilters = [
# {
# "tenantId" : "testTenant",
# "timeRangeAndFilters" : [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does multiple timeRangeAndFilters mean here?
So, different filters for multiple time ranges, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

# "timeRangeAndFilters" : [
# {
# "start_time_millis" : 1,
# "endTimeMillis" : 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endTimeMillis -> end_time_millis
This will be an epoch time, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'm thinking of making start_time_millis as starTimeMillis to keep it consistent with the other keys

@@ -82,6 +82,18 @@ service.config = {
requestHandlerInfo = {
tenantColumnName = tenant_id
startTimeAttributeName = "EVENT.startTime"
# additionalTenantFilters = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additionalTenantFilters -> scopedAdditionalFilters?
Just a thought: Do you think we might need to extend that in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it's already under a table scope, I think we not need add another scope here.
I don't think we need to further extend it to query scope(Its not even available in query as we discussed)

try {
JsonNode jsonNode = objectMapper.readTree(filterJson);

Filter.Builder personBuilder = Filter.newBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personBuilder -> fitlerBuilder

@@ -3,9 +3,6 @@
import static org.hypertrace.core.query.service.QueryFunctionConstants.QUERY_FUNCTION_COUNT;
import static org.hypertrace.core.query.service.QueryRequestUtil.getAlias;
import static org.hypertrace.core.query.service.QueryRequestUtil.getLogicalColumnName;
import static org.hypertrace.core.query.service.api.Expression.ValueCase.ATTRIBUTE_EXPRESSION;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we move this some other place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@kotharironak
Copy link
Contributor

Can we add some tests?

This comment has been minimized.

@anujgoyal1 anujgoyal1 marked this pull request as ready for review July 24, 2024 06:41
@anujgoyal1 anujgoyal1 requested a review from a team as a code owner July 24, 2024 06:41

This comment has been minimized.

{
"startTimeMillis" = 1
"endTimeMillis" = 2
"filter" : "{\r\n \"lhs\": {\r\n \"columnIdentifier\": {\r\n \"columnName\": \"SERVICE.name\"\r\n }\r\n },\r\n \"operator\": \"EQ\",\r\n \"rhs\": {\r\n \"literal\": {\r\n \"value\": {\r\n \"string\": \"dummyService\"\r\n }\r\n }\r\n }\r\n}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Do we need \r\n? If we just put a valid single line json string, would it work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would work. Was just added by the json string escaper I was using, got rid of it.

"tenantId" = "__default"
"timeRangeAndFilters" = [
{
"filter" : "{\r\n \"lhs\": {\r\n \"columnIdentifier\": {\r\n \"columnName\": \"SERVICE.name\"\r\n }\r\n },\r\n \"operator\": \"EQ\",\r\n \"rhs\": {\r\n \"literal\": {\r\n \"value\": {\r\n \"string\": \"dummyService\"\r\n }\r\n }\r\n }\r\n}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need \r\n? If we just put a single string, would it work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

}
}

private Optional<Filter> buildTimeRangeAndFilters(Optional<String> timeRangeAttributeOptional) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: timeRangeAttributeOptional -> timeRangeAttribute

config.getString("name"), config.getConfig("requestHandlerInfo"));

// domain-service-handler only has filter and no time range
if (config.getString("name").equals("domain-service-handler")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test using the correct handler - domain-service-handler? For this, the original request should be the same after applying filters, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it should have been - trace-view-handler, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain-event-view only has filter.
span-event-view only has time range
trace-view has both time range and filter.

if (this.startTimeMillis.isPresent() && this.endTimeMillis.isPresent()) {
filterBuilder.addChildFilter(
QueryRequestUtil.createLongFilter(
timeRangeAttributeOptional.get(), Operator.GT, this.endTimeMillis.get()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Is this check correct? Should endTime be compared with the greater than operator (I don't remember the text you had in your sublime, so just re-checking)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is how it's supposed to be as it's an or filter
This is what we discussed.

t3....   t1  t2 ....t4


((s >= t2 OR s <= t1) OR (API NOT IN (api1, api2)) AND (t3, t4))

This comment has been minimized.

This comment has been minimized.

kotharironak
kotharironak previously approved these changes Jul 24, 2024

This comment has been minimized.

kotharironak
kotharironak previously approved these changes Jul 24, 2024
@@ -27,4 +27,9 @@ public interface RequestHandler {

/** Handle the request and add rows to the collector. */
Observable<Row> handleRequest(QueryRequest request, ExecutionContext executionContext);

default QueryRequest applyAdditionalFilters(
Copy link
Contributor

@satish-mittal satish-mittal Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already an existing step of query transformation. Look at QueryTransformation. This step of adding filters should go there.
https://github.com/hypertrace/query-service/blob/main/query-service-impl/src/main/java/org/hypertrace/core/query/service/QueryTransformation.java

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We started from this, then realised we don't get scope information in query request.
If we apply a filter on Transformation step, it'll be applied on all the scopes and would work as global filter. Which is wrong behaviour
Query Service decides the scope and it's handler based of the query. Hence pushed down these filters to handler

This comment has been minimized.


private Optional<Filter> buildTimeRangeAndFilters(Optional<String> timeRangeAttribute) {
Filter.Builder filterBuilder = Filter.newBuilder();
filterBuilder.setOperator(Operator.OR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why OR? don't we want the filter to be: time > startTime and time < endTime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -39,6 +39,18 @@ service.config = {
startTimeAttributeName = "Trace.start_time_millis"
slowQueryThresholdMs = 0
percentileAggFunction = "PERCENTILE"
additionalTenantFilters = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple problems in the way this new config has been defined:

  • It is is very confusing. Only after going through the code one can understand.
  • There is a hardcoded assumption that startTimeAttributeName will act as the lhs key for the 2 values below. Makes it very specific and no room for generalising.
  • It is hardcoded with OR operator.

I would prefer it being defined as follows:

additionalHandlerConfigs = [
  operator = "OR"
  {
    key = "Trace.start_time_millis"
    operator = LT
    value = "max_cutoff"
  },
  {
    key = "Trace.start_time_millis"
    operator = GE
    value = "min_cutoff"
  },
  filter : {
    columnIdentifiier: ...
  }
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would operator = "AND" possible here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to apply the time range to startTimeAttributeName. Any other attribute can go to the normal filter expression.

The given time range indicates that we would like to apply a filter within that specified range. is that not clear?

This comment has been minimized.

@hypertrace hypertrace deleted a comment from kotharironak Jul 25, 2024

This comment has been minimized.

Copy link
Contributor

@kotharironak kotharironak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

kotharironak
kotharironak previously approved these changes Jul 25, 2024

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

@kotharironak kotharironak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anujgoyal1 anujgoyal1 merged commit d136696 into main Jul 26, 2024
6 checks passed
@anujgoyal1 anujgoyal1 deleted the additionalFilteringSupport branch July 26, 2024 07:33
Copy link

Unit Test Results

  42 files  +2    42 suites  +2   11s ⏱️ -1s
325 tests +5  325 ✔️ +5  0 💤 ±0  0 ❌ ±0 

Results for commit d136696. ± Comparison against base commit 13e6ccf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants