Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support count per sec/min/hr aggregation functions #198

Merged
merged 11 commits into from
Jan 8, 2024
6 changes: 6 additions & 0 deletions .changeset/breezy-seahorses-swim.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
'@hyperdx/api': patch
'@hyperdx/app': patch
---

feat: support count per sec/min/hr aggregation functions
123 changes: 119 additions & 4 deletions packages/api/src/clickhouse/__tests__/clickhouse.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,80 @@ Array [
"ts_bucket": 1641341100,
},
]
`);

const multiGroupBysData2 = (
await clickhouse.getMultiSeriesChart({
series: [
{
type: 'time',
table: 'logs',
aggFn: clickhouse.AggFn.CountPerMin,
field: 'awesomeNumber',
where: `runId:${runId}`,
groupBy: ['testGroup', 'testOtherGroup'],
},
],
tableVersion: undefined,
teamId,
startTime: now,
endTime: now + ms('10m'),
granularity: '5 minute',
maxNumGroups: 20,
seriesReturnType: clickhouse.SeriesReturnType.Column,
})
).data.map(d => {
return _.pick(d, [
'group',
'series_0.data',
'series_1.data',
'ts_bucket',
]);
});
expect(multiGroupBysData2.length).toEqual(5);
expect(multiGroupBysData2).toMatchInlineSnapshot(`
Array [
Object {
"group": Array [
"group2",
"otherGroup1",
],
"series_0.data": 0.6,
"ts_bucket": 1641340800,
},
Object {
"group": Array [
"group1",
"otherGroup1",
],
"series_0.data": 0.4,
"ts_bucket": 1641340800,
},
Object {
"group": Array [
"group1",
"otherGroup2",
],
"series_0.data": 0.2,
"ts_bucket": 1641340800,
},
Object {
"group": Array [
"group1",
"otherGroup2",
],
"series_0.data": 0.4,
"ts_bucket": 1641341100,
},
Object {
"group": Array [
"group1",
"otherGroup3",
],
"series_0.data": 0.2,
"ts_bucket": 1641341100,
},
]
`);

const ratioData = (
Expand Down Expand Up @@ -382,6 +456,51 @@ Array [
"ts_bucket": 1641341100,
},
]
`);

const tableData = (
await clickhouse.getMultiSeriesChart({
series: [
{
type: 'table',
table: 'logs',
aggFn: clickhouse.AggFn.CountPerMin,
where: `runId:${runId}`,
groupBy: ['testGroup'],
},
],
tableVersion: undefined,
teamId,
startTime: now,
endTime: now + ms('10m'),
granularity: undefined,
maxNumGroups: 20,
seriesReturnType: clickhouse.SeriesReturnType.Column,
})
).data.map(d => {
return _.pick(d, ['group', 'series_0.data', 'ts_bucket', 'rank']);
});

expect(tableData.length).toEqual(2);
expect(tableData).toMatchInlineSnapshot(`
Array [
Object {
"group": Array [
"group1",
],
"rank": "1",
"series_0.data": 0.6,
"ts_bucket": "0",
},
Object {
"group": Array [
"group2",
],
"rank": "2",
"series_0.data": 0.3,
"ts_bucket": "0",
},
]
`);
});

Expand Down Expand Up @@ -479,8 +598,6 @@ Array [
}),
);

mockLogsPropertyTypeMappingsModel({});

mockSpyMetricPropertyTypeMappingsModel({
runId: 'string',
host: 'string',
Expand Down Expand Up @@ -846,8 +963,6 @@ Array [
}),
);

mockLogsPropertyTypeMappingsModel({});

mockSpyMetricPropertyTypeMappingsModel({
runId: 'string',
host: 'string',
Expand Down
51 changes: 43 additions & 8 deletions packages/api/src/clickhouse/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,9 @@ export enum AggFn {
AvgRate = 'avg_rate',
Count = 'count',
CountDistinct = 'count_distinct',
CountPerSec = 'count_per_sec',
CountPerMin = 'count_per_min',
CountPerHour = 'count_per_hour',
Max = 'max',
MaxRate = 'max_rate',
Min = 'min',
Expand Down Expand Up @@ -1078,6 +1081,18 @@ const buildEventSeriesQuery = async ({
throw new Error('Rate is not supported in logs chart');
}

const isCountFn =
aggFn === AggFn.Count ||
aggFn === AggFn.CountPerSec ||
aggFn === AggFn.CountPerMin ||
aggFn === AggFn.CountPerHour;

if (field == null && !isCountFn) {
throw new Error(
'Field is required for all aggregation functions except Count',
);
}

const tableName = getLogStreamTableName(tableVersion, teamId);
const whereClause = await buildSearchQueryWhereCondition({
endTime,
Expand All @@ -1086,18 +1101,11 @@ const buildEventSeriesQuery = async ({
startTime,
});

if (field == null && aggFn !== AggFn.Count) {
throw new Error(
'Field is required for all aggregation functions except Count',
);
}

const selectField =
field != null
? buildSearchColumnName(propertyTypeMappingsModel.get(field), field)
: '';

const isCountFn = aggFn === AggFn.Count;
const groupByColumnNames = groupBy.map(g => {
const columnName = buildSearchColumnName(
propertyTypeMappingsModel.get(g),
Expand Down Expand Up @@ -1130,8 +1138,35 @@ const buildEventSeriesQuery = async ({
const label = SqlString.escape(`${aggFn}(${field})`);

const selectClause = [
isCountFn
aggFn === AggFn.Count
? 'toFloat64(count()) as data'
: aggFn === AggFn.CountPerSec
? granularity
? SqlString.format('divide(count(), ?) as data', [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for now, but I tend to think these kind of duplicates are safer as CountPer('second') or equivalent. Maybe a TODO here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this select clause would probably need some refactoring later

ms(granularity) / ms('1 second'),
])
: SqlString.format(
"divide(count(), age('ss', toDateTime(?), toDateTime(?))) as data",
[startTime / 1000, endTime / 1000],
)
: aggFn === AggFn.CountPerMin
? granularity
? SqlString.format('divide(count(), ?) as data', [
ms(granularity) / ms('1 minute'),
])
: SqlString.format(
"divide(count(), age('mi', toDateTime(?), toDateTime(?))) as data",
[startTime / 1000, endTime / 1000],
)
: aggFn === AggFn.CountPerHour
? granularity
? SqlString.format('divide(count(), ?) as data', [
ms(granularity) / ms('1 hour'),
])
: SqlString.format(
"divide(count(), age('hh', toDateTime(?), toDateTime(?))) as data",
[startTime / 1000, endTime / 1000],
)
: aggFn === AggFn.Sum
? `toFloat64(sum(${selectField})) as data`
: aggFn === AggFn.Avg
Expand Down
3 changes: 3 additions & 0 deletions packages/app/src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,9 @@ export type AggFn =
| 'avg'
| 'count_distinct'
| 'count'
| 'count_per_sec'
| 'count_per_min'
| 'count_per_hour'
| 'max_rate'
| 'max'
| 'min_rate'
Expand Down