[stats] Add support for daily and cumulative time series to integrations and gbstats #1334

lukesonnet · 2023-06-06T20:27:00Z

Features and Changes

Adds support for cumulative and daily time series, but does not expose them to GrowthBook users yet.
Also refactors our general query approach to rely on __userMetric as our main dataset with all user and metric data (LEFT JOIN metrics on to users rather than an inner join)

Overview

Adds ability to create cumulative and daily time series.

Our current time series just are dimension results by user's first date bucketed. So that means on day X, we see experimental effects for all users bucketed on day X, regardless of whether those conversions happened on day X or day X+1 and so on. Furthermore, on day X, we don't have any users that were bucketed on days X-1 and backwards influencing the metric, so it doesn't tell you what happened on day X but rather what happened to users bucketed on day X.

There's ample demand for Cumulative Time Series, which show you the total effects for all users and all metrics through day X. In other words, what is the experiment result as if I had run my results at the end of day X?

We also might care about Daily Time Series, which show you for all users bucketed through day X, what is the effect on metrics that were "accrued" on day X. This probably only matters for those using the Experiment Duration attribution model or long conversion windows, but can let you know what is happening on given days in your experimental population as the experiment goes on. This is effectively the derivative of the Cumulative Time Series, but with appropriate statistics rather than simply diff'ing results in the front end.

This PR adds the necessary components to gbstats and our SQL integrations to compute these last two series: Cumulative and Daily Time Series.

How?

UsesgetDateTable to build a time series of dates and then cross joining user metric results to that before aggregating. This method is a generic method that should work across all SQL engines and is very cheap. Other methods, such as UNNESTing, may be more compact in their SQL but are slightly less performant and vary across engines.
Reworking the entire SQL query to focus on LEFT JOINing __metric to __distinctUsers to rely on this as our main count of users and to left join other auxiliary metrics on to (e.g. denominator and/or CUPED metrics). This is also where we cross join with the above date range to build our time series.
Splits the __userMetric CTE into __userMetricJoin and __userMetricAgg so that custom aggregations can operate over a column of metric values that are COALESCE(value, 0) for users who have 1+ conversion and are NULL for users with 0 matching conversions.
In gbstats, then uses diff() to create the daily time series, which are just the metric values added from day to day. In that way, it tells you "what changed in the cumulative time series on this day" rather than some abstract redefinition of conversion windows, custom aggregations, etc. to fit neatly in one day, which is what we previously discussed.
Also incorporates FE changes from @bryce-fitzsimons that add tooltips and change some of the logic for our experiment date graphs.

Side effects:

Custom aggregations may behave somewhat differently
- Custom aggregation gets applied to a value column we create that IS NULL for users with no matching rows in their conversion window, and is COALESCE(value, 0) otherwise. This could affect people that use AVG, COUNT or other operators, but allows us to have a clear distinction in the aggregation step between users that have no matching conversion and treats those with matching conversion but no values as 0s. This is explicit. I think this change is for the best, but it will alter some user behavior if they have null values in their data.
User counts for ratio metrics may change, but TBD

TODO:

Notebook generator
Merging with Bryce's front-end UI

Dependencies

n/a

Testing

Build front-end logic to control setting cumulativeDate (NOTE: THIS IS IN PLACE BUT NOT EXPOSED TO THE USER)
Clean up some potentially unnecessary casting and somewhat inconsistent SQL across engines

Closes GB-251

Dependencies

n/a

Testing

All integration test queries execute:

Cumulative time series on latest day is the same as the full results without any time series:

Cumulative time series on day X is the same as if we had run full results at midnight at the end of day X:

postgres
Will not test others, but will just check equivalence of time series results to postgres results on same data (DONE)

TODO:

Test that daily time series (built solely using diff()) produces correct results

Screenshots

Updated FE for cohort TS:
https://www.loom.com/share/4c014e60163b4c36840d883909f69ebc

Example SQL

Example queries run as part of the integration testing can be found in this branch here: https://github.com/growthbook/example-sql/tree/lsonnet/ts-left-twocteusermetric/queries

Some of particular interest:

BigQuery Cumulative Time Series w. CUPED: https://github.com/growthbook/example-sql/blob/lsonnet/ts-left-twocteusermetric/queries/bigquery/dimension_datecumulative/nonbinom__purchased_items_regressionadjusted.sql
BigQuery normal experiment, but demonstrating how new custom aggregations (here COUNT(*) rely on NULL values in the __userMetricJoin CTE: https://github.com/growthbook/example-sql/blob/lsonnet/ts-left-twocteusermetric/queries/bigquery/base/nonbinomcustom__purchased_items.sql

…ies-ui

…eusermetric

github-actions · 2023-06-06T20:47:36Z

Your preview environment pr-1334-bttf has been deployed.

Preview environment endpoints are available at:

linear · 2023-06-07T15:45:47Z

GB-251 Time Series: Implement SQL integration and gbstats changes

packages/back-end/src/integrations/SqlIntegration.ts

packages/front-end/components/Dimensions/DimensionChooser.tsx

packages/shared/src/dates.ts

…eusermetric

github-actions · 2023-06-21T18:44:41Z

Deploy preview for docs ready!

✅ Preview
https://docs-433r5ep8c-growthbook.vercel.app

Built with commit 97f684d.
This pull request is being automatically deployed with vercel-action

…left-twocteusermetric

…eusermetric

lukesonnet · 2023-06-23T15:13:33Z

packages/back-end/src/integrations/SqlIntegration.ts

+      .map((d) => `SELECT ${d} AS day`)
+      .join("\nUNION ALL\n");
+    return `
+      SELECT ${this.dateTrunc(this.castToDate("t.day"))} AS day


This may appear a little redundant, but dateTrunc doesn't work on strings in a lot of cases, and DATE sometimes returns time information and we just want to use the full date when comparing with the metric/assignment data.

lukesonnet and others added 27 commits May 23, 2023 11:40

Use cross join in old flow

4210a0a

Add daily logic

7bb610f

Add MySQL join support

4c65328

Add clickhouse

c7fac09

Add MSSQL

4304e21

Fix clickhouse aliasing

2406017

simplify PR

1ebab13

Fix bug and example front-end changes

560ef81

Add comment and fix clickhouse join

e47f1e4

Fix clickhouse outer join

6f659f3

Fix first row diff; fix front end copy

eba768e

Fix cumulative state

2df5e4d

initial

ef78ab3

Merge remote-tracking branch 'origin/main' into time-series-ui

f97f61d

Update date queries

b30b012

progress

4682dee

Fix curdate

4c6c2ff

progress

df60693

Merge branch 'lsonnet/ts-simple' into time-series-ui

8364b59

Handle missing rows more accurately

d3a435a

Merge remote-tracking branch 'origin/lsonnet/ts-simple' into time-ser…

8182edf

…ies-ui

improve tooltip ui & calcs

47785ba

simplify where statement

e6bf9ca

left join

bdbe7f1

Update approach for count * and gbstats

b35a908

testing

2390416

Merge remote-tracking branch 'origin/main' into lsonnet/ts-left-twoct…

137dd9c

…eusermetric

lukesonnet added 2 commits June 6, 2023 16:54

fix test

a292131

Simplify and fix gbstats integration

dbabfd2

fix broken gbstats changes; remove unnecessary changes

0295238

lukesonnet changed the title ~~WIPv2 [stats] Add support for daily and cumulative time series to integrations and gbstats~~ [stats] Add support for daily and cumulative time series to integrations and gbstats Jun 7, 2023

lukesonnet and others added 2 commits June 7, 2023 15:41

gbstats version bump, add testing, add TS to generate_notebook

e055064

Merge branch 'main' into lsonnet/ts-left-twocteusermetric

28cc713

lukesonnet marked this pull request as ready for review June 7, 2023 20:44

jdorn reviewed Jun 20, 2023

View reviewed changes

lukesonnet added 3 commits June 20, 2023 16:05

Address comments

597028b

Merge remote-tracking branch 'origin/main' into lsonnet/ts-left-twoct…

fd64fbe

…eusermetric

Update tests

4d45e95

lukesonnet added 11 commits June 21, 2023 12:32

remove doc; add unit tests for dates

5f96a10

Make linter happy

bd71a17

Merge remote-tracking branch 'origin/time-series-ui' into lsonnet/ts-…

f41322f

…left-twocteusermetric

Merge remote-tracking branch 'origin/main' into lsonnet/ts-left-twoct…

2eb199e

…eusermetric

Merge Bryce's changes; remove time series from dropdown

ed9695b

hide stats if using the cumulative view

b2a1fc6

More FE changes

2bb0355

missed staging

6279f6f

Catch up linter

243af09

Remove unneeded changes

9e290f2

Remove unneeded changes

fb0ae83

lukesonnet commented Jun 23, 2023

View reviewed changes

lukesonnet and others added 2 commits June 23, 2023 13:35

Fix user count for ratio metrics

8f6df76

Merge branch 'main' into lsonnet/ts-left-twocteusermetric

fc8fc4b

jdorn approved these changes Jun 26, 2023

View reviewed changes

Merge branch 'main' into lsonnet/ts-left-twocteusermetric

97f684d

lukesonnet merged commit acd0a34 into main Jun 26, 2023
4 checks passed

lukesonnet deleted the lsonnet/ts-left-twocteusermetric branch June 26, 2023 20:51

lukesonnet mentioned this pull request Jun 27, 2023

[stats] Re-enable non-numeric value columns; enable RA for custom agg #1409

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stats] Add support for daily and cumulative time series to integrations and gbstats #1334

[stats] Add support for daily and cumulative time series to integrations and gbstats #1334

lukesonnet commented Jun 6, 2023 •

edited

Loading

github-actions bot commented Jun 6, 2023 •

edited

Loading

linear bot commented Jun 7, 2023

github-actions bot commented Jun 21, 2023 •

edited

Loading

lukesonnet Jun 23, 2023 •

edited

Loading

[stats] Add support for daily and cumulative time series to integrations and gbstats #1334

[stats] Add support for daily and cumulative time series to integrations and gbstats #1334

Conversation

lukesonnet commented Jun 6, 2023 • edited Loading

Features and Changes

Overview

How?

Dependencies

Testing

Dependencies

Testing

Screenshots

Example SQL

github-actions bot commented Jun 6, 2023 • edited Loading

linear bot commented Jun 7, 2023

github-actions bot commented Jun 21, 2023 • edited Loading

lukesonnet Jun 23, 2023 • edited Loading

Choose a reason for hiding this comment

lukesonnet commented Jun 6, 2023 •

edited

Loading

github-actions bot commented Jun 6, 2023 •

edited

Loading

github-actions bot commented Jun 21, 2023 •

edited

Loading

lukesonnet Jun 23, 2023 •

edited

Loading