Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for auto generating metrics #1283

Merged
merged 50 commits into from Jun 9, 2023
Merged

Add support for auto generating metrics #1283

merged 50 commits into from Jun 9, 2023

Conversation

mknowlton89
Copy link
Collaborator

@mknowlton89 mknowlton89 commented May 12, 2023

Features and Changes

This PR introduces logic to automatically generate metrics for our various event trackers. In it's current state, this only supports Segment & Rudderstack. Once I get some initial feedback on the overall structure of the PR, I'll expand to include additional event trackers, specifically GA4, Firebase, and Amplitude, with more to follow in separate PRs.

To do this, when an organization adds a datasource via an event tracker supported above, they'll see an option on the NewDataSourceForm to look up what metrics we can generate for them automatically. I built a method within SqlIntegration that adds a supportsAutoGeneratedMetrics property to the integration's getDataSourceProperties so we can easily determine on the front and back end if the particular data source supports auto generating metrics.

isAutoGeneratingMetricsSupported(): boolean {
    const supportedEventTrackers: SchemaFormat[] = ["segment", "rudderstack"];

    if (
      this.settings.schemaFormat &&
      supportedEventTrackers.includes(this.settings.schemaFormat)
    ) {
      return true;
    }
    return false;
  }

This hits a new endpoint /datasource/:datasourceId/auto-metrics - This endpoint looks up the unique events that are tracked by the event tracker. This is possible since the event trackers all have a table that lists all of the events that they track. In the case of Segment & Rudderstack this lives on the tracks table.

The endpoint returns a list of events that we think we can generate, along with sql queries for count and binomial metrics derived from the tracked events.

Here, the user can preview the underlying SQL that would eventually power these metrics, and opt to create binomial and count metrics for each tracked event. I've added some tooltips throughout the table to provide some insight.

Then, when the user clicks "Save" if there are any metrics they've indicated they want us to create for them, (via the toggle state) we'll kick off an async job (createAutoGeneratedMetrics) that creates those metrics.

There has also been some logic defined to pluralize the event names (E.G. for an Order Placed event via Segment, the count metric associated with that would be Orders Placed.)

This is currently handled via a pretty simple map. Eventually, we could leverage ChatGPT for this. If the map is unable to map an event to a pluralized version, it will simply return a displayName of Count of {event}.

The seed for the pluralization map came from Segment's documentation here - https://segment.com/docs/connections/spec/ecommerce/v2/. This map will likely be a bit of a living document as we expand support.

Testing

  • Set up a Segment destination with BigQuery and add that data source to GrowthBook and confirm all tracked events are displayed and created correctly, with the correct underlying sql query.
  • Ensure the above works for revenue, count. and binomial metrics.

Screenshots

Screen Shot 2023-06-01 at 3 51 48 PM Screen Shot 2023-06-01 at 3 52 36 PM Screen Shot 2023-06-07 at 10 31 52 AM Screen Shot 2023-06-01 at 3 53 02 PM Screen Shot 2023-06-07 at 10 33 18 AM

@github-actions
Copy link

github-actions bot commented May 12, 2023

Your preview environment pr-1283-bttf has been deployed.

Preview environment endpoints are available at:

…he user has indicated they want us to create for them.
… how to go about duration types, atleast with Segment.
…mes for the tracked events, along with different event column names based on the schemaFormat.
…nput as that is required for us to know which schema to query when looking for the tracked events coming from Rudderstack. Also updating the Snowflake from clause for the query to use the schema.
…e for both sql queries relating to the auto metrics - the query to get a list of tracked events and the query for the actual metric.
@mknowlton89 mknowlton89 requested a review from jdorn May 22, 2023 10:40
@mknowlton89 mknowlton89 self-assigned this May 22, 2023
@mknowlton89 mknowlton89 requested a review from a team May 22, 2023 10:40
…but committing in, instead of stashing it while I go bash a bug.
… also adding hooks to generate additional metrics from single tracked event. Very narrow in scope for now, but getting the building blocks in place.
…ing all tracked events, and giving the user the option to create a binomial and/or count metric for each event.
…and adds polish to the front end. Still need to wire a few things on the front end up, mainly the SQL Preview.
…uilt out the pluralization map further, inspired from Segment's article on e-comm tracking suggestions.
…the previous implementation in, but commented out, and I will make a follow-up commit to remove it, but I just want a snapshot in case we want to revert back to it.
@mknowlton89 mknowlton89 changed the title POC for auto-generating metrics from Segment/BigQuery Add support for auto generating metrics Jun 2, 2023
@mknowlton89 mknowlton89 marked this pull request as ready for review June 5, 2023 10:39
packages/back-end/src/models/MetricModel.ts Outdated Show resolved Hide resolved
packages/back-end/src/integrations/SqlIntegration.ts Outdated Show resolved Hide resolved
packages/back-end/types/datasource.d.ts Show resolved Hide resolved
binomialSqlQuery: string;
countSqlQuery: string;
countDisplayName: string;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data structure is a little weird. It's mixing form state (e.g. createBinomialFromEvent) with data from the query (e.g. lastTrackedAt). Seems like that should be 2 separate data structures.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to be a bit more future proof and to separate the concerns.

Currently, this data takes the following shape:

export type TrackedEventData = {
  event: string;
  displayName: string;
  count: number;
  hasUserId: boolean;
  lastTrackedAt: Date;
  metricsToCreate: {
    name: string;
    sql: string;
    type: MetricType;
    shouldCreate?: boolean;
  }[];
};

getInformationSchemaFromClause(): string {
getInformationSchemaTableFromClause(databaseName: string): string {
return `${databaseName}.information_schema.columns`;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method also feels duplicative with the new generateTableName method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdorn Just pushed up a big refactor of generateTableName that handles the logic here as well, so we've now gotten rid of the getInformationSchemaTableFromClause method.

All of the changes were done in this commit.

packages/back-end/src/util/autoGeneratedMetrics.ts Outdated Show resolved Hide resolved
@mknowlton89 mknowlton89 requested a review from jdorn June 6, 2023 20:28
@mknowlton89 mknowlton89 merged commit 9a07423 into main Jun 9, 2023
3 checks passed
@mknowlton89 mknowlton89 deleted the mk/auto-metrics branch June 9, 2023 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants