feat: add ai tracker endpoints#3953
Conversation
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
|
There was a problem hiding this comment.
Pull request overview
Adds Tinybird data assets to track AI-assisted commit activity over time, including a materialized aggregate datasource and endpoints for tool-level and total commit counts.
Changes:
- Introduces
ai_code_tracker_dsas a monthly aggregate store of commit counts by AI tool (plus totals). - Adds a scheduled COPY pipe to populate/replace
ai_code_tracker_dsfromactivities_deduplicated_ds. - Adds two endpoint pipes: AI-assisted commits by tool, and total commits, both supporting monthly/yearly aggregation.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| services/libs/tinybird/pipes/ai_code_tracker_copy.pipe | Daily COPY job to classify and aggregate commits into ai_code_tracker_ds. |
| services/libs/tinybird/datasources/ai_code_tracker_ds.datasource | New MergeTree datasource schema for monthly toolKey commit aggregates (+ totals). |
| services/libs/tinybird/pipes/ai_code_tracker.pipe | Endpoint returning AI-assisted commit counts per tool for monthly/yearly periods. |
| services/libs/tinybird/pipes/ai_code_tracker_total_commits.pipe | Endpoint returning total commit counts for monthly/yearly periods. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,62 @@ | |||
| DESCRIPTION > | |||
| - `ai_code_tracker.pipe` returns AI-assisted commit counts by tool and time period. | |||
| - Reads from pre-computed `ai_code_tracker_ds` datasource (materialized hourly by `ai_code_tracker_copy.pipe`). | |||
There was a problem hiding this comment.
The description says ai_code_tracker_ds is materialized hourly by ai_code_tracker_copy.pipe, but ai_code_tracker_copy.pipe is scheduled as COPY_SCHEDULE 0 3 * * * (daily). Please align the description with the actual schedule (or update the schedule if hourly refresh is required).
| - Reads from pre-computed `ai_code_tracker_ds` datasource (materialized hourly by `ai_code_tracker_copy.pipe`). | |
| - Reads from pre-computed `ai_code_tracker_ds` datasource (materialized daily by `ai_code_tracker_copy.pipe`). |
| AND monthStart >= toDate({{ DateTime(startDate, description="Filter commits after this date", required=False) }}) | ||
| {% end %} | ||
| {% if defined(endDate) %} | ||
| AND monthStart < toDate({{ DateTime(endDate, description="Filter commits before this date", required=False) }}) |
There was a problem hiding this comment.
startDate/endDate are defined as DateTime filters, but the query converts them to Date and compares against monthStart (period start). For example, a mid-month startDate will drop the entire month. Either (a) change params/docs to be month/year boundary inputs, or (b) translate the DateTime into the appropriate period boundary (e.g., toStartOfMonth / toStartOfYear based on granularity) so filtering matches the documented semantics.
| AND monthStart >= toDate({{ DateTime(startDate, description="Filter commits after this date", required=False) }}) | |
| {% end %} | |
| {% if defined(endDate) %} | |
| AND monthStart < toDate({{ DateTime(endDate, description="Filter commits before this date", required=False) }}) | |
| AND monthStart >= | |
| CASE | |
| WHEN {{ String(granularity, description="Time aggregation: monthly or yearly", required=True) }} = 'monthly' | |
| THEN toStartOfMonth({{ DateTime(startDate, description="Filter commits after this date", required=False) }}) | |
| ELSE toStartOfYear({{ DateTime(startDate, description="Filter commits after this date", required=False) }}) | |
| END | |
| {% end %} | |
| {% if defined(endDate) %} | |
| AND monthStart < | |
| CASE | |
| WHEN {{ String(granularity, description="Time aggregation: monthly or yearly", required=True) }} = 'monthly' | |
| THEN toStartOfMonth({{ DateTime(endDate, description="Filter commits before this date", required=False) }}) + INTERVAL 1 MONTH | |
| ELSE toStartOfYear({{ DateTime(endDate, description="Filter commits before this date", required=False) }}) + INTERVAL 1 YEAR | |
| END |
| @@ -0,0 +1,39 @@ | |||
| DESCRIPTION > | |||
| - `ai_code_tracker_total_commits.pipe` returns total commit counts per time period. | |||
| - Reads from pre-computed `ai_code_tracker_ds` datasource (materialized hourly by `ai_code_tracker_copy.pipe`). | |||
There was a problem hiding this comment.
The description says ai_code_tracker_ds is materialized hourly by ai_code_tracker_copy.pipe, but ai_code_tracker_copy.pipe uses COPY_SCHEDULE 0 3 * * * (daily). Please align the description with the actual schedule (or adjust the schedule if hourly refresh is needed).
| - Reads from pre-computed `ai_code_tracker_ds` datasource (materialized hourly by `ai_code_tracker_copy.pipe`). | |
| - Reads from pre-computed `ai_code_tracker_ds` datasource (materialized daily by `ai_code_tracker_copy.pipe`). |
| AND monthStart >= toDate({{ DateTime(startDate, description="Filter commits after this date", required=False) }}) | ||
| {% end %} | ||
| {% if defined(endDate) %} | ||
| AND monthStart < toDate({{ DateTime(endDate, description="Filter commits before this date", required=False) }}) |
There was a problem hiding this comment.
startDate/endDate are documented as DateTime filters, but they’re cast to Date and compared to monthStart (period start). This means mid-period dates exclude whole months/years. Please either update the API docs/param types to reflect period-boundary filtering, or adjust the filtering logic to map the DateTime to the correct period boundary for the selected granularity.
| AND monthStart >= toDate({{ DateTime(startDate, description="Filter commits after this date", required=False) }}) | |
| {% end %} | |
| {% if defined(endDate) %} | |
| AND monthStart < toDate({{ DateTime(endDate, description="Filter commits before this date", required=False) }}) | |
| AND monthStart >= | |
| CASE | |
| WHEN {{ String(granularity, description="Time aggregation: monthly or yearly", required=True) }} = 'monthly' | |
| THEN toStartOfMonth({{ DateTime(startDate, description="Filter commits after this date", required=False) }}) | |
| ELSE toStartOfYear({{ DateTime(startDate, description="Filter commits after this date", required=False) }}) | |
| END | |
| {% end %} | |
| {% if defined(endDate) %} | |
| AND monthStart < | |
| CASE | |
| WHEN {{ String(granularity, description="Time aggregation: monthly or yearly", required=True) }} = 'monthly' | |
| THEN addMonths(toStartOfMonth({{ DateTime(endDate, description="Filter commits before this date", required=False) }}), 1) | |
| ELSE addYears(toStartOfYear({{ DateTime(endDate, description="Filter commits before this date", required=False) }}), 1) | |
| END |
| @@ -0,0 +1,19 @@ | |||
| DESCRIPTION > | |||
| - `ai_code_tracker_ds` contains pre-computed monthly aggregates of AI-assisted commits by tool. | |||
| - Populated hourly by `ai_code_tracker_copy.pipe` which scans activities for AI tool signatures. | |||
There was a problem hiding this comment.
The datasource description says it’s populated hourly by ai_code_tracker_copy.pipe, but ai_code_tracker_copy.pipe is scheduled as COPY_SCHEDULE 0 3 * * * (daily). Please update the description (or change the schedule) so operators/users aren’t misled about data freshness.
| - Populated hourly by `ai_code_tracker_copy.pipe` which scans activities for AI tool signatures. | |
| - Populated daily at 03:00 (UTC) by `ai_code_tracker_copy.pipe` which scans activities for AI tool signatures. |
| SELECT | ||
| monthStart, | ||
| multiIf( | ||
| positionCaseInsensitive(title, 'github copilot') > 0 OR positionCaseInsensitive(body, 'co-authored-by') > 0 AND positionCaseInsensitive(body, 'copilot') > 0 OR positionCaseInsensitive(attributes, 'copilot') > 0, 'github-copilot', |
There was a problem hiding this comment.
The first multiIf condition mixes OR and AND without parentheses. In ClickHouse, AND binds tighter than OR, so the current expression may not match the intended Copilot classification logic and is easy to misread/modify incorrectly. Please add explicit parentheses (or split into named boolean expressions) to make the precedence unambiguous.
| positionCaseInsensitive(title, 'github copilot') > 0 OR positionCaseInsensitive(body, 'co-authored-by') > 0 AND positionCaseInsensitive(body, 'copilot') > 0 OR positionCaseInsensitive(attributes, 'copilot') > 0, 'github-copilot', | |
| ( | |
| positionCaseInsensitive(title, 'github copilot') > 0 | |
| OR (positionCaseInsensitive(body, 'co-authored-by') > 0 AND positionCaseInsensitive(body, 'copilot') > 0) | |
| OR positionCaseInsensitive(attributes, 'copilot') > 0 | |
| ), 'github-copilot', |
| positionCaseInsensitive(title, 'ai-generated') > 0 OR positionCaseInsensitive(title, 'ai generated') > 0 | ||
| OR positionCaseInsensitive(body, 'ai-generated') > 0 OR positionCaseInsensitive(body, 'ai generated') > 0 | ||
| OR positionCaseInsensitive(attributes, 'ai-generated') > 0 | ||
| OR (positionCaseInsensitive(body, 'co-authored-by') > 0 AND positionCaseInsensitive(body, 'bot') > 0), 'other', |
There was a problem hiding this comment.
Non-AI bot commits misclassified as "Other AI"
Medium Severity
The 'other' classification condition positionCaseInsensitive(body, 'co-authored-by') > 0 AND positionCaseInsensitive(body, 'bot') > 0 matches any commit with a co-author trailer containing the substring bot. This incorrectly classifies commits from non-AI automation bots like dependabot[bot] and renovate[bot] as "Other AI." Since these bots are extremely common, the AI-assisted commit counts can be significantly inflated with false positives. Additionally, positionCaseInsensitive is a substring match, so unrelated words like bottom or robot in the body also trigger this condition.
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| positionCaseInsensitive(title, 'aider') > 0 OR positionCaseInsensitive(body, 'aider') > 0, | ||
| 'aider', | ||
| positionCaseInsensitive(title, 'devin') > 0 OR positionCaseInsensitive(body, 'devin') > 0, | ||
| 'devin', |
There was a problem hiding this comment.
Generic keywords cause massive false positive AI classification
High Severity
The classifier uses bare keywords like 'cursor', 'claude', 'devin', 'aider', and 'gemini' to detect AI tool usage. These are extremely common words in software contexts — "cursor" appears in commits about database cursors, mouse cursors, CSS cursor properties, and text editing. "Claude" and "Devin" are common developer names. "Aider" is an ordinary English word. Any commit containing these words in its title or body will be incorrectly classified as AI-assisted, likely inflating counts by orders of magnitude and rendering the tracker unreliable.
Additional Locations (1)
Signed-off-by: Gašper Grom <gasper.grom@gmail.com> Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>


Note
Medium Risk
Introduces new scheduled Tinybird COPY jobs and string-matching classification logic over commit text, which could affect performance/cost and produce misclassification if patterns or schedules are wrong.
Overview
Adds a new Tinybird reporting pipeline to track AI-assisted commits over time.
Creates
ai_code_tracker_commits_ds(daily extractedauthored-commitsubset) andai_code_tracker_ds(monthly aggregates bytoolKey, plus__total__), populated via new scheduled COPY pipes that prefilter and classify commits by AI-tool keywords (Copilot/ChatGPT/Claude/Cursor/etc.).Exposes query pipes
ai_code_tracker.pipe(AI commit counts by tool with monthly/yearly granularity and date filters) andai_code_tracker_total_commits.pipe(total commits for the same periods) for percentage/trend calculations.Written by Cursor Bugbot for commit b5fa023. This will update automatically on new commits. Configure here.