Skip to content

feat: add ai tracker endpoints#3953

Merged
gaspergrom merged 4 commits intomainfrom
ai-code-tracker
Mar 25, 2026
Merged

feat: add ai tracker endpoints#3953
gaspergrom merged 4 commits intomainfrom
ai-code-tracker

Conversation

@gaspergrom
Copy link
Copy Markdown
Contributor

@gaspergrom gaspergrom commented Mar 25, 2026

Note

Medium Risk
Introduces new scheduled Tinybird COPY jobs and string-matching classification logic over commit text, which could affect performance/cost and produce misclassification if patterns or schedules are wrong.

Overview
Adds a new Tinybird reporting pipeline to track AI-assisted commits over time.

Creates ai_code_tracker_commits_ds (daily extracted authored-commit subset) and ai_code_tracker_ds (monthly aggregates by toolKey, plus __total__), populated via new scheduled COPY pipes that prefilter and classify commits by AI-tool keywords (Copilot/ChatGPT/Claude/Cursor/etc.).

Exposes query pipes ai_code_tracker.pipe (AI commit counts by tool with monthly/yearly granularity and date filters) and ai_code_tracker_total_commits.pipe (total commits for the same periods) for percentage/trend calculations.

Written by Cursor Bugbot for commit b5fa023. This will update automatically on new commits. Configure here.

Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Copilot AI review requested due to automatic review settings March 25, 2026 09:11
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Tinybird data assets to track AI-assisted commit activity over time, including a materialized aggregate datasource and endpoints for tool-level and total commit counts.

Changes:

  • Introduces ai_code_tracker_ds as a monthly aggregate store of commit counts by AI tool (plus totals).
  • Adds a scheduled COPY pipe to populate/replace ai_code_tracker_ds from activities_deduplicated_ds.
  • Adds two endpoint pipes: AI-assisted commits by tool, and total commits, both supporting monthly/yearly aggregation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
services/libs/tinybird/pipes/ai_code_tracker_copy.pipe Daily COPY job to classify and aggregate commits into ai_code_tracker_ds.
services/libs/tinybird/datasources/ai_code_tracker_ds.datasource New MergeTree datasource schema for monthly toolKey commit aggregates (+ totals).
services/libs/tinybird/pipes/ai_code_tracker.pipe Endpoint returning AI-assisted commit counts per tool for monthly/yearly periods.
services/libs/tinybird/pipes/ai_code_tracker_total_commits.pipe Endpoint returning total commit counts for monthly/yearly periods.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,62 @@
DESCRIPTION >
- `ai_code_tracker.pipe` returns AI-assisted commit counts by tool and time period.
- Reads from pre-computed `ai_code_tracker_ds` datasource (materialized hourly by `ai_code_tracker_copy.pipe`).
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description says ai_code_tracker_ds is materialized hourly by ai_code_tracker_copy.pipe, but ai_code_tracker_copy.pipe is scheduled as COPY_SCHEDULE 0 3 * * * (daily). Please align the description with the actual schedule (or update the schedule if hourly refresh is required).

Suggested change
- Reads from pre-computed `ai_code_tracker_ds` datasource (materialized hourly by `ai_code_tracker_copy.pipe`).
- Reads from pre-computed `ai_code_tracker_ds` datasource (materialized daily by `ai_code_tracker_copy.pipe`).

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +57
AND monthStart >= toDate({{ DateTime(startDate, description="Filter commits after this date", required=False) }})
{% end %}
{% if defined(endDate) %}
AND monthStart < toDate({{ DateTime(endDate, description="Filter commits before this date", required=False) }})
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startDate/endDate are defined as DateTime filters, but the query converts them to Date and compares against monthStart (period start). For example, a mid-month startDate will drop the entire month. Either (a) change params/docs to be month/year boundary inputs, or (b) translate the DateTime into the appropriate period boundary (e.g., toStartOfMonth / toStartOfYear based on granularity) so filtering matches the documented semantics.

Suggested change
AND monthStart >= toDate({{ DateTime(startDate, description="Filter commits after this date", required=False) }})
{% end %}
{% if defined(endDate) %}
AND monthStart < toDate({{ DateTime(endDate, description="Filter commits before this date", required=False) }})
AND monthStart >=
CASE
WHEN {{ String(granularity, description="Time aggregation: monthly or yearly", required=True) }} = 'monthly'
THEN toStartOfMonth({{ DateTime(startDate, description="Filter commits after this date", required=False) }})
ELSE toStartOfYear({{ DateTime(startDate, description="Filter commits after this date", required=False) }})
END
{% end %}
{% if defined(endDate) %}
AND monthStart <
CASE
WHEN {{ String(granularity, description="Time aggregation: monthly or yearly", required=True) }} = 'monthly'
THEN toStartOfMonth({{ DateTime(endDate, description="Filter commits before this date", required=False) }}) + INTERVAL 1 MONTH
ELSE toStartOfYear({{ DateTime(endDate, description="Filter commits before this date", required=False) }}) + INTERVAL 1 YEAR
END

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,39 @@
DESCRIPTION >
- `ai_code_tracker_total_commits.pipe` returns total commit counts per time period.
- Reads from pre-computed `ai_code_tracker_ds` datasource (materialized hourly by `ai_code_tracker_copy.pipe`).
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description says ai_code_tracker_ds is materialized hourly by ai_code_tracker_copy.pipe, but ai_code_tracker_copy.pipe uses COPY_SCHEDULE 0 3 * * * (daily). Please align the description with the actual schedule (or adjust the schedule if hourly refresh is needed).

Suggested change
- Reads from pre-computed `ai_code_tracker_ds` datasource (materialized hourly by `ai_code_tracker_copy.pipe`).
- Reads from pre-computed `ai_code_tracker_ds` datasource (materialized daily by `ai_code_tracker_copy.pipe`).

Copilot uses AI. Check for mistakes.
Comment on lines +31 to +34
AND monthStart >= toDate({{ DateTime(startDate, description="Filter commits after this date", required=False) }})
{% end %}
{% if defined(endDate) %}
AND monthStart < toDate({{ DateTime(endDate, description="Filter commits before this date", required=False) }})
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startDate/endDate are documented as DateTime filters, but they’re cast to Date and compared to monthStart (period start). This means mid-period dates exclude whole months/years. Please either update the API docs/param types to reflect period-boundary filtering, or adjust the filtering logic to map the DateTime to the correct period boundary for the selected granularity.

Suggested change
AND monthStart >= toDate({{ DateTime(startDate, description="Filter commits after this date", required=False) }})
{% end %}
{% if defined(endDate) %}
AND monthStart < toDate({{ DateTime(endDate, description="Filter commits before this date", required=False) }})
AND monthStart >=
CASE
WHEN {{ String(granularity, description="Time aggregation: monthly or yearly", required=True) }} = 'monthly'
THEN toStartOfMonth({{ DateTime(startDate, description="Filter commits after this date", required=False) }})
ELSE toStartOfYear({{ DateTime(startDate, description="Filter commits after this date", required=False) }})
END
{% end %}
{% if defined(endDate) %}
AND monthStart <
CASE
WHEN {{ String(granularity, description="Time aggregation: monthly or yearly", required=True) }} = 'monthly'
THEN addMonths(toStartOfMonth({{ DateTime(endDate, description="Filter commits before this date", required=False) }}), 1)
ELSE addYears(toStartOfYear({{ DateTime(endDate, description="Filter commits before this date", required=False) }}), 1)
END

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,19 @@
DESCRIPTION >
- `ai_code_tracker_ds` contains pre-computed monthly aggregates of AI-assisted commits by tool.
- Populated hourly by `ai_code_tracker_copy.pipe` which scans activities for AI tool signatures.
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datasource description says it’s populated hourly by ai_code_tracker_copy.pipe, but ai_code_tracker_copy.pipe is scheduled as COPY_SCHEDULE 0 3 * * * (daily). Please update the description (or change the schedule) so operators/users aren’t misled about data freshness.

Suggested change
- Populated hourly by `ai_code_tracker_copy.pipe` which scans activities for AI tool signatures.
- Populated daily at 03:00 (UTC) by `ai_code_tracker_copy.pipe` which scans activities for AI tool signatures.

Copilot uses AI. Check for mistakes.
SELECT
monthStart,
multiIf(
positionCaseInsensitive(title, 'github copilot') > 0 OR positionCaseInsensitive(body, 'co-authored-by') > 0 AND positionCaseInsensitive(body, 'copilot') > 0 OR positionCaseInsensitive(attributes, 'copilot') > 0, 'github-copilot',
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first multiIf condition mixes OR and AND without parentheses. In ClickHouse, AND binds tighter than OR, so the current expression may not match the intended Copilot classification logic and is easy to misread/modify incorrectly. Please add explicit parentheses (or split into named boolean expressions) to make the precedence unambiguous.

Suggested change
positionCaseInsensitive(title, 'github copilot') > 0 OR positionCaseInsensitive(body, 'co-authored-by') > 0 AND positionCaseInsensitive(body, 'copilot') > 0 OR positionCaseInsensitive(attributes, 'copilot') > 0, 'github-copilot',
(
positionCaseInsensitive(title, 'github copilot') > 0
OR (positionCaseInsensitive(body, 'co-authored-by') > 0 AND positionCaseInsensitive(body, 'copilot') > 0)
OR positionCaseInsensitive(attributes, 'copilot') > 0
), 'github-copilot',

Copilot uses AI. Check for mistakes.
positionCaseInsensitive(title, 'ai-generated') > 0 OR positionCaseInsensitive(title, 'ai generated') > 0
OR positionCaseInsensitive(body, 'ai-generated') > 0 OR positionCaseInsensitive(body, 'ai generated') > 0
OR positionCaseInsensitive(attributes, 'ai-generated') > 0
OR (positionCaseInsensitive(body, 'co-authored-by') > 0 AND positionCaseInsensitive(body, 'bot') > 0), 'other',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-AI bot commits misclassified as "Other AI"

Medium Severity

The 'other' classification condition positionCaseInsensitive(body, 'co-authored-by') > 0 AND positionCaseInsensitive(body, 'bot') > 0 matches any commit with a co-author trailer containing the substring bot. This incorrectly classifies commits from non-AI automation bots like dependabot[bot] and renovate[bot] as "Other AI." Since these bots are extremely common, the AI-assisted commit counts can be significantly inflated with false positives. Additionally, positionCaseInsensitive is a substring match, so unrelated words like bottom or robot in the body also trigger this condition.

Fix in Cursor Fix in Web

Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

positionCaseInsensitive(title, 'aider') > 0 OR positionCaseInsensitive(body, 'aider') > 0,
'aider',
positionCaseInsensitive(title, 'devin') > 0 OR positionCaseInsensitive(body, 'devin') > 0,
'devin',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic keywords cause massive false positive AI classification

High Severity

The classifier uses bare keywords like 'cursor', 'claude', 'devin', 'aider', and 'gemini' to detect AI tool usage. These are extremely common words in software contexts — "cursor" appears in commits about database cursors, mouse cursors, CSS cursor properties, and text editing. "Claude" and "Devin" are common developer names. "Aider" is an ordinary English word. Any commit containing these words in its title or body will be incorrectly classified as AI-assisted, likely inflating counts by orders of magnitude and rendering the tracker unreliable.

Additional Locations (1)
Fix in Cursor Fix in Web

@gaspergrom gaspergrom merged commit 6f9e65a into main Mar 25, 2026
12 checks passed
@gaspergrom gaspergrom deleted the ai-code-tracker branch March 25, 2026 10:27
skwowet pushed a commit that referenced this pull request Mar 25, 2026
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants