Skip to content

WatchEvent capture rate has degraded significantly since June 2025 #320

@plurch

Description

@plurch

WatchEvent capture rate has degraded significantly since June 2025, collapsed since Feb 2026

Summary

GHArchive's WatchEvent (star) capture rate appears to have degraded substantially starting around June 2025, with a second, much sharper drop starting February 2026. By comparing the GitHub stargazers API (ground truth) against WatchEvents in GHArchive's BigQuery dataset for two independent repositories, capture rates have fallen from a healthy ~95–100% baseline to under 20% in 2026.

This appears related to #310 . I think it is useful to have a separate analysis for WatchEvents.

Methodology

For each repo:

  1. Pulled all current stargazers via the GitHub REST API (/repos/{owner}/{repo}/stargazers with Accept: application/vnd.v3.star+json for timestamps).
  2. Pulled all WatchEvent rows from the GHArchive BigQuery dataset (githubarchive.month.*) for that repo's repo.id.
  3. Bucketed both by month of event timestamp.
  4. Computed capture % = gharchive_events / api_stargazers * 100.
    Note: capture % can legitimately exceed 100% in any given month because the stargazers API only returns current stargazers (users who later unstarred are absent), while WatchEvents capture the star action regardless of subsequent unstars. Values fluctuating in the 90–115% range are consistent with healthy capture; sustained values well below 90% indicate missing events.

Repo 1: google/osv.dev (repo_id 327775269)

Month API Stars GHArchive WatchEvents Capture %
2024-12 27 25 92.6%
2025-01 97 86 88.7%
2025-02 78 77 98.7%
2025-03 42 33 78.6%
2025-04 47 42 89.4%
2025-05 49 42 85.7%
2025-06 31 20 64.5%
2025-07 23 12 52.2%
2025-08 36 24 66.7%
2025-09 30 12 40.0%
2025-10 355 183 51.5%
2025-11 39 15 38.5%
2025-12 44 16 36.4%
2026-01 37 16 43.2%
2026-02 38 6 15.8%
2026-03 52 4 7.7%
2026-04 88 12 13.6%
2026-05 38 3 7.9%

Months prior to 2024-12 are healthy (95–115% range) and omitted for brevity.

Repo 2: facebook/stylex (very different repo profile — large launch spike)

Month API Stars GHArchive WatchEvents Capture %
2024-12 60 59 98.3%
2025-01 44 48 109.1%
2025-02 29 34 117.2%
2025-03 73 68 93.2%
2025-04 47 44 93.6%
2025-05 40 31 77.5%
2025-06 27 18 66.7%
2025-07 31 22 71.0%
2025-08 21 15 71.4%
2025-09 21 9 42.9%
2025-10 43 10 23.3%
2025-11 198 68 34.3%
2025-12 75 32 42.7%
2026-01 124 51 41.1%
2026-02 24 5 20.8%
2026-03 33 7 21.2%
2026-04 29 4 13.8%
2026-05 9 0 0.0%

Earlier months (back to repo creation in 2023-11) also tracked at healthy 93–108% capture; omitted for brevity.

Cross-repo comparison at the two inflection points

Month osv.dev stylex
2025-05 85.7% 77.5%
2025-06 64.5% 66.7%
2025-07 52.2% 71.0%
2025-08 66.7% 71.4%
2025-09 40.0% 42.9%
2025-10 51.5% 23.3%
2025-11 38.5% 34.3%
2025-12 36.4% 42.7%
2026-01 43.2% 41.1%
2026-02 15.8% 20.8%
2026-03 7.7% 21.2%
2026-04 13.6% 13.8%

Two independent repos — one a slow-burn security tool, the other a JS library with a 6K-star launch month — show the same two inflection points in the same months:

  • June 2025: sustained drop from a ~90–100% baseline to the 60–70% range.
  • February 2026: collapse to the 10–20% range, where it has remained.
    This is not consistent with per-repo sampling artifacts, unstars, or transient outages. It looks systemic.

Why this matters

Many downstream projects (recommendation systems, trending dashboards, ecosystem analytics, academic research on OSS) depend on GHArchive's WatchEvent stream as a proxy for the GitHub Events firehose. Silent degradation is worse than an outage because consumers don't know to switch sources or backfill.

Questions

  1. Are others observing the same degradation? (Easy to reproduce — pick any repo, run the comparison above.)
  2. Is this a known upstream issue with the GitHub Events API (e.g., sampling/rate-limit changes), or an ingestion-side issue with the archiver?
  3. Is there a documented status page or completeness metric I missed?
  4. For users needing complete data, what's the recommended path — direct stargazers API backfill, or is there a known mirror with better coverage during this window?
    Happy to share the raw CSVs or query scripts if useful.

See this gist fetch_stargazers.py for a script to download all stargazers along with timestamps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions