-
-
Notifications
You must be signed in to change notification settings - Fork 0
HTML Parsing
Once the contributions HTML is fetched (see Fetching Contributions), ContribKit extracts the data with a handful of regexes over the rendered page. The parser lives in infrastructure/github/github-html-contributions-repository.ts.
This is deliberately the only place that knows GitHub's HTML structure. If GitHub changes the markup, only these patterns need updating.
Why regex and not a DOM parser? The renderer runs inside a Cloudflare Worker, where there's no DOM and bundle size/cold-start matter. A handful of focused regexes over the response text is faster, dependency-free, and easy to pin to exactly the two element shapes GitHub emits.
GitHub renders each day as a <td> carrying data attributes, and exposes the exact count through a separate <tool-tip> element linked by id:
<td class="ContributionCalendar-day" id="contribution-day-component-1-2"
data-date="2024-01-02" data-level="2"> … </td>
...
<tool-tip for="contribution-day-component-1-2">4 contributions on January 2nd.</tool-tip>| Pattern | Captures |
|---|---|
TD_REGEX |
each contribution-day <td>'s attribute string |
DATE_REGEX |
data-date="YYYY-MM-DD" |
LEVEL_REGEX |
data-level="0..4" |
ID_REGEX |
the <td>'s id
|
TIP_REGEX |
each <tool-tip for="…">N → maps id → exact count |
-
Cells — iterate every contribution-day
<td>, pullingdate,level, andid. A cell is kept only when it has both a date and a level. -
Tooltips — iterate every
<tool-tip>and build aMap<id, count>. -
Enrich — for each day, attach the exact
countby looking up itsidin the map;levelis run throughclampLevelto guarantee it's in0–4. Days whose id isn't in the map (or that have no id) getcount: null. -
Total — if any counts were found, sum them; otherwise
totalisnull.
The result is { days, total }, where each day is { date, level, count }.
Given the markup above, the cell pass yields { date: "2024-01-02", level: 2, id: "…-1-2" }, and the tooltip pass yields { "…-1-2" => 4 }. Enrichment joins them by id into { date: "2024-01-02", level: 2, count: 4 }. If GitHub omitted the tooltip, count would be null but level would still be 2. total sums every resolved count.
If the pass produces zero days, the repository returns parse("Could not parse contributions") rather than an empty (and misleading) calendar. That typically means GitHub changed the page structure — see Troubleshooting.
- Fetching Contributions — where the HTML comes from
- Calendar Grid — turning parsed days into a fixed grid