-
-
Notifications
You must be signed in to change notification settings - Fork 0
HTML Parsing
Once the contributions HTML is fetched (see Fetching Contributions), ContribKit extracts the data with a handful of regexes over the rendered page. The parser lives in infrastructure/github/github-html-contributions-repository.ts.
This is deliberately the only place that knows GitHub's HTML structure. If GitHub changes the markup, only these patterns need updating.
GitHub renders each day as a <td> carrying data attributes, and exposes the exact count through a separate <tool-tip> element linked by id:
<td class="ContributionCalendar-day" id="contribution-day-component-1-2"
data-date="2024-01-02" data-level="2"> … </td>
...
<tool-tip for="contribution-day-component-1-2">4 contributions on January 2nd.</tool-tip>| Pattern | Captures |
|---|---|
TD_REGEX |
each contribution-day <td>'s attribute string |
DATE_REGEX |
data-date="YYYY-MM-DD" |
LEVEL_REGEX |
data-level="0..4" |
ID_REGEX |
the <td>'s id
|
TIP_REGEX |
each <tool-tip for="…">N → maps id → exact count |
-
Cells — iterate every contribution-day
<td>, pullingdate,level, andid. A cell is kept only when it has both a date and a level. -
Tooltips — iterate every
<tool-tip>and build aMap<id, count>. -
Enrich — for each day, attach the exact
countby looking up itsidin the map;levelis run throughclampLevelto guarantee it's in0–4. Days whose id isn't in the map (or that have no id) getcount: null. -
Total — if any counts were found, sum them; otherwise
totalisnull.
The result is { days, total }, where each day is { date, level, count }.
If the pass produces zero days, the repository returns parse("Could not parse contributions") rather than an empty (and misleading) calendar. That typically means GitHub changed the page structure — see Troubleshooting.
- Fetching Contributions — where the HTML comes from
- Calendar Grid — turning parsed days into a fixed grid