Fix broken docs links and add CI link checker#116
Merged
CallumMcMahon merged 5 commits intomainfrom Feb 11, 2026
Merged
Conversation
CSV links now point to GitHub instead of serving raw files from the docs site, and notebook links point to the converted HTML docs page. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same pattern as fuzzy-join-without-keys: CSV links now point to GitHub, notebook links point to the converted HTML docs pages, and the .md extension is dropped from internal reference links. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a Python script that parses all HTML files in the static build output and verifies internal links resolve to existing pages. Runs after build in the deploy-docs workflow to catch broken links before deployment. Note: lychee was evaluated but has a parser bug where links after <pre><code> blocks are silently dropped, making it unreliable for docs sites with code examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract links from <meta> and <link> tags (catches og:image 404s) - Check GitHub blob/main links by verifying the local file exists - SKIPPED_URLS is now per-URL instead of per-domain, so new links to an already-skipped domain still get flagged - CHECKED_DOMAINS remains domain-level for our own properties - Deduplicate links per page Known failure: everyrow.io/everyrow-og.png is 404 on every page. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
rgambee
approved these changes
Feb 10, 2026
docs/add-column-web-lookup.md
Outdated
| ``` | ||
|
|
||
| The dataset is a list of 246 SaaS and developer tools like Slack, Notion, Asana. Download [saas_products.csv](data/saas_products.csv) to follow along. We find the annual price of each product's lowest paid tier, which isn't available through any structured API; it requires visiting pricing pages that change frequently and present information in different formats. | ||
| The dataset is a list of 246 SaaS and developer tools like Slack, Notion, Asana. Download [saas_products.csv](https://github.com/futuresearch/everyrow-sdk/blob/main/docs/data/saas_products.csv) to follow along. We find the annual price of each product's lowest paid tier, which isn't available through any structured API; it requires visiting pricing pages that change frequently and present information in different formats. |
Contributor
There was a problem hiding this comment.
Since we use LFS to store CSVs, I don't see any data when I go to that URL. Should we link to the raw content instead, in this case https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/saas_products.csv?
Member
Author
There was a problem hiding this comment.
Ahh nice, I hadn't thought of that! let me change it over
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
.mdextensions are dropped from internal reference linksdocs-site/scripts/check-links.py) that parses all HTML in the static build output and verifies internal links resolve to existing pagesdeploy-docsCI workflow, running after build and before deployPages fixed
fuzzy-join-without-keys—company_info.csv,valuations.csv, notebook linkadd-column-web-lookup—saas_products.csvclassify-dataframe-rows-llm—hn_jobs.csvfilter-dataframe-with-llm—hn_jobs.csv,reference/SCREEN.mdresolve-entities-python—case_01_crm_data.csv, notebook linkWhy a custom script instead of lychee?
lychee has a parser bug where links appearing after
<pre><code>blocks are silently dropped. Since every docs page has code examples, this makes lychee unreliable for this site.Test plan
python docs-site/scripts/check-links.pypasses with 0 broken links across 29 pages🤖 Generated with Claude Code