Fix broken docs links and add CI link checker by CallumMcMahon · Pull Request #116 · futuresearch/futuresearch-python

CallumMcMahon · 2026-02-09T11:04:12Z

Summary

Fix broken relative links across all guide pages: CSV download links now point to GitHub blob URLs, notebook links point to converted HTML docs pages, and .md extensions are dropped from internal reference links
Add a link checker script (docs-site/scripts/check-links.py) that parses all HTML in the static build output and verifies internal links resolve to existing pages
Integrate the check into the deploy-docs CI workflow, running after build and before deploy

Pages fixed

fuzzy-join-without-keys — company_info.csv, valuations.csv, notebook link
add-column-web-lookup — saas_products.csv
classify-dataframe-rows-llm — hn_jobs.csv
filter-dataframe-with-llm — hn_jobs.csv, reference/SCREEN.md
resolve-entities-python — case_01_crm_data.csv, notebook link

Why a custom script instead of lychee?

lychee has a parser bug where links appearing after <pre><code> blocks are silently dropped. Since every docs page has code examples, this makes lychee unreliable for this site.

Test plan

python docs-site/scripts/check-links.py passes with 0 broken links across 29 pages
Verified the script correctly detects the original broken links when reverted
CI workflow runs successfully on this PR

🤖 Generated with Claude Code

CSV links now point to GitHub instead of serving raw files from the docs site, and notebook links point to the converted HTML docs page. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Same pattern as fuzzy-join-without-keys: CSV links now point to GitHub, notebook links point to the converted HTML docs pages, and the .md extension is dropped from internal reference links. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds a Python script that parses all HTML files in the static build output and verifies internal links resolve to existing pages. Runs after build in the deploy-docs workflow to catch broken links before deployment. Note: lychee was evaluated but has a parser bug where links after <pre><code> blocks are silently dropped, making it unreliable for docs sites with code examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Extract links from <meta> and <link> tags (catches og:image 404s) - Check GitHub blob/main links by verifying the local file exists - SKIPPED_URLS is now per-URL instead of per-domain, so new links to an already-skipped domain still get flagged - CHECKED_DOMAINS remains domain-level for our own properties - Deduplicate links per page Known failure: everyrow.io/everyrow-og.png is 404 on every page. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rgambee · 2026-02-10T19:26:57Z

docs/add-column-web-lookup.md

 ```

-The dataset is a list of 246 SaaS and developer tools like Slack, Notion, Asana. Download [saas_products.csv](data/saas_products.csv) to follow along. We find the annual price of each product's lowest paid tier, which isn't available through any structured API; it requires visiting pricing pages that change frequently and present information in different formats.
+The dataset is a list of 246 SaaS and developer tools like Slack, Notion, Asana. Download [saas_products.csv](https://github.com/futuresearch/everyrow-sdk/blob/main/docs/data/saas_products.csv) to follow along. We find the annual price of each product's lowest paid tier, which isn't available through any structured API; it requires visiting pricing pages that change frequently and present information in different formats.


Since we use LFS to store CSVs, I don't see any data when I go to that URL. Should we link to the raw content instead, in this case https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/saas_products.csv?

Ahh nice, I hadn't thought of that! let me change it over

hnykda

Good catch

CallumMcMahon and others added 3 commits February 9, 2026 11:04

Fix broken links in fuzzy join guide

91cdcc5

CSV links now point to GitHub instead of serving raw files from the docs site, and notebook links point to the converted HTML docs page. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix broken links across all guide pages

3eb2348

Same pattern as fuzzy-join-without-keys: CSV links now point to GitHub, notebook links point to the converted HTML docs pages, and the .md extension is dropped from internal reference links. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CallumMcMahon changed the title ~~Fix broken links in fuzzy join guide~~ Fix broken docs links and add CI link checker Feb 9, 2026

CallumMcMahon requested review from hnykda and rgambee February 10, 2026 18:42

rgambee approved these changes Feb 10, 2026

View reviewed changes

LFS file links directly to file storage

2b67e6f

hnykda approved these changes Feb 11, 2026

View reviewed changes

CallumMcMahon merged commit fddae60 into main Feb 11, 2026
2 checks passed

CallumMcMahon deleted the fix/broken_links branch February 11, 2026 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broken docs links and add CI link checker#116

Fix broken docs links and add CI link checker#116
CallumMcMahon merged 5 commits intomainfrom
fix/broken_links

CallumMcMahon commented Feb 9, 2026 •

edited

Loading

Uh oh!

rgambee Feb 10, 2026

Uh oh!

CallumMcMahon Feb 10, 2026

Uh oh!

hnykda left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

CallumMcMahon commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pages fixed

Why a custom script instead of lychee?

Test plan

Uh oh!

rgambee Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

CallumMcMahon Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

hnykda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CallumMcMahon commented Feb 9, 2026 •

edited

Loading