Data-driven Clusters v4.1 page (11 clusters from Google Sheet) by LukasWallrich · Pull Request #720 · forrtproject/forrtproject.github.io

LukasWallrich · 2026-03-22T22:24:43Z

Summary

Replaces the 7 hardcoded cluster pages (v3) with a fully data-driven approach powered by the FORRT Clusters v4.1 Google Doc and a structured Google Sheet.

What changed

11 clusters (was 7), 93 sub-clusters, ~1300 publications with DOI-resolved APA references
New parsing script (scripts/parse_clusters_to_sheet.py) that:
- Fetches the Google Doc as plain text and parses the hierarchical structure
- Resolves ~1050 DOIs via doi.org content negotiation for clean APA references + BibTeX
- Writes structured data to a Google Sheet (3 tabs: Clusters, Sub-Clusters, Publications with data validation)
- Exports data/clusters_v4.json for Hugo to consume at build time
New Hugo shortcode (layouts/shortcodes/clusters_display.html) that renders all clusters from the JSON data with:
- Sidebar navigation with collapsible cluster tree and colored arrows
- Tabbed sub-clusters (matching the previous UI pattern) with wrapping support
- Sub-cluster headings, italic descriptions, and bulleted reference lists
- Full-text search across clusters, sub-clusters, and all references (with match highlighting and click-to-scroll)
- DOI links rendered as clickable URLs; HTML formatting (e.g. <i> for italics) preserved from doi.org
- Responsive layout (sidebar collapses on mobile with toggle button)
Updated intro text to reflect 11 clusters (was 9)
Deactivated old cluster1.md–cluster7.md (set active = false)

Data pipeline

Google Doc (v4.1)
    ↓  parse_clusters_to_sheet.py
Google Sheet (3 tabs with data validation)
    ↓  --export-json flag
data/clusters_v4.json (committed to repo)
    ↓  Hugo build
clusters_display.html shortcode renders the page

The script supports --dry-run, --skip-doi, --json-only, and --export-json flags. DOI lookups are cached in scripts/doi_cache.json (gitignored) for fast reruns.

Screenshots

The page preserves the established tab-based UI for sub-clusters while adding sidebar navigation and full-text search. Each cluster section has an alternating pastel background color.

Test plan

Run python3 scripts/parse_clusters_to_sheet.py --dry-run to verify parsing (expect 11 clusters, ~93 sub-clusters, ~1297 publications)
Run hugo server and verify /clusters/ renders correctly
Test tab switching within clusters
Test sidebar navigation (expand clusters, click sub-clusters)
Test full-text search (e.g. search for an author name, click result to scroll)
Test on mobile viewport (sidebar toggle, content layout)
Verify print view shows all tab content

🤖 Generated with Claude Code

Replace the 7 hardcoded cluster markdown files with a data-driven approach that reads from a generated JSON file (clusters_v4.json). The data originates from the FORRT Clusters v4.1 Google Doc and is parsed into a Google Sheet, then exported as JSON for Hugo to consume at build time. Key changes: - New script (parse_clusters_to_sheet.py) that parses the GDoc, resolves DOIs via doi.org for clean APA references + BibTeX, writes to Google Sheet, and exports JSON for Hugo - New Hugo shortcode (clusters_display.html) renders all clusters with sidebar navigation, tabbed sub-clusters, and full-text search - Updated intro text to reflect 11 clusters (was 9) - Deactivated old cluster1-7.md files (replaced by data-driven rendering) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-22T22:25:07Z

👍 All image files/references (if any) are in webp format, in line with our policy.

LukasWallrich · 2026-03-22T22:28:11Z

✅ Staging Deployment Status

This PR has been successfully deployed to staging as part of an aggregated deployment.

Deployed at: 2026-03-25 22:33:11 UTC
Staging URL: https://staging.forrt.org

The staging site shows the combined state of all compatible open PRs.

The clusters page now has its own full-text search that covers clusters, sub-clusters, and all references. The site-wide Academic search is redundant and has been disabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-22T22:36:29Z

📝 Spell Check Results

Found 6 potential spelling issue(s) when checking 30 changed file(s):

📄 `static/js/clusters-page.js`

Line	Issue
80	tabEl ==> table
81	tabEl ==> table
83	tabEl ==> table
85	tabEl ==> table
436	tabEl ==> table
437	tabEl ==> table

ℹ️ How to address these issues:

Fix the typo: If it's a genuine typo, please correct it.
Add to whitelist: If it's a valid word (e.g., a name, technical term), add it to .codespell-ignore.txt
False positive: If this is a false positive, please report it in the PR comments.

_{🤖 This check was performed by codespell}

This reverts commit 9ad47b9.

richarddushime · 2026-03-23T18:00:52Z

we now have 2 searches box funcs
I m proposing that we remove the custom search on the left and leave the search on top of clusters

meanwhile i will continue enhancing it , would be good if you can check it asap
@LukasWallrich @flavioazevedo

LukasWallrich · 2026-03-23T18:33:48Z

Thanks @richarddushime! I agree that we need to get rid of one of the searches.

There is also now too much going on in this area - too many boxes. Maybe the syllabus does not need to be in a box?

Can we also remove the outdated figure and really condense the text? I think the following is all we need above the clusters - unless @flavioazevedo disagrees (but Richard, please make the change so that he can look at a complete new draft)

Teaching Open and Reproducible Science shouldn't require educators to spend months sifting through a decade of literature. FORRT simplifies this process by providing a curated, expert-backed framework. Developed by over 50 scholars, our taxonomy organizes open scholarship into 11 distinct clusters, offering a clear pathway for integrating these tenets into your teaching and mentoring, regardless of your field or level of expertise.

richarddushime · 2026-03-23T20:58:14Z

I am from making other adjustements
removed the left search and enhanced the functionality of the search (I limited the search not to go through references because it was getting a lot of results from references and making a user loose necessary text of the clusters)

I would like also clarification about the below

Teaching Open and Reproducible Science shouldn't require educators to spend months sifting through a decade of literature. FORRT simplifies this process by providing a curated, expert-backed framework. Developed by over 50 scholars, our taxonomy organizes open scholarship into 11 distinct clusters, offering a clear pathway for integrating these tenets into your teaching and mentoring, regardless of your field or level of expertise.

Do you mean all the contents before the forrt syllabus and the figure all removed and replaced by this paragraph ?

About the figure i think its good to keep having it as we wait for the updated one (may be flavio can push for its design quickly ?)

richarddushime · 2026-03-23T23:53:36Z

Additionally here is something i am proposing

in the latest commit I Introduces dedicated, indexable URLs for each FORRT cluster (/clusters/cluster-N/) alongside the existing taxonomy hub (/clusters/), so each cluster is a first-class page for search and sharing.

The reason i Added this is that Clusters in sitemap are only covered by 1 url (the main cluster page) or we can have each cluster indexable

by :
Canonical URLs per topic — One clear URL per cluster (and its sub-clusters in-page), instead of relying on a single long hub page or hash-only navigation for discovery.
Unique metadata per URL — Each cluster page can carry its own <title>, meta description, and Open Graph / Twitter fields from front matter, improving relevance for queries and snippet quality.
Structured data — Per-page JSON-LD (cluster_seo_jsonld) ties each URL to explicit taxonomy/entity signals for that cluster.
Topic-cluster information architecture — The hub remains the overview and entry point; cluster pages act as satellites with internal links between hub and subpages, supporting crawl paths and topical grouping.
Stable deep links — Shareable URLs (including hash targets for sub-clusters where used) support accurate social previews, backlinks, and citations to the right slice of the taxonomy.

you can check the preview by https://staging.forrt.org/clusters/cluster- [cluster-number-eg:2 or 2] eg: https://staging.forrt.org/clusters/cluster-2/

LukasWallrich · 2026-03-25T09:46:47Z

Thanks Richard! The individual pages are great! Yes, please remove all text and the figure. Let's focus on having an accurate website. I don't see why we need this rather complex figure if we have the same information right below (in the sidebar) as readers generally want to get to the point ... so I would personally always hide it behind a details tag, or an about page if we want to talk more about the process - but that can be discussed once we have an updated figure. Showing inconsistent data is unnecesssary, unprofessional and confusing.

LukasWallrich · 2026-03-25T09:48:26Z

And one issue with the separate pages: the search no longer works across pages. I am ok with that if we rename it to "search this cluster" - but ideally I think I'd prefer to have a search of all clusters. What do you think?

richarddushime · 2026-03-25T10:20:52Z

And one issue with the separate pages: the search no longer works across pages. I am ok with that if we rename it to "search this cluster" - but ideally I think I'd prefer to have a search of all clusters. What do you think?

I saw that issue but i left it pending because i was waiting for your validation first of the new Design,
The search can work the same way as the current hub search (search within all clusters)

LukasWallrich · 2026-03-25T10:45:15Z

And one issue with the separate pages: the search no longer works across pages. I am ok with that if we rename it to "search this cluster" - but ideally I think I'd prefer to have a search of all clusters. What do you think?

I saw that issue but i left it pending because i was waiting for your validation first of the new Design, The search can work the same way as the current hub search (search within all clusters)

That sounds a bit difficult to me - doesn't that then require anchors on each paragraph that you can link to from another page? But great if you can implement it!

Replace verbose multi-paragraph intro with a compact two-column layout (text left, clickable thumbnail right) and remove syllabus section. Adds lightbox overlay with magnifying glass hint for discoverability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Sub-cluster tabs in fixed 3-column grid with fixed height and centered text - Remove reference counts from tabs - Full-width layout for clusters display section - Darker background on inactive tabs - "Update pending" badge on clusters diagram Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…av behavior - Remove ~145 lines of dead .clusters-cluster-subpage CSS and ~65 lines of dead JS (element never exists in DOM) - Remove unused list.html template and .clusters-title-toolbar CSS - Extract duplicated color arrays to shared partial (colors.html) - Remove duplicate intro text from _index.md (was inconsistent with intro.html) - Fix sub-cluster nav on individual pages to use in-page anchors for the active cluster, aligning scroll behavior with the hub page - Fix redundant .toLowerCase() in search - Add min-height on .cluster-tab-content to keep footer below fold when viewing short sub-clusters Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LukasWallrich · 2026-03-25T22:08:53Z

@richarddushime thanks for the work on this today (and before). I did the final cleanups ... should be good to go.

richarddushime · 2026-03-25T22:36:55Z

i think what s remaining is documenting the process of how its generated up to rendering pages in the website
and also updating data processing with a manual workflow dispatch incase the sheets have been updated ?? but for now this can be shipped 🥇

richarddushime

LGTM 🥇

richarddushime · 2026-03-25T22:47:23Z

i think what s remaining is documenting the process of how its generated up to rendering pages in the website and also updating data processing with a manual workflow dispatch incase the sheets have been updated ?? but for now this can be shipped 🥇

TBF as Enhancement

LukasWallrich requested a review from a team as a code owner March 22, 2026 22:24

forrtproject deleted a comment from github-actions bot Mar 22, 2026

LukasWallrich and others added 3 commits March 22, 2026 22:42

Revert "Disable site-wide search (replaced by clusters page search)"

9b9bdc4

This reverts commit 9ad47b9.

Merge branch 'master' into data-driven-clusters-v4

30d65f9

enhancement:clusters page

671781c

rm: left custom search, enhancement of the UI

2f3a56b

richarddushime added 2 commits March 23, 2026 22:03

jump to active when tab clicked of sub cluster

476afad

SEO booster for clusters

d328ce1

LukasWallrich and others added 4 commits March 25, 2026 16:34

Update clusters intro: 100+ scholars, remove stale date

331b813

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'master' into data-driven-clusters-v4

eaefe57

richarddushime approved these changes Mar 25, 2026

View reviewed changes

richarddushime merged commit 05001bb into master Mar 25, 2026
5 checks passed

richarddushime deleted the data-driven-clusters-v4 branch March 25, 2026 22:47

Conversation

LukasWallrich commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Data pipeline

Screenshots

Test plan

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

LukasWallrich commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Spell Check Results

📄 static/js/clusters-page.js

ℹ️ How to address these issues:

Uh oh!

richarddushime commented Mar 23, 2026

Uh oh!

LukasWallrich commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richarddushime commented Mar 23, 2026

Uh oh!

richarddushime commented Mar 23, 2026

Uh oh!

LukasWallrich commented Mar 25, 2026

Uh oh!

LukasWallrich commented Mar 25, 2026

Uh oh!

richarddushime commented Mar 25, 2026

Uh oh!

LukasWallrich commented Mar 25, 2026

Uh oh!

LukasWallrich commented Mar 25, 2026

Uh oh!

richarddushime commented Mar 25, 2026

Uh oh!

richarddushime left a comment

Choose a reason for hiding this comment

Uh oh!

richarddushime commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LukasWallrich commented Mar 22, 2026 •

edited

Loading

LukasWallrich commented Mar 22, 2026 •

edited

Loading

github-actions bot commented Mar 22, 2026 •

edited

Loading

📄 `static/js/clusters-page.js`

LukasWallrich commented Mar 23, 2026 •

edited

Loading