Skip to content

Conversation

dscho
Copy link
Member

@dscho dscho commented Sep 2, 2025

Changes

  • Update Pagefind (the library that implements the search on the site)
  • Enable the "Pagefind Playground"

Context

The recently-released new version of Pagefind sports a pretty neat Playground for experimenting with the search parameters (to optimize them for better search results).

I mentioned this here, and don't want to derail the highly desirable effort to refresh the website, therefore I'll not even mention it there that I opened this PR.

@jvns @To1ne FYI this is the PR to upgrade Pagefind and enable the Playground.

@dscho dscho requested a review from To1ne September 2, 2025 20:25
@dscho dscho self-assigned this Sep 2, 2025
@dscho
Copy link
Member Author

dscho commented Sep 2, 2025

  • Enable the "Pagefind Playground"

To see this in action, direct your web browser to https://dscho.github.io/git-scm.com/pagefind/playground/.

@dscho dscho force-pushed the upgrade-pagefind branch 2 times, most recently from 4097e80 to 43171c4 Compare September 4, 2025 15:25
@dscho
Copy link
Member Author

dscho commented Sep 4, 2025

Okay, I did have to work on this a bit more than I thought because of a "link checker report" caused by enabling the Pagefind Playground. Turns out that lychee (the link checker) posed some challenges that can only be resolved using options that are now available, but had not been available when I introduced those link checks.

So now this is ready to be merged!

@jvns @To1ne opinions?

This version most notably sports an optional Playground (see
https://pagefind.app/docs/playground/) that _should_ help optimize the
search parameters better.

Release notes: https://github.com/Pagefind/pagefind/releases/tag/v1.4.0

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When a HTML file in a subdirectory references a file in the same
subdirectory via a relative path (e.g. Pagefind Playground's
`<link rel="stylesheet" href="./pagefind-playground.css" />` in
`pagefind/playground/index.html`), lychee's `--base-url` option causes
unwanted transformations: It pretends that that file is actually at the
base URL. For example, running with `--base-url
https://dscho.github.io/git-scm.com` will pretend that that CSS is at
https://dscho.github.io/git-scm.com/pagefind-playground.css, skipping
the `pagefind/playground/` altogether.

This is a known issue, and lychee introduced a `--root-dir` option to
accommodate. However, there remain problems, see e.g.
lycheeverse/lychee#1718.

In our concrete case, we cannot easily use `--root-dir` because when we
deploy to forks, the absolute paths have a `/git-scm.com/` prefix yet
that is not the suffix of the absolute path of the `public/` folder!

Even though it is not easily used, it _is_ possible: by constructing a
separate path and adding a symbolic link whose name _is_ `git-scm.com`
and which points to the `public/` directory. Then the parent directory
of that symlink can be used as `--root-dir`.

There is still one caveat: lychee ignores symbolic links when populating
the initial set of files from a given base directory. So let's pass the
path to that symlink, with trailing slash, as the base directory
parameter. That seems to work.

This change is needed because we are about to enable Pagefind's shiny
new Playground, which has an `index.html` file in
`/pagefind/playground/` that references a `.css` in the same directory
via a relative link (which would be mishandled when using `--base-url`).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This makes experimenting with Pagefind's search parameters easier,
allowing to fiddle with the parameters in real-time by directing the
browser to the `/pagefind/playground` URL.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@To1ne
Copy link
Collaborator

To1ne commented Sep 6, 2025

To see this in action, direct your web browser to https://dscho.github.io/git-scm.com/pagefind/playground/.

@dscho Fancy, but I don't have any idea what to do with this.

About the lychee changes, they seem to work well when you deploy to your fork, but how can we ensure they also work on git-scm.com? I did not test this locally, and to be honest, I don't have any idea how to. But I trust you, so I'll approve already then you can merge if you feel it's safe.

Copy link
Collaborator

@To1ne To1ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine.

@dscho
Copy link
Member Author

dscho commented Sep 6, 2025

To see this in action, direct your web browser to https://dscho.github.io/git-scm.com/pagefind/playground/.

@dscho Fancy, but I don't have any idea what to do with this.

You can see how the search results change with changed search parameters, such as: how much weight to give word frequency? How much weight on page length? Etc

Just type in a search term and change the sliders.

About the lychee changes, they seem to work well when you deploy to your fork, but how can we ensure they also work on git-scm.com?

I did test lightly, in particular what would happen if base_url=http://git-scm.com.

In any case, I'll see directly after merging if it breaks. And if it does, I'll revert, that's what the Revert button is for 😁

@dscho dscho merged commit 802cbb4 into git:gh-pages Sep 6, 2025
1 check passed
@dscho dscho deleted the upgrade-pagefind branch September 6, 2025 07:50
@dscho
Copy link
Member Author

dscho commented Sep 6, 2025

Seems to have worked all right:

Summary

Status Count
🔍 Total 1538563
✅ Successful 1424161
⏳ Timeouts 0
🔀 Redirected 0
👻 Excluded 114402
❓ Unknown 0
🚫 Errors 0
⛔ Unsupported 0

For comparison, the previous deployment reported these numbers:

Summary

Status Count
🔍 Total 1541452
✅ Successful 61428
⏳ Timeouts 0
🔀 Redirected 0
👻 Excluded 1480024
❓ Unknown 0
🚫 Errors 0
⛔ Unsupported 0

I have little idea why the new run's Total is so much lower, but it excites me that the Excluded is drastically lower! Which means that the previous way to run lychee excluded waaaay too much, and I fixed a bug (that thankfully did not uncover broken links).

@dscho
Copy link
Member Author

dscho commented Sep 6, 2025

FWIW these numbers are consistent with the numbers of the "broken link" issue at dscho#22 (comment) which was opened because of the false positive pagefind-playground.css that necessitated the fix in this here PR where I replaced the --base-url construct with a --root-dir one.

I merely missed the dramatic change of the Excluded number because they're both long and both start with a 1...

@To1ne
Copy link
Collaborator

To1ne commented Sep 6, 2025

In any case, I'll see directly after merging if it breaks. And if it does, I'll revert, that's what the Revert button is for 😁

True. But I didn't have time to babysit the deploy, so I didn't merge yet.


And about the link report, I don't know how to read it. But if you're happy, I'm too!

@dscho
Copy link
Member Author

dscho commented Sep 6, 2025

And about the link report, I don't know how to read it.

The numbers refer to unique URLs that were link-checked.

That is, the site has about 1.5 million distinct URLs, previously 1.4 million of them were not checked for one reason or another, and only about a bit more than one hundred thousand were checked.

All external URLs are excluded from checking. That is, the link to https://gitlab.com/gitlab-org/gitlab-ce/tree/master is not verified. But a link to ../videos is checked.

Now, there are a bit fewer distinct URLs, but only some sixty thousand of them are excluded, and the rest of them (which is now 96% of them) are checked.

I recreated the lychee invocations locally and ran them with --verbose, to see where the difference comes from. Turns out that the previous way, the one without --root-dir was broken in more than one way! For example, when the file public/about/trademark.html was checked, the link (../about/trademark) was remapped to https://git-scm.com/about/trademark, which is an "external" link, and it was therefore excluded. By the same token, the vast majority of totally internal links were mistaken for external ones.

So I inadvertently (and unintentionally!) snuck in a rather important bug fix in this here PR...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants