Skip to content

Conversation

@msimberg
Copy link
Collaborator

This fixes more typos (following #57), and adds a spellchecking action (https://github.com/marketplace/actions/check-spelling) that warns about suspicious spelling.

I've been testing this on another PR, and it's not perfect, but I think the typos it catches are still better than the false positives it reports.

Some notes:

  • there's a whitelist of words in .github/actions/spelling/allow.txt
  • there's a whitelist of files in .github/actions/spelling/only.txt (regex, only matching .md files right now)
  • I've added commonly used words, technical terms etc. to the whitelist, but I have not tried to be exhaustive. We can expand the list as needed, or simply ignore real words that it doesn't know about.
  • This uses SARIF reporting which shows up as comments from github-advanced-security. Sounds a bit scary, but I find that was the nicest reporting. There are also alternatives to make the action add regular comments but this requires setting up additional workflows. There are examples of doing this, so if it's something someone wants I can look into it, but I was too lazy to set it up right now.

Finally: if this is too noisy, we disable it. Remember that the typos it reports are never blocking, they can always be ignored.

@msimberg msimberg requested review from RMeli and bcumming as code owners April 22, 2025 15:04
@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/98

Comment on lines +106 to +107
concretise
concretizer
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I haven't made any judgment about e.g. z vs s in words like these. Feel free to bikeshed as much as you want about what goes into this file, but let's keep it practical.

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/98

### FirecREST

Bristen can also be accessed using [FircREST][ref-firecrest] at the `https://api.cscs.ch/ml/firecrest/v1` API endpoint.
Bristen can also be accessed using [FirecREST][ref-firecrest] at the `https://api.cscs.ch/ml/firecrest/v1` API endpoint.

Check failure

Code scanning / check-spelling

Unrecognized Spelling

[Firec](#security-tab) is not a recognized word. \(unrecognized-spelling\)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FirecREST is a word that apparently it will not accept when it's in allow.txt, because it treats it as two words, Firec and REST. The first is not a recognized word.

patterns.txt allows regexes as a whitelist, and I've added FirecREST into that file instead.

### FirecREST

Clariden can also be accessed using [FircREST][ref-firecrest] at the `https://api.cscs.ch/ml/firecrest/v1` API endpoint.
Clariden can also be accessed using [FirecREST][ref-firecrest] at the `https://api.cscs.ch/ml/firecrest/v1` API endpoint.

Check failure

Code scanning / check-spelling

Unrecognized Spelling

[Firec](#security-tab) is not a recognized word. \(unrecognized-spelling\)

Uenv are user environments that provide scientific applications, libraries and tools.
This page will explain how to find, dowload and use uenv on the command line, and how to enable them in SLURM jobs.
Uenv are user enviroments that provide scientific applications, libraries and tools.

Check failure

Code scanning / check-spelling

Unrecognized Spelling

[enviroments](#security-tab) is not a recognized word. \(unrecognized-spelling\)
@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/98

1 similar comment
@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/98

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/98

@msimberg
Copy link
Collaborator Author

@bcumming
Copy link
Member

I am a little bit confused about the output.
Does it show the spelling errors in a comment on the PR, like the FirecREST errors above? Or do we have to navigate to https://github.com/eth-cscs/cscs-docs/security/code-scanning?query=pr%3A98+is%3Aopen ?

And does it show only spelling mistakes in the modified/added text, or for the whole document?

@msimberg
Copy link
Collaborator Author

msimberg commented Apr 23, 2025

I am a little bit confused about the output. Does it show the spelling errors in a comment on the PR, like the FirecREST errors above? Or do we have to navigate to https://github.com/eth-cscs/cscs-docs/security/code-scanning?query=pr%3A98+is%3Aopen ?

And does it show only spelling mistakes in the modified/added text, or for the whole document?

Sort of both... let me expand:

The action currently checks all files, and that's why the "Check Spelling" action fails. There are more words that I haven't whitelisted, that it doesn't like. This is a bit ugly I admit.

On the other hand, the SARIF output only shows the inline "review" comments on files that have changed, like clariden.md. I fixed other typos in clariden.md, that's why it was showing the FirecREST "typo".

All the typos that the action found are visible on https://github.com/eth-cscs/cscs-docs/security/code-scanning?query=pr%3A98+is%3Aopen, but those include files that haven't been changed.

One peculiarity is that one can enable SARIF output, or check only changed files, but they can't unfortunately be enabled at the same time. If I enable the only-check-changed-files option I'll need to look at the other options for reporting errors.

TL;DR: it detects typos in all files and outside changed areas within a changed file, but the inline comments only show up on changed parts.

It's definitely not ideal, but decent? 🤷

Edit: with the current setup, I could also set the action to always succeed, despite typos in other files, since they're not really relevant for the changes in the PR and the typos are anyway just suggestions.

@bcumming
Copy link
Member

One problem is that it will generate a lot of false-positive comments about typos for larger changes, for example it doesn't like the second cyberduck on the following line:

![cyberduck](../images/storage/cyberduck.png)

... it really shouldn't be flagging filenames, link names, and other markdown/html artifacts.

We can turn it on, to see how it works.
But I think that it will likely generate a lot of noise, and be turned off ultimately.

@msimberg
Copy link
Collaborator Author

One problem is that it will generate a lot of false-positive comments about typos for larger changes, for example it doesn't like the second cyberduck on the following line:

![cyberduck](../images/storage/cyberduck.png)

... it really shouldn't be flagging filenames, link names, and other markdown/html artifacts.

We can turn it on, to see how it works. But I think that it will likely generate a lot of noise, and be turned off ultimately.

I do worry about noise as well. My suggestion would be to mostly ignore the full "security scanning" output, and only care about what's reported for the changed lines.

The tool does have a feature to ignore words with regexes as well (FirecREST in patterns.txt is in theory a regex, even if it's the most trivial regex), which means that stuff like the image syntax could be ignored through that (in fact, let me try that right away).

@msimberg
Copy link
Collaborator Author

Also, there are a bunch of additional lists/dictionaries that one can add to ignore things like URLs etc. that I haven't added right now.

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/98

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/98

2 similar comments
@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/98

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/98

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/98

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/98

@msimberg
Copy link
Collaborator Author

I've added a few more patterns just for testing, and it did get rid of more false positives. If you're happy with the approach I'd stick to this and see how much noise it generates. I'm very happy to be pinged whenever there's too much noise and I can try to add more exclusions as we find the worst offenders, and as said, if it's too much noise we disable it eventually.

Copy link
Member

@bcumming bcumming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try this out for a while.

We can continue to improve the filters to reduce noise, and if it causes too much noise of confusion we can remove it.

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/98

@bcumming bcumming merged commit e8700ee into eth-cscs:main Apr 24, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants