-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable pandoc spellcheck filter #333
Conversation
AppVeyor build 1.0.107 for commit 2f30622 is now complete. Found 74 potential spelling error(s). Preview:adipiscingaliqua aliquip amet anim aute cillum commodo consectetur consequat culpa cupidatat deserunt DOI dolore Duis eiusmod elit enim eq esse et eu Excepteur exercitation fugiat incididunt ipsum irure isbn janeroe johndoe labore laboris laborum LaTeX Lorem magna manubot mollit nisi nostrud nulla occaecat officia ORCID pariatur permalink PMC pmcid pmid proident PubMed qui quis reprehenderit rootstock s sed sint Striketh... |
The first draft of the spellcheck pull request is working in AppVeyor and Travis CI. Travis CI doesn't store the spelling errors as an artifact or add them as a pull request comment. However, it can dump them in the build log. |
Really great progress! Will help catch spelling errors hopefully. Did you look into detecting the language set in It'd be nice to show spellcheck output as a github actions step. Then it'd be easy to expand that section of the output to check for spelling errors. Was the motivation for excluding Actions that it doesn't use
I think so. The only concern I'd have is non-english manuscripts, but it's not too hard to turn off or switch the language.
Doesn't matter. I view Travis CI support mostly for backwards compatibility at this point and less about new features.
From agitter/my-manuscript#15 (comment), it looks like lot's of manubot related terms cause problems: LaTeX, ORCID, permalink... not sure how many are because this is on delete-me which uses words like DOI outside of citekeys. |
I haven't, but I will. I believe this would involve downloading the appropriate aspell dictionary with apt-get and then specifying the language in the
I initially planned to only set it up for AppVeyor because it can preview spelling errors in the pull request comment. If we enable it for GitHub Actions, would you recommend a new conditional step between I should be able to come up with a reasonable custom dictionary based on the errors from delete-me. |
It might be better to save language detection for a follow up pull request. Aspell dictionary files use ISO 639 language codes. An exception is Portuguese, which has The first step would be reading the language from |
I was thinking a conditional step that writes the spellcheck errors file to standard output (if the file exists. But whatever you think is best (including nothing) is okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agitter, this looks good to me. Up to you if you want to pursue anything else in this PR or leave it for later.
build/build.sh
Outdated
# Spellcheck | ||
if [ "${SPELLCHECK:-}" = "true" ]; then | ||
export ASPELL_CONF="add-extra-dicts $(pwd)/build/assets/custom-dictionary.txt" | ||
pandoc --lua-filter spellcheck.lua output/manuscript.md | sort | uniq > output/spelling-errors.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change
pandoc --lua-filter spellcheck.lua output/manuscript.md | sort | uniq > output/spelling-errors.txt
to the following line:
pandoc --lua-filter spellcheck.lua output/manuscript.md | uniq | while read word; do grep -on "\<$word\>" output/manuscript.md; done | sort -h > output/spelling-errors.txt
^ This adds line numbers to each misspelled word. Would be useful to identify the exact spot of the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This fix appends file names at the beginning of each "misspelled" word:
pandoc --lua-filter spellcheck.lua output/manuscript.md | uniq | while read word; do grep -on "\<$word\>" content/*md; done | sort -h -t ":" -k 1b,1 -k2,2 > output/spelling-errors.txt
^ Assuming the desire is to only spell check md files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: David Nicholson <dnicholson329@gmail.com>
AppVeyor build 1.0.108 for commit 14215d3 is now complete. Found 118 potential spelling error(s). Preview:46:XXXX46:XXXX 46:XXXX 46:XXXX 48:johndoe 56:XXXX 56:XXXX 56:XXXX 56:XXXX 118:XXXX 118:XXXX 118:XXXX 118:XXXX 118:XXXX 118:XXXX 118:XXXX 118:XXXX 120:johndoe 120:johndoe 122:johndoe 122:johndoe 125:XXXXXXXX 130:XXXX 130:XXXX 130:XXXX 130:XXXX 130:XXXX 130:XXXX 130:XXXX 130:XXXX 132:janeroe 132:janeroe 187:adipiscing 187:aliqua 187:amet 187:consectetur 187:dolore 187:eiusmod 187:elit 187:incididunt 187:ipsum 187:labore 1... |
Co-authored-by: David Nicholson <dnicholson329@gmail.com>
Make dictionary lowercase now that case is ignored
AppVeyor build 1.0.110 for commit 24241ee failed. |
Thanks for re-reviewing @dhimmel. My plan is to get @danich1's line numbering idea to work and then merge this. I added some light documentation to I'll close manubot/manubot#31 after merging this and create a new rootstock issue with the remaining spellcheck improvements, including GitHub Actions. |
grep exits with 1 if there is no match found, which makes the pipeline fail due to the pipefail option Also make grep case insensitive
AppVeyor build 1.0.111 for commit 311269d failed. |
AppVeyor build 1.0.112 for commit a63cb12 is now complete. Found 58 potential spelling error(s). Preview:content/02.delete-me.md:44:adipiscingcontent/02.delete-me.md:44:aliqua content/02.delete-me.md:44:amet content/02.delete-me.md:44:consectetur content/02.delete-me.md:44:dolore content/02.delete-me.md:44:eiusmod content/02.delete-me.md:44:elit content/02.delete-me.md:44:incididunt content/02.delete-me.md:44:ipsum content/02.delete-me.md:44:labore content/02.delete-me.md:44:Lorem content/02.delete-me.md:44:magna content/02... |
The build passes now, but I get different output from the Travis CI and AppVeyor builds for a63cb12. Edit: I okay with that difference because it isn't due to our usage of the spellcheck filter. |
AppVeyor build 1.0.113 for commit 8dee14c is now complete. Found 58 potential spelling error(s). Preview:content/02.delete-me.md:44:adipiscingcontent/02.delete-me.md:44:aliqua content/02.delete-me.md:44:amet content/02.delete-me.md:44:consectetur content/02.delete-me.md:44:dolore content/02.delete-me.md:44:eiusmod content/02.delete-me.md:44:elit content/02.delete-me.md:44:incididunt content/02.delete-me.md:44:ipsum content/02.delete-me.md:44:labore content/02.delete-me.md:44:Lorem content/02.delete-me.md:44:magna content/02... |
Using this with a complex manuscript revealed some bugs: greenelab/covid19-review#261 I'll continue testing there. In the meantime, I don't recommend this for general use in other manuscripts. |
merges manubot/rootstock#333 Installs and runs aspell via the pandoc spellcheck filter when the SPELLCHECK environment variable is true. Spellcheck outputs candidate spelling errors and the filenames and line numbers in which they occur. Initially only supported in Travis CI and AppVeyor builds.
References manubot/manubot#31
Initial implementation from agitter/my-manuscript#15
There are a few open questions to discuss:
SETUP.md
?)I've noticed that many spelling errors come from
00.front-matter.md
. We could automatically add all of the manuscript metadata to the custom dictionary, but then we would ignore typos in affiliations or titles. I would prefer to save that for a separate pull request even if we do want it.