Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation Improvement Suggestions #169

Closed
waylan opened this issue Mar 31, 2023 · 3 comments
Closed

Documentation Improvement Suggestions #169

waylan opened this issue Mar 31, 2023 · 3 comments
Labels
S: triage Issue needs triage.

Comments

@waylan
Copy link

waylan commented Mar 31, 2023

While converting Python-Markdown from a custom script to pyspelling for its spelling workflow, I have noted a few deficiencies in the docs as outlined below. Unfortunately, I did not keep notes as I worked through things, so I can't necessarily point to each missing item in the docs. Instead this is more of an overview of my impressions. But, my hope is that they could potentially provide inspiration for improvements. Note that I have only used aspell, but any mention to aspell below could apply to other supported backends.

My first observation is that the docs dive right into the deep end from the start. However, by skipping over the basics, you may be missing an entire subset of potential users. Pyselling has some amazingly powerful features in its pipelines, which are well documented. However, the basic functionally is a great help to getting started with automated spell checking, but almost completely ignored in the docs.

IMO, one of pyspelling's best features is the fact that it provides a simple mechanism to list files and then collects them, to pass to the backend. It also then collects all of the reported failures and reports them as a single report with one pass/fail, which is needed for CI purposes. Finally, it makes dictionary compilation mostly transparent. For a user to use apsell directly, they need to write their own script to handle all that. A quickstart guide/tutorial which highlighted these features alone would have been a great help in convincing me that aspell was useful to me. Instead my eyes glazed over with all of the complex pipelines mentioned rather early on. I was left with the impression that the piplelines were required. However, only after looking at the source code did I realize that there was a Aspell.spell_check_no_pipeline method which didn't require use of any pipelines. All I needed was a wordlist, file list, and some basic aspell options, and I could replicate my existing custom script with little effort.

Given the above, there are a few things I would like to see in the documentation:

  1. Document all of the options one can set under the aspell option (currently only the language specific options are documented). For example, without any pipeline options set, the user should set an aspell mode. But the docs don't mention that at all.
  2. Document what each pipeline passes to aspell. Before I realized what it was doing, I was very confused as to why aspell wasn't acting as I expected. For example, I assumed the HTML filter was passing the HTML source to aspell as HTML with aspell's HTML mode. Once I realized what was actually happening (extracted text passed as plain text with mode set to None), then my expectations changed and I better understood where to look for ways to alter behavior.
  3. A quickstart tutorial which demonstrates how to set up a test which does not use any pipeline options would be a great help. Maybe the user needs to run a spellcheck on her Markdown based README which does not use any extensions. The tutorial guides her through pointing her config at the README file and her custom wordlist, and setting aspell.mode to markdown. Of course, part 2 of the tutorial could then cover a more complex setup using pipelines. While the part 2 stuff is already mostly covered in the existing docs, seeing the progression from simple to complex could be helpful in addressing point 2 above.
@gir-bot gir-bot added the S: triage Issue needs triage. label Mar 31, 2023
@waylan
Copy link
Author

waylan commented Apr 6, 2023

I have one additional remark to make. After experimenting some more I have realized that you can also use aspell's markdown mode with a pipeline. So, for example, if your Python code comments use Markdown syntax. There is no need to define regex to ignore code spans, ect. Instead, simply set spell.mode to markdown and aspell will properly handle the comments. This is especially useful because the pipeline is rather limited and cannot fully support the complexities of Markdown's syntax. Yet at the same time, code comments are unlikely to include non-standard Markdown syntax, which means aspell's basic Markdown support is sufficient.

I would think that would be the suggested approach, yet it is not even mentioned as a possibility in the documentation. For that matter, pyspelling's own pyspell config defines a custom pipeline to ignore code blocks and spans. I would suggest changing it to use markdown mode instead.

@facelessuser
Copy link
Owner

I do need to do a better job at pointing out you can utilize A spell options directly. I can certainly add a note in the Markdown filter that you can use A spell's Markdown filter instead. I'm not sure if Hunspell has that though. Regardless, I can make such things more clear.

I kind of wrote of wrote the tool for myself and did the bare minimum to document it. It's become a bit more popular than I had thought it would, so it would probably help to improve the docs more

@facelessuser
Copy link
Owner

I've put in more explicit text to notify the reader that a pipeline is not explicitly needed. We better describe that spell checker filters can be used instead of pipeline filters and that pipeline filters are mainly provided for situations where the built-in spell checker filters are not sufficient. We've provided a basic example showing this.

We do not document every Aspell/Hunspell option that is supported as that is simply too much work. We pretty much allow any and all except for those that do not make sense: replacement features, interactive suggestion features, etc. In recent versions, we take an ignore list approach as opposed to an allow list approach.

We've tried to more explicitly note what each filter returns as well.

I've moved the quickstart guide to a separate issue as that will require a more thoughtful approach and time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: triage Issue needs triage.
Projects
None yet
Development

No branches or pull requests

3 participants