New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add non-enchant spellcheck support #505
Comments
Thank you @kakaroto for continuing your work to improve Manuskript. I'll be away for a long weekend so I won't have a chance to look at this in earnest, or your other PR, until I return next week. As you might already know you can work around the lack of a 64-bit PyEnchant wheel for Windows by using the 32-bit version of Python and PyEnchant. This is described in Appendix A: Install Required Software Packages on Windows 7 and Higher. |
Thanks gedakc, yes I know about the workaround, but for some reason python-32b is slow for me, and instead of trying to investigate that slowdown (which would probably end with the conclusion of "It's a python problem"), I'm trying to improve the spellcheck support. I'm testing my changes on both 32-bit and 64-bit now which allows to test support for having both spellcheckers available to users. Enjoy your weekend! |
Three suggestions I just thought of before I go:
|
|
This modifies the Spellchecker abstraction to add a new dictionary support, with support for pyspellchecker. It also changes the main UI so that multiple libraries can be supported and dictionaries provided to the user. The custom dictionary of pyspellchecker has to be handled manually, and the performance and words of this library isn't on par with PyEnchant, but at least it works with 64 bits. Fixes olivierkes#505
This is probably a pipe dream, but I'd love a mechanism where there can be multiple custom dictionaries as properties of a project. |
Not really a pipe dream :) I am now adding support for symspellpy which is so much faster than pyspellchecker (near instantaneous spelling suggestions), but it has one issue in that it doesn't come with a dictionary AND it's very slow to load the dictionary on startup (from 10 to 30 seconds). So I'm thinking of having an interface in the Settings window for configuring spellcheckers, things like the distance in suggestions (how many characters to change/remove to find the right suggestion match), and a way to select/download/add dictionaries for symspellpy (it would take a minute to load the dictionary and save I'm not much of a UI person though, so I don't think I will/can do that interface, but if someone wants to design it, and finds a way to make it not horrible/complex to the user, I can do the internal code for handling all of that. |
@DonEdwards requires a little bit of work, but with the latest changes I just pushed to #507, you can now do what you wanted. See bottom of this comment for details. I just added support for symspellpy which is a much better spellchecker library than pyspellchecker. Most specifically, it's a lot faster at giving suggestions and the 1 second time to load the dictionary is barely perceptible.
Then enter a python shell and load that dictionary and export it in symspellpy format :
Then copy that file into the appropriate directory, since I use manuskript on Windows, it was Now for @DonEdwards if I wanted a per-project custom dictionary, I would just copy the en_US.sym file into multiple copies, one per project, something like "en_US_projectA.sym" and "en_US_projectB.sym", etc.. then in the Tools->Dictionary I'd see the two entries, and each dictionary would have its own custom dictionary that will come along with it (Note the |
@kakaroto a thought recently came to me. Did you investigate compiling PyEnchant for Window 64-bit? I came across the following link that indicates the task might not be easy but sounds possible: StackOverflow - Install pyenchant on a Windows 64-bit machine If building a 64-bit Enchant and PyEnchant works, then we could create a separate win64 PyInstaller package for Manuskript to complement the current win32 version. Please note that I unfortunately I haven't had time to look at this issue in earnest as I have lots of other things to attend to, both within and outside of volunteer activities. EDIT: From reading the following link for the PyEnchant project, it sounds like the compiling task is very challenging and no one has solved it since it was raised in June of 2014. :-( |
Yeah, I looked into compiling PyEnchant for Windows 64 bit, but I dropped the idea pretty quickly because I didn't want to try and compile stuff for windows (I only develop under Linux usually, and while I use manuskript on my windows machine, there is no compilation or complex devenv setup involved, so I'm ok with that) and because it didn't seem to be a trivial task anyway. Also, like I said, PyEnchant is not maintained anymore, its last commit is from over a year ago and it added the "this project is unmaintained" notice to its README. I liked having the option of using something other enchant and I'm happy with the symspellpy implementation I added recently. |
I understand. I only develop on Linux now too. It's been well over a decade since I last developed on Windows. |
This modifies the Spellchecker abstraction to add a new dictionary support, with support for pyspellchecker. It also changes the main UI so that multiple libraries can be supported and dictionaries provided to the user. The custom dictionary of pyspellchecker has to be handled manually, and the performance and words of this library isn't on par with PyEnchant, but at least it works with 64 bits. Fixes olivierkes#505
This modifies the Spellchecker abstraction to add a new dictionary support, with support for pyspellchecker. It also changes the main UI so that multiple libraries can be supported and dictionaries provided to the user. The custom dictionary of pyspellchecker has to be handled manually, and the performance and words of this library isn't on par with PyEnchant, but at least it works with 64 bits. Fixes olivierkes#505
I'm using manuskript on Windows with python 64 bits and there is no pyenchant support for 64 bits unfortunately. Also it appears pyenchant is not maintained anymore and even the link that manuskript opens when enchant isn't installed doesn't work.
My solution is to add support for an alternative spellchecker, and a simple "python spellchecker" search suggests pyspellchecker. I've started to add support for it and it seems to work fine so far. My patches are not yet ready for review/push, but I'm creating this issue to discuss the feature and my proposed changes before I'm done with it.
First, I wanted to keep pyenchant support, so I'm not replacing it but just adding support for something else.
If nothing is installed, the menu will suggest installing either pyenchant or pyspellchecker and clicking on it would open both links.
If both are available, then the Tools->Dictionary would show dictionaries from both libraries so the user can make their choice of library to use :
If only one is available, mention that another choice is available to the user :
So first question: Does that UI look good or you prefer it to be done differently?
I think that's good, but I have one issue: It only works as is because the dictionary names are different from pyspellchecker and pyenchant, I don't know how it would work if pyspellchecker suddenly changed dictionary names to look like pyenchant.
So I'm wondering if I should store the library name as part of the 'dict' setting, or not?
I'm not entirely sure, but I think I can store the library name in the QAction without it appearing as text to the user, I'd have to check, but if not, I don't want to have the library name on every entry...
Right now, the code handles both enchant and pyspellchecker, but I think it would make a lot more sense to create an abstraction layer, a Spellchecker class that does that for us, so the code remains simple within the text editor, highlighter, main window, context menu, etc...
Once I write the abstraction class, it would be easy to add support for other spellchecker libraries, so another question: Are you OK with adding support for pyspellchecker or do you have another library to suggest?
From my initial tests, pyspellchecker seems pretty good, it's rather slow to load the dictionary though but that's only because each class will load the dict, while pyenchant will share the same dict among all instances for the same language (note: this shouldn't be a problem once I add the abstraction class, since it would create a single dictionary to share among all text editors). It also offers less suggestions for corrections but that's because it gives you only 2 letter changes/permutations as maximum, while pyenchant does a lot more.
Final issue I can foresee is the custom dictionary. Enchant will automatically add new words to its own files, pyspellchecker does not, while we can add/remove words from its dictionary, we'd need to keep a separate file to list custom words from the user and pass the file again when we load, which is fine I think, but I'm wondering Should the custom dictionary be shared between pyenchant and pyspellchecker? If it doesn't need to, then it's great, but if it does, we have no way of getting the custom dictionary from pyenchant, unless we poke in the data directory manually, but I also don't think it would be good to share a custom dictionary between technically different languages (switch from english in enchant to french in pyspellchecker shouldn't carry the same words...). Secondary question, should the custom dictionaries be stored in manuskript data directory or in the project directory?
Final question: Anything else? Requests/suggestions/comments/questions of your own?
I should have the feature done within the next couple of days I expect, unless I'm distracted by something. If I don't get answers/suggestions quickly enough, we can always discuss things in the PR itself once I submit it.
The text was updated successfully, but these errors were encountered: