-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug in algorithm #45
Comments
Hi There are many different pattern lists for German, indeed. This is mostly due to a very active community for building german patterns (http://projekte.dante.de/Trennmuster) updating the patterns every once a while. I rebuild the patterns based on the most recent wordlist from the Trennmuster-group and updated the de.hpb. Now "zweihenklig" and "zytosol" are hyphenated correctly. Pattern files are language-specific – it's ok that "indestructible" won't be hyphenated correctly by german patterns. I'm very interested in your word list to check the patterns. Best regards, |
Hi again, thanks for the update! Works better now.
I've simply used the dictionary file from LanguageTool and then diff'ed the result from Hypenopoly and Pyphen. While testing again i've noted a strange thing with nodejs the StringDecoder returns "95,97,98,99,100,101,102,103,104, ..." instead of "_abcdef...." which results in awkward word-splitting. Not sure if this a bug in nodejs or intended. I'am at v8.11. |
Sorry, I missed that part. |
I've stumbled over some strange/incorrect hyphens in some words.
To validate i tryed http://pyphen.org/ and compared on a large list of words. With a ton of differences. In this list I've found one wrong word (didn't look any further):
"zweihenklig" should be "zwei-henk-lig" but is "zwei-hen-klig"
It seems there are multiple pattern lists for german available therefore I've created a custom de.hpb with the patterns found in the MiKTeX Portable Package (6/30/2018) to fix this.
BUT: Then TeX and Hyphenopoly seem to disagree on other words (again i did not look further):
"zytosol" => "zyto-s-ol" in TeX: "zy-to-sol" (which is correct)
"indestructible" => "in-des-t-ruc-tible" in TeX: "in-de-struc-tible" (while not german this is almost correct)
Your de.hpd results in: "zy-to-sol" and "in-de-st-ruc-ti-ble"
Can you look into this?
I would like to avoid doing some ajax request to get this done with the Python solution.
I can provide a TeX test file and the custom de.hpb if you need it.
The text was updated successfully, but these errors were encountered: