-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does not work: Dump + Encode with frequency + Dump again #34
Comments
Hi,
Taking the polish dict from the hunspell folder I can dump it. But I'm not sure if everything is OK. |
Jaume, I tried to dump the dictionary from the current folder. Then the error will appear. I simply wanted to see if it was encoded properly (because there is an encoding-related bug I discovered: I don't think hardcoding the folder helps, and -x should work for frequency dictionaries. Otherwise, we cannot say we supply the source, which violates Debian principles - this is why we have documented all decoding procedures so that one could get the original sources. This means, however, that the decoding procedure has to produce readable frequency files, I'm afraid. See also morfologik/morfologik-stemming#15 |
Also see morfologik/morfologik-stemming#35 |
So I understand that the problem is that we add the |
@milekpl Could you maybe help with this, i.e. reply to my question above from 2014-09-24? |
@danielnaber: it won't help. The encoder will be set but frequency dictionaries have more data. These data are not dumped properly. I tried to persuade Jaume to add code to dump frequency data but this is not a trivial thing to do, as the source format is XML. |
[ga] replace some generated examples with genuine ones from gaois.ie
Jaume, I dumped the Polish dictionary, used the frequency list to encode it. But then I cannot dump the dictionary again as there is an error:
The text was updated successfully, but these errors were encountered: