-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error running ./WiktionarySplitter.sh #81
Comments
Best I can tell from the messages, the XML file is not valid UTF-8. Maybe a newer/different version of xerces can help making it less picky, but I doubt it. |
Do I have to run ./WiktionarySplitter.sh if I use my own DE-EN.txt file or can I generate it directly ? I found test files in DictionaryPC which I want to try... |
You only need WiktionarySplitter (and even the download scripts for downloading wiktionary data) only if you actually want to use the data from Wiktionary. So I guess the answer should be "no". |
I'm getting a similar issue:
|
I guess the best you can do is to fix this encoding: |
Really, really short answer: Wiktionary really ought to run XML validation on their data, in which case they would catch and fix this themselves instead of us having to deal with bad data... |
This time I got this error at random times when running dictionary generation multiple times. |
I think it might be fixed actually... I've run it quite a few times and not seen this anymore. |
I get always this error when I try to run ./WiktionarySplitter.sh. What can I do to avoid this ? I use a debian 9 system.
The text was updated successfully, but these errors were encountered: