-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix encoding to render Arabic text correctly. #29
Conversation
It looks like you have just changed the default argument value to
and
instead? If so, could please send me the corpus used so I could debug the issue? |
I tried a couple of options to enforce the encoding, but nothing worked. Only when I changed the encoding in symspellpy.py file, it worked. Attached is a sample file to test. Thanks for your support. Please feel free to reject this pr if you wish. One last question: |
I think this could be a problem with how the terminal displays Arabic text instead. Due to the special features of the Arabic script mentioned here. The terminal doesn't display it properly while the text file displays it properly. This script demonstrates that import os
from symspellpy.symspellpy import SymSpell, Verbosity # import the module
def main():
sym_spell = SymSpell(83000, 2, 7)
# load dictionary
corpus_path = os.path.join(os.path.dirname(__file__),
"sample_text_ar.txt")
dict_path = os.path.join(os.path.dirname(__file__), "dict.txt")
corrected_path = os.path.join(os.path.dirname(__file__), "corrected.txt")
with open(corpus_path, encoding="utf-8-sig") as infile:
for line in infile:
print(line) Last portion of the if not sym_spell.create_dictionary(corpus_path, encoding="utf-8-sig"):
print("Corpus file not found")
return
with open(dict_path, "w", encoding="utf-8") as outfile:
for key, count in sym_spell.words.items():
print("{} {}".format(key, count))
outfile.write("{} {}\n".format(key, count)) The console output is results = sym_spell.lookup("كفية", Verbosity.TOP)
with open(corrected_path, "w", encoding="utf-8") as outfile:
for result in results:
print(result)
outfile.write(str(result)) Finally, we can correct a misspelled word from if __name__ == "__main__":
main() So I think you could look into an alternative terminal which supports displaying Arabic scripts, or print all your outputs to a text file for debugging. |
@mammothb Thank you so much for taking the time. Glad that it looks OK. One more thing: I was not using a terminal. I modified the create dictionary script to print out to a file. Before doing that, I was using Pycharm, and the Arabic text was garbled still in Pycharm output window. The code I used was this:
I will use your code above and test it. Thanks a million for your support and for providing such an amazing port of Symspell. Thank you! |
Thanks @mammothb ! I tested the code and it works fine at my end now with Python 3.5.2. I will close this issue and the other github issue. Thanks again for your support. |
Referencing this issue https://github.com/mammothb/symspellpy/issues/28, the Arabic rendering issue was fixed by changing these 2 lines in symspellpy.py file.
I hope this helps.