You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
UnicodeEncodeError: 'charmap' codec can't encode characters in position 31758-31761: character maps to <undefined> - when fetching the DBPedia_IT dataset
#57
Closed
ERijck opened this issue
Mar 16, 2022
· 2 comments
@ERijck I faced the similar error while using a function preprocessor.preprocess_dataset() that comes with OCTIS. I found out that my dataset has some unicode characters and emojis.
The error you are facing comes from line-number-(77) in downloader.py where OCTIS is trying to create relevant files of dataset. Link here:
You can try to fork the repository and change these lines until OCTIS provides Unicode support for dataset.
You can edit the the files present in the environment (Not recommended).
NOTE:
If you endup doing this modification by yourself before OCTIS, you also need to change some appropriate functions which are reading the "Dataset" files before you start training model.
Description
I am trying to fetch the DBPedia_IT dataset. I expected nothing to happen, but an UnicodeEncodeError was raised.
What I Did
The text was updated successfully, but these errors were encountered: