You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cloning https://huggingface.co/Bingsu/vitstr_small-korean into local empty directory.
WARNING:huggingface_hub.repository:Cloning https://huggingface.co/Bingsu/vitstr_small-korean into local empty directory.
Pulling changes ...
WARNING:huggingface_hub.repository:Pulling changes ...
Upload file pytorch_model.bin: 87%|███████████████████████████████████████▉ | 71.1M/81.9M [00:06<00:00, 14.8MB/s]remote: Scanning LFS files for validity, may be slow...
remote: LFS file scan complete.
To https://huggingface.co/Bingsu/vitstr_small-korean
5aefb96..101a2bf main -> main
WARNING:huggingface_hub.repository:remote: Scanning LFS files for validity, may be slow...
remote: LFS file scan complete.
To https://huggingface.co/Bingsu/vitstr_small-korean
5aefb96..101a2bf main -> main
Upload file pytorch_model.bin: 100%|██████████████████████████████████████████████| 81.9M/81.9M [00:09<00:00, 9.45MB/s]
Traceback (most recent call last):
File "C:\Users\smartmind\Desktop\workspace\test\train_ocr\train_pytorch.py", line 468, in<module>
main(args)
File "C:\Users\smartmind\Desktop\workspace\test\train_ocr\train_pytorch.py", line 405, in main
push_to_hf_hub(model, exp_name, task="recognition", run_config=args)
File "C:\Users\smartmind\miniconda3\envs\ocr\lib\site-packages\doctr\models\factory\hub.py", line 179, in push_to_hf_hub
_save_model_and_config_for_hf_hub(model, repo.local_dir, arch=arch, task=task)
File "C:\Users\smartmind\miniconda3\envs\ocr\lib\site-packages\doctr\models\factory\hub.py", line 87, in _save_model_and_config_for_hf_hub
json.dump(model_config, f, indent=2, ensure_ascii=False)
File "C:\Users\smartmind\miniconda3\envs\ocr\lib\json\__init__.py", line 180, in dump
fp.write(chunk)
UnicodeEncodeError: 'cp949' codec can't encode character '\xa3' in position 98: illegal multibyte sequence
Environment
DocTR version: N/A
TensorFlow version: N/A
PyTorch version: 1.13.1 (torchvision 0.14.1)
OpenCV version: 4.7.0
OS: Microsoft Windows 11 Pro
Python version: 3.10.8
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): Yes
CUDA runtime version: 11.7.99
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
Deep Learning backend
is_tf_available: False
is_torch_available: True
The text was updated successfully, but these errors were encountered:
Bug description
I tried to train a Korean language recognition model with
doctr/references/recognition/train_pytorch.py
script, and I got encoding errors.first error is here.
doctr/doctr/datasets/recognition.py
Lines 39 to 40 in e66ce01
and second is here.
doctr/doctr/models/factory/hub.py
Lines 86 to 87 in e66ce01
I'm using Korean Windows 11, and the default encoding is 'cp949', so it is an error that could not read 'utf-8'.
Code snippet to reproduce the bug
It probably won't give an error on windows using utf-8.
dataset:
https://drive.google.com/file/d/1RN6pQAELWGYmwt1y6xnF6Xj0dO5RKU-Q/view?usp=share_link
(344k images, 1GB)
Error traceback
Environment
Deep Learning backend
The text was updated successfully, but these errors were encountered: