-
Notifications
You must be signed in to change notification settings - Fork 751
Issues: huggingface/tokenizers
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Tokenizer.from_bytes() not available in python bindings
Feature Request
#1567
opened Jul 11, 2024 by
RamvigneshPasupathy
documentation of the
pattern parameter in pre_tokenizers.Split is incorrect
#1565
opened Jul 10, 2024 by
craigschmidt
Unable to sSet
use_regex=False in BPE decoder & post_processor?
#1563
opened Jul 8, 2024 by
jchwenger
[Bug?] Modifying normalizer for pretrained tokenizers don't consistently work
Stale
#1552
opened Jun 12, 2024 by
alvations
"Solution" to memory hogging in train_new_from_iterator with a hack
#1546
opened Jun 4, 2024 by
morphpiece
How can I get the mapping relationship between byte values and Unicode characters of the fast tokenizer?
#1545
opened Jun 4, 2024 by
LuoKaiGSW
How to allow the merging of consecutive newline tokens \n when training a byte-level bpe tokenizer?
#1534
opened May 18, 2024 by
liuslnlp
Link to download the training text in
docs/source/quicktour.rst is broken
#1526
opened May 9, 2024 by
14jdelap
UnigramTrainer: byte_fallback is false.
Feature Request
training
#1515
opened Apr 25, 2024 by
Moddus
BPE Trainer doesn't respect the
vocab_size parameter when dataset size is increased
Stale
#1514
opened Apr 25, 2024 by
Abhinay1997
Previous Next
ProTip!
Updated in the last three days: updated:>2024-07-12.