You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here, max_length=128 (125=max_length-3 special tokens)
When len(text2)<125, no problems happens whatever the length of text1. The normal encoding should be like [2, (part of) token indexes of text1 ..., 0, (part of) token indexes of text2..., 0] (0 for [SEP], 2 for [CLS]).
When len(text2)>=125,
len(text1)<125, every case also work well.
len(text1)=125, the error of Rust happens.
thread '' panicked at 'assertion failed: stride < max_len', /__w/tokenizers/tokenizers/tokenizers/src/tokenizer/encoding.rs:109:9
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
Aborted (core dumped)
len(text1)>125, the length of encodings are out of 'max_length', maybe some asserts required here.
It maybe meaningless for the last case, so I ignored it at first. But when I found the fatal runtime error, I think this should be fixed just in case.
The text was updated successfully, but these errors were encountered:
Hi @wxupjack and thank you for your report. There is indeed a bug in the LongestFirst strategy (the default truncation strategy) that makes it truncate only the longest, not both. So when it is expected to truncate one of them entirely, it crashes.
Fixed with this commit: f8f0702. Will be part of the next release!
The simple code is here for reproducing:
Here, max_length=128 (125=max_length-3 special tokens)
len(text2)<125
, no problems happens whatever the length of text1. The normal encoding should be like[2, (part of) token indexes of text1 ..., 0, (part of) token indexes of text2..., 0]
(0 for [SEP], 2 for [CLS]).len(text2)>=125
,len(text1)<125
, every case also work well.len(text1)=125
, the error of Rust happens.len(text1)>125
, the length of encodings are out of 'max_length', maybe some asserts required here.It maybe meaningless for the last case, so I ignored it at first. But when I found the fatal runtime error, I think this should be fixed just in case.
The text was updated successfully, but these errors were encountered: