-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
transformers.js version 2.14.2.
Environment/Platform
- Website/web-app
Description
When I try to use tokenizers from https://huggingface.co/Qwen/Qwen1.5-14B-Chat with const t = new PreTrainedTokenizer(tok.tokenizer, tok.config).encode(text), I get the following error:
Uncaught (in promise) Error: SyntaxError: Invalid regular expression: /(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+/gu: Invalid group
I can't get anything more specific out from Chrome's debugger than that it happens with the PreTrainedTokenizer call.
All other tokenizers, like Mistral, Llama, etc. work perfectly, so I'm thinking that this must be a some sort of compatibility bug with Qwen.
Reproduction
- Download https://huggingface.co/Qwen/Qwen1.5-14B-Chat/blob/main/tokenizer.json and https://huggingface.co/Qwen/Qwen1.5-14B-Chat/blob/main/tokenizer_config.json
- Execute
const t = new PreTrainedTokenizer(tokenizer, config).encode('foobar')
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working