-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom tokenizer layer #75
Comments
@ptamas88 Did you succeed to make it works ? I have the same question |
yes, usually the tokenizer is not part of the graph. For this you'll need a tokenizer that has a TF implementation, like sentencepiece when using albert. For BERT you might try the tf.text BertTokenizer (https://github.com/tensorflow/text/blob/master/docs/api_docs/python/text/BertTokenizer.md) - I haven't used it myself, but it should work. |
hope that helps:
and then try something along those lines: import tensorflow_text as text
tokenizer = text.BertTokenizer(os.path.join(ckpt_dir, 'vocab.txt'))
tok_ids = tokenizer.tokenize(["hello, cruel world!", "abcccccccd"]).merge_dims(-2,-1).to_tensor(shape=(2, max_seq_len)) |
it didn't work, still throw OperatorNotAllowedInGraphError |
Hi,
I would like to incorporate the tokenization process into a model which is using bert layer.
Here is my custom layer:
And here is my code to test the custom layer within a dummy model:
I get the following traceback:
Can you lease help how to solve this issue?
I think the problem is that the tokenizer gets tensors not string and that is why it can't tokenize it.
But if that is the case how should I mkae this work?
Thanks
The text was updated successfully, but these errors were encountered: