-
Notifications
You must be signed in to change notification settings - Fork 301
Description
Is your feature request related to a problem? Please describe.
SentencePieceTokenizer
, which is a wrapper for tf_text.SentencepieceTokenizer
, currently does not expose all the internal parameters that may be specified in tf_text.SentencepieceTokenizer.__init__()
.
The parameters I am particularly interested in are add_bos
and add_eos
; in the current state, users must explicitly add the token ids for '<s>'
and '</s>'
(which default to 1 and 2) to the result of SentencePieceTokenizer.tokenize()
.
Describe the solution you'd like
Add add_bos=False, add_eos=False
to SentencePieceTokenizer.__init__()
, save them in self.add_bos
and self.add_eos
, and use them in set_proto()
when initializing tf_text.SentencepieceTokenizer
.
Describe alternatives you've considered
It's always possible to write a custom Tokenizer
by subclassing, but the change seemed trivial enough.