New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nlp inference and on-device tokenization #17
Conversation
Coverage reportThe coverage rate went from
Diff Coverage details (click to unfold)nengo_edge/network_runner.py
nengo_edge/version.py
nengo_edge/saved_model_runner.py
nengo_edge/device_modules/network_tokenizer.py
|
55eeb88
to
0386f0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just minor comments.
nengo_edge/saved_model_runner.py
Outdated
# Process string inputs | ||
assert self.model_params["type"] == "nlp" | ||
assert self.tokenizer is not None | ||
return self._run_model(self.tokenizer.tokenize(inputs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work if we did inputs = self.tokenizer.tokenizer(inputs).numpy()
? It'd be nice to share the _run_model
call if possible, just to have as much of the pipeline shared as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do inputs.lower()
here too, since that's what we're currently assuming in the tokenizer training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it where the pipeline was shared at one point, but I'll have to test it out again because I'm pretty sure something wasn't working.
Also, I have adjusted our model/tokenizer training pipeline to more closely match how we are doing it elsewhere (in past projects). We can discuss that further, as I'm not sure which is more correct or accurate for the vocabulary sizes we are considering.
] | ||
|
||
ragged_in0 = [inputs[0]] # type: ignore | ||
ragged_in1 = [inputs[1]] # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was it complaining about with these type errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was because ragged_in0/1 is already defined in the asr block above, so it was a re-assignment problem if I recall.
Parameters | ||
---------- | ||
input_text: str | ||
Input string to be tokenized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
detokenize
takes a batch of tokens, but tokenize
just takes a single string. We should probably make them both the same, for consistency. I think we could basically rename decode_ids
to detokenize
(supporting non-batched inputs), and then the batch loop could go in .run
(same as tokenize
). So in both cases tokenize/detokenize
are for single inputs, and run
handles the batching.
nengo_edge/saved_model_runner.py
Outdated
# Detokenize asr outputs using greedy decoding | ||
# Detokenize asr outputs by applying greedy decoding, removing | ||
# blank tokens and merging repeats before feeding values into the | ||
# sentencepiece tokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you remember why we didn't use keras.backend.ctc_decode
here? Seems like we could here, and it'd take care of this for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's fair, I just forgot about that function when I was making this fixup. I think we also initially forgot to include it in this step and our original test didn't include blanks/repeats (or at least it didn't impact the outcome of the test if that were to occur).
) | ||
output = subprocess.run( | ||
cmd.split(), | ||
input=input_text, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should do an input_text.lower()
here too.
2b84ee7
to
6760b0f
Compare
7f19807
to
ebb9a68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine not to include these changes if they don't seem that important. Looks good overall though.
outputs = np.array(_outputs) | ||
# Apply CTC decoding | ||
outputs = tf.keras.backend.ctc_decode( | ||
tf.nn.softmax(ragged.to_masked(outputs)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we actually need the softmax here, despite the model outputting logits. The internal argmax
during Keras' ctc_decode
should work the same with or without it. It's probably fine to leave it though, not a big difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though that too, but for some reason it doesn't. If you just pass in the logits you get a very different and incorrect decoding (this is the bug that Lisa was running into a while back). I never really investigated why though, my guess is that somewhere in there they're assuming these sum to 1 or something like that.
@@ -73,8 +73,8 @@ def test_runner_ragged( | |||
_, tokenizer_path = new_tokenizer(tmp_path) | |||
|
|||
if model_type == "asr": | |||
assert isinstance(pipeline.post[0], TokenizerDesc) | |||
pipeline.post[0].tokenizer_file = tokenizer_path | |||
assert isinstance(pipeline.post[1], TokenizerDesc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reference these as -1
indexing, since we know the tokenizer will always be at the end of a pipeline? Just so we avoid having to open a PR to change the indexing if the other layers change.
This PR primarily adds support in the
SavedModelRunner
for text/string based inputs during NLP model inference. A new class was also added todevice_modules
namedNetworkTokenizer
, which allows users to make network calls to a device with the SentencePiece CLI installed.