You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Initialize the modelmodel=germansentiment.SentimentModel()
# Dummy input that matches the input dimensions of the modeldummy_input=torch.randint(0, 30_000, (1, 512), dtype=torch.long)
# Export to ONNXtorch.onnx.export(model.model, dummy_input, "german_sentiment_model.onnx")
# Export the vocabwithopen('vocab.json', 'w') asf:
json.dump(model.tokenizer.vocab, f)
is this correct
why do some keys in the vocab.json start with ##
why are keys are named ["unused{x}"]
why does the prediction not scale 0 to 1, but are signed floats
why do some strings not work in my version not work. The string "Ein scheiß Film" works on hugging face but not in the export.
Why are some keys in capital letters. While the text is always converted to lower?
about 4) I currently scale the prediction as follows:
Hi @sehHeiden
this is an interesting question. The problem is the tokenization. The process is a bit more complex than splitting the words. Longer and compound words get split up into individual tokens, it works a bit like a simple compression algorithm. The huggingface team has a library for all the different tokenizer. To make it work, you would need to implement the BertTokenizer in Elixir or build a wrapper for the compiled Rust tokenizers from this lib.
Or you use a tool to run the original python code in Elixir, something like this.
Could I add some ONNX export version?
My current attempt is:
Than I used the model in Elixir:
But I still have some problems/questions:
about 4) I currently scale the prediction as follows:
about 5) In my version above keys that not matched return nil. I changed that to 0. But that changes to meaning of the sentence.
I opened a question in the Elixir Forum about it here.
The text was updated successfully, but these errors were encountered: