-
Notifications
You must be signed in to change notification settings - Fork 3.8k
onnxruntime_tools optimize a transformer model #5113
Copy link
Copy link
Closed
Labels
staleissues that have not been addressed in a while; categorized by a botissues that have not been addressed in a while; categorized by a bot
Description
Not a bug.
I have model like bert which have same moudle(embed/MultiHeadedAttention). But the forward process and input/output have a little difference. How can I optimize my model by onnxruntime_tools.
This is my try.
~/onnxruntime-master/onnxruntime/python/tools/transformers$ python3 optimizer.py --input model/encoder.onnx --output encoder_optim.onnx --model_type bert --num_heads 4 --hidden_size 320
apply: Fused LayerNormalization count: 12
apply: Fused SkipLayerNormalization count: 12
prune_graph: Graph pruned: 0 inputs, 0 outputs and 0 nodes are removed
apply: Fused SkipLayerNormalization(add bias) count: 12
optimize: opset verion: 11
save_model_to_file: Output model to encoder_optim.onnx
get_fused_operator_statistics: Optimized operators:{'EmbedLayerNormalization': 0, 'Attention': 0, 'Gelu': 0, 'FastGelu': 0, 'BiasGelu': 0, 'LayerNormalization': 0, 'SkipLayerNormalization': 12}
main: The model has been optimized.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
staleissues that have not been addressed in a while; categorized by a botissues that have not been addressed in a while; categorized by a bot