Name		Name	Last commit message	Last commit date
parent directory ..
img		img
README.md		README.md
response_knowledge_distillation.html		response_knowledge_distillation.html
response_knowledge_distillation.ipynb		response_knowledge_distillation.ipynb
text_classification_onnxruntime.html		text_classification_onnxruntime.html
text_classification_onnxruntime.ipynb		text_classification_onnxruntime.ipynb

README.md

Speeding Up Transformers Inferencing

Transfer learning on large scale pre-trained models based on transformation architecture on downstream applications have been gaining a lot of popularity due to its promising gains on model performance. However, in real world applications, often times, we have other constraints around latency, throughout as well as memory. This folder show cases several techniques that are commonly used to speed up transformers model's inferencing while still retaining the majority of its performance.

Response Knowledge Distillation for Training Student Model. [nbviewer][html]
Finetuning Pre-trained BERT Model on Text Classification Task And Inferencing with ONNX Runtime. [nbviewer][html]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transformers

transformers

README.md

Speeding Up Transformers Inferencing

Reference

Files

transformers

Directory actions

More options

Directory actions

More options

Latest commit

History

transformers

Folders and files

parent directory

README.md

Speeding Up Transformers Inferencing

Reference