Optimized inference pipeline for Nano

1. Current status

    <table>
    <tr>
	    <td>
	    <td>FP32
	    <td>BF16
	    <td>IN8
    <tr>
	    <td>PyTorch
	    <td>Y
	    <td>N
	    <td>Y
    <tr>
	    <td>ONNX
	    <td>Y
	    <td>N
	    <td>Y
    <tr>
	    <td>OpenVINO
	    <td>Y
	    <td>N
	    <td>N
    </table>

    - `Trainer.compile(…, onnx=T/F, quantize=T/F, openvino=T/F)` - bind relevant methods/variables
    - `Trainer.quantize(…)` - generate quantized model (PyTorch/ONNX) 
    - `Model.eval(quantize=T/F)` - forward using (quantized) PyTorch model
    - `Model.eval_onnx(quantize=T/F)/eval_openvino()/exit_onnx()/exit_openvino()` - forward using (quantized) ONNX/OpenVINO model

3. Desired status
    - Support all combinations of the above table
    - **Compile**: `Trainer.compile()` – just bind all methods/variables? 
    - **Quantize**: `Trainer.quantize(precision=…, accelerator=…)` 
    - **Forward**: `model.eval(precision=…, accelerator=…)`? – need to call `quantize()` first? 
    - **Export/save**: `Trainer.openvino.export(precision=…)`? – how about onnx/quantized? need to be consistent 
    - **Load**: `model.load()/model.load_quantized_state_dict()`??? - need to have consistent APIs
    - **Status**: `model.eval_status()`? – every model should maintain current/default mode, and report here?
    - What's the interactions of there methods? Any other methods needed?
 
@TheaperDeng @zhentaocc @yangw1234 @shane-huang 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimized inference pipeline for Nano #4360

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	FP32	BF16	IN8
PyTorch	Y	N	Y
ONNX	Y	N	Y
OpenVINO	Y	N	N

Optimized inference pipeline for Nano #4360

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions