Skip to content

Optimized inference pipeline for Nano #4360

@jason-dai

Description

@jason-dai
  1. Current status

    FP32 BF16 IN8
    PyTorch Y N Y
    ONNX Y N Y
    OpenVINO Y N N
    • Trainer.compile(…, onnx=T/F, quantize=T/F, openvino=T/F) - bind relevant methods/variables
    • Trainer.quantize(…) - generate quantized model (PyTorch/ONNX)
    • Model.eval(quantize=T/F) - forward using (quantized) PyTorch model
    • Model.eval_onnx(quantize=T/F)/eval_openvino()/exit_onnx()/exit_openvino() - forward using (quantized) ONNX/OpenVINO model
  2. Desired status

    • Support all combinations of the above table
    • Compile: Trainer.compile() – just bind all methods/variables?
    • Quantize: Trainer.quantize(precision=…, accelerator=…)
    • Forward: model.eval(precision=…, accelerator=…)? – need to call quantize() first?
    • Export/save: Trainer.openvino.export(precision=…)? – how about onnx/quantized? need to be consistent
    • Load: model.load()/model.load_quantized_state_dict()??? - need to have consistent APIs
    • Status: model.eval_status()? – every model should maintain current/default mode, and report here?
    • What's the interactions of there methods? Any other methods needed?

@TheaperDeng @zhentaocc @yangw1234 @shane-huang

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions