You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After export a PyTorch model, run optimizer to get an optimized model (), then run it on RTX8000, was unable to see if any performance gain.
The optimizing code is as below:
optimizer = ORTOptimizer.from_pretrained(model)
optimization_config = AutoOptimizationConfig.O4(disable_shape_inference=True)
optimizer.optimize(save_dir=onnx_optimized_output_path, optimization_config=optimization_config)
The code to run inference:
ort_session = ort.InferenceSession(onnx_optimized_model_path, providers='CUDAExecutionProvider')
outputs = self._inference_session.run(["logits"], {"input_ids": input_ids, "attention_mask": attention_mask})
Anyone can share some insights about what's going wrong? Is that specific GPU card required for this optimization? Thanks.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
After export a PyTorch model, run optimizer to get an optimized model (), then run it on RTX8000, was unable to see if any performance gain.
The optimizing code is as below:
optimizer = ORTOptimizer.from_pretrained(model)
optimization_config = AutoOptimizationConfig.O4(disable_shape_inference=True)
optimizer.optimize(save_dir=onnx_optimized_output_path, optimization_config=optimization_config)
The code to run inference:
ort_session = ort.InferenceSession(onnx_optimized_model_path, providers='CUDAExecutionProvider')
outputs = self._inference_session.run(["logits"], {"input_ids": input_ids, "attention_mask": attention_mask})
Anyone can share some insights about what's going wrong? Is that specific GPU card required for this optimization? Thanks.
Beta Was this translation helpful? Give feedback.
All reactions