I tried using FP16 and FP32 to store the model separately. The model obtained from FP16 resulted in greater memory consumption (4GB) and slower speed (3token/s) when running with onnxruntime, but using FP32 occupied (3GB) and ran at a faster speed (10token/s). Preliminary assessment suggests that the hardware may not support the FP16 type, resulting in a conversion from FP16 to FP32. How can I determine if my hardware supports types such as FP16, INT8, and INT4