You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi,i appreciate the code you provided,but could you tell me what the method you did to show the model size after the quantification?i could't find the corresponding part in your code,thanks
The text was updated successfully, but these errors were encountered:
The quantization that I used is categorized under fake quantization. Meaning that I store 2/3/4 bit tensors into 32-bit float torch tensors. Hence, there is no real way to store the model after quantization. What I and many people do in such scenario is to guesstimate which more or less is accurate IF sub-eight bit quantization was fully supported by ML frameworks.
hi,i appreciate the code you provided,but could you tell me what the method you did to show the model size after the quantification?i could't find the corresponding part in your code,thanks
The text was updated successfully, but these errors were encountered: