You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The default value of the variable self.norm is 2.4 (used here). Why is that the case? More generally, what is the purpose of grid search over the values described by maxshrink and grid? I could not find anything related to it in the paper.
The text was updated successfully, but these errors were encountered:
Hi, this part of the code and its hyper-parameters are mostly adapted from BRECQ. The purpose of this is to find a narrower quantization grid (relative to the min-max which includes all outliers) by minimizing the p-norm. Setting norm = 2.4 gives slightly more weight to outliers in this search than for norm = 2.
Ok thank you. Another question - where exactly is the quantization happening?
The quantize function is simply a chain of affine tranform, clamping and reverse affine transform. It does not reduce the number of bits to store the weights.
The bits variable is simply used to calculate maxq which is used to calculate the scale.
I am unable to find where a fp32 value is stored in a bits number of bits.
Thank you for this work!
The default value of the variable
self.norm
is2.4
(used here). Why is that the case? More generally, what is the purpose of grid search over the values described bymaxshrink
andgrid
? I could not find anything related to it in the paper.The text was updated successfully, but these errors were encountered: