-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with error bounds #7
Comments
We do check later if the precision issue exists between original and reconstruction, and if so we write the original stream data directly to avoid breaking the contract. You can see that in both the C++ and the Python versions. With that in mind, I think it's fine from a correctness perspective even if the resolution function gives 0. Do you believe there is a specific case where this would lead to issues? |
One problem here is that you arbitrarily limit the error bound to be greater than 1e-6. This is very large considering that the smallest normal float32 is 1.1754944e-38. If I'm not mistaken, ~40% of the representable float32 values have smaller magnitudes than 1e-6. There's also the fact that if you set an error bound very close to but greater than 1e-6, then the adjusted value will be very small, even subnormal or zero, and this would make compression fail with all values being written to the outlier stream. This could be solved by using np.spacing to make the adjustment and then ensuring that the adjusted value is still significantly large (e.g. still a normal value). Alternatively, you could simply not make any adjustment at all. As you point out, you're already checking the reconstructed values against the original error bound, so you could do away with the adjustment and it would not break the contract |
Got it, thanks for pointing this out and explaining the considerations. I will get back to fixing it in a month or so since I'll need to run some experiments to make sure this does give benefits in some scenarios. |
https://github.com/shubhamchandak94/LFZip/blob/14b2e9f79b1d9bc3f60d5cb69e20eb1529c709b4/src/nlms_compress.py#L85C18-L85C18
The value
np.finfo(np.float32).resolution
is used to reduce the error bound, but it's not fit for this purpose. The value just represents the approximate spacing between floating point values above and around 1 to the nearest power of 10. It is incorrect for magnitudes away from 1.If we take an error bound of 10 and subtract this value, the resulting new error bound when interpreted as float32 is the same as the original. Conversely, if we were to take a smaller error bound the new error bound would be multiple steps lower than the next smallest bound. The correct function to use here would be np.spacing:
https://numpy.org/doc/stable/reference/generated/numpy.spacing.html
This returns the precise spacing between a value and the next one away from zero (despite what the documentation says).
The text was updated successfully, but these errors were encountered: