Issues with error bounds #7

mrmbernardi · 2023-11-16T13:39:47Z

https://github.com/shubhamchandak94/LFZip/blob/14b2e9f79b1d9bc3f60d5cb69e20eb1529c709b4/src/nlms_compress.py#L85C18-L85C18

The value np.finfo(np.float32).resolution is used to reduce the error bound, but it's not fit for this purpose. The value just represents the approximate spacing between floating point values above and around 1 to the nearest power of 10. It is incorrect for magnitudes away from 1.

If we take an error bound of 10 and subtract this value, the resulting new error bound when interpreted as float32 is the same as the original. Conversely, if we were to take a smaller error bound the new error bound would be multiple steps lower than the next smallest bound. The correct function to use here would be np.spacing:

https://numpy.org/doc/stable/reference/generated/numpy.spacing.html

This returns the precise spacing between a value and the next one away from zero (despite what the documentation says).

The text was updated successfully, but these errors were encountered:

shubhamchandak94 · 2023-11-18T16:28:59Z

We do check later if the precision issue exists between original and reconstruction, and if so we write the original stream data directly to avoid breaking the contract. You can see that in both the C++ and the Python versions.

With that in mind, I think it's fine from a correctness perspective even if the resolution function gives 0. Do you believe there is a specific case where this would lead to issues?

mrmbernardi · 2023-11-18T17:04:24Z

One problem here is that you arbitrarily limit the error bound to be greater than 1e-6. This is very large considering that the smallest normal float32 is 1.1754944e-38. If I'm not mistaken, ~40% of the representable float32 values have smaller magnitudes than 1e-6.

There's also the fact that if you set an error bound very close to but greater than 1e-6, then the adjusted value will be very small, even subnormal or zero, and this would make compression fail with all values being written to the outlier stream.

This could be solved by using np.spacing to make the adjustment and then ensuring that the adjusted value is still significantly large (e.g. still a normal value).

Alternatively, you could simply not make any adjustment at all. As you point out, you're already checking the reconstructed values against the original error bound, so you could do away with the adjustment and it would not break the contract

shubhamchandak94 · 2023-11-19T09:26:36Z

Got it, thanks for pointing this out and explaining the considerations. I will get back to fixing it in a month or so since I'll need to run some experiments to make sure this does give benefits in some scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with error bounds #7

Issues with error bounds #7

mrmbernardi commented Nov 16, 2023 •

edited

Loading

shubhamchandak94 commented Nov 18, 2023

mrmbernardi commented Nov 18, 2023

shubhamchandak94 commented Nov 19, 2023

Issues with error bounds #7

Issues with error bounds #7

Comments

mrmbernardi commented Nov 16, 2023 • edited Loading

shubhamchandak94 commented Nov 18, 2023

mrmbernardi commented Nov 18, 2023

shubhamchandak94 commented Nov 19, 2023

mrmbernardi commented Nov 16, 2023 •

edited

Loading