Fix bit packing and quantize on big endian. #293

wengxt · 2020-08-28T22:10:03Z

BitPackShift() doesn't return the right value on big endian for
{Read,Write}Int25. Consecutive25 would just fail on big endian.
The fix is to round it to 32 instead 64.
Using a WriteInt57 and Two ReadInt25 to read the combined integer value
won't simply work on big endian. The fix just simply replace the
WriteInt57 with two ReadInt25 for MiddlePointer.

1. BitPackShift() doesn't return the right value on big endian for {Read,Write}Int25. Consecutive25 would just fail on big endian. The fix is to round it to 32 instead 64. 2. Using a WriteInt57 and Two ReadInt25 to read the combined integer value won't simply work on big endian. The fix just simply replace the WriteInt57 with two ReadInt25 for MiddlePointer.

wengxt · 2020-08-28T22:35:53Z

Some more context:
I use kenlm in https://github.com/fcitx/libime and some fedora packager notifies me about the issue on s390x cpu, so I tried to figure out what's going wrong there.

My test is done on a s390x platform provided by https://linuxone.cloud.marist.edu/.
This is a site that provides a free-120 day trail for s390x vps, just in case if you want to test the change.

kpu · 2020-08-28T22:53:30Z

Wow I had to admit testing on big endian hasn't actually happened.

kpu · 2020-08-29T12:56:51Z

With a tweak, we can avoid two memory accesses in the write.

        void Write(float prob, float backoff) const {
          uint64_t prob_encoded = (ProbBins().EncodeProb(prob));
          uint64_t backoff_encoded = BackoffBins().EncodeBackoff(backoff);
#if BYTE_ORDER == LITTLE_ENDIAN
          prob_encoded
#elif BYTE_ORDER == BIG_ENDIAN
          backoff_encoded
#endif
                  <<= BackoffBins().Bits();
          util::WriteInt57(address_.base, address_.offset, ProbBins().Bits() + BackoffBins().Bits(),
                           prob_encoded | backoff_encoded);
        }

kpu · 2020-08-29T13:52:27Z

Thanks, I've merged this and also fixed a test failure due to merge_test. There's still some endianness assumptions in https://github.com/kpu/kenlm/blob/master/lm/interpolate/bounded_sequence_encoding.hh which causes test failures but almost nobody uses interpolation.

kpu · 2020-08-29T17:55:41Z

Interpolation should be fixed too now, thanks for stopping by!

kpu merged commit c63e9d9 into kpu:master Aug 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bit packing and quantize on big endian. #293

Fix bit packing and quantize on big endian. #293

wengxt commented Aug 28, 2020

wengxt commented Aug 28, 2020

kpu commented Aug 28, 2020

kpu commented Aug 29, 2020

kpu commented Aug 29, 2020

kpu commented Aug 29, 2020

Fix bit packing and quantize on big endian. #293

Fix bit packing and quantize on big endian. #293

Conversation

wengxt commented Aug 28, 2020

wengxt commented Aug 28, 2020

kpu commented Aug 28, 2020

kpu commented Aug 29, 2020

kpu commented Aug 29, 2020

kpu commented Aug 29, 2020