Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bit packing and quantize on big endian. #293

Merged
merged 1 commit into from
Aug 29, 2020
Merged

Fix bit packing and quantize on big endian. #293

merged 1 commit into from
Aug 29, 2020

Conversation

wengxt
Copy link
Contributor

@wengxt wengxt commented Aug 28, 2020

  1. BitPackShift() doesn't return the right value on big endian for
    {Read,Write}Int25. Consecutive25 would just fail on big endian.
    The fix is to round it to 32 instead 64.

  2. Using a WriteInt57 and Two ReadInt25 to read the combined integer value
    won't simply work on big endian. The fix just simply replace the
    WriteInt57 with two ReadInt25 for MiddlePointer.

1. BitPackShift() doesn't return the right value on big endian for
   {Read,Write}Int25. Consecutive25 would just fail on big endian.
   The fix is to round it to 32 instead 64.

2. Using a WriteInt57 and Two ReadInt25 to read the combined integer value
   won't simply work on big endian. The fix just simply replace the
   WriteInt57 with two ReadInt25 for MiddlePointer.
@wengxt
Copy link
Contributor Author

wengxt commented Aug 28, 2020

Some more context:
I use kenlm in https://github.com/fcitx/libime and some fedora packager notifies me about the issue on s390x cpu, so I tried to figure out what's going wrong there.

My test is done on a s390x platform provided by https://linuxone.cloud.marist.edu/.
This is a site that provides a free-120 day trail for s390x vps, just in case if you want to test the change.

@kpu
Copy link
Owner

kpu commented Aug 28, 2020

Wow I had to admit testing on big endian hasn't actually happened.

@kpu
Copy link
Owner

kpu commented Aug 29, 2020

With a tweak, we can avoid two memory accesses in the write.

        void Write(float prob, float backoff) const {
          uint64_t prob_encoded = (ProbBins().EncodeProb(prob));
          uint64_t backoff_encoded = BackoffBins().EncodeBackoff(backoff);
#if BYTE_ORDER == LITTLE_ENDIAN
          prob_encoded
#elif BYTE_ORDER == BIG_ENDIAN
          backoff_encoded
#endif
                  <<= BackoffBins().Bits();
          util::WriteInt57(address_.base, address_.offset, ProbBins().Bits() + BackoffBins().Bits(),
                           prob_encoded | backoff_encoded);
        }

@kpu kpu merged commit c63e9d9 into kpu:master Aug 29, 2020
@kpu
Copy link
Owner

kpu commented Aug 29, 2020

Thanks, I've merged this and also fixed a test failure due to merge_test. There's still some endianness assumptions in https://github.com/kpu/kenlm/blob/master/lm/interpolate/bounded_sequence_encoding.hh which causes test failures but almost nobody uses interpolation.

@kpu
Copy link
Owner

kpu commented Aug 29, 2020

Interpolation should be fixed too now, thanks for stopping by!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants