Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory error on program closing with C++ example #430

Closed
DanBmh opened this issue May 18, 2023 · 14 comments
Closed

Memory error on program closing with C++ example #430

DanBmh opened this issue May 18, 2023 · 14 comments

Comments

@DanBmh
Copy link

DanBmh commented May 18, 2023

Hi, I'm currently trying to use kenlm in a C++ program and started out with a minimal example (following the official example), but while the scoring seems to work, it can't be correctly ended. It always fails with:

/kenlm/util/mmap.cc:138 in void util::SyncOrThrow(void*, size_t) threw ErrnoException because `length && msync(start, length, 4)'.
Cannot allocate memory Failed to sync mmapAborted (core dumped)

My program looks as follows:

#include <iostream>
#include <string>
#include "lm/model.hh"

int main()
{
    using namespace lm::ngram;

    Model model("/kenlm/lm/test.arpa");
    std::vector<std::string> words = {"language", "modeling", "is", "fun"};

    State state(model.BeginSentenceState()), out_state;
    const Vocabulary &vocab = model.GetVocabulary();

    for (std::string word : words)
    {
        std::cout << word << " " << model.Score(state, vocab.Index(word), out_state) << '\n';
        state = out_state;
    }

    std::cout << "--finished--" << '\n';
    return 0;
}

I compiled it with:

g++ test_langmodel_minimal.cpp -Wall -DKENLM_MAX_ORDER=5 -I/kenlm/ -L/kenlm/build/lib/ -lkenlm -lkenlm_util -lz -llzma -lbz2 -o test_langmodel_minimal.exe
./test_langmodel_minimal.exe

Full output:

./test_langmodel_minimal.exe
Loading the LM will be faster if you build a binary file.
Reading /kenlm/lm/test.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
language -1.99564
modeling -1.99564
is -1.68787
fun -1.99564
--finished--
/kenlm/util/mmap.cc:138 in void util::SyncOrThrow(void*, size_t) threw ErrnoException because `length && msync(start, length, 4)'.
Cannot allocate memory Failed to sync mmapAborted (core dumped)

Any idea how to fix this?

@kpu
Copy link
Owner

kpu commented May 18, 2023

Platform? I see the .exe extension but also / in the paths. At the same time, this was compiled without defined(_WIN32) || defined(_WIN64) otherwise it would have called FlushViewOfFile

@DanBmh
Copy link
Author

DanBmh commented May 18, 2023

Sorry for the confusion, platform is Ubuntu 20.04 in docker, using the latest master branch.

FROM docker.io/ubuntu:20.04

ARG DEBIAN_FRONTEND=noninteractive
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8

RUN apt-get update && apt-get upgrade -y
RUN apt-get update && apt-get install -y --no-install-recommends wget curl nano git

# Install python
RUN apt-get update && apt-get install -y --no-install-recommends python3 python3-pip python3-dev
RUN pip3 install --upgrade --no-cache-dir pip
RUN python3 -V && pip3 --version

# Install swig
RUN apt-get update && apt-get install -y --no-install-recommends build-essential
RUN apt-get update && apt-get install -y --no-install-recommends swig

# Install kenlm
RUN apt-get update && apt-get install -y --no-install-recommends \
 cmake libboost-system-dev libboost-thread-dev libboost-program-options-dev \
 libboost-test-dev libeigen3-dev zlib1g-dev libbz2-dev liblzma-dev
RUN git clone --depth 1 https://github.com/kpu/kenlm.git
RUN cd /kenlm/; mkdir -p build/
RUN cd /kenlm/build/; cmake ..
RUN cd /kenlm/build/; make -j 4
RUN pip3 install --no-cache https://github.com/kpu/kenlm/archive/master.zip

WORKDIR /
CMD ["/bin/bash"]

@kpu
Copy link
Owner

kpu commented May 31, 2023

I'm confused by this, but if you delete the msync line in question does it work? It's unnecessary when just reading a language model and maybe the kernel is angry about this.
And so we're clear Ubuntu 20.04 on real linux, not WSL?

@DanBmh
Copy link
Author

DanBmh commented May 31, 2023

Deleting the line gives me another error:

Loading the LM will be faster if you build a binary file.
Reading /kenlm/lm/test.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
language -1.99564
modeling -1.99564
is -1.68787
fun -1.99564
</s> -1.02949
--finished--
/kenlm/util/mmap.cc:146 in void util::UnmapOrThrow(void*, size_t) threw ErrnoException because `munmap(start, length)'.
Invalid argument munmap failed with 0x0 for length 18446744070488326144Aborted (core dumped)

There seems to be an issue with the length in the first error, it seems quite large...

And deleting this line as well gives me this error (I think this is an expected one):

[...]
--finished--
Could not close file 32573
Aborted (core dumped)

And so we're clear Ubuntu 20.04 on real linux, not WSL?

Yes, host is Ubuntu 22.04

@hieuhoang
Copy link
Collaborator

what is the underlying filesystem? kenlm needs to memory map the file which is not be supported by some fs. You may need to move the file to the temp directory before opening it

@DanBmh
Copy link
Author

DanBmh commented May 31, 2023

The host has ext4 and docker is using it's default one.

@hieuhoang
Copy link
Collaborator

hieuhoang commented May 31, 2023 via email

@kpu
Copy link
Owner

kpu commented May 31, 2023

It's got corrupt values in scoped_mmap. This doesn't seem to be an OS issue. Unless something weird with that is causing the corrupt value. Need to understand how those values came to be there. Stack trace?

@kpu
Copy link
Owner

kpu commented Jun 1, 2023

And there really shouldn't be a file number 32573. This all sounds like memory corruption.

@DanBmh
Copy link
Author

DanBmh commented Jun 1, 2023

@hieuhoang memory amount shouldn't be the issue, I'm using the example arpa file with a size of 3KB

@kpu can you replicate the issue on your own computer?


Need to understand how those values came to be there. Stack trace?

How do I activate it?

@kpu
Copy link
Owner

kpu commented Jun 1, 2023

Default KENLM_MAX_ORDER is 6 and you've compiled cmake with the default. But your program is compiled with -DKENLM_MAX_ORDER=5.

@DanBmh
Copy link
Author

DanBmh commented Jun 5, 2023

Thanks for your help, with -DKENLM_MAX_ORDER=6 it's working.

@DanBmh DanBmh closed this as completed Jun 5, 2023
@DanBmh
Copy link
Author

DanBmh commented Jun 5, 2023

Now it's also matching the scores from the python example (which would have been my next question otherwise) , the new scores are:

language -2.41061
modeling -15
is -23.6879
fun -2.29666
</s> -21.0295
--finished--

I'm not an expert in this, but shouldn't the scores be the same for the same arpa file, or is this just a side-effect from the wrong MAX_ORDER size?

@kpu
Copy link
Owner

kpu commented Jun 5, 2023

You had random memory corruption due to the structs not having the same definition in the compiled library and executable. All bets are off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants