Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault using side arpa or binary #159

Closed
olesyaksyon opened this issue Aug 18, 2020 · 4 comments
Closed

segmentation fault using side arpa or binary #159

olesyaksyon opened this issue Aug 18, 2020 · 4 comments

Comments

@olesyaksyon
Copy link

Hi.

I am trying to test ctcdecode with my arpa model.

test are done well, but when I am using my generated arpa, I get segfault.

Here is gdb output

(gdb) run my_test.py
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/python3 my_test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffa5b06700 (LWP 26274)]
[New Thread 0x7fffa5305700 (LWP 26275)]
[New Thread 0x7fffa0b04700 (LWP 26276)]
[New Thread 0x7fff9e303700 (LWP 26277)]
[New Thread 0x7fff9db02700 (LWP 26278)]
[New Thread 0x7fff99301700 (LWP 26279)]
[New Thread 0x7fff98b00700 (LWP 26280)]
[New Thread 0x7fff942ff700 (LWP 26281)]
[New Thread 0x7fff91afe700 (LWP 26282)]
[New Thread 0x7fff8f2fd700 (LWP 26283)]
[New Thread 0x7fff8cafc700 (LWP 26284)]
[New Thread 0x7fff8a2fb700 (LWP 26285)]
[New Thread 0x7fff87afa700 (LWP 26286)]
[New Thread 0x7fff852f9700 (LWP 26287)]
[New Thread 0x7fff82af8700 (LWP 26288)]
[New Thread 0x7fff640d5700 (LWP 26289)]
[New Thread 0x7fff638d4700 (LWP 26290)]
[New Thread 0x7fff630d3700 (LWP 26291)]
[New Thread 0x7fff628d2700 (LWP 26292)]

Thread 18 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff638d4700 (LWP 26290)]
fst::SortedMatcher<fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl >, fst::VectorState<fst::ArcTpl<fst::TropicalWeightTpl >, std::allocator<fst::ArcTpl<fst::TropicalWeightTpl > > > > >::BinarySearch (this=0x7fff54000cc0) at /tmp/pip-req-build-yz1tiwhq/third_party/openfst-1.6.7/src/include/fst/matcher.h:360
360 /tmp/pip-req-build-yz1tiwhq/third_party/openfst-1.6.7/src/include/fst/matcher.h: No such file or directory.

and backtrace output

(gdb) bt
#0 fst::SortedMatcher<fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl >, fst::VectorState<fst::ArcTpl<fst::TropicalWeightTpl >, std::allocator<fst::ArcTpl<fst::TropicalWeightTpl > > > > >::BinarySearch (this=0x7fff54000cc0) at /tmp/pip-req-build-yz1tiwhq/third_party/openfst-1.6.7/src/include/fst/matcher.h:360
#1 fst::SortedMatcher<fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl >, fst::VectorState<fst::ArcTpl<fst::TropicalWeightTpl >, std::allocator<fst::ArcTpl<fst::TropicalWeightTpl > > > > >::Search (this=0x7fff54000cc0) at /tmp/pip-req-build-yz1tiwhq/third_party/openfst-1.6.7/src/include/fst/matcher.h:384
#2 fst::SortedMatcher<fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl >, fst::VectorState<fst::ArcTpl<fst::TropicalWeightTpl >, std::allocator<fst::ArcTpl<fst::TropicalWeightTpl > > > > >::Find (match_label=2, this=) at /tmp/pip-req-build-yz1tiwhq/third_party/openfst-1.6.7/src/include/fst/matcher.h:256
#3 PathTrie::get_path_trie (this=this@entry=0x7fff638d3be8, new_char=new_char@entry=1, new_timestep=0, cur_log_prob_c=cur_log_prob_c@entry=-16.6958027, reset=reset@entry=true)
at /tmp/pip-req-build-yz1tiwhq/ctcdecode/src/path_trie.cpp:61
#4 0x00007fff7b0ae2c4 in DecoderState::next (this=this@entry=0x7fff638d3b80, probs_seq=std::vector of length 353, capacity 353 = {...})
at /tmp/pip-req-build-yz1tiwhq/ctcdecode/src/ctc_beam_search_decoder.cpp:107
#5 0x00007fff7b0afc41 in ctc_beam_search_decoder (probs_seq=std::vector of length 353, capacity 353 = {...}, vocabulary=..., beam_size=, cutoff_prob=,
cutoff_top_n=, blank_id=, log_input=0, ext_scorer=0x24da860) at /tmp/pip-req-build-yz1tiwhq/ctcdecode/src/ctc_beam_search_decoder.cpp:224
#6 0x00007fff7b0b06da in std::__invoke_impl<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > >, std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > (&)(std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > > const&, std::vector<std::string, std::allocatorstd::string > const&, unsigned long, double, unsigned long, unsigned long, int, Scorer), std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > >&, std::vector<std::string, std::allocatorstd::string >&, unsigned long&, double&, unsigned long&, unsigned long&, int&, Scorer*&> (__f=) at /usr/include/c++/7/bits/invoke.h:60
#7 std::__invoke<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > (&)(std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > > const&, std::vector<std::string, std::allocatorstd::string > const&, unsigned long, double, unsigned long, unsigned long, int, Scorer), std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > >&, std::vector<std::string, std::allocatorstd::string >&, unsigned long&, double&, unsigned long&, unsigned long&, int&, Scorer*&> (__fn=) at /usr/include/c++/7/bits/invoke.h:96
#8 std::_Bind<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > ((std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > >, std::vector<std::string, std::allocatorstd::string >, unsigned long, double, unsigned long, unsigned long, int, Scorer))(std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > > const&, std::vector<std::string, std::allocatorstd::string > const&, unsigned long, double, unsigned long, unsigned long, int, Scorer*)>::__call<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > >, , 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul, 7ul>) (__args=..., this=) at /usr/include/c++/7/functional:469
#9 std::_Bind<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > ((std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > >, std::vector<std::string, std::allocatorstd::string >, unsigned long, double, unsigned long, unsigned long, int, Scorer))(std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > > const&, std::vector<std::string, std::allocatorstd::string > const&, unsigned long, double, unsigned long, unsigned long, int, Scorer*)>::operator()<, std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > >() (this=) at /usr/include/c++/7/functional:551
#10 std::__invoke_impl<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > >, std::_Bind<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > ((std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > >, std::vector<std::string, std::allocatorstd::string >, unsigned long, double, unsigned long, unsigned long, int, Scorer))(std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > > const&, std::vector<std::string, std::allocatorstd::string > const&, unsigned long, double, unsigned long, unsigned long, int, Scorer*)>&>(std::__invoke_other, std::_Bind<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > ((std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > >, std::vector<std::string, std::allocatorstd::string >, unsigned long, double, unsigned long, unsigned long, int, Scorer))(std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > > const&, std::vector<std::string, std::allocatorstd::string > const&, unsigned long, double, unsigned long, unsigned long, int, Scorer*)>&) (__f=...) at /usr/include/c++/7/bits/invoke.h:60
#11 std::__invoke<std::_Bind<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > ((std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > >, std::vector<std::string, std::allocatorstd::string >, unsigned long, double, unsigned long, unsigned long, int, Scorer))(std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > > const&, std::vector<std::string, std::allocatorstd::string > const&, unsigned long, double, unsigned long, unsigned long, int, Scorer*)>&>(std::_Bind<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > ((std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > >, std::vector<std::string, std::allocatorstd::string >, unsigned long, double, unsigned long, unsigned long, int, Scorer))(std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > > const&, std::vector<std::string, std::allocatorstd::string > const&, unsigned long, double, unsigned long, unsigned long, int, Scorer*)>&) (__fn=...) at /usr/include/c++/7/bits/invoke.h:96
#12 std::__future_base::_Task_state<std::_Bind<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > ((std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > >, std::vector<std::string, std::allocatorstd::string >, unsigned long, double, unsigned long, unsigned long, int, Scorer))(std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > > const&, std::vector<std::string, std::allocatorstd::string > const&, unsigned long, double, unsigned long, unsigned long, int, Scorer*)>, std::allocator, std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > ()>::_M_run()::{lambda()#1}::operator()() const (
__closure=) at /usr/include/c++/7/future:1421
#13 std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > >, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::_Bind<std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > ((std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > >, std::vector<std::string, std::allocatorstd::string >, unsigned long, double, unsigned long, unsigned long, int, Scorer))(std::vector<std::vector<double, std::allocator >, std::allocator<std::vector<double, std::allocator > > > const&, std::vector<std::string, std::allocatorstd::string > const&, unsigned long, double, unsigned long, unsigned long, int, Scorer*)>, std::allocator, std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > ()>::_M_run()::{lambda()#1}, std::vector<std::pair<double, Output>, std::allocator<std::pair<double, Output> > > >::operator()() const (this=0x7fff638d3df0) at /usr/include/c++/7/future:1339

@rbracco
Copy link
Contributor

rbracco commented Aug 20, 2020

Can you give more detail? When you say "arpa model" do you mean a model trained with the 2-letter arpa symbols as your vocab? I have used IPA/Arpa for my decoding and this library does not currently support multichar symbols or unicode as discussed in #31. This caused errors for me but it looked nothing like this, maybe try running a non-threaded version until you get it working so we can see if that's the issue?

@olesyaksyon
Copy link
Author

it seems that problem is here. I use unicode symbols, as my model is in Russian. As I understood, unicode is not supported.

@rbracco
Copy link
Contributor

rbracco commented Aug 25, 2020

Your understanding is correct, unicode is not supported and will not work, but there is a way to hack it. It doesn't matter what vocab/symbols/tokens you pass in as the results you get back are encoded ints. This means you can pass a bunch of ASCII symbols as your vocab, and then decode to your real vocab. Here's an example

# your_real_vocab is your list of your actual tokens, including unicode chars, in the order they were trained in
n_classes = len(your_real_vocab)
# get a list of ASCII symbols that's the same as the number of classes you have
single_char_vocab = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")[:n_classes+1]
decoder = ctcdecode.CTCBeamDecoder(single_char_vocab, alpha=0, beta=1.85, beam_width=100, blank_id=0)
# result is a list of ints representing the index of the most likely output
result = beam_search.decode(output)
return "".join([your_real_vocab[n] for n in result])

Hope this helps and let me know if you try it and get stuck somewhere.

@olesyaksyon
Copy link
Author

thanks for making it clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants