Skip to content
This repository has been archived by the owner on Jan 26, 2021. It is now read-only.

[Inference] Infer: Program received signal SIGSEGV, Segmentation fault. But train model is OK #25

Closed
heavendai opened this issue Feb 19, 2016 · 6 comments

Comments

@heavendai
Copy link

Hi, @feiga .
I cannot inference new doc by lightlda infer tool. Can you give me a hand? My problem is:

  1. run the lightlda tool to train model using the following command:
    $bin/lightlda -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 111000 -input_dir $dir -data_capacity 500
    I got server_0_table_0.model, server_0_table_1.model and doc_topic.0
  2. run the infer tool to inference new docs using the following comand:
    mv doc_topic.0 doc_topic.0.tr
    $bin/infer -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 110629 -input_dir $dir -data_capacity 500
    I run the command in the same dir with running lightlda (including block.0, vocab.0, vocab.0.txt), But I got the error info:
    INFO] [2016-02-19 14:44:38] Actual Alias capacity: 5 MB
    [INFO] [2016-02-19 14:44:38] loading model
    [INFO] [2016-02-19 14:44:38] loading word topic table[server_0_table_0.model]
    [INFO] [2016-02-19 14:44:38] loading summary table[server_0_table_1.model]
    [INFO] [2016-02-19 14:44:38] block=0, Alias Time used: 0.11 s
    [INFO] [2016-02-19 14:44:38] iter=0
    Segmentation fault (core dumped) $bin/infer -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 110629 -input_dir $dir -data_capacity 500

GDB the program, the information like follow:
(gdb) r -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 110629 -input_dir /home/disk4/daimingyang/tools/DMTK/lightlda_feiga/example/data/20151001_65w_200k -data_capacity 500
Starting program: /home/disk4/daimingyang/tools/DMTK/lightlda_feiga/bin/infer -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 110629 -input_dir /home/disk4/daimingyang/tools/DMTK/lightlda_feiga/example/data/20151001_65w_200k -data_capacity 500
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/tls/libthread_db.so.1".
[INFO] [2016-02-19 14:45:47] Actual Alias capacity: 5 MB
[INFO] [2016-02-19 14:45:47] loading model
[INFO] [2016-02-19 14:45:47] loading word topic table[server_0_table_0.model]
[INFO] [2016-02-19 14:45:47] loading summary table[server_0_table_1.model]
[INFO] [2016-02-19 14:45:47] block=0, Alias Time used: 0.11 s
[INFO] [2016-02-19 14:45:47] iter=0
[New Thread 0x40a00960 (LWP 10091)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x40a00960 (LWP 10091)]
0x000000000041bdde in multiverso::Row::At(int) ()

Can you help me? Thank you.

@heavendai heavendai changed the title [Inference] Infer::q [Inference] Infer: Program received signal SIGSEGV, Segmentation fault. But train model is OK Feb 19, 2016
@hiyijian
Copy link
Contributor

it is odd enough. your program cored at somewhere request mutiverso. however, it is nothing about mutiverso since all data including model stored at local buffer when inference.
can you please to show us the exact line number core dump happened? i will try to help

@heavendai
Copy link
Author

hi, @hiyijian
Thanks for your time. the debug core information as follows:
Run till exit from #0 multiverso::lightlda::AliasTable::Build (this=0xe7da190, word=6, model=) at /home/tools/DMTK/lightlda_inf/src/alias_table.cpp:82
multiverso::lightlda::Inferer::BeforeIteration (this=this@entry=0xe7da530, block=block@entry=0) at /home/tools/DMTK/lightlda_inf/inference/inferer.cpp:53
Value returned is $2 = 0

looking forward to your reply.

@hiyijian
Copy link
Contributor

I am sorry but can you share me with trained model and new/unseen docs data? I will try to reproduct it. I found that It's difficult to figure out the problem with core information.

@heavendai
Copy link
Author

OK. How do i share the model with you?
can you give me your email?

@hiyijian
Copy link
Contributor

@heavendai hiyijian@qq.com

@hiyijian
Copy link
Contributor

Except data validity, the problem partly caused by a boundary condition check. @feiga has already fixed in commit 733a06c.
Thanks for your report

@chivee chivee closed this as completed Jun 30, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants