Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault during inference #35

Open
Thopic opened this issue Jan 23, 2019 · 6 comments
Open

Segmentation fault during inference #35

Thopic opened this issue Jan 23, 2019 · 6 comments
Labels

Comments

@Thopic
Copy link

Thopic commented Jan 23, 2019

Hi Quentin,

I have a recurrent "Segmentation fault", crashing IGoR, with no error message during inference. It happens both on my laptop and on the cluster, usually at the start of an iteration (rarely the first one). The iteration at which the crash happens seems to be random, even with the same sequences, sometimes it can even end without a crash (especially if I look at small number of sequences, say < 10000) . The alignment part always goes smoothly. I tested that it is not tied to a particular sequence. Additionally, when there is no crash final_marginals.txt is produced and looks reasonable.

I'm providing IGoR my own genomics files and model_parms.txt and running it with the following commands:

cmd="igor -set_wd $wdpath -batch foo"
$cmd -read_seqs $sequences
cmd="$cmd -set_genomic --V $mygenomicVs --J $mygenomicsJs --D $mygenomicsDs -set_CDR3_anchors --V $myanchorsV --J $myanchorsJ"  
$cmd -align --V
$cmd -align --D --thresh 30
$cmd -align --J
$cmd -infer -set_custom_model $mymodel_params

I'm using the current version of IGoR, compiled with

./configure --prefix=$HOME/.local/ && make && make install

I've tried to run it through valgrind to pinpoint where the error happens, so I've compiled IGoR with debug flags:

./configure --prefix=$HOME/.local/ CPPFLAGS=-DDEBUG CXXFLAGS="-g -O0" && make && make install

And ran the same command as before (without the alignment part)

cmd="valgrind --leak-check=yes $cmd"
$cmd -infer -set_custom_model $mymodel_params

At the start of the inference process (just after "Initialization of probability bounds over."), I start to get multiple valgrind errors, with the first one being:

==24496== Invalid read of size 4
==24496==    at 0x169635: Dinucl_markov::update_event_internal_probas(std::unique_ptr<long double [], std::default_delete<long double []> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&) (Dinuclmarkov.cpp:479)
==24496==    by 0x186731: GenModel::infer_model(std::vector<std::tuple<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<Gene_class, std::vector<Alignment_data, std::allocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > >, std::allocator<std::tuple<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<Gene_class, std::vector<Alignment_data, std::allocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > > > > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, double, bool, double, double) [clone ._omp_fn.0] (GenModel.cpp:315)
==24496==    by 0x599C97D: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==24496==    by 0x4E436DA: start_thread (pthread_create.c:463)
==24496==    by 0x5EEE88E: clone (clone.S:95)
==24496==  Address 0x13d50240 is 0 bytes after a block of size 209,760 alloc'd
==24496==    at 0x4C3089F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24496==    by 0x1D93B9: Model_marginals::Model_marginals(Model_marginals const&) (Model_marginals.cpp:48)
==24496==    by 0x18543E: GenModel::infer_model(std::vector<std::tuple<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<Gene_class, std::vector<Alignment_data, std::allocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > >, std::allocator<std::tuple<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<Gene_class, std::vector<Alignment_data, std::allocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > > > > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, double, bool, double, double) [clone ._omp_fn.0] (GenModel.cpp:193)
==24496==    by 0x599C97D: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==24496==    by 0x4E436DA: start_thread (pthread_create.c:463)
==24496==    by 0x5EEE88E: clone (clone.S:95)
==24496== 

That's it I think, if you need my genes/CDR3/model_parms files, or the sequences, just ask.

Additional question, is this the "right way" to provide IGoR with a non-supplied specie ?

Thanks,

Thomas

@qmarcou
Copy link
Owner

qmarcou commented Jan 24, 2019

Hi Thomas,
Thanks for taking the time to produce such a detailed bug report! I highly appreciate it
This indeed seems to be a real segfault issue, valgrind points to the following:

this->dinuc_proba_matrix(i,j) += marginal_array[event_index + i*event_realizations.size() + j];
//TODO This is risky in case the code evolves to have more than 4 realizations

At first I thought I had made a mistake upon making changes to handle ambiguous nucleotides within IGoR but it seems commit 4a1ac62 did not change anything there. I had checked all the code using valgrind at some point but maybe I missed something. Also it seems upon writing this line of code I was already suspicious of its potential pitfalls.

This is indeed the right way to supply a model for an undefined species, however did you make changes to the model_marginals or model_parms by hand or did you use the command line and or cpp interface?

I'll try and find what is wrong just from the error message but I might ask you all the models and sequences if I cannot find an obvious cause.

Best,
Quentin

@qmarcou qmarcou added the bug label Jan 24, 2019
@Thopic
Copy link
Author

Thopic commented Jan 24, 2019

Thanks for the fast answer,

I'm not providing the model_marginals (if I'm reading the documentation correctly, this should initialize it to the uniform distribution), but I write my model_params.txt manually, which could cause the issue.

Here is a summary of the file:

@Event_list
#GeneChoice;V_gene;Undefined_side;7;v_choice
%Omyk_Chr12_NC_035088.1_IGHV20_81397765_81398073;GTTTGCTGTGATGAGATGGATCAGTCTCCTTCTCAGGTGAAAAGACCTGGAGACAAGTTTAAATTGTCATGTAAAATCTCTGGGTTTGATATGACAGACTACTATATGACCTGGATAAGACAGAAACCAGGCAAAGCTCTGGAGTGGATTGGGGGTATTGACTCTGGTTCAACAAATTCTCCAGACTACTCAGACTCCCTGAAAGGCCAGTTCAGCCTGACTGAGGACGTCTCTACAAGCACACAGTTCTTAGAGGCCAAGAGTCTGAGTTCAGAGGACTCTGCTGTTTATTATTGTGCTCGACAGG;0
%Omyk_Chr12_NC_035088.1_IGHV37_81466761_81467069;GTTTGCTGTGATGAGATGGATCAGTCACCTTCTCAGGTGAAAAGACCTGGAGACACGTTTAAATTGTCATGTAAAATCTCTGGGTTTGATATGACAGACTATTTCATGCACTGGATAAGACAGAAACCAGGCAAAGCTCTGGAGTGGATTGGGAGAATTAACTCTGGTTCAAGAAATGCTCCAGTCTACTCAGACTCCCTGAAAGGCCAGTTCACCCTTACTGAGGACATCTCTACAAGCACACAGTTCTTAGAGGCCAAGAGTCTGAGTTCAGAGGACTCTGCTGTTTATTATTGTGCTCGAGAGA;1
%Omyk_Chr12_NC_035088.1_IGHV57_81618381_81618689;GTTTGCTGTGATGAGATGGATCAGTCACCTTCTCAGGTGAAAAGACCTGGAGACACGTTTAAATTGTCATGTAAAATCTCTGGGTTTGATATGACAGACTATTTCATGCACTGGATAAGACAGAAACCAGGCAAAGCTCTGGAGTGGATTGGGGGTATTTACTCTGGTTCAACAGATGCTCCAGTCTACTCAGACTCCCTGAAAGGCCAGTTCATCCTTACTGAGGACATCTCTACAAGCACACAGTTCTTAGAGGCCAAGAGTCTGAGTTCAGAGGACTCTGCTGTTTATTATTGTGCTCGAGAGA;2
#GeneChoice;D_gene;Undefined_side;6;d_gene
%Omyk_Chr12_NC_035088.1_RSS_IGHD1_RSS_81394296_81394994;ACTATATGGGGGC;0
...
%Omyk_Chr13_NC_035089.1_RSS_IGHD11_RSS_48342957_48343024;TATGGGGGCAGC;23
#GeneChoice;J_gene;Undefined_side;7;j_choice
%Omyk_Chr12_NC_035088.1_RSS_JH1_SS_81678720_81678808;ATGCTTTTGACTACTGGGGTAAAGGGACACAAGTCACCGTCTCAACAG;0
...
%Omyk_Chr13_NC_035089.1_RSS_IGHJ10_SS_48368507_48368599;ACAATGCTTTTGACTACTGGGGGAAAGGCACAATGGTTACCGTTTCATCAG;18
#Deletion;V_gene;Three_prime;5;v_3_del
%-4;0
...
%16;20
#Deletion;D_gene;Three_prime;5;d_3_del
%-4;0
...
%16;20
#Deletion;D_gene;Five_prime;5;d_5_del
%-4;0
...
%16;20
#Deletion;J_gene;Five_prime;5;j_5_del
%-4;0
...
%18;22
#Insertion;VD_genes;Undefined_side;4;vd_ins
%0;0
...
%30;30
#Insertion;DJ_gene;Undefined_side;2;dj_ins
%0;0
...
%30;30
#DinucMarkov;VD_genes;Undefined_side;3;vd_dinucl
%T;3
%C;1
%G;2
%A;0
#DinucMarkov;DJ_gene;Undefined_side;1;dj_dinucl
%T;3
%C;1
%G;2
%A;0
@Edges
%GeneChoice_V_gene_Undefined_side_prio7_size3;Deletion_V_gene_Three_prime_prio5_size21
%GeneChoice_V_gene_Undefined_side_prio7_size3;GeneChoice_D_gene_Undefined_side_prio6_size24
%GeneChoice_V_gene_Undefined_side_prio7_size3;GeneChoice_J_gene_Undefined_side_prio7_size19
%GeneChoice_D_gene_Undefined_side_prio6_size24;Deletion_D_gene_Three_prime_prio5_size21
%GeneChoice_D_gene_Undefined_side_prio6_size24;Deletion_D_gene_Five_prime_prio5_size21
%GeneChoice_J_gene_Undefined_side_prio7_size19;Deletion_J_gene_Five_prime_prio5_size23
%GeneChoice_J_gene_Undefined_side_prio7_size19;GeneChoice_D_gene_Undefined_side_prio6_size24
%Deletion_D_gene_Five_prime_prio5_size21;Deletion_D_gene_Three_prime_prio5_size21
@ErrorRate
#SingleErrorRate
0.005

(Only three V-genes, I'm focusing on one primer)

Best,
Thomas

@Thopic
Copy link
Author

Thopic commented Feb 5, 2019

Hi,

I think I solved the first Valgrind error, you were right it was a "typo" leftover from your move to ambiguous nucleotides. I sent you a pull request (which is completely unnecessary for this one-line fix, but I never sent one, and I wanted to try it out)

It didn't solve my issue though, and a few other memory errors are popping out, I'm gonna try to look into these in the next few days (weeks more probably). I'm joining my valgrind error log, if you think your workload is too light,

Best,

Thomas

valgrind_log.txt

@qmarcou
Copy link
Owner

qmarcou commented Feb 5, 2019

Hi Thomas,
Sorry about this I should have spotted it looking more than 2 seconds at the code! Thanks for the PR i'll accept it now!
Keep me updated of your Valgrind quest, I'm afraid because Valgrind have very long runtimes you might not get past the model initialization (maybe I'm wrong, but it will clearly take you some time). Looking at TRA like models could make the computation lighter (although it might hide the issues you are looking for)
Quentin

@Thopic
Copy link
Author

Thopic commented Apr 26, 2019

Hi,

I think I figured out how to avoid the segmentation fault, in my model the maximum number of allowed deletions on the D gene was too low , and when I increased it, IGoR ran fine.

So if someone happens to get the same error, increasing the range of parameters of the model seems to work.

While I'm here and because this probably doesn't need a new issue, I have a small question, do you know what happens in IGoR when delD3 + delD5 > len(D) ? (or even delD3 > len(D))

Best,

Thomas

@qmarcou
Copy link
Owner

qmarcou commented Apr 23, 2020

Hi Thomas,
Not sure my answer is still relevant but I find this weird that increasing D deletion range would change anything in this segfault thing... I or @alfaceor still need to get a look at this!

As for your last question: the code design assuming there is no deletion simply prevents this from happening. You can only delete up to current D length (meaning if you have added p nucleotides on 1 side, it is still allowed to delete them upon deleting on the other side).

Quentin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants