Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r10 branch nanopolish train Error: unknown model: #761

Closed
Psy-Fer opened this issue Apr 14, 2020 · 9 comments
Closed

r10 branch nanopolish train Error: unknown model: #761

Psy-Fer opened this issue Apr 14, 2020 · 9 comments

Comments

@Psy-Fer
Copy link

Psy-Fer commented Apr 14, 2020

Hello Jared,

I'm running this to try and train a positive control (then a negative)

nanopolish train --progress --rounds=10 --threads=4 --train-kmers=methylated --input-model-filename src/builtin_models/r9_4_450bps_cpg_6mer_template_model.inl --reads ~/data/meth/training/pos/positive_pass.fastq --bam ~/data/meth/training/pos/positive_pass.sorted.bam --genome ~/data/meth/training/hg38noAlt.fa --verbose -d ~/data/meth/training/pos/out/

I've tried changing the --input-model-filename to r9.4_450bps as well, but same error.

Is there something i'm doing wrong?

Also, any help with the steps to build a methylation model to use with nanopolish call-methylation using nanopolish train would be appreciated.

Cheers,
James

@Psy-Fer
Copy link
Author

Psy-Fer commented Apr 14, 2020

oooooooooooh,
I used ~/nanopolish/etc/r9-models/r9.4_450bps.nucleotide.6mer.template.model

and that seemed to work.
Is that what I "should" be using for this?

James

@jts
Copy link
Owner

jts commented Apr 14, 2020 via email

@Psy-Fer
Copy link
Author

Psy-Fer commented Apr 14, 2020

And that, after a while, resulted in this

Inferring read 08138292-7d7e-4b51-a7bc-b14c2a9f527f chr7:51438571-51441928 0
nanopolish: src/nanopolish_raw_loader.inl:495: int EventBandedMatrix<StorageType>::get_offset_for_kmer_in_band(size_t, int) const [with StorageType = SimpleHMMFBStorage; size_t = long unsigned int]: Assertion `band_idx < this->band_origins.size()' failed.
Aborted (core dumped)

@jts
Copy link
Owner

jts commented Apr 14, 2020

Is that in the first round, or after a few iterations?

@Psy-Fer
Copy link
Author

Psy-Fer commented Apr 14, 2020

Hmm, not sure.
Relaunching with rounds=1
The verbosity nuked my window history to scroll back, ha!
I'll let you know

@Psy-Fer
Copy link
Author

Psy-Fer commented Apr 15, 2020

Yea, dumped again.

Inferring read 08138292-7d7e-4b51-a7bc-b14c2a9f527f chr7:51438571-51441928 0
nanopolish: src/nanopolish_raw_loader.inl:495: int EventBandedMatrix<StorageType>::get_offset_for_kmer_in_band(size_t, int) const [with StorageType = SimpleHMMFBStorage; size_t = long unsigned int]: Assertion `band_idx < this->band_origins.size()' failed.
Aborted (core dumped)

Looks like at the same point too. Maybe it's a bad read? or maybe something else.
Should I remove that read and try again?

@jts
Copy link
Owner

jts commented Apr 15, 2020 via email

@Psy-Fer
Copy link
Author

Psy-Fer commented Apr 15, 2020

Yep, dies.
I took a few reads around that one, as I was running with 4 threads, so I wasn't sure which one broke it, and I wasn't going to re-run with 1 thread. So I grabbed a few.

Then I ran with 1 thread, but otherwise the same.

nanopolish train --progress --rounds=1 --threads=1 --train-kmers=methylated --input-model-filename ~/nanopolish/etc/r9-models/r9.4_450bps.nucleotide.6mer.template.model --reads ~/data/meth/training/pos/positive_pass.test.fastq --bam ~/data/meth/training/pos/positive_pass.test.sorted.bam --genome ~/data/meth/training/hg38noAlt.fa --verbose -d ~/data/meth/training/pos/test
[train] initialized r9.4_450bps for alphabet nucleotide for 6-mers
[train] round 0
Inferring read c461563b-27d7-4ca1-ba70-028dc95ade72 chr7:51427517-51429859 0
Inferring read c461563b-27d7-4ca1-ba70-028dc95ade72 chr7:51427517-51429859 0
Inferring read c461563b-27d7-4ca1-ba70-028dc95ade72 chr7:51428418-51429848 0
Inferring read c461563b-27d7-4ca1-ba70-028dc95ade72 chr7:51428418-51429848 0
Inferring read b439a565-78b4-4d52-941d-a207ffcdb60b chr7:51428474-51430022 0
Inferring read c461563b-27d7-4ca1-ba70-028dc95ade72 chr7:51429789-51435072 0
Inferring read c461563b-27d7-4ca1-ba70-028dc95ade72 chr7:51429789-51435072 0
Inferring read c461563b-27d7-4ca1-ba70-028dc95ade72 chr7:51433035-51434265 0
nanopolish: src/nanopolish_raw_loader.inl:495: int EventBandedMatrix<StorageType>::get_offset_for_kmer_in_band(size_t, int) const [with StorageType = SimpleHMMFBStorage; size_t = long unsigned int]: Assertion `band_idx < this->band_origins.size()' failed.
Aborted (core dumped)

Here is the fastq and bam of the reads I ran. let me know if I need to extract the fast5 files for you

positive_pass.test.zip

@hasindu2008
Copy link
Contributor

Hi @jts

I was asked to have a look into this by @Psy-Fer and it seems that the assertion is caused by get_by_event_kmer(event_idx + 1, start_trim_kmer_state + 1) at

float score_diag = hmm_result.get_by_event_kmer(event_idx + 1, start_trim_kmer_state + 1) + lp_step + lp_emission_diag;

get_by_event_kmer(event_idx + 1, start_trim_kmer_state + 1) calls event_kmer_to_band(event_idx + 1, start_trim_kmer_state + 1) and this resultant band index gets equal to band_origins.size() triggering the assertion. It seems to be something to do with the band offset being off by 1 or something, but without actually knowing the context I couldn't be sure. Does it ring a bell to you?

Anyway, given that the reads that trigger this assertion is rare, I just hacked to skip such reads as https://github.com/hasindu2008/nanopolish-arm/blob/995dafe291582e8a13497e57acef750be47d8c43/src/nanopolish_raw_loader.inl#L837-L842 for now. Hope it does not introduce errors to the final training result.

Also, I grabbed the fast5s for Jame's extracted dataset. The zip file containing all the necessary files to reproduce the bu is attached.
small_test.zip

You may quickly replicate the problem by running the following inside the extracted folder.
../nanopolish/nanopolish train --progress --rounds=1 --threads=1 --train-kmers=methylated --input-model-filename ../nanopolish/etc/r9-models/r9.4_450bps.nucleotide.6mer.template.model --reads positive_pass.test.fastq --bam positive_pass.test.sorted.bam --genome /mnt/d/genome/hg38noAlt.fa --verbose -d pos_out/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants