Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No methylation calls #826

Closed
JesseBNL opened this issue Aug 25, 2020 · 42 comments
Closed

No methylation calls #826

JesseBNL opened this issue Aug 25, 2020 · 42 comments

Comments

@JesseBNL
Copy link

Hi,

We have captured a a genomic region, and want to define the methylation of those reads. However, whatever I try, there are no calls made on the data. What could go wrong?
If I use the tutorial data set, it does work, so the pipeline is running..

Thanks,
Jesse

@jts
Copy link
Owner

jts commented Aug 25, 2020

Hi @JesseBNL,

I need a bit more information - what did call-methylation print at the end of the run? Did nanopolish index work correctly? How much coverage do you have of the target region?

Jared

@JesseBNL
Copy link
Author

Hi,

it did only print the first line, no hits for reads. The indexing worked fine and I have an average coverage of approximately 150X. The region is large (200+ kb) and is covered by multiple overlapping reads. However, not every part of the target region is covered by the reads, so there will be target regions without coverage.

Jesse

@JesseBNL JesseBNL reopened this Aug 25, 2020
@jts
Copy link
Owner

jts commented Aug 25, 2020

If you only get a header line, and no line that says [post-run summary] then the process is probably still running, or failed (possibly ran out of memory?)

@JesseBNL
Copy link
Author

Yes, it's only the header line. The linux system shows the run ended, and I am sure the computer is powerful enough to run this.. Is there something else that might disturb the methylation calling?

@jts
Copy link
Owner

jts commented Aug 26, 2020

Can you paste the exact command you ran, and all the messages printed to the terminal?

@JesseBNL
Copy link
Author

JesseBNL commented Aug 26, 2020

Yes, its:

algemeen@DESKTOP-FF5H9DC:/mnt/d/Analyse/Methylation_7609$ nanopolish/nanopolish index -d fast5_pass2/ totaal_beide.fastq
[readdb] indexing fast5_pass2/
[readdb] num reads: 1381018, num reads with path to fast5: 909966
algemeen@DESKTOP-FF5H9DC:/mnt/d/Analyse/Methylation_7609$ minimap2 -a -x map-ont HG38_chr19.fasta totaal_beide.fastq | samtools sort -T tmp -o output.sorted.bam
[M::mm_idx_gen::0.0560.83] collected minimizers
[M::mm_idx_gen::0.084
1.49] sorted minimizers
[M::main::0.0851.47] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.087
1.43] mid_occ = 48
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.0881.41] distinct minimizers: 31388 (83.09% are singletons); average occurrences: 1.431; average spacing: 5.306
[M::worker_pipeline::330.113
2.97] mapped 46738 sequences
[M::worker_pipeline::604.8133.00] mapped 45268 sequences
[M::worker_pipeline::886.502
3.01] mapped 47817 sequences
[M::worker_pipeline::1169.4063.01] mapped 49225 sequences
[M::worker_pipeline::1447.063
3.01] mapped 48212 sequences
[M::worker_pipeline::1746.1893.01] mapped 46838 sequences
[M::worker_pipeline::2004.205
3.01] mapped 47760 sequences
[M::worker_pipeline::2290.7813.02] mapped 47394 sequences
[M::worker_pipeline::2559.204
3.02] mapped 46435 sequences
[M::worker_pipeline::2829.5483.02] mapped 46369 sequences
[M::worker_pipeline::3108.926
3.02] mapped 96943 sequences
[M::worker_pipeline::3409.3473.02] mapped 95554 sequences
[M::worker_pipeline::3686.230
3.02] mapped 67331 sequences
[M::worker_pipeline::3976.3743.02] mapped 72874 sequences
[M::worker_pipeline::4269.109
3.02] mapped 77494 sequences
[M::worker_pipeline::4591.5593.02] mapped 82847 sequences
[M::worker_pipeline::4869.920
3.00] mapped 73684 sequences
[M::main] Version: 2.12-r827
[M::main] CMD: minimap2 -a -x map-ont HG38_chr19.fasta totaal_beide.fastq
[M::main] Real time: 4870.038 sec; CPU: 14606.906 sec
[bam_sort_core] merging from 19 files and 1 in-memory blocks...
algemeen@DESKTOP-FF5H9DC:/mnt/d/Analyse/Methylation_7609$ samtools index output.sorted.bam
algemeen@DESKTOP-FF5H9DC:/mnt/d/Analyse/Methylation_7609$ nanopolish/nanopolish call-methylation -t 16 -r totaal_beide.fastq -b output.sorted.bam -g HG38_chr19.fasta -w "chr19:35600-203300" > call_methylation.tsv
[bam process] iterating over region: chr19:35600-203300

And then it stops after the last line, which ends within seconds.

@jts
Copy link
Owner

jts commented Aug 26, 2020

If it terminates right after that line its probably running out of memory. Try to monitor the memory usage in another terminal window using top and see if the memory usage approaches how much RAM you have on your desktop.

@JesseBNL
Copy link
Author

Ok, CPU % jumps to 99%, so it might be the issue. I do actually have a powerful computer here, so how can I solve this?

@jts
Copy link
Owner

jts commented Aug 26, 2020

CPU usage of 99% is normal, I don't think that is the issue

@JesseBNL
Copy link
Author

Any other idea what might stop the methylation calling?

@jts
Copy link
Owner

jts commented Aug 27, 2020

Without some diagnostic information its pretty hard to debug. Can you try running this command, and paste all messages printed to the terminal result?

/usr/bin/time -v nanopolish/nanopolish call-methylation -t 16 -r totaal_beide.fastq -b output.sorted.bam -g HG38_chr19.fasta -w "chr19:35600-203300"

@jts
Copy link
Owner

jts commented Aug 27, 2020

I just noticed that the region (chr19:35600-203300) is very close to the end of the chromosome so may be repetitive. Do you have any reads with MAPQ >20 there?

@JesseBNL
Copy link
Author

Hi, the mapping quality is >20, but yes, the region we are studying is highly repetitive. Might this be a problem?

I will try the other command and post the terminal result!

@JesseBNL
Copy link
Author

This is the terminal output:

algemeen@DESKTOP-FF5H9DC:/mnt/d/Analyse/Methylation_7609$ /usr/bin/time -v nanopolish/nanopolish call-methylation -t 16 -r totaal_beide.fastq -b output.sorted.bam -g HG38_chr19.fasta -w "chr19:35600-203300"
chromosome strand start end read_name log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands num_motifs sequence
[bam process] iterating over region: chr19:35600-203300
Command being timed: "nanopolish/nanopolish call-methylation -t 16 -r totaal_beide.fastq -b output.sorted.bam -g HG38_chr19.fasta -w chr19:35600-203300"
User time (seconds): 2.98
System time (seconds): 2.25
Percent of CPU this job got: 105%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.95
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 413232
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 122489
Voluntary context switches: 0
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

@jts
Copy link
Owner

jts commented Aug 27, 2020

Do you have any other regions to try? I believe the most likely explanation is that nanopolish couldn't use any of your data

@JesseBNL
Copy link
Author

JesseBNL commented Aug 27, 2020

Hmm, no this is all we captured. And the alignment looks fine, mapping the reference genome where it should.

It is also a CpG rich region, so we do expect a lot of methylation calls..

@jts
Copy link
Owner

jts commented Aug 27, 2020 via email

@JesseBNL
Copy link
Author

Yes, how can I send this file to you?

@jts
Copy link
Owner

jts commented Aug 27, 2020 via email

@JesseBNL
Copy link
Author

Hi,

you can download it here:
https://www.dropbox.com/s/lhy2netlky82fvb/output.sorted.bam.gz?dl=0

@jts
Copy link
Owner

jts commented Aug 28, 2020

I tried to look at the region of interest with samtools view output.sorted.bam chr19:35600-203300 but got this error:

[main_samview] region "chr19:35600-203300" specifies an unknown reference name. Continue anyway.

Can you let me know which region I should look at?

@JesseBNL
Copy link
Author

Hi, I used this reference, its HG38, but the target region extracted.
HG38_chr19.fasta.gz

@jts
Copy link
Owner

jts commented Sep 1, 2020

By extracting the region you've changed the coordinate system so the region specified in the nanopolish command (chr19:35600-203300) is no longer valid. I recommend using the full reference genome for both mapping the reads and running call-methylation. This will avoid the need to perform coordinate transformations between commands.

@JesseBNL
Copy link
Author

JesseBNL commented Sep 1, 2020

I changed the coordinates to the extracted reference. But I will give it another try with the full reference and update you. Thanks.

@JesseBNL
Copy link
Author

JesseBNL commented Oct 1, 2020

Sorry for my late reply.

I have managed to get the reads mapped in the region in which I expect them to match. But I get back that all mapped reads in the target region have bad fast5 files:
[post-run summary] total reads: 2572, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 0, bad fast5: 2569

Which leaves the methylation_calls.tsv empty. What might this be? The indexing seems to be successful and the fastq files were only modified by merging with cat command.

@jts
Copy link
Owner

jts commented Oct 1, 2020

Hi,

I have added a small program called nanopolish fast5-check to help debug these issues. Can you please grab the fast5check branch from github (https://github.com/jts/nanopolish/tree/fast5check), compile it and run:

nanopolish fast5-check -r reads.fastq

This will hopefully help us understand what the problem is.

Jared

@JesseBNL
Copy link
Author

JesseBNL commented Oct 5, 2020 via email

@jts
Copy link
Owner

jts commented Oct 5, 2020

Hi,

I've just merged it into the master branch, so if you do git pull origin master in your nanopolish directory then make you should have it.

Jared

@JesseBNL
Copy link
Author

JesseBNL commented Oct 5, 2020 via email

@jts
Copy link
Owner

jts commented Oct 5, 2020

Let it go for awhile and see if any "ERROR" messages come up.

Jared

@JesseBNL
Copy link
Author

JesseBNL commented Oct 6, 2020

It finished without errors

@jts
Copy link
Owner

jts commented Oct 6, 2020 via email

@jts
Copy link
Owner

jts commented Oct 6, 2020

Can you try the following, from your nanopolish directory:

git fetch origin
git checkout origin/issue_826
make

Then run the nanopolish call-methylation command, as before. It will print out a series of messages indicating why the fast5s can't be loaded.

@JesseBNL
Copy link
Author

JesseBNL commented Oct 7, 2020

Ok, this got something out. I used the command:

nanopolish/nanopolish call-methylation -t 16 -r Contig_geneious.fasta -b Contig_geneious.bam -g chr19.fasta -w "chr19:55200000-55400000" > methylation_calls4.tsv

And returned:

[bam process] iterating over region: chr19:55200000-55400000
[warning] fast5 skipped because path is empty (58cc1d43-b426-4369-b40e-a977243ecd76_runid=62bbd1c993a6abf63df49dc0d1838af537144d43_sampleid=no_sample_read=19111_ch=486_start_time=2020-09-07T14:56:49Z)
[warning] fast5 skipped because path is empty (4e4af99a-2fa3-42c2-ad77-4696b0aba05e_runid=62bbd1c993a6abf63df49dc0d1838af537144d43_sampleid=no_sample_read=89998_ch=152_start_time=2020-09-08T01:50:16Z)
[warning] fast5 skipped because path is empty (800ce81e-2f2d-4517-b2ce-cd352ed5b5b8_runid=62bbd1c993a6abf63df49dc0d1838af537144d43_sampleid=no_sample_read=29779_ch=194_start_time=2020-09-07T17:42:50Z)
[warning] fast5 skipped because path is empty (4e4af99a-2fa3-42c2-ad77-4696b0aba05e_runid=62bbd1c993a6abf63df49dc0d1838af537144d43_sampleid=no_sample_read=89998_ch=152_start_time=2020-09-08T01:50:16Z)
[warning] fast5 skipped because path is empty (58cc1d43-b426-4369-b40e-a977243ecd76_runid=62bbd1c993a6abf63df49dc0d1838af537144d43_sampleid=no_sample_read=19111_ch=486_start_time=2020-09-07T14:56:49Z)

And so on.. So it seems it cannot define the path? How might this be fixed?

@jts
Copy link
Owner

jts commented Oct 7, 2020

Hi,

It looks like you have joined the read_id (58cc1d43-b426-4369-b40e-a977243ecd76) with the metadata in the fastq header (runid=62bbd1c99...). Nanopolish just expects the read ID, and not the metadata so it cannot find your reads in the readdb index. If you fix your fastq file it should work.

Jared

@JesseBNL
Copy link
Author

Hi Jared,

Indeed this was a problem. But now I have the same problem as at the start.. The "iterating over region" ends suddenly and the tsv file ends up empty. I have too a little sample with only a few mapping reads, to speed up the testing. Might I send those files to you so you can give it a try?

@zoucheng123
Copy link

Hi,
The same issue came up in my analysis. Have you resolved it yet?
Thanks!

Commands:
nanopolish call-methylation -b minimap2_sorted.bam -r pass_fmlrc.fa -q cpg -t 2 -g Homo_sapiens.GRCh38.dna.primary_assembly.fa -w "2:5,000,000-8,000,000" >test.txt

Terminal output:
[bam process] iterating over region: 2:5,000,000-8,000,000
[post-run summary] total reads: 88, unparseable: 0, qc fail: 2, could not calibrate: 7, no alignment: 2, bad fast5: 0

@YingziZhang-github
Copy link

YingziZhang-github commented Feb 19, 2022

Hi,
I can only call a little methylation. The no alignment of mine is very high and I (only) got 3 callings. Reports here:

[post-run summary] total reads: 32531, unparseable: 0, qc fail: 0, could not calibrate: 37, no alignment: 32492, bad fast5: 389

chromosome      strand  start   end     read_name       log_lik_ratio   log_lik_methylated      log_lik_unmethylated    num_calling_strands     num_motifs      sequence
chr12   -       121364758       121364758       049943a7-fae5-459b-86b9-1a3cf0f93177    1.65    -139.72 -141.36 1       1       GAAGACGAAGA
chr12   -       121364794       121364794       049943a7-fae5-459b-86b9-1a3cf0f93177    0.81    -330.80 -331.62 1       1       GAAGACGAAGA
chrX    -       150222688       150222688       7993b266-4def-4938-8826-62a678911919    0.44    -168.95 -169.39 1       1       ACCTACGGTGT

I have been fixing this problem for several days trying debugging by indexing different paths or downloading reference genomes but they didn't help. Do you know how to fix the high no alignment problem? Thank you very much.

@jts
Copy link
Owner

jts commented Feb 19, 2022 via email

@YingziZhang-github
Copy link

YingziZhang-github commented Feb 19, 2022

Hi, Can you describe the data? Is it R9.4?

On Feb 19, 2022, at 9:16 AM, Yingzi Zhang 张樱子 @.***> wrote:  Hi, I cannot call the methylation as well. The no alignment of mine is very high. Reports here: [post-run summary] total reads: 229, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 229, bad fast5: 0 [post-run summary] total reads: 353, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 353, bad fast5: 3 I have been fixing this problem for several days trying debugging by indexing different paths or downloading reference genomes but they didn't help. Do you know how to fix this problem? Thank you very much. — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.

Hi,
Thank you for the reply. It is R10.3.
The DNA sample was from humans.
Here is my analysis process:

guppy_basecaller -c dna_r10.3_450bps_sup.cfg –cpu_threads_per_caller 20 –num_callers 2 -r -i <fastq5 files> -s basecalled/
cat ./basecalled/pass/*.fastq > albacore_output.fastq
nanopolish index -d <fastq5 files> albacore_output.fastq
minimap2 -t 20 -a -x map-ont -Y --MD <genome.fa> albacore_output.fastq | samtools sort -T tmp -o output.sorted.bam
samtools index output.sorted.bam
samtools flagstat output.sorted.bam    #42639 + 0 mapped (99.41% : N/A)
nanopolish call-methylation -t 8 -r albacore_output.fastq -b output.sorted.bam -g <genome.fa> > whole.methylation_calls.tsv

The report and the results were as shown in my last message.

[post-run summary] total reads: 32531, unparseable: 0, qc fail: 0, could not calibrate: 37, no alignment: 32492, bad fast5: 389

chromosome      strand  start   end     read_name       log_lik_ratio   log_lik_methylated      log_lik_unmethylated    num_calling_strands     num_motifs      sequence
chr12   -       121364758       121364758       049943a7-fae5-459b-86b9-1a3cf0f93177    1.65    -139.72 -141.36 1       1       GAAGACGAAGA
chr12   -       121364794       121364794       049943a7-fae5-459b-86b9-1a3cf0f93177    0.81    -330.80 -331.62 1       1       GAAGACGAAGA
chrX    -       150222688       150222688       7993b266-4def-4938-8826-62a678911919    0.44    -168.95 -169.39 1       1       ACCTACGGTGT

Many thanks.

@YingziZhang-github
Copy link

Hi, Can you describe the data? Is it R9.4?

On Feb 19, 2022, at 9:16 AM, Yingzi Zhang 张樱子 @.***> wrote:  Hi, I cannot call the methylation as well. The no alignment of mine is very high. Reports here: [post-run summary] total reads: 229, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 229, bad fast5: 0 [post-run summary] total reads: 353, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 353, bad fast5: 3 I have been fixing this problem for several days trying debugging by indexing different paths or downloading reference genomes but they didn't help. Do you know how to fix this problem? Thank you very much. — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.

Hi, Thank you for the reply. It is R10.3. The DNA sample was from humans. Here is my analysis process:

guppy_basecaller -c dna_r10.3_450bps_sup.cfg –cpu_threads_per_caller 20 –num_callers 2 -r -i <fastq5 files> -s basecalled/
cat ./basecalled/pass/*.fastq > albacore_output.fastq
nanopolish index -d <fastq5 files> albacore_output.fastq
minimap2 -t 20 -a -x map-ont -Y --MD <genome.fa> albacore_output.fastq | samtools sort -T tmp -o output.sorted.bam
samtools index output.sorted.bam
samtools flagstat output.sorted.bam    #42639 + 0 mapped (99.41% : N/A)
nanopolish call-methylation -t 8 -r albacore_output.fastq -b output.sorted.bam -g <genome.fa> > whole.methylation_calls.tsv

The report and the results were as shown in my last message.

[post-run summary] total reads: 32531, unparseable: 0, qc fail: 0, could not calibrate: 37, no alignment: 32492, bad fast5: 389

chromosome      strand  start   end     read_name       log_lik_ratio   log_lik_methylated      log_lik_unmethylated    num_calling_strands     num_motifs      sequence
chr12   -       121364758       121364758       049943a7-fae5-459b-86b9-1a3cf0f93177    1.65    -139.72 -141.36 1       1       GAAGACGAAGA
chr12   -       121364794       121364794       049943a7-fae5-459b-86b9-1a3cf0f93177    0.81    -330.80 -331.62 1       1       GAAGACGAAGA
chrX    -       150222688       150222688       7993b266-4def-4938-8826-62a678911919    0.44    -168.95 -169.39 1       1       ACCTACGGTGT

Many thanks.

Hi there, my problem still exists. Now I wonder if it is because my sequencing data were generated by R10.3. Could this be the case? Would appreciate it very much if you can reply to my problem. Thank you!

Best,

@jts
Copy link
Owner

jts commented Feb 25, 2022

Hi,

Sorry for the slow reply. Yes it is because it is R10.3. There is experimental support for R10 data in the r10 branch but I make no claims about its performance or the accuracy of the calls.

Jared

@jts jts closed this as completed Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants