Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing translocations when running NAIBR with candidate lists #10

Closed
Budheimer opened this issue Nov 27, 2023 · 3 comments
Closed

Missing translocations when running NAIBR with candidate lists #10

Budheimer opened this issue Nov 27, 2023 · 3 comments

Comments

@Budheimer
Copy link

Budheimer commented Nov 27, 2023

Hi,

first of all, thank you for continuing the work on NAIBR and keeping it up to date! That's great and really appreciated!
I think NAIBR is a well though-out and promising approach, and one of the few published SV callers for 10x linked-read data, that produces reasonable results on my samples (without errors!) and is still updated. One thing I really like is that it works with candidate breakpoint lists. But I think there might be a bug when calling translocations from these candidate lists:

(1) For example, when calling the translocation in your test bam file "HCC1954T_10xGenomics_chr_12_20_TRA.bam" without candidate list, the translocation is called perfectly fine (default parameters):

lmax = 598, lmin = 137, reads_included = 93.0%
========= NAIBR =========
version: 0.5.1
-------- CONFIGS --------
FILES
  bam_file = HCC1954T_10xGenomics_chr_12_20_TRA.bam
  candidates = None
  blacklist = None
  outdir = example
  prefix = NAIBR_SVs
PARAMETERS
  d =            10,000
  min_mapq =         40
  k =                 3
  min_sv =        1,000
  sd_mult =           2
  min_len =       1,196
  max_len =     200,000
  min_reads =         2
  min_discs =         2
  threads =           1
  DEBUG =          True
-------------------------
2023-11-27 13:07:21,227 - INFO: Found 2 chromosomes with data in BAM
2023-11-27 13:07:21,227 - INFO: Getting candidates for chromosome chr12
Processing chr12: DONE! 19,238 pairs (total time = 0.903 s)!
2023-11-27 13:07:22,179 - INFO: Done reading chromosome chr12: coverage = 56.898X
2023-11-27 13:07:22,180 - INFO: chr12: total pairs: 18,401, discordant: 367 (1.99%), concordant: 17,198 (93.46%)
2023-11-27 13:07:22,187 - INFO: For chromsome chr12 - found 184 positions with discordant reads.
Parsing discs: DONE! 184 pos (total time = 0.000 s)!
2023-11-27 13:07:22,370 - INFO: Got candidate noval adjacencies from data: 0.1827 s
2023-11-27 13:07:22,370 - INFO: Ranking 2 candidates from chr12
2023-11-27 13:07:22,607 - INFO: Getting candidates for chromosome chr20
Processing chr20: DONE! 26,925 pairs (total time = 0.920 s)!
2023-11-27 13:07:23,681 - INFO: Done reading chromosome chr20: coverage = 80.098X
2023-11-27 13:07:23,681 - INFO: chr20: total pairs: 25,734, discordant: 212 (0.82%), concordant: 24,335 (94.56%)
2023-11-27 13:07:23,691 - INFO: For chromsome chr20 - found 292 positions with discordant reads.
Parsing discs: DONE! 292 pos (total time = 0.000 s)!
2023-11-27 13:07:23,902 - INFO: Got candidate noval adjacencies from data: 0.2113 s
2023-11-27 13:07:23,902 - INFO: Ranking 2 candidates from chr20
Parsing discs: DONE! 366 pos (total time = 0.001 s)!
2023-11-27 13:07:25,334 - INFO: ranking 2 interchromosomal candidates
2023-11-27 13:07:26,901 - INFO: Found 1 interchromosomal SVs of which 1 (100.0%) passed the threshold
2023-11-27 13:07:26,913 - INFO: Writing results to example/NAIBR_SVs.bedpe
2023-11-27 13:07:26,923 - INFO: Writing results to example/NAIBR_SVs.reformat.bedpe
2023-11-27 13:07:26,931 - INFO: Writing results to example/NAIBR_SVs.vcf
2023-11-27 13:07:27,002 - INFO: Finished in 0.0966313083966573 minutes

Output in NAIBR_SVs.reformat.bedpe:

#chrom1	start1	stop1	chrom2	start2	stop2	name	qual	strand1	strand2	filters	info
chr12	40393618	40393619	chr20	53974756	53974757	NAIBR_00001	9385.737	-	-	PASS	LENGTH=0;NUM_SPLIT_MOLECULES=427;NUM_DISCORDANT_READS=61;ORIENTATION=--;HAPLOTYPE=1,2;SVTYPE=TRA

But when I put the previously detected breakpoints into a candidate bedpe file like this:

chr12 40393618 40393619 chr20 53974756 53974757 --

it's not detected anymore:

lmax = 598, lmin = 137, reads_included = 93.0%
========= NAIBR =========
version: 0.5.1
-------- CONFIGS --------
FILES
  bam_file = HCC1954T_10xGenomics_chr_12_20_TRA.bam
  candidates = example_candidates.bedpe
  blacklist = None
  outdir = example
  prefix = NAIBR_SVs
PARAMETERS
  d =            10,000
  min_mapq =         40
  k =                 3
  min_sv =        1,000
  sd_mult =           2
  min_len =       1,196
  max_len =     200,000
  min_reads =         2
  min_discs =         2
  threads =           1
  DEBUG =          True
-------------------------
2023-11-27 13:09:59,782 - INFO: Using user defined candidates
2023-11-27 13:09:59,782 - INFO: Found 1 candidates in file of which 1 are long enough
Readning chr12:40203618-40583618: DONE! 19,238 pairs (total time = 0.684 s)!
Readning chr20:53784756-54164756: DONE! 26,925 pairs (total time = 0.857 s)!
2023-11-27 13:10:01,407 - DEBUG: Candidate ('chr12', 40393618, 'chr20', 53974756, '--'): coverage = 16.272780263157895
2023-11-27 13:10:01,418 - INFO: For TRA chr12:40393618-chr20:53974756 - found 476 discordant positions
Parsing discs: DONE! 476 pos (total time = 0.001 s)!
2023-11-27 13:10:01,855 - INFO: Evaluation yielded 0 viable novel adjacencies
2023-11-27 13:10:01,856 - INFO: Writing results to example/NAIBR_SVs.bedpe
2023-11-27 13:10:01,864 - INFO: Writing results to example/NAIBR_SVs.reformat.bedpe
2023-11-27 13:10:01,871 - INFO: Writing results to example/NAIBR_SVs.vcf
2023-11-27 13:10:01,944 - INFO: Finished in 0.036073052883148195 minutes

And that's the same with my 10x linked-read data. I have candidates from another caller, and deletions, inversions and tandem duplications are called, but never any translocation (and I know that there are some translocations in my samples). Strangely, that part worked with the old NAIBR from the Raphael-group git (but it produces other errors on some of my samples....).

And, if you don't mind, I have another question, where I'm not sure whether there's something wrong, or if I don't understand it correctly:

(2) Very often in the NAIBR output, I see SVs with a lot of split-molecules (high-score) but without any discordant read pairs. But I am sure from generating my candidates, that there are often plenty discordant reads around the breakpoint. This is not really a big problem for me, the SV call and orientation usually seems right, and I get that there are surely differences in calling the discordant read pairs. But I am asking myself: How does NAIBR infer the SVs orientation, if it didn't find any discordant reads? The orientation is often right, so maybe that's just a bug in the output?

I experienced both these issues with NAIBR 0.5 and 0.5.1, using default parameters in the config file and candidate lists from our in-house SV caller (which uses only discordant read pairs and split-reads).
I hope I could make myself clear enough. I'd really like to use your current version of NAIBR on my data and translocations are an important part in my analysis. Let me know, if you need any more info or small examples from my data for debugging.

Thanks for your work and help!
Best regards,
Christoph

@pontushojer
Copy link
Owner

Hi Christoph!

Thank you for the encouraging words! It is great to hear that someone else is benefiting from this :)

With (1) it seems it was indeed a bug, or more a misstake on my part as I had not considered calling translocation from candidates in my rewrite. I added a fix which should sort this out, see 778a702. Please check it out and see if it works for you too!

As for (2) this is also something that has been bugging me for a while. Discordant reads are definitely present as they are used to generate a candidate SV which is then scored. However it seems this discordant reads is not counted for the final call for some reason. I have to investigate this further...

@Budheimer
Copy link
Author

Budheimer commented Nov 28, 2023

Thank you very much for the quick reply and the immediate fix! I ran the updated version on my samples and it works! The translocations now appear in the output as expected. Great!

The missing translocations were my main worry. Concerning (2): That's good to know. If the discordant reads are used for generating and scoring the candidate, and it's pretty much only a problem with the final output file, it's fine by me for now. If I need the number of discordant pairs, I can just as well use the output from generating my candidate lists.

Thanks again, I really appreciate your help!

@pontushojer
Copy link
Owner

Great to hear that you got the expected translocation!

I will make a separate issue with (2) so it can be worked on in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants