Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion regarding ntCDR3 / set_CDR3_anchors #7

Open
jeremycfd opened this issue Apr 28, 2018 · 4 comments
Open

Confusion regarding ntCDR3 / set_CDR3_anchors #7

jeremycfd opened this issue Apr 28, 2018 · 4 comments
Labels
enhancement in progress Feature design or bug fix in progress, should be available on dev soon

Comments

@jeremycfd
Copy link

Hi @qmarcou,

I'm a bit confused about use of the ntCDR3 option in standard analysis and have had trouble inferring how exactly to use it in typical analysis by looking through the code. If for instance I have a set of CDR3 sequences that are additionally annotated with V and J information, does --ntCDR3 allow for alignment and downstream analysis (in particular, Pgen calculation) while maintaining the known V/J annotations? Any chance for a brief tutorial on this option included in the demo?

Thanks for your help!

@qmarcou
Copy link
Owner

qmarcou commented Apr 30, 2018

Hi @jeremycfd !
For now it is not quite possible to do so, and the --ntCDR3 option only accept sequences CDR3 nt sequences without knowledge of the associated V/J.
Of course this is something I plan to add support for in the very near future, however did not want to postpone v1.2.0 release. I'll probably add a small patch to this problem in the next few days, I just need to figure out how to include this in the pipeline in a clean way.

@jeremycfd
Copy link
Author

Ah, hrm... I'm still a bit confused on the current implementation of --ntCDR3. I did test out putting in just CDR3 sequences, but I had to decrease "thresh" for it to work. Should I be flagging --ntCDR3 somewhere if I am only using cdr3 sequences?

Thanks!

@qmarcou
Copy link
Owner

qmarcou commented Apr 30, 2018

You mean the alignment threshold? It makes perfect sense to lower the alignment score threshold since the number of observable genomic nucleotides is much lower on CDR3 than full read sequences. This is something I did not think about before posting the release, thanks for pointing it out!
In theory you should only have to flag --ntCDR3 at the alignment stage, as this option simply uses the genomic anchor indices as alignments offsets. The inference/evaluation should be blind to the type of sequences you use as long as alignments have been provided.

@qmarcou qmarcou added the in progress Feature design or bug fix in progress, should be available on dev soon label Jun 1, 2018
@qmarcou
Copy link
Owner

qmarcou commented Aug 4, 2018

Following up on this: I've automated the decrease in alignment threshold for V and J when the --ntCDR3 option is used (and set them to 0 since the alignment offsets are known and sequences are coverage of V and J is short in CDR3 sequence)
As for restricting the V/J usage I have started looking at a clean solution, but it turned out to be complicated to implement I'll probably start by implementing a dirtier solution (at the expense of memory usage though...)

Repository owner deleted a comment from decenwang Apr 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement in progress Feature design or bug fix in progress, should be available on dev soon
Projects
None yet
Development

No branches or pull requests

2 participants