-
Notifications
You must be signed in to change notification settings - Fork 2
Disregard the wobble base #1
Comments
Dear Solaiman, Thanks for your feedback! There are two possible workarounds:
Remember that prior to any of those operations you should convert your file to VDJtools format using P.S. Issues should be reported to https://github.com/mikessh/vdjtools/issues, this is just a doc repository Best, |
Dear Mikhail, thank you for your quick response! The metadata file trick did it for me, I have another quick question regarding the OverlapPair Firstly, is there a specific reason you exclude the non-overlapping pairs, Secondly, is there an option im missing, to include non-overlapping pairs? Thank you! 2016-06-29 18:41 GMT+02:00 Mikhail Shugay notifications@github.com:
|
Dear Solaiman, Its not very clear for me, what do you mean under "including non-overlapping clonotypes". If it is for R score computation, then including a set of, say 1000, points that have 0 frequency in one sample and non-zero frequency in other can bias the estimate. I mean people frequently use correlation in bioinformatics for random variables that can lead to non-robust behavior (https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#Robustness), but in this case it can be wise to judge missing clonotypes separately based on frequency and number of non-overlapping clonotypes (see ``freq12`, etc in http://vdjtools-doc.readthedocs.io/en/latest/overlap.html#id4). Note that non-overlapping clonotype density is still shown in side bars (marginal distributions) of first plot in http://vdjtools-doc.readthedocs.io/en/latest/overlap.html#graphical-output. When you track clonotypes from the same individual (e.g. in HSCT), it is of course better to use plots like the second plot in http://vdjtools-doc.readthedocs.io/en/latest/overlap.html#graphical-output. From statistical point of view, to bind together detected and non-detected clonotypes, as well as clonotype frequency one should have a good idea of whether a change in clonotype count is due to sampling (e.g. you sampled 1000 cells first time and 1010 cells second time) or clonal expansion (you observe 1000 cells first time and 2000 second time). However there is currently no good model to tell clonal expansions from sampling noise. As for your last question - you can use |
Hey, thank you for the explanation! To my second question: The JoinSamples Solai 2016-06-30 18:56 GMT+02:00 Mikhail Shugay notifications@github.com:
|
Hey,
thank you for this amazing tool, I have included it in out pipeline and it works perfectly. I just have one issue I'd like to bring up.
Sometimes, it is not the nucleotide (NT) sequence that we are interested in but the amino acid (AA) sequence. When I parse my IMGT files with VDJtools my output file lists all my sequences in order of highest expansion. However, due to the wobble base sometimes several NT sequences code the same AA sequence. When we now look at the top 20 AA-sequences we 1. dont have the actual count, as there might be several other NT sequences coding for it aswell, 2. might be missing AA sequences that would have made it into the top 20 if all there AA-sequences would have been summed up.
I have written my own script to deal with this, however it takes a long time to process 500k lines. I wonder if there is a option to disregard the NT-sequence or if it is possible to include something like that in future versions?
Best wishes,
Solaiman
The text was updated successfully, but these errors were encountered: