-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDR3 anchors information and other models/ref_genome information #29
Comments
Hi Zach, About the gene anchor indicesThey are obtained via the IMGT gapped alignments (they are aligned regarding a conserved cystein/tryptophan/phenylalanin). If I remember correctly the index is given in the fasta header for each sequence (or an index that allows you to compute it). There is currently nothing in IGoR to extract it automatically but it is a rather straightforward script to code. About the BCRs genomic templatesIn general I have tried to include text files along with the models to give the reference from which the model has been taken from. Such text file should sit in the same model's folder. |
I'm having the same problem with TCR. Since the Vgene list shipped with iGOR does not have some of the genes in my data, I was hoping to use the latest TRBV fasta from IMGT, however i did not find the anchor position in header. Could you please give us a more detailed instruction on obtaining anchor? Thank you very much
|
Hi, You can get the anchors by taking the number 309 and subtracting the gaps from this number. See here for a reference of the anchor indices. For instance, take TRBV1*01. The number of gaps is 42. So 309-42 = 267. Another example: Hope that helps. |
Thanks for the reply, one last question: I noticed that the default iGOR V/J anchor files have different formats compared to the SONIA ones, the latter only has 3 fields while the iGOR one has many other information like species. I wonder can I used them interchangeably? |
You can't use them interchangeably. I'm not sure what's going on in the T cell anchor csvs, but in the B cell one there are two fields separated by a semi-colon. It needs to be in this format. It will not work otherwise. I can't speak to using IGoR for T cells since I've used it for B cells only. I assume it needs to be in this format though and can't imagine why it wouldn't be otherwise. |
Hi Quentin,
How did you obtain the indices of where the anchor index is for the J and V genes in your models? Is there implementation or a feature in IGoR so that it can do this? If not, what was the method you used? IMGT doesn't appear to contain the anchor indices information.
Additionally, what was the process in deciding which genes would be used and which would not be used? For example, let's consider the information in
models/human/bcr_heavy/ref_genome
. I am attempting to see where you got these values from using IMGT, specifically the hyperlinks in the table, which is two-thirds down the page, available at http://www.imgt.org/vquest/refseqh.html. F+ORF+all P IGHV Human has 477 sequences (http://www.imgt.org/genedb/GENElect?query=7.2+IGHV&species=Homo+sapiens). Your genomicVs.fasta file in theigor_1-3-0/models/human/bcr_heavy/ref_genome
directory has only 97. Why is it so small compared to the IGMT files? Were those the only ones available at the time?Thanks,
Zach
The text was updated successfully, but these errors were encountered: