Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding RefAllele and Default Allele in gene-table.csv #51

Closed
NTNguyen13 opened this issue Mar 25, 2022 · 4 comments
Closed

Regarding RefAllele and Default Allele in gene-table.csv #51

NTNguyen13 opened this issue Mar 25, 2022 · 4 comments

Comments

@NTNguyen13
Copy link
Contributor

NTNguyen13 commented Mar 25, 2022

Hi, this question is mostly for better understanding of pypgx data structure.

I tried to figure out the meaning of RefAllele, but it's not quite right actually. I thought RefAllele is the allele represented on the Human Reference genome (the fasta file), but GRCh37Default and GRCh38Default are already represented that. I also saw case where GRCh37Default and GRCh38Default flip (I think it's because of changes between GRCh37 and GRCh38), but I found 5 cases where GRCh37Default and GRCh38Default are the same, but they are different from RefAllele

Gene	RefAllele	GRCh37Default	GRCh38Default
ABCB1	*1          	*2                   	*2
NAT2	*4          	*12                   	*12
SLC22A2	*1          	*3                   	*3
UGT2B7	*1          	*2                   	*2
UGT2B15	*1          	*2                   	*2

I found this logic check to assign allele where no candidate is found, but still, I'm not fully understand the role of RefAllele

if ref_allele != default_allele and ref_allele not in candidates and default_allele not in candidates:
    candidates.append(default_allele)
if not candidates:
    candidates.append(default_allele)

Could you please explain what is RefAllele please? And how to assign it in gene-table? Thank you very much.

@sbslee
Copy link
Owner

sbslee commented Mar 25, 2022

@NTNguyen13,

Good question! The RefAllele column in the gene-table.csv file gives you reference STAR allele for the given gene (some people refer it as "wild-type" allele, but reference allele is the preferred term).

For example, the CYP2D6 gene has *1 as reference allele and therefore RefAllele is *1. Now, if you look at the CYP2D6 sequence of GRCh37, you will find that it actually matches that of *2; therefore, the GRCh37Default column is *2. Finally, when you do the same for GRCh38, its CYP2D6 sequence matches that of *1 and so GRCh38Default is *1.

Let me know if you have more questions.

P.S. You will see that the NAT2 gene has *4 as reference allele instead of *1. That's for historical reasons. See the official NAT2 alleles page for more details (http://nat.mbg.duth.gr/Human%20NAT2%20alleles_2013.htm).

@NTNguyen13
Copy link
Contributor Author

NTNguyen13 commented Mar 25, 2022

Thank you for the quick response! So if I find a new gene to add to pypgx, I can assign RefAllele based on literature review, GRCh37Default and GRCh38Default based on the human genome sequence, depended on assembly versions, am I right?

@sbslee
Copy link
Owner

sbslee commented Mar 25, 2022

That's correct! Though I would strongly advise that if you have a PGx gene you'd like to add to PyPGx, please first open a new issue in the repository for discussion before making a PR 😄

@NTNguyen13
Copy link
Contributor Author

yes, I'm definitely gonna follow that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants