Regarding RefAllele and Default Allele in gene-table.csv #51

NTNguyen13 · 2022-03-25T08:03:05Z

Hi, this question is mostly for better understanding of pypgx data structure.

I tried to figure out the meaning of RefAllele, but it's not quite right actually. I thought RefAllele is the allele represented on the Human Reference genome (the fasta file), but GRCh37Default and GRCh38Default are already represented that. I also saw case where GRCh37Default and GRCh38Default flip (I think it's because of changes between GRCh37 and GRCh38), but I found 5 cases where GRCh37Default and GRCh38Default are the same, but they are different from RefAllele

Gene	RefAllele	GRCh37Default	GRCh38Default
ABCB1	*1          	*2                   	*2
NAT2	*4          	*12                   	*12
SLC22A2	*1          	*3                   	*3
UGT2B7	*1          	*2                   	*2
UGT2B15	*1          	*2                   	*2

I found this logic check to assign allele where no candidate is found, but still, I'm not fully understand the role of RefAllele

if ref_allele != default_allele and ref_allele not in candidates and default_allele not in candidates:
    candidates.append(default_allele)
if not candidates:
    candidates.append(default_allele)

Could you please explain what is RefAllele please? And how to assign it in gene-table? Thank you very much.

The text was updated successfully, but these errors were encountered:

sbslee · 2022-03-25T08:18:34Z

@NTNguyen13,

Good question! The RefAllele column in the gene-table.csv file gives you reference STAR allele for the given gene (some people refer it as "wild-type" allele, but reference allele is the preferred term).

For example, the CYP2D6 gene has *1 as reference allele and therefore RefAllele is *1. Now, if you look at the CYP2D6 sequence of GRCh37, you will find that it actually matches that of *2; therefore, the GRCh37Default column is *2. Finally, when you do the same for GRCh38, its CYP2D6 sequence matches that of *1 and so GRCh38Default is *1.

Let me know if you have more questions.

P.S. You will see that the NAT2 gene has *4 as reference allele instead of *1. That's for historical reasons. See the official NAT2 alleles page for more details (http://nat.mbg.duth.gr/Human%20NAT2%20alleles_2013.htm).

NTNguyen13 · 2022-03-25T08:26:27Z

Thank you for the quick response! So if I find a new gene to add to pypgx, I can assign RefAllele based on literature review, GRCh37Default and GRCh38Default based on the human genome sequence, depended on assembly versions, am I right?

sbslee · 2022-03-25T08:48:04Z

That's correct! Though I would strongly advise that if you have a PGx gene you'd like to add to PyPGx, please first open a new issue in the repository for discussion before making a PR 😄

NTNguyen13 · 2022-03-25T08:49:02Z

yes, I'm definitely gonna follow that!

NTNguyen13 closed this as completed Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding RefAllele and Default Allele in gene-table.csv #51

Regarding RefAllele and Default Allele in gene-table.csv #51

NTNguyen13 commented Mar 25, 2022 •

edited

sbslee commented Mar 25, 2022 •

edited

NTNguyen13 commented Mar 25, 2022 •

edited

sbslee commented Mar 25, 2022

NTNguyen13 commented Mar 25, 2022

Regarding RefAllele and Default Allele in gene-table.csv #51

Regarding RefAllele and Default Allele in gene-table.csv #51

Comments

NTNguyen13 commented Mar 25, 2022 • edited

sbslee commented Mar 25, 2022 • edited

NTNguyen13 commented Mar 25, 2022 • edited

sbslee commented Mar 25, 2022

NTNguyen13 commented Mar 25, 2022

NTNguyen13 commented Mar 25, 2022 •

edited

sbslee commented Mar 25, 2022 •

edited

NTNguyen13 commented Mar 25, 2022 •

edited