Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update F5 gene to Pypgx #39

Merged
merged 18 commits into from Dec 14, 2021
Merged

Update F5 gene to Pypgx #39

merged 18 commits into from Dec 14, 2021

Conversation

NTNguyen13
Copy link
Contributor

Hi, I found that currently Pypgx does not have some genes in PharmGKB guideline, so I want to contribute my effort into this great open-source package. I start with a 'simple' gene first: F5 only have 1 variant in PGKB, and it's a Level 1A.

As I worked on this, I found that this variant is not quite simple. The gene F5 strand has been flipped when moving up from GRCh37 to GRCh38, which makes a quite confusing case. The guideline states that the risk allele is T, but it was the reference allele in GRCh37. This was changed in GRCh38, into C as the reference allele.

I'm afraid that it may lead to some changes in representation of the diplotype: for example, for homozygous of risk allele TT in GRCh37, no variant will be called => Reference/Reference, while in GRCh38, this variant will be called, and represented as c.1601G>A/c.1601G>A diplotype.

What do you think about this case? I'm making this draft pull request so that we can discuss directly on it.

P/s: Do you have any plan to expand to MT-RNR1 gene, it also level 1A and newly published on CPIC: https://cpicpgx.org/guidelines/cpic-guideline-for-aminoglycosides-and-mt-rnr1/

@sbslee
Copy link
Owner

sbslee commented Dec 14, 2021

@NTNguyen13,

Thanks for this PR! Before I do a thorough review, I want to point out few things:

  1. It was very wise of you to start with a simple gene :) By the end of this exercise, I hope we can establish some basic expectations when it comes to the addition of a new gene.
  2. "The gene F5 strand has been flipped when moving up from GRCh37 to GRCh38" -> I just checked, and please note that this is not the case. The gene is encoded in the minus strand for both GRCh37 and GRCh38. I think what you are referring to is change in reference allele at the SNP level between GRCh37 and GRCh38, which is actually quite common for PGx genes. For example, if you look at the variant-table.csv file, you will find many variants that have different values in the GRCh37Allele and GRCh38Allele columns (e.g. rs1809810 in CYP2A6 where GRCh37 has 19-41350664-A-T while GRCh38 has 19-40844759-T-A).
  3. Note that this does NOT affect diplotype representation because star allele definition in PyPGx already accounts for this discrepancy. For example, if you look at the GRCh37Default and GRCh38Default columns in the gene-table.csv file, you will see that GRCh37 is CYP2D6*2 while GRCh38 is CYP2D6*1. This is because the GRCh37 haplotype already contains two variants that define the CYP2D6*2 allele; therefore, if an individual does not have any CYP2D6 variants and the sequence reads were aligned against GRCh37, we know the individual is homozygous for CYP2D6*2/*2. You can also see this in action from the allele-table.csv file:
>>> import pypgx
>>> pypgx.list_variants('CYP2D6', alleles=['*2'], assembly='GRCh37')
[]
>>> pypgx.list_variants('CYP2D6', alleles=['*2'], assembly='GRCh38')
['22-42126611-C-G', '22-42127941-G-A']
  1. I didn't have immediate plans for adding new genes for a while, but I'd be more than happy to review any PRs that include addition of new genes. However, there are many moving parts when it comes to the addition of a new gene, so to minimize any potential errors, I will only review one gene at a time.

I will review this PR soon!

Copy link
Owner

@sbslee sbslee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I know how much tedious things can get when adding a new gene to PyPGx, so great job getting this much done already in your first attempt :) Please look at my suggested changes.

Once this PR is merged, there are some additional steps to be taken. For example, we need to add reference haplotype panel from the 1000 Genomes Project for F5 for both GRCh37 and GRCh38. However, these are things that I need to take care because I have the original VCF files downloaded.

CHANGELOG.rst Outdated Show resolved Hide resolved
docs/genes.rst Outdated Show resolved Hide resolved
docs/genes.rst Outdated Show resolved Hide resolved
docs/genes.rst Outdated Show resolved Hide resolved
pypgx/api/data/diplotype-table.csv Outdated Show resolved Hide resolved
pypgx/api/data/gene-table.csv Outdated Show resolved Hide resolved
pypgx/api/data/phenotype-table.csv Outdated Show resolved Hide resolved
pypgx/api/data/variant-table.csv Outdated Show resolved Hide resolved
pypgx/api/data/allele-table.csv Outdated Show resolved Hide resolved
NTNguyen13 and others added 11 commits December 14, 2021 13:41
Co-authored-by: Seung-been "Steven" Lee <sbstevenlee@gmail.com>
Co-authored-by: Seung-been "Steven" Lee <sbstevenlee@gmail.com>
Co-authored-by: Seung-been "Steven" Lee <sbstevenlee@gmail.com>
Co-authored-by: Seung-been "Steven" Lee <sbstevenlee@gmail.com>
Co-authored-by: Seung-been "Steven" Lee <sbstevenlee@gmail.com>
Co-authored-by: Seung-been "Steven" Lee <sbstevenlee@gmail.com>
Co-authored-by: Seung-been "Steven" Lee <sbstevenlee@gmail.com>
Co-authored-by: Seung-been "Steven" Lee <sbstevenlee@gmail.com>
Co-authored-by: Seung-been "Steven" Lee <sbstevenlee@gmail.com>
Copy link
Owner

@sbslee sbslee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good! Merging this PR.

@sbslee sbslee marked this pull request as ready for review December 14, 2021 07:10
@sbslee sbslee merged commit 2edaf92 into sbslee:0.10.0-dev Dec 14, 2021
@sbslee
Copy link
Owner

sbslee commented Dec 15, 2021

@NTNguyen13, just letting you know that I added reference haplotype panel for F5 (983abee)! Now you should be able to genotype F5 with PyPGx :)

@NTNguyen13 NTNguyen13 deleted the F5-gene branch December 15, 2021 06:58
@sbslee sbslee mentioned this pull request May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants