Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTHFR.vcf 1kgp pypgx bundle #115

Closed
nbiesot opened this issue Dec 8, 2023 · 12 comments · Fixed by #118
Closed

MTHFR.vcf 1kgp pypgx bundle #115

nbiesot opened this issue Dec 8, 2023 · 12 comments · Fixed by #118
Labels
enhancement New feature or request

Comments

@nbiesot
Copy link

nbiesot commented Dec 8, 2023

Hi,

First of all, thanks for this tool! I want to add the MTHFR gene and I was wondering where I could find the MTHFR.vcf or how to create it for the 1kgp folder in the pypgx bundle.
Adding the gene to all .csv files was already successful.

Thank you in advance.

@sbslee
Copy link
Owner

sbslee commented Dec 11, 2023

@nbiesot,

Thanks for reaching out! It's exciting to hear that you're planning to add a new gene to PyPGx. As for obtaining the reference haplotype panel VCF file for MTHFR, it needs to be created from the following resources: GRCh37 and GRCh38. If you're unsure how to do this, please let me know. I'd be more than happy to help. If you'd like, I could also review what you have done with the CSV files because things can be tricky.

Best,
Steven

@nbiesot
Copy link
Author

nbiesot commented Dec 11, 2023

Hi Steven,

I would appreciate it if you could provide instructions on how to create the MTHFR VCF file.

Currently, I have only checked whether the tool can identify the variants mentioned on PharmGKB (https://www.pharmgkb.org/gene/PA245/variantAnnotation), not the phenotype.
allele-table.csv
gene-table.csv
phenotype-table.csv
variant-table.csv

Best,
Nienke

sbslee added a commit that referenced this issue Dec 16, 2023
@sbslee
Copy link
Owner

sbslee commented Dec 16, 2023

@nbiesot,

Thanks for your patience! I've looked at your files and they all look fantastic. Thanks so much for your contribution. Here's my response:

  1. allele-table.csv: You nailed this one.
  2. gene-table.csv: This was mostly fine, but I had to make some changes with genomic coordinates. For PyPGx, I use Table Browser from UCSC Genome Browser to get the start/end positions of the transcript and exons for each gene (Group: Genes and Gene Predictions; track: NCBI RefSeq; table: RefSeq All (ncbiRefSeq)).
  3. phenotype-table.csv: This is not needed. Indeterminate is the default for genes that don't have any phenotypes.
  4. variant-table.csv: You also nailed this one, but I just sorted them by chromosome position.

I uploaded these changes to the 0.23.0-dev branch of pypgx.

As for adding reference haplotype panel in pypgx-bundle, I uploaded it for both GRCh37 and GRCh38 in the 0.23.0-dev branch.

I tested the dev version and confirmed that everything works as expected. I'd greatly appreciate it if you could test it from your end.

Best regards,
Steven

@nbiesot
Copy link
Author

nbiesot commented Dec 18, 2023

Hi Steven,

Thank you for reviewing the files and for uploading the MTHFR reference haplotype panel! I was wondering if you could still provide the steps for making the reference panel?

I have tested the dev version and it works for my samples!

Best,
Nienke

@sbslee
Copy link
Owner

sbslee commented Dec 18, 2023

@nbiesot,

No problem. I used the fuc package's CLI to create the panels for GRCh37 and GRCh38:

$ fuc vcf-slice chr1.1kg.phase3.v5a.vcf.gz 1:11842779-11869021 | fuc fuc-bgzip > MTHFR.vcf.gz
$ fuc vcf-slice CCDG_14151_B01_GRM_WGS_2020-08-05_chr1.filtered.shapeit2-duohmm-phased.vcf.gz 1:11782722-11808964 | fuc fuc-bgzip > MTHFR.vcf.gz

Note that fuc is another Python program I wrote.

I will update the Read the Docs soon to reflect the new gene addition. Please let me know if you have further questions.

@sbslee
Copy link
Owner

sbslee commented Dec 18, 2023

P.S. The fuc package is one of the dependencies for pypgx, so you don't need to install fuc separately.

@nbiesot
Copy link
Author

nbiesot commented Dec 18, 2023

Thank you for the help.

I also want to add the COMT gene. If this was successful and you are interested, I can share the files when it is finished.

@sbslee
Copy link
Owner

sbslee commented Dec 18, 2023

@nbiesot,

That sounds awesome! I'd be more than happy to help.

@nbiesot
Copy link
Author

nbiesot commented Dec 19, 2023

Hi Steven,

Here are the files. I would appreciate it if you could take a look at them. Again, I used the variants described on PharmGKB
allele-table.csv
gene-table.csv
variant-table.csv

sbslee added a commit that referenced this issue Dec 23, 2023
sbslee added a commit that referenced this issue Dec 23, 2023
@sbslee
Copy link
Owner

sbslee commented Dec 23, 2023

@nbiesot,

Thanks for your contribution! I've only made slight changes to the files and uploaded them to the 0.23.0-dev branch. I'd greatly appreciate if you can test the latest version.

@nbiesot
Copy link
Author

nbiesot commented Dec 23, 2023

Thank you for reviewing the files. I have tested the dev version and it works for my samples!

@sbslee sbslee closed this as completed Dec 24, 2023
@sbslee sbslee added the enhancement New feature or request label Dec 24, 2023
@sbslee sbslee linked a pull request Dec 24, 2023 that will close this issue
@sbslee
Copy link
Owner

sbslee commented Dec 24, 2023

@nbiesot, FYI, I just released the official 0.23.0 version to PyPI and Anaconda. Please use this version going forward. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants