New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running Pypgx with Phased variants and Predicted CNV #32
Comments
@NTNguyen13, thank you for asking these important questions! What you are asking is precisely why I started PyPGx! I wanted to give people a unified platform for PGx research where no matter what type of input data a user might have (e.g. phased vs. unphased VCF), he or she can use PyPGx to process their data and get PGx results. That being said, let's delve into the specifics. It's true that with the current version of PyPGx, it's not easy to work with phased VCF directly. Updating PyPGx to accept phased VCF is definitely on top of my to-do list. I will be releasing the official version of Regarding this update, one important thing to note is that the Creating an archive is easy! Create a directory that contains your VCF file (it must be named
Let's say you named this directory
The resulting file, Now switching gear, let's talk about CNV data. You mentioned that you have your own CNV data, which is awesome! The current version of PyPGx implements a support vector machine-based classifier to detect PGx CNVs using read depth. If you somehow can generate your own CNV data, 1) please consider exporting the algorithm to PyPGx so others can enjoy your algorithm (your work will be properly credited of course), 2) you can have CNV results in an archive called The most important thing here is that your CNV results must conform to the CNV names that I have defined because only these names will be recognized by the I hope this helps. Please let me know if you have any questions or comments. |
Thank you for the suggestion. Hi, I think I found one solution for the phased VCF. I made a new function I have updated the code on my fork, if you open another release branch, I will make my PR ASAP. |
I found that the current method for loading and processing VCF files is quite memory-demanding and time-consuming, especially for cohort WGS. I'm thinking about using cyvcf for the region slicing and then load the data into pandas Dataframe, do you think it's appropriate for Pypgx? About CNV, currently I cannot open the CNV definition files in |
@NTNguyen13, thanks for your enthusiasm on PyPGx :)
Thanks for your patience. |
That would be great, I can't wait to check out the latest features. |
* Update :meth:`api.plot.plot_vcf_allele_fraction` method to accept both VcfFrame[Imported] and VcfFrame[Consolidated] * :issue:`32`: Update :command:`run-ngs-pipeline` method to accept phased VCF as input. In this case, the method will skip statistical haplotype phasing.
Just letting you know that I've made updates so that the Here's some example using the GeT-RM tutorial:
Unzip the
Notice how above does not create VcfFrame[Imported] and VcfFrame[Phased]. If you are planning to do your own testing, please make sure to also update your fuc package to use the |
Hi, I have read the manual and the code, I found that Pypgx always needs to run Beagle to phase the variants before genotyping. and the CNV is detected by read depth. However, in some cases, our variants will be phased by another workflow, so does CNV! How can I input phased VCF and predicted CNV (say, in some common format) into
predict_allele
andcall_genotypes
?In the past, when I used Stargazer, I saw in the code that it will detect the GT separator '|' or '/' before doing phasing, but Pypgx will always unphase all variants first, then phase, and finally consolidate both files. I tried to tinker with the new Pypgx code, but it turned out to be harder than I thought, and I'm not sure that I understand everything.
Thank you very much.
The text was updated successfully, but these errors were encountered: