Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Singularity recipe and Fix a typo #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

koido
Copy link

@koido koido commented Dec 11, 2023

For my environment, it was difficult to do the local building due to unknown compile errors.
Then, I tried to build a Singularity image from your Dockerfile to use your pipeline in HPC.

Please see Singularity/README.md for how to create the recipe for Singularity.
From the Singularity image, I could successfully use your pipeline in my environment (not on the UK Biobank DNANexus RAP).
I hope the Singularity recipe will be helpful for many non-UKB users.

Also, I fixed a typo in the README file.

Many thanks for creating and sharing such a helpful toolkit!
I will enjoy the accurate phasing results.

Best,
Masaru

@rwk-unil
Copy link
Owner

Hello,
Thank you for sharing this, it might be helpful for others.
I will merge not merge this for the moment because I am not familiar with singularity, and this means if I update the pipeline, I need to update the singularity recipe otherwise it might not work anymore. So for the moment I will not add it.

Also note that because the pipeline was designed and created to phase polish the UKB releases, it has some specificities to that platform that need to be addressed to work well in other environments.

The phase_caller will search for CRAM files in a specific way : CRAM filenames are created from sample IDs and project ID e.g., for sample 1234567 from project abcdef, the cram file <cram-path>/12/1234567_abcdef_0_0.cram will be loaded.

I have a version that allows to load a sample list with the CRAM path directly written inside the sample list file under the branch https://github.com/rwk-unil/pp/tree/phase_caller_generic_no_path the phase_caller2 program. It is not yet merged into the main branch.

This allows to use a sample list that instead of the sample ID only allows to enter three parameters :

<index in bin file>,<sample name>,<path to individual CRAM file>

So for example for a VCF/BCF with samples HG001, HG002, HG003, HG004 that got extracted to a binary file, and you are interested in phase calling only HG003 with its CRAM that doesn't follow the UKB naming convention you can use a sample list as :

2,HG003,/home/user/crams/HG003.cram

(The index in the binary file follows the order of samples in VCF/BCF so HG001, HG002, HG003, HG004 would be 0,1,2,3).

This will be added to the main branch and documented in the future.

I hope you will be able to use this tool in your own environment, let me know if you have issues.

Best regards,
Rick

@koido
Copy link
Author

koido commented Dec 11, 2023

Hi Rick,

Thank you for your quick reply.
Not merging this pull request is no problem for me.

Thanks again for sharing information about phase_caller's dependency on UKB cram files.
In fact, I also noticed that your phase_caller assumes the fixed naming strategy and therefore, prepared renamed symbolic links of my cram files following the naming strategy.
Based on the log files, it seems to have worked well.

I also look forward to your next release.
Thank you again for your hard work!

Best,
Masaru

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants