Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run post process variants #319

Closed
dhwani2410 opened this issue Jun 17, 2020 · 5 comments
Closed

Run post process variants #319

dhwani2410 opened this issue Jun 17, 2020 · 5 comments

Comments

@dhwani2410
Copy link

dhwani2410 commented Jun 17, 2020

i have used this command to run deep variant and generate VCF file

sudo docker run -v pwd:pwd -w pwd google/deepvariant:"${BIN_VERSION}" /opt/deepvariant/bin/run_deepvariant --model_type=WGS --ref=${base_path}/${ref_file_name}.fasta --reads=${base_path}/base_recalib/$i_vqsr.bam --output_vcf=deep_variant_results/$i.vcf.gz --output_gvcf=deep_variant_results/$i.g.vcf.gz

I want to run post-process variants but cannot get it from the above command

  1. Is there some way to add parameters to above command?

  2. I found two links related to it:-
    a) https://github.com/google/deepvariant/blob/r0.10/docs/deepvariant-gvcf-support.md
    GVCF_TFRECORDS="${OUTPUT_DIR}/HG002.gvcf.tfrecord@${N_SHARDS}.gz"

( time seq 0 $((N_SHARDS-1)) |
parallel --halt 2 --joblog "${LOG_DIR}/log" --res "${LOG_DIR}"
python "${BIN_DIR}"/make_examples.zip
--mode calling
--ref "${REF}"
--reads "${BAM}"
--examples "${EXAMPLES}"
--gvcf "${GVCF_TFRECORDS}"
--task {}
) >"${LOG_DIR}/make_examples.log" 2>&1`

There is no make_examples.zip in the bin directory and what should be supplied to this parameter --examples. Can you please give more details about variables?

b) #103
sudo docker run -v ${HOME}:${HOME} gcr.io/deepvariant-docker/deepvariant:0.7.0 /opt/deepvariant/bin/postprocess_variants
--ref ${OUTDIR}/data/hg19.fa
--infile ${OUTDIR}/output/cvo.tfrecord.gz
--outfile ${OUTDIR}/output/output.vcf.gz
--nonvariant_site_tfrecord_path ${OUTDIR}/output/gvcf.tfrecord@8.gz
--gvcf_outfile ${OUTDIR}/output/output.gvcf.gz

This is another way and parameters are also different. how to define the infile here?

@MariaNattestad
Copy link
Collaborator

There are several questions here, and I can answer them all individually, but first can you help me understand overall what you are trying to do?

For context, when you run run_deepvariant, it already includes make_examples, call_variants, and postprocess_variants. So are you trying to re-run postprocess_variants?

@dhwani2410
Copy link
Author

@dhwani2410 my main aim to get a list of all variant as well as non-variant sites in VCF format and not g.vcf format. I thought running post-process variants may help me this.

@MariaNattestad
Copy link
Collaborator

Okay, I see. The run_deepvariant script you ran already includes postprocess_variants as the last step, which is the stage that produced the VCF and optionally gVCF.

As I answered in your other issue (#318), there unfortunately aren't any parameters in postprocess_variants that will generate a VCF of every base without variants.

I'm including the below for reference in case you or others want to run individual steps or pass specific parameters into the make_examples, call_variants, or postprocess_variants stages.

How to get usage information for run_deepvariant and other runner scripts

If you need to add parameters for postprocess_variants for another reason, you can add a --postprocess_variants_extra_args parameter to run_deepvariant.
See usage for run_deepvariant in the code for run_deepvariant or by running it with --helpshort:

sudo docker run  google/deepvariant:"${BIN_VERSION}" /opt/deepvariant/bin/run_deepvariant --helpshort
# output includes:
 --make_examples_extra_args: A comma-separated list of flag_name=flag_value.
    "flag_name" has to be valid flags for make_examples.py. If the flag_value is
    boolean, it has to be flag_name=true or flag_name=false.
  --postprocess_variants_extra_args: A comma-separated list of
    flag_name=flag_value. "flag_name" has to be valid flags for
    postprocess_variants.py. If the flag_value is boolean, it has to be
    flag_name=true or flag_name=false.

And for the specific flags for postprocess_variants.py

sudo docker run  google/deepvariant:"${BIN_VERSION}" /opt/deepvariant/bin/postprocess_variants --helpshort

Again, I want to make sure to reiterate that I'm including this for reference, but for your specific request, we already know that no parameters will give a VCF with all the non-variant bases in the genome.

@cmclean
Copy link
Collaborator

cmclean commented Jun 24, 2020

@dhwani2410 To do this manually, you can take the gVCF output by DeepVariant and use the following command from bcftools

bcftools convert --gvcf2vcf

Documentation at http://samtools.github.io/bcftools/bcftools.html#convert

@ESDeutekom
Copy link

@dhwani2410 To do this manually, you can take the gVCF output by DeepVariant and use the following command from bcftools

bcftools convert --gvcf2vcf

Documentation at http://samtools.github.io/bcftools/bcftools.html#convert

This is awesome, but are you sure this does what you think it does though? Because the gvcf from deepvariant is different from other gvcfs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants