Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BQ of generated reads do not match model specification #11

Open
cntrier opened this issue Mar 1, 2019 · 4 comments
Open

BQ of generated reads do not match model specification #11

cntrier opened this issue Mar 1, 2019 · 4 comments
Labels
Milestone

Comments

@cntrier
Copy link

cntrier commented Mar 1, 2019

Hi,

I've created a read model with the following script:

mitty create-read-model synth-illumina
100.pkl
--read-length 100
--mean-template-length 250
--std-template-length 20
--bq0 30
--k 200
--sigma 5

And when I check the read model in Mitty with the following:
mitty describe-read-model 100.pkl 100.png

It looks as expected:
100
But when I generate reads using the model with the following code:
k=HG00632
i=100

mitty -v4 generate-reads GRCh38.p12.fa
./final_vcfs/${k}all.vcf.gz
${k} all_merged_sorted.bed
${i}.pkl
40
7
${k}
${i}reads-test.1.fq
${k}
${i}-lq.txt
--fastq2 ${k}_${i}reads-test.2.fq
2> vcf-${i}
${k}.log

The generated reads have a flat BQ of 9 when I check them with FastQC:
image

And when I run the god-aligner to create a bam file, I can see in IGV that the reads are a mess. I've tried running different individuals, different read lengths but get the same pattern.

Have I misunderstood something with the read model generation?

Thank you very much for any help you can provide on the matter.

@ghost ghost self-assigned this Mar 1, 2019
@ghost ghost added the pending verification Dev needs to replicate and isolate label Mar 7, 2019
@ghost ghost added this to the 2019.05.31 milestone Mar 7, 2019
@ghost
Copy link

ghost commented Mar 7, 2019

@cntrier thank you for this comprehensive report! Will make it easy to replicate and hopefully isolate and fix.

@ghost
Copy link

ghost commented Mar 7, 2019

@cntrier if you use the parameters given in the examples in the Readme, can you replicate the results shown in the Readme? This information is useful to me as a means of understanding if the bug has always been there, or is a recently introduced bug. Thanks!

@cntrier
Copy link
Author

cntrier commented Mar 8, 2019

Thank you for getting back to me.

I have now downloaded the demo data and ran the same parameters as in the README and encountered the same problem.

I filtered variants and generated a synthetic read model with the code from the README and I have the same output as the README with mitty describe-read-model function:
image
When I generated reads using the synthetic read model with the parameters from the README, again the reads had a flat BQ of 9.

image

Here is the code:
FASTA=human_g1k_v37.fa.gz
SAMPLEVCF=1kg.20.22.vcf.gz
SAMPLENAME=HG00119
REGION_BED=region.bed
FILTVCF=HG00119-filt.vcf.gz
READMODEL=1kg-pcr-free.pkl
COVERAGE=30
READ_GEN_SEED=7
FASTQ_PREFIX=HG00119-reads
READ_CORRUPT_SEED=8

docker run --rm -v /Users/cassandra/Desktop/Ous_mitty:/VCF:rw ous_nbs/mitty /opt/conda/envs/mymitty/bin/mitty -v4 filter-variants \
  /VCF/${SAMPLEVCF} \
  ${SAMPLENAME} \
  /VCF/${REGION_BED} \
  -  \
  2> vcf-filter.log | bgzip -c > ${FILTVCF}
 
tabix -p vcf ${FILTVCF}

MODELNAME=synthetic-model.pkl

docker run --rm -v /Users/cassandra/Desktop/Ous_mitty:/VCF:rw ous_nbs/mitty /opt/conda/envs/mymitty/bin/mitty create-read-model synth-illumina \
  /VCF/${MODELNAME} \
  --read-length 121 \
  --mean-template-length 400 \
  --std-template-length 20 \
  --bq0 30 \
  --k 200 \
  --sigma 5 \
  --comment 'Model created for the demo'

docker run --rm -v /Users/cassandra/Desktop/Ous_mitty:/VCF:rw ous_nbs/mitty /opt/conda/envs/mymitty/bin/mitty describe-read-model /VCF/${MODELNAME} /VCF/${MODELNAME}.png
  
docker run --rm -v /Users/cassandra/Desktop/Ous_mitty:/VCF:rw ous_nbs/mitty /opt/conda/envs/mymitty/bin/mitty -v4 generate-reads \
  /VCF/${FASTA} \
  /VCF/${FILTVCF} \
  ${SAMPLENAME} \
  /VCF/${REGION_BED} \
  /VCF/${MODELNAME} \
  ${COVERAGE} \
  ${READ_GEN_SEED} \
  /VCF/${FASTQ_PREFIX}1.fq \
  /VCF/${FASTQ_PREFIX}-lq.txt \
  --fastq2 /VCF/${FASTQ_PREFIX}2.fq \
  --threads 2

Thank you for your time!

@ghost
Copy link

ghost commented Mar 8, 2019

@cntrier thank you so much for this report - this is super helpful! I will fix this bug.

@ghost ghost added bug and removed pending verification Dev needs to replicate and isolate labels Mar 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant