Estimated genome size using mash sketch #114

MostafaYA · 2019-05-02T15:27:55Z

Hello,

I understand that mash sketch can gives rough estimation about the genome size based on the unique kmers in the sample.

I using the following command for this purpose, to roughly estimate genome size of bacteria
mash sketch -o tempFile -k 32 -m 3 -r read1.fastq
In most cases, I am getting estimations of bacterial genomes close to real size (size mentioned in literatures)
In some cases, however, there is a great discrepancy between the estinated genome and the real size.
Does that mean

the original sample may not be fully sequenced in case that the estimated value of the genome size is far too low than normal?
the original sample may be contaminated in case that the estimated value of the genome size is far too high than normal?

I would appreciate if you declare any misunderstanding from my side

The text was updated successfully, but these errors were encountered:

ondovb · 2019-05-03T02:47:26Z

Those both sound like reasonable explanations. A good check might be to run mash screen on your reads against RefSeq (see https://mash.readthedocs.io/en/latest/tutorials.html#screening-a-read-set-for-containment-of-refseq-genomes). This will tell you, roughly, how well your genome is covered (or at least its nearest RefSeq neighbor) and if there are significant contaminants.

tseemann · 2019-07-31T03:08:49Z

@MostafaYA are these Illumina reads?

MostafaYA · 2019-07-31T07:20:59Z

@tseemann Yes, they are MiSeq Illumina reads

tseemann · 2019-07-31T22:59:18Z

To estimate genome size in shovill I do the same as what you have done -k 32 -r -m 3. If you have very high coverage you could try increasing to -m 10, or if you have high error rate try reducing to -k 24. Also, only use R1 as it is higher quality, esp on MiSeq.

Secondly, just do a genome assembly of the reads with Shovill, SKESA or Spades. If the total contig size is too high, you probably have contamination, or not a pure colony.

MostafaYA · 2019-08-01T16:48:12Z

@tseemann @ondovb Thanks a lot your answers

MostafaYA closed this as completed Aug 1, 2019

rpetit3 mentioned this issue Dec 3, 2019

Improve genome size estimates bactopia/bactopia#45

Closed

hoelzer mentioned this issue Jan 5, 2020

estimate gsize hoelzer/mgnify-lr#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimated genome size using mash sketch #114

Estimated genome size using mash sketch #114

MostafaYA commented May 2, 2019

ondovb commented May 3, 2019

tseemann commented Jul 31, 2019

MostafaYA commented Jul 31, 2019

tseemann commented Jul 31, 2019

MostafaYA commented Aug 1, 2019

Estimated genome size using mash sketch #114

Estimated genome size using mash sketch #114

Comments

MostafaYA commented May 2, 2019

ondovb commented May 3, 2019

tseemann commented Jul 31, 2019

MostafaYA commented Jul 31, 2019

tseemann commented Jul 31, 2019

MostafaYA commented Aug 1, 2019