question about memory requirements and determining appropriate kmer size #13

eocampbe · 2020-03-25T21:31:24Z

Hi there,

I have completed an initial analysis using DiscoverY in the female+male mode, and I am wondering how I might determine whether the kmer size I used is optimal for the data I have. For the analysis I've done so far, I used the default size of 25, but I understand that this may need to be adjusted based on the specific characteristics of the genome I'm working with. I have plotted the results of my analysis (attached), and there seem to be a large number of kmers with very low similarity to the female genome, which is of quite good quality, but high depth. The organism we're working on has a neo sex chromosome system, so I suspect the Y regions are clustering in with the X regions on the bottom right of the graph (confirming this was actually my reason for using DiscoverY), however I'm less sure about why there might be so many male contigs that have very low similarity to the female, but rather high coverage. I don't know if this is a result of my kmer parameter or something else, but I'm hoping you might be able to offer some advice.

In addition, this analysis required about 720 Gb of RAM, which is about double what is estimated in the paper and is almost the maximum amount of RAM I'm allowed to ask for per node of the cluster I'm using. Can DiscoverY run in parallel so that I can spread this memory out across multiple nodes? I don't see anything in the documentation or the paper that mentions this, but it would be very helpful for subsequent analyses.

Thanks,
Erin
graph.pdf

sheinasim · 2020-07-18T02:43:42Z

Hello!

I am not a part of the group who wrote this software, but I was able to calculate the appropriate kmer size by applying the random boundary formula, (Eq. 2 in Fofanov et al 2004).

kmer size = log[ genome_size (1 - error_rate_of_reads) / error_rate_of_reads ] / log(4)

Hope this helps!

RAWWiberg mentioned this issue Jan 31, 2022

memory #14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about memory requirements and determining appropriate kmer size #13

question about memory requirements and determining appropriate kmer size #13

eocampbe commented Mar 25, 2020 •

edited

Loading

sheinasim commented Jul 18, 2020

question about memory requirements and determining appropriate kmer size #13

question about memory requirements and determining appropriate kmer size #13

Comments

eocampbe commented Mar 25, 2020 • edited Loading

sheinasim commented Jul 18, 2020

eocampbe commented Mar 25, 2020 •

edited

Loading