-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More explicit handling of reads that exceed the maximum sampling #6
Comments
Note: At the moment setting the max_chunk_size to inf is effectively enrich - setting the chunk size to anything less than inf will give a deplete behaviour. |
Can you expand on this behavior a bit and explain max_chunk_size a bit more? I have it set to inf, but would like to know if I should be toggling it in order to improve my targeting/output. |
Hi, This is tricky. What are you trying to do? I would advise against using ing chunks at this time - I would suggest taking no more than 2 kb worth of data before rejecting a read. You will get better performance as this will reduce blocking. |
Thanks Matt, just trying to optimize and figure out what parameters I should be looking at, max_chunk_size was one I was having trouble understanding. One specific challenge I'm working through is that I'm recovering a lot of long reads (>10kb) that don't map anywhere in the genome I'm using. It sounds like changing max_chunk_size might reduce this. If I set it to 2, that is equivalent to 2kb? Thanks. |
Hi Danny, Chunk number is dependent on chunk size - so if you have set it to 0.4s per chunk then 2kb is approximately 12 chunks. If your chunk size is 1s then 2 kb is approximately 4 chunks. Hope that makes sense! |
Hi Matt, I setup a new run and can confirm that setting max_chunk_size to 12 resolved my issue where I was recovering reads >10kb that did not map anywhere. I now get reads up to 3kb that do not map anywhere, but not larger than that. I'll see if this improves coverage of my target regions then experiment with turning this down more. Any reason I shouldn't only be taking 1kb or so of data before rejecting a read (setting max_chunk_size to 6, for example)? BTW, thanks to you and your team for the excellent work on this. I remember you discussing it at porecamp in 2017 thinking about how cool it would be if it worked. |
Hi @danrdanny what is the best way to check the length of the reads that didn't map anywhere? I don't see a max_chunk_size option, but there is a max_chunks. Is that the same? |
Whoops, you're correct @rdwrt, the option is max_chunks. The easiest way to find reads that don't map is to align to a genome then identify reads that don't map. Alternatively, you can blast randomly selected reads, but that's not very efficient. |
Closed as inactive. |
Currently in the event of a given read exceeding the maximum threshold we unblock unless the last decision was "stop_receiving" see here. However, this is not fit if the reference only contains targets that need to be removed; as anything that doesn't classify will be unblocked.
The action to take in the event of exceeding max chunks need to be either user settable (adds more complexity) or we could provide pre-defined scenarios e.g.:
Where
deplete
would be the use case for unblocking anything that classifies against the reference; whereasenrich
would do the opposite and stop receiving anything that classifies.These options might not encompass mixed references.
The text was updated successfully, but these errors were encountered: