More explicit handling of reads that exceed the maximum sampling #6

alexomics · 2019-12-16T14:31:41Z

Currently in the event of a given read exceeding the maximum threshold we unblock unless the last decision was "stop_receiving" see here. However, this is not fit if the reference only contains targets that need to be removed; as anything that doesn't classify will be unblocked.

The action to take in the event of exceeding max chunks need to be either user settable (adds more complexity) or we could provide pre-defined scenarios e.g.:

ru deplete ...
ru enrich ...

Where deplete would be the use case for unblocking anything that classifies against the reference; whereas enrich would do the opposite and stop receiving anything that classifies.

These options might not encompass mixed references.

The text was updated successfully, but these errors were encountered:

mattloose · 2019-12-16T15:11:59Z

Note: At the moment setting the max_chunk_size to inf is effectively enrich - setting the chunk size to anything less than inf will give a deplete behaviour.

danrdanny · 2020-07-01T18:38:12Z

Can you expand on this behavior a bit and explain max_chunk_size a bit more? I have it set to inf, but would like to know if I should be toggling it in order to improve my targeting/output.

mattloose · 2020-07-01T18:46:11Z

Hi,

This is tricky. What are you trying to do?

I would advise against using ing chunks at this time - I would suggest taking no more than 2 kb worth of data before rejecting a read. You will get better performance as this will reduce blocking.

danrdanny · 2020-07-01T18:49:48Z

Thanks Matt, just trying to optimize and figure out what parameters I should be looking at, max_chunk_size was one I was having trouble understanding. One specific challenge I'm working through is that I'm recovering a lot of long reads (>10kb) that don't map anywhere in the genome I'm using. It sounds like changing max_chunk_size might reduce this. If I set it to 2, that is equivalent to 2kb? Thanks.

mattloose · 2020-07-01T19:00:58Z

Hi Danny,

Chunk number is dependent on chunk size - so if you have set it to 0.4s per chunk then 2kb is approximately 12 chunks.

If your chunk size is 1s then 2 kb is approximately 4 chunks.

Hope that makes sense!

danrdanny · 2020-07-02T04:53:54Z

Hi Matt, I setup a new run and can confirm that setting max_chunk_size to 12 resolved my issue where I was recovering reads >10kb that did not map anywhere. I now get reads up to 3kb that do not map anywhere, but not larger than that. I'll see if this improves coverage of my target regions then experiment with turning this down more. Any reason I shouldn't only be taking 1kb or so of data before rejecting a read (setting max_chunk_size to 6, for example)?

BTW, thanks to you and your team for the excellent work on this. I remember you discussing it at porecamp in 2017 thinking about how cool it would be if it worked.

ps-account · 2020-08-25T12:22:27Z

Hi @danrdanny what is the best way to check the length of the reads that didn't map anywhere?

I don't see a max_chunk_size option, but there is a max_chunks. Is that the same?

danrdanny · 2020-08-25T15:02:58Z

Whoops, you're correct @rdwrt, the option is max_chunks. The easiest way to find reads that don't map is to align to a genome then identify reads that don't map. Alternatively, you can blast randomly selected reads, but that's not very efficient.

mattloose · 2021-03-09T19:56:27Z

Closed as inactive.

alexomics added the enhancement New feature or request label Dec 16, 2019

mattloose closed this as completed Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More explicit handling of reads that exceed the maximum sampling #6

More explicit handling of reads that exceed the maximum sampling #6

alexomics commented Dec 16, 2019

mattloose commented Dec 16, 2019

danrdanny commented Jul 1, 2020

mattloose commented Jul 1, 2020

danrdanny commented Jul 1, 2020

mattloose commented Jul 1, 2020

danrdanny commented Jul 2, 2020

ps-account commented Aug 25, 2020

danrdanny commented Aug 25, 2020

mattloose commented Mar 9, 2021

More explicit handling of reads that exceed the maximum sampling #6

More explicit handling of reads that exceed the maximum sampling #6

Comments

alexomics commented Dec 16, 2019

mattloose commented Dec 16, 2019

danrdanny commented Jul 1, 2020

mattloose commented Jul 1, 2020

danrdanny commented Jul 1, 2020

mattloose commented Jul 1, 2020

danrdanny commented Jul 2, 2020

ps-account commented Aug 25, 2020

danrdanny commented Aug 25, 2020

mattloose commented Mar 9, 2021