Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More explicit handling of reads that exceed the maximum sampling #6

Closed
alexomics opened this issue Dec 16, 2019 · 9 comments
Closed
Labels
enhancement New feature or request

Comments

@alexomics
Copy link
Contributor

Currently in the event of a given read exceeding the maximum threshold we unblock unless the last decision was "stop_receiving" see here. However, this is not fit if the reference only contains targets that need to be removed; as anything that doesn't classify will be unblocked.

The action to take in the event of exceeding max chunks need to be either user settable (adds more complexity) or we could provide pre-defined scenarios e.g.:

ru deplete ...
ru enrich ...

Where deplete would be the use case for unblocking anything that classifies against the reference; whereas enrich would do the opposite and stop receiving anything that classifies.

These options might not encompass mixed references.

@alexomics alexomics added the enhancement New feature or request label Dec 16, 2019
@mattloose
Copy link
Contributor

Note: At the moment setting the max_chunk_size to inf is effectively enrich - setting the chunk size to anything less than inf will give a deplete behaviour.

@danrdanny
Copy link

Can you expand on this behavior a bit and explain max_chunk_size a bit more? I have it set to inf, but would like to know if I should be toggling it in order to improve my targeting/output.

@mattloose
Copy link
Contributor

Hi,

This is tricky. What are you trying to do?

I would advise against using ing chunks at this time - I would suggest taking no more than 2 kb worth of data before rejecting a read. You will get better performance as this will reduce blocking.

@danrdanny
Copy link

Thanks Matt, just trying to optimize and figure out what parameters I should be looking at, max_chunk_size was one I was having trouble understanding. One specific challenge I'm working through is that I'm recovering a lot of long reads (>10kb) that don't map anywhere in the genome I'm using. It sounds like changing max_chunk_size might reduce this. If I set it to 2, that is equivalent to 2kb? Thanks.

@mattloose
Copy link
Contributor

Hi Danny,

Chunk number is dependent on chunk size - so if you have set it to 0.4s per chunk then 2kb is approximately 12 chunks.

If your chunk size is 1s then 2 kb is approximately 4 chunks.

Hope that makes sense!

@danrdanny
Copy link

Hi Matt, I setup a new run and can confirm that setting max_chunk_size to 12 resolved my issue where I was recovering reads >10kb that did not map anywhere. I now get reads up to 3kb that do not map anywhere, but not larger than that. I'll see if this improves coverage of my target regions then experiment with turning this down more. Any reason I shouldn't only be taking 1kb or so of data before rejecting a read (setting max_chunk_size to 6, for example)?

BTW, thanks to you and your team for the excellent work on this. I remember you discussing it at porecamp in 2017 thinking about how cool it would be if it worked.

@ps-account
Copy link

Hi @danrdanny what is the best way to check the length of the reads that didn't map anywhere?

I don't see a max_chunk_size option, but there is a max_chunks. Is that the same?

@danrdanny
Copy link

Whoops, you're correct @rdwrt, the option is max_chunks. The easiest way to find reads that don't map is to align to a genome then identify reads that don't map. Alternatively, you can blast randomly selected reads, but that's not very efficient.

@mattloose
Copy link
Contributor

Closed as inactive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants