Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose minOverlap parameter #117

Closed
nbokulich opened this issue May 23, 2019 · 10 comments · Fixed by #130
Closed

expose minOverlap parameter #117

nbokulich opened this issue May 23, 2019 · 10 comments · Fixed by #130
Labels
diff:1|beginner Only limited knowledge of the languages and platform is required. good first issue Good for newcomers lang:r The R language scope:0|this-project No other repositories are impacted. src:forum From the QIIME 2 Forum. time:1|short Should not take very much time to complete (assuming familiarity). type:improvement Making something better.

Comments

@nbokulich
Copy link
Member

Improvement Description
exposing minOverlap would allow adjustment of this parameter, which has been a bit of a bugbear for some users.

Current Behavior
minOverlap is hardcoded at 20

Proposed Behavior
Expose minOverlap but leave the default=20

References
forum xref

@nbokulich nbokulich added diff:1|beginner Only limited knowledge of the languages and platform is required. good first issue Good for newcomers lang:r The R language scope:0|this-project No other repositories are impacted. src:forum From the QIIME 2 Forum. time:1|short Should not take very much time to complete (assuming familiarity). type:improvement Making something better. labels May 23, 2019
@benjjneb
Copy link
Collaborator

Current Behavior
minOverlap is hardcoded at 20

An important note, as part of the update to 1.10 or the R package, the minOverlap default value change from 20 to 12, so that is now the "hardcoded" value.

Maybe it is still worth exposing this, but the maximum benefit here is very low, as one should never go below minOverlap=4 as this opens up to a lot of FP mergers.

@nbokulich
Copy link
Member Author

An important note, as part of the update to 1.10 or the R package, the minOverlap default value change from 20 to 12, so that is now the "hardcoded" value.

thanks @benjjneb ! I was not aware of that change.

Maybe it is still worth exposing this, but the maximum benefit here is very low

I agree, probably not a priority but we have had a number of users ask about this so it may be worthwhile to give control.

one should never go below minOverlap=4 as this opens up to a lot of FP mergers.

we can set 4 as the minimum overlap; an error will be raised if users try to go lower. (an explanation can be given in the description)

@benjjneb
Copy link
Collaborator

If it's sufficiently requested then OK.

Would it be possible to create a milestone linked to the next Q2 release?
There are .couple other things I want to add to the plugin for the next release now that 1.10 is in (especially pooling/singletons) and would help a lot to have them organized by a milestone with reference to a (tentative) date for the next Q2 release.

@nbokulich
Copy link
Member Author

I have added to the 2019.7 release project page — we have been using projects to organize release goals (and release dates and details are available on that page). We have not been using the milestones feature, but you are welcome to use that feature if it helps you organize issues for q2-dada2. Thanks!

@colinbrislawn
Copy link

colinbrislawn commented Jun 11, 2019

I'm gamely interested in working on this issue.

EDIT: The maxMismatch should be included in this PR as well. These settings are related and powerful!

@Oddant1
Copy link
Member

Oddant1 commented Jul 31, 2019

Exposing the min_overlap parameter and defaulting it to 12 (which does appear to be the default value in dada2 after a cursory look) causes the tests to fail, and I was unable to find a value of min_overlap that did pass the tests. The table we get as a result of the command with min_overlap always has more nonzero elements than expected.

@emford
Copy link

emford commented Oct 21, 2019

Ill check into this and see whats going on with the tests.

@fconstancias
Copy link

Adding a few more options to the q2-dada2 plugin - R script which is running when calling the q2-dada2 plugin - can be easy and lead to a fine-tuned /state-of-the art dada2 pipeline whitin qiime2.

In addition to the minOverlap, I think that adding truncQ and minlen parameters can help on specific datasets.
As @benjjneb mentionned, adding a parameter to specify the pooling strategy : "pseudo", TRUE instead of default FALSE will allow detection of singletons.
Having a closer look at the R script running behind q2-dada2 plugin, I also find it more logical to add the option randomize = TRUE as default in the learnErrors function. My understanding is that doing so, samples are randomly added until enough bases/reads are loaded in order to learn the error rates of the dataset instead of adding samples according sample names alphabetic order which can correspond to specific samples of the entire dataset.
Is this correct @benjjneb ?

Last one, adding the possibility to export plots from the plotError function could also help people to confirm that everything went well and that error model fits the data. Also, exporting read/ASV length distribution after read merging (as shown here) can also provide a positive feeling that everything went well.

I can help to add those options if you agree they are helpfull.

@benjjneb
Copy link
Collaborator

'truncQ' is almost always a superfluous parameter in my experience.

randomize=TRUE has the downside of giving non-identical results when the pipeline is re-run, so for total reproducibility randomize=FALSE is the current default in the R package.

I hope to have a pull request to add pseudo-pooling up today, sadly probably too late for the imminent release though.

The read length stats and plotErrors output definitely can be useful diagnostics in some datasets.

@emford
Copy link

emford commented Oct 29, 2019

So I started on this but my MacAir wont run the tests due to not enough RAM ...:coffin:

@Oddant1 Oddant1 mentioned this issue Jun 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
diff:1|beginner Only limited knowledge of the languages and platform is required. good first issue Good for newcomers lang:r The R language scope:0|this-project No other repositories are impacted. src:forum From the QIIME 2 Forum. time:1|short Should not take very much time to complete (assuming familiarity). type:improvement Making something better.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants