Upgrade to DADA2 >= 1.7.3 #85

thermokarst · 2018-01-25T16:13:30Z

As noted here, when that new release lands, we should update the pinned version here, as well as revert the SSE changes made in January 2018 in the wake of SSE-gate.

thermokarst · 2018-01-25T19:23:18Z

@benjjneb, can you let us know which bioconductor release cycle you expect this to be available in, that way we can track it in the appropriate QIIME 2 release cycle. Thanks!

benjjneb · 2018-01-25T21:07:58Z

This won't get to release until the next Bioconductor release (BioC 3.7) which will be around April. Given that there will also be a delay in the BioC release propagating to bioconda, this would make sense for a June/July Q2 release.

benjjneb · 2018-07-31T15:23:37Z

Still waiting on the May Bioconductor release to propagate to bioconda. I think progress will show up on this issue: bioconda/bioconda-recipes#8947

Once it migrates, I'm planning to update the R scripts, and to address several other outstanding issues at the same time including #87, #93, #97, #99

And note to self: Add an extra step catching and removing very low depth sequences before learning the error rates.

ebolyen · 2018-08-24T19:20:40Z

@benjjneb, in the new granular pipeline scheme, would it be possible to provide a way to handle sample pooling? This came up recently in the context of breakaway and alpha diversity estimation.

breakaway in particular requires singletons to exist, if there was a way to conditionally return FeatureTable[Frequency % Properties('singletons')] we would be able to enforce that assumption (when sample pooling is true or pseudo at least).

We're going to be trying to address TypeMap and similar typing related things in the next release cycle. If I need to implement a way to dependently type the output, that should be possible (especially with a real use-case now).

benjjneb · 2018-08-24T23:49:02Z

@benjjneb, in the new granular pipeline scheme, would it be possible to provide a way to handle sample pooling? This came up recently in the context of breakaway and alpha diversity estimation.

Yes. Although given the frustrating speed penalty that bioconda-dada2 has, pseudo-pooling might make more sense.

if there was a way to conditionally return FeatureTable[Frequency % Properties('singletons')] we would be able to enforce that assumption (when sample pooling is true or pseudo at least).

We're going to be trying to address TypeMap and similar typing related things in the next release cycle. If I need to implement a way to dependently type the output, that should be possible (especially with a real use-case now).

Have to admit, I don't really follow this. Basically option to a different return type if pool=TRUE (or pool=psuedo)?

ebolyen · 2018-08-25T00:28:44Z

Yep! Basically we'd just figure out when that property can be added, and tools that need it can require it as an input. Everything else will continue not caring like normal.

epruesse · 2019-02-21T17:52:57Z

Bioconda is now at 1.10: http://bioconda.github.io/recipes/bioconductor-dada2/README.html

@benjjneb Is the speed penalty still there?

This blocks qiime2/q2-alignment#64

benjjneb · 2019-02-21T23:27:00Z

Bioconda is now at 1.10: http://bioconda.github.io/recipes/bioconductor-dada2/README.html

Interesting. It would actually be easier to just jump ahead to 1.10 as the edge case merging bugs were mostly worked out between 1.8 and 1.10.

@benjjneb Is the speed penalty still there?

It will be a little better than now, but as far as I coiuld tell the speed penalty is pretty hard to overcome completely, because it is a result of how bioconda compiles code (old version of gcc, low levels of optimization to handle generic hardware).

thermokarst · 2019-02-22T00:07:53Z

old version of gcc

I wonder if the recent bump to gcc7 on conda-forge and bioconda will help us out here...

epruesse · 2019-02-22T01:50:37Z

That's what I meant. If it's not -march dependent, then it should now all be OK.

Having -march=native compiled stuff won't work for Bioconda. We've had the discussion in a few places, but just picture a user with miniconda in $HOME and doing qsub on a cluster with a few generations of compute nodes added over time. Even on Travis you'll sometimes run into different CPUs.

The reason why I can't easily build a SINA version that matches the pinned libs is that we followed Conda-Forge in the move to GCC7. Since that brought the ABI change required to support C++11, all C++ libs except for libstdc++ are incompatible with the old ones.

benjjneb · 2019-02-22T02:16:17Z

I wonder if the recent bump to gcc7 on conda-forge and bioconda will help us out here...

Didn't realize this had happened! I will definitely re-profile the speed issue with the 1.10 bioconda builds. fingers crossed would be awesome if the performance delta narrows.

epruesse · 2019-02-22T02:57:42Z

Didn't realize this had happened!

It's still happening I suppose. We are rebuilding more or less on an as-needed-basis because of resource constraints. Conda-Forge had been rebuilding for a while, and this January did "the label switch". That means all the GCC7 package previously in conda-forge/label/gcc7 became available through conda-forge. We tagged our pre-gcc7 state and started rebuilding. So far no major disasters have happened fingers crossed.

benjjneb · 2019-02-24T17:04:46Z

@epruesse A brief update, we have started getting segfault error reports for the dada2 package that appear to be coming from conda-installed versions of the 1.10 package, i.e. the new package version that is being built with GCC7. See for example: benjjneb/dada2#684

benjjneb · 2019-03-13T20:41:01Z

I've just run some initial tests on the bioconda version of DADA2 1.10 built with GCC7, and it appears that the speed penalty versus natively installed DADA2 is now essentially gone.

I will do some more detailed testing, but this would be a major quality-of-life improvement for folks using DADA2 via QIIME2 (up to a 10x speedup) and strongly suggests skipping the 1.8 package version and going straight to 1.10. Is that in the realm of possibility for the next Q2 release?

apcamargo · 2019-03-13T20:58:23Z

@benjjneb Did you solve the segfault error or it didn't happen with your samples?

ebolyen · 2019-03-13T21:07:37Z

Is that in the realm of possibility for the next Q2 release?

Yep! There's still a full month or so on the next release. The dates are also probably something we can flex if it comes down to it, the speed improvement would certainly be worth it.

benjjneb · 2019-03-13T21:57:38Z

@benjjneb Did you solve the segfault error or it didn't happen with your samples?

Works fine in my initial testing, but I haven't tried to reproduce the reported bioconda-specific segfault yet. Will try soon.

Yep! There's still a full month or so on the next release. The dates are also probably something we can flex if it comes down to it, the speed improvement would certainly be worth it.

OK, I am going to move forward with testing and updating the Q2 scripts based on package version 1.10 then. I'll need help from your side figuring out the Q2 conda recipes and any python updates to the plugin though.

ebolyen · 2019-03-13T23:42:24Z

I'll need help from your side figuring out the Q2 conda recipes and any python updates to the plugin though.

Sounds like a plan! Feel free to structure the R scripts into as many or few as you need. I can adapt the Python side to work correctly. I think we had talked about breaking up the denoise-* methods into multiple actions before (and maybe even including taxonomic assignment?). We have pipelines now, so I can re-compose the original denoise-* methods in Python if you think that's worth attempting with this.

Re segfault: Is it possible that it's just a blanket runtime conflict? e.g. glibc on CentOS 5 vs anything compiled in the last decade?

benjjneb · 2019-03-21T18:29:26Z

Just wanted to ping this thread with the DADA2 issue that is open on segfaults with bioconda-dada2-1.10: benjjneb/dada2#684

In testing on my local machine, the bioconda install of 1.10 seems fine, but clearly there is something going on here because there are multiple people with the same report.

epruesse · 2019-03-21T18:58:28Z

It might be not just 1.10.

(segfault in 1.8) bioconda/bioconda-recipes#13847
(just references) bioconda/bioconda-recipes#13776

In 1.8, it appears to have been a libc++ issue? I can't tell. We need someone proficient in R for this.

epruesse · 2019-03-24T20:23:13Z

Ok updates - my best guess is a concurrency bug somewhere in dada2. See the issue at dada2 repo.

benjjneb mentioned this issue Aug 9, 2018

denoise-*: expose --collapse-no-mismatch parameter #92

Open

benjjneb mentioned this issue Mar 21, 2019

Update for dada2 package version 1.10 #113

Merged

thermokarst closed this as completed in #113 Apr 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to DADA2 >= 1.7.3 #85

Upgrade to DADA2 >= 1.7.3 #85

thermokarst commented Jan 25, 2018

thermokarst commented Jan 25, 2018

benjjneb commented Jan 25, 2018

benjjneb commented Jul 31, 2018 •

edited

Loading

ebolyen commented Aug 24, 2018

benjjneb commented Aug 24, 2018

ebolyen commented Aug 25, 2018

epruesse commented Feb 21, 2019

benjjneb commented Feb 21, 2019

thermokarst commented Feb 22, 2019

epruesse commented Feb 22, 2019

benjjneb commented Feb 22, 2019

epruesse commented Feb 22, 2019

benjjneb commented Feb 24, 2019

benjjneb commented Mar 13, 2019

apcamargo commented Mar 13, 2019

ebolyen commented Mar 13, 2019

benjjneb commented Mar 13, 2019

ebolyen commented Mar 13, 2019 •

edited

Loading

benjjneb commented Mar 21, 2019

epruesse commented Mar 21, 2019

epruesse commented Mar 24, 2019

Upgrade to DADA2 >= 1.7.3 #85

Upgrade to DADA2 >= 1.7.3 #85

Comments

thermokarst commented Jan 25, 2018

thermokarst commented Jan 25, 2018

benjjneb commented Jan 25, 2018

benjjneb commented Jul 31, 2018 • edited Loading

ebolyen commented Aug 24, 2018

benjjneb commented Aug 24, 2018

ebolyen commented Aug 25, 2018

epruesse commented Feb 21, 2019

benjjneb commented Feb 21, 2019

thermokarst commented Feb 22, 2019

epruesse commented Feb 22, 2019

benjjneb commented Feb 22, 2019

epruesse commented Feb 22, 2019

benjjneb commented Feb 24, 2019

benjjneb commented Mar 13, 2019

apcamargo commented Mar 13, 2019

ebolyen commented Mar 13, 2019

benjjneb commented Mar 13, 2019

ebolyen commented Mar 13, 2019 • edited Loading

benjjneb commented Mar 21, 2019

epruesse commented Mar 21, 2019

epruesse commented Mar 24, 2019

benjjneb commented Jul 31, 2018 •

edited

Loading

ebolyen commented Mar 13, 2019 •

edited

Loading