Implement SIMD code for faster statistics computation#69
Closed
althonos wants to merge 56 commits into
Closed
Conversation
…given alignment and, optionally, keep those sequences' indexes with a ratio equal or higher than a given threshold. Threshold by default is 0.0. To facilitate their inline use with trimAl, it is possible to use --show_only_index in combination with the --threshold parameter to remove sequences exceeding a given ratio set by the user
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi Nicolás, hi @scapella,
This PR is a draft implementation of support for SIMD for computing some statistics (namely, similarity and identity). As we discussed briefly at ECCB, the goal is to get all of these as optional requirements, so that trimAl can still be built on any platform.
Build system
I updated the
CMakeLists.txtto attempt to detect SSE2 support at compile-time. SSE2-specific code will only be built if the compiler supports it. The SSE2 code is kept separate so that it can be compiled with different flags if needed, and only linked at the end in the executables.Forcing the build with or without SSE2 can be done with a single CMake flag:
Dynamic dispatch
At the moment, compiling trimAl with SSE2 support will make SSE2 required at runtime, which is not ideal for distributing the binary. Eventually, the goal would be to have dynamic dispatch, and select the best SIMD implementation at runtime by detecting CPU features. I've done that previously with the
cpu_featureslibrary, which could be vendored and compiled statically.To get a bit more encapsulation, I think it would be nice if the computation of
identitiesandoverlapsof alignments were moved to be handled by theManagerclass, which would act as a proxy for every statistic. This would make it easier to implement the strategy design pattern for selecting the best SIMD implementation at runtime. If that sounds good for you I'll also work on that before adding more code.Threading
At the moment, I disabled the OpenMP thread loops in the SIMD version of the stats until I find a way to shared the buffers efficiently between threads. Nevertheless, single-threaded runs with SIMD enabled is faster than multi-threaded (8 threads) runs with SIMD disabled in my benchmarks.
Performance
I didn't write comprehensive benchmarks right now, but here are how the runtime improves with
-strictand-clusters.