Implement SIMD code for faster statistics computation by althonos · Pull Request #69 · inab/trimal

althonos · 2022-09-27T19:32:47Z

Hi Nicolás, hi @scapella,

This PR is a draft implementation of support for SIMD for computing some statistics (namely, similarity and identity). As we discussed briefly at ECCB, the goal is to get all of these as optional requirements, so that trimAl can still be built on any platform.

Build system

I updated the CMakeLists.txt to attempt to detect SSE2 support at compile-time. SSE2-specific code will only be built if the compiler supports it. The SSE2 code is kept separate so that it can be compiled with different flags if needed, and only linked at the end in the executables.

Forcing the build with or without SSE2 can be done with a single CMake flag:

$ cmake -DHAVE_SSE2=1
$ cmake -DHAVE_SSE2=0

Dynamic dispatch

At the moment, compiling trimAl with SSE2 support will make SSE2 required at runtime, which is not ideal for distributing the binary. Eventually, the goal would be to have dynamic dispatch, and select the best SIMD implementation at runtime by detecting CPU features. I've done that previously with the cpu_features library, which could be vendored and compiled statically.

To get a bit more encapsulation, I think it would be nice if the computation of identities and overlaps of alignments were moved to be handled by the Manager class, which would act as a proxy for every statistic. This would make it easier to implement the strategy design pattern for selecting the best SIMD implementation at runtime. If that sounds good for you I'll also work on that before adding more code.

Threading

At the moment, I disabled the OpenMP thread loops in the SIMD version of the stats until I find a way to shared the buffers efficiently between threads. Nevertheless, single-threaded runs with SIMD enabled is faster than multi-threaded (8 threads) runs with SIMD disabled in my benchmarks.

Performance

I didn't write comprehensive benchmarks right now, but here are how the runtime improves with -strict and -clusters.

$ time ./trimal_generic -in ../dataset/example.014.AA.EggNOG.COG0591.fasta -strict
________________________________________________________
Executed in  118.20 secs    fish           external
   usr time  270.31 secs    0.00 millis  270.31 secs
   sys time    0.48 secs    2.44 millis    0.47 secs

$ time ./trimal_sse2 -in ../dataset/example.014.AA.EggNOG.COG0591.fasta -strict
________________________________________________________
Executed in   29.68 secs    fish           external
   usr time   29.22 secs  928.00 micros   29.22 secs
   sys time    0.21 secs  730.00 micros    0.21 secs

$ time ./trimal_generic -in ../dataset/example.014.AA.EggNOG.COG0591.fasta -clusters 5
________________________________________________________
Executed in   64.70 secs    fish           external
   usr time  254.46 secs    0.00 millis  254.46 secs
   sys time    0.67 secs    1.54 millis    0.67 secs

$ time ./trimal_sse2 -in ../dataset/example.014.AA.EggNOG.COG0591.fasta -clusters 5
________________________________________________________
Executed in   12.61 secs    fish           external
   usr time   31.89 secs    0.00 micros   31.89 secs
   sys time    0.13 secs  670.00 micros    0.12 secs

…given alignment and, optionally, keep those sequences' indexes with a ratio equal or higher than a given threshold. Threshold by default is 0.0. To facilitate their inline use with trimAl, it is possible to use --show_only_index in combination with the --threshold parameter to remove sequences exceeding a given ratio set by the user

…ner`

…ith SSE2

…tion

…class

scapella and others added 30 commits September 17, 2020 13:18

Compute inner blocks

b281c33

Typo fix

383f95f

Add script to perform statistics analysis

bea46f5

Add script to measure parameters relevance

9dee10f

Count sequences and columns

1f7c6d4

Count number of ungapped columns

997f245

Read new computed parameters

255fcab

Add average identity and overlap and number of identical columns

062b8a6

Add openmp flag to makefile and parallelize identity calculation

546268d

Remove debugging print

cd487ea

Fix parallel loop

b73fdcb

Refactor parallel for loop

5e47e6c

Parallelize sequence overlap calculation

ec14b32

Merge branch 'parallel' into ML_algorithm

2da7db6

Use relative paths

16d9284

Remove overlap feature

3e57202

Remove identical columns feature

6b55d55

Save table as html and csv

2c42a6d

Add variables scope and minimum size for parallel loop

9f2d6c8

Change constant to lowercase

fbde690

Add parallel directives to loops

9b77a43

Merge remote-tracking branch 'origin/parallel' into ML_algorithm

013bcfc

Add average gaps, problem number and tools

d143cef

Parse every problem and taxon

e66c14c

Fix to parse all problems

7976867

Change block calculation to iterative

38dac16

Fix typo and limit concurrent jobs

4fc3b73

Add original alignment and average seq identity

a8dd03a

Limit threads and parallelize sequence identity

04efb96

Nicolás Díaz Roussel and others added 24 commits February 11, 2022 10:59

Add residue type and RF distance

7f0c9a7

Limit number of parallel threads

5f5fe8d

Remove repeated calculation

ea7a82f

Add max and min columns and process error problems

7bceb2c

Parallelize statistics calculations

074e22b

Add average gaps calculation

29602c4

Put omp directive after comment

4f2f5ac

Merge branch 'ML_algorithm' into 2.0_RC

986378f

Add error and fix value when comparing sequences without residues

906cb8b

Add coordinates of left and right boundaries of main block

f48c0ef

Add main block coordinates and write results by batches

fd5e6c9

Fix indentation

b731101

Add script results and training models

5c6d2ee

Add an SSE implementation of Cleaner for overlap trimming

8b96619

Setup conditional build of SSE2Cleaner.cpp based on SSE availability

2bfd383

Mark some Cleaner methods as virtual to override them in `SSEClea…

4531c3d

…ner`

Temporarily hardcode Alignment to use a SSE2Cleaner if compiled w…

6e8d5df

…ith SSE2

Allocate memory locally in SSE2Cleaner::calculateSeqIdentity method

8e482bd

Used aligned_alloc for allocating SIMD buffers in SSE2Cleaner

3d1e7c9

Enable OpenMP threads in SSE2Cleaner::calculateSeqIdentity

47adef3

Add GPL disclaimer to SSE2Cleaner.h and SSE2Cleaner.cpp

705f71c

Add SSE2 implementation of Similarity and setup conditional compila…

d46d906

…tion

Reorganize some local variables in inner loop of SSE2Similarity

858bd8c

Move calculateVectors improvements from #66 into base Similarity …

ea923bc

…class

althonos mentioned this pull request Sep 27, 2022

Improve performance of Similarity::calculateVectors #66

Closed

Build and link SSE2OBJLib only when SSE2 is available

2874b17

martin-g mentioned this pull request Nov 8, 2022

Use GitHub Actions as CI for Linux x86_64 and aarch64 #68

Merged

nicodr97 force-pushed the 2.0_RC branch from 5c6d2ee to 0698fc1 Compare March 15, 2023 12:16

althonos changed the base branch from 2.0_RC to SIMD_PR_merge April 20, 2023 17:43

althonos closed this by deleting the head repository Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement SIMD code for faster statistics computation#69

Implement SIMD code for faster statistics computation#69
althonos wants to merge 56 commits into
inab:SIMD_PR_mergefrom
althonos:impl-simd

althonos commented Sep 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

althonos commented Sep 27, 2022

Build system

Dynamic dispatch

Threading

Performance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants