Skip to content

Improve performance of Similarity::calculateVectors#66

Closed
althonos wants to merge 4 commits into
inab:2.0_RCfrom
althonos:2.0_RC
Closed

Improve performance of Similarity::calculateVectors#66
althonos wants to merge 4 commits into
inab:2.0_RCfrom
althonos:2.0_RC

Conversation

@althonos
Copy link
Copy Markdown

@althonos althonos commented Jun 7, 2022

Hi there!

While working on althonos/pytrimal I did some thorough profiling of the code, and I identified some critical sections that could be improved. In particular, I noticed that the code in Similarity::calculateVectors was sub-optimal, because it was repeatedly calling similarityMatrix::getDistance with the same sequence characters, and the check for invalid/incorrect symbols seems to have a high performance impact.

To fix this, I added two buffers to store column data; the first one for the sequence itself, storing uppercase column characters to reduce the number of utils::toUpper calls; the other one to store the indices of gapped/indeterminate characters. The sequence characters for a column are checked once when the column is copied; after that, the distance matrix is indexed directly, without checking character ranges.

I used valgrind to count cycles on a run of trimAl in strict mode on example.073.AA.strNOG.ENOG411BFCW.fasta, here are the results in number of cycles:

Object cycles (before PR) cycles (after PR)
Similarity::calculateVectors (self) 3,132,810,840 1,652,854574
Similarity::calculateVectors (incl) 8,281,641,990 3,240,879,580
trimal (total) 13,504,155,928 8,463,690,891

althonos added a commit to althonos/pytrimal that referenced this pull request Jun 26, 2022
althonos added a commit to althonos/pytrimal that referenced this pull request Jun 26, 2022
althonos added a commit to althonos/pytrimal that referenced this pull request Jun 26, 2022
althonos added a commit to althonos/pytrimal that referenced this pull request Jun 27, 2022
@althonos
Copy link
Copy Markdown
Author

Superseded by #69 .

@althonos althonos closed this Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant