Add fastalignment metric #456

tobgan · 2023-10-18T18:51:16Z

Closes #304

CHANGELOG.md updated
Tests added (For bug fixes or new features)
Tutorial updated (if necessary)

Added variant of AlignmentDistanceCalculator which performs faster, but may miss some of the pairwise distances.

for more information, see https://pre-commit.ci

codecov · 2023-10-18T18:58:37Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (7dd3377) 80.22% compared to head (21f06c1) 80.49%.

Files	Patch %	Lines
src/scirpy/ir_dist/metrics.py	98.11%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #456      +/-   ##
==========================================
+ Coverage   80.22%   80.49%   +0.27%     
==========================================
  Files          49       49              
  Lines        3939     3994      +55     
==========================================
+ Hits         3160     3215      +55     
  Misses        779      779

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

grst

Hi @tobgan,

thank you so much for adding this!
The code you added looks good, but there are a few point that still need some work to make this more easily accessible to users. I'm happy to guide you, but I can also work on those tasks myself if you prefer.

documentation: The new class should be added to the API docs
tutorial: the new method should be mentioned in the docs and probably even be suggested as the standard method.
The code needs unit tests (it's mostly applying the existing tests to the new method as well and adjusting the expected values)

Before we tackle those I have a more general question: Do I remember correctly, that if only the length filtering is applied there are no losses and the results are exactly the same as with the plain AlignmentDistanceCalculator but faster? (obviously not as fast as the full method). Because in that case I would consider getting rid of the old AlignmentDistanceCalculator altogether and just use the new class with different parameters for alignment and fastalignment. What do you think?

Best,
Gregor

src/scirpy/ir_dist/metrics.py

grst · 2023-10-21T16:26:54Z

src/scirpy/ir_dist/metrics.py

+            else penalty_dict[subst_mat]
+            if subst_mat in penalty_dict.keys()
+            else 0.0


can be simplified to

Suggested change

else penalty_dict[subst_mat]

if subst_mat in penalty_dict.keys()

else 0.0

else penalty_dict.get(subst_mat, 0.0)

I would even consider raising an error if the substitution matrix is unnown and no penalty is specified.

Agreed, I think raising an error would be better

src/scirpy/ir_dist/metrics.py

grst · 2023-11-16T15:58:34Z

Hi @tobgan -- would be great if I could get your feedback on the points raised above.

Co-authored-by: Gregor Sturm <mail@gregor-sturm.de>

tobgan · 2023-11-16T21:11:03Z

Hi @grst,

I'm very sorry for the belated response, I have been dealing with some health issues lately.

Regarding the length filter - yes, the results are exactly the same, at least in practice. Theoretically, it could still result in some loss, e.g. with unusually high-scoring mismatches (as, for example, in PAM500, where the mismatch N-D scores higher than the match N-N), but this has not happened in any of my test runs and I would wager to say it is unlikely to happen with real data, and even less likely to do so to a relevant degree. So I think replacing the AlignmentDistanceCalculator altogether would make sense.

If you agree, I would then just make the FastAlignmentDistanceCalculator the new AlignmentDistanceCalculator. I will also add the unit tests, but I could use some pointers on how to best add to the documentation and tutorial.

Best,
Tobias

grst · 2023-11-22T20:51:00Z

Thanks, that sounds good. Honestly, I doubt anyone has ever changed the substitution matrix to anything other than blosum62, so that should be fine.

I would then suggest that in pp.ir_dist

specifying metric="alignment" sets the parameters of the FastAlignmentDistanceCalculator such that the result is identical to the previous AlignmentDistanceCalculator for backwards compatibility (ignoring the PAM500 caveat)
specifying metric="fastalignment" uses the same class, but sets the parameters such that the heuristic is used for additional speedup.

Regarding the API docs, the list of functions is here: https://github.com/scverse/scirpy/blob/main/docs/api.rst?plain=1#L298

For the tutorial, it's probably easiest if I take care of that myself as a final step.

for more information, see https://pre-commit.ci

tobgan · 2023-12-03T17:01:18Z

Hi @grst,
I added the changes to pp.ir_dist and the documentation. I also reran the tests on the Wu dataset, for the default parameters I get a 5x speedup from the old to the new alignment metric on my laptop (with the expected zero loss), so this seems to work as expected. Let me know if I overlooked anything.

grst · 2023-12-06T07:32:48Z

Perfect, thanks! I'll go over the documentation one more time myself and update the tutorial and then merge this!

for more information, see https://pre-commit.ci

review-notebook-app · 2023-12-27T21:32:36Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

grst · 2023-12-27T21:35:05Z

Adding the tables with the expected penalty here for future reference:

Wu2020 3k dataset

Wu2020 full dataset
"using the nighest expected penalty which kept loss <1% for the 3k dataset"

grst

Wrapped up the open things... Looks good to me now!

Some CI job is failing, but it doesn't look related to this PR.

Thanks again @tobgan!

tobgan and others added 5 commits October 3, 2023 23:57

Add FastAlignmentDistanceCalculator

503330a

Added variant of AlignmentDistanceCalculator which performs faster, but may miss some of the pairwise distances.

Place parasail import in methods

67105f4

Add fastalignment key

03871cf

[pre-commit.ci] auto fixes from pre-commit.com hooks

07920f3

for more information, see https://pre-commit.ci

Merge branch 'main' into master

c7427e5

grst requested changes Oct 21, 2023

View reviewed changes

Update src/scirpy/ir_dist/metrics.py

0f53688

Co-authored-by: Gregor Sturm <mail@gregor-sturm.de>

grst and others added 5 commits November 22, 2023 21:51

Merge branch 'main' into master

ed48585

Added unit tests for fastalignment

53749db

Added exception, updated docstrings

315a19e

Added fastalignment

147f1e8

[pre-commit.ci] auto fixes from pre-commit.com hooks

c1f85cc

for more information, see https://pre-commit.ci

grst and others added 7 commits December 27, 2023 21:47

Remove accidentically committed data objects

60fb080

Improve docstring

8d78fa4

[pre-commit.ci] auto fixes from pre-commit.com hooks

09c3d2a

for more information, see https://pre-commit.ci

Test FastAlignmentDistanceCalculator

af623c5

[pre-commit.ci] auto fixes from pre-commit.com hooks

042dbbd

for more information, see https://pre-commit.ci

Update changelog

fb9e6a5

Update tutorial

196d2c1

grst approved these changes Dec 27, 2023

View reviewed changes

Merge branch 'main' into master

21f06c1

grst merged commit e1c8028 into scverse:main Jan 4, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fastalignment metric #456

Add fastalignment metric #456

tobgan commented Oct 18, 2023 •

edited by grst

codecov bot commented Oct 18, 2023 •

edited

grst left a comment

grst Oct 21, 2023

tobgan Nov 16, 2023

grst commented Nov 16, 2023

tobgan commented Nov 16, 2023

grst commented Nov 22, 2023

tobgan commented Dec 3, 2023

grst commented Dec 6, 2023

review-notebook-app bot commented Dec 27, 2023

grst commented Dec 27, 2023

grst left a comment

Add fastalignment metric #456

Add fastalignment metric #456

Conversation

tobgan commented Oct 18, 2023 • edited by grst

codecov bot commented Oct 18, 2023 • edited

Codecov Report

grst left a comment

Choose a reason for hiding this comment

grst Oct 21, 2023

Choose a reason for hiding this comment

tobgan Nov 16, 2023

Choose a reason for hiding this comment

grst commented Nov 16, 2023

tobgan commented Nov 16, 2023

grst commented Nov 22, 2023

tobgan commented Dec 3, 2023

grst commented Dec 6, 2023

review-notebook-app bot commented Dec 27, 2023

grst commented Dec 27, 2023

grst left a comment

Choose a reason for hiding this comment

tobgan commented Oct 18, 2023 •

edited by grst

codecov bot commented Oct 18, 2023 •

edited