Improve addDivergence #635

TuomasBorman · 2024-08-29T08:41:22Z

This PR improves and streamlines the addDivergence() function. Basically, now dissimilarity matrix is calculated first for all sample pairs. From that matrix, the values are assigned to correct sample-pairs. This makes the calculation a lot faster since the dissimilarities are calculated at once for all. The computational cost was an issue especially in miaTime's divergence functions. Moreover, the function now uses getDissimilarity() function which makes the maintenance easier.

This new approach also allows users to specify different reference samples. Previously it was only possible to specify a single reference sample for all samples.

To make the usage simpler, user is advised to store reference sample information to colData (if reference is not "mean" or "median"). However, the function still works with single sample name, vector of samples or vector of numeric values denoting a single sample.

The aim of this all is that now we can utilize this directly from divergence functions from miaTime. I have initialized there a branch, however, I think it is already outdated but might give some idea.

In miaTime functions, we only need to specify the reference samples from previous time points. After this is done, we can call mia::getDivergence().

@Daenarys8 Would you be able to first fix the documentation in this mia::addDivergence() (old description style), check that everything works and then incorporate this function to miaTime divergence functions. You could update miaTime functions before this branch is merged to devel (install mia from this branch). This is to ensure that everything works.

antagomir · 2024-08-29T09:06:08Z

Note: in many divergence functions it is not necessary to compute the entire dissimilarity matrix if only specific pairwise dissimilarities are needed (e.g. consecutive time points, or time points vs. baseline etc.). In those cases calculating the entire dissimilarity matrix can be too much to calculate. Good to check this one?

TuomasBorman · 2024-08-29T09:14:59Z

Yes, that is good to check. However, it still might be easier and faster to calculate the whole dissimilarity matrix instead of calculating one by one especially since vegdist is calculated with C.

One option is to calculate dissimilarities in batches, for each unique reference sample. However, in case of time divergence, this causes a bottle neck (there are lots of reference samples)

antagomir · 2024-08-29T09:20:45Z

In general it is faster to not calculate the entire dissimilarity matrix but it may depend a bit on feature and sample set sizes. This is relatively straightfwd to quantify experimentally. This problem is particularly remarkable with larger sample sizes as the number of comparisons grows N^2

TuomasBorman · 2024-08-29T09:29:31Z

Yes, well that is true. Can you check if that is causing problems @Daenarys8 ? In miaTime we have a dataset with over 1,000 samples.

If there is a single reference sample for all samples, we could calculate dissimilarity only between samples and this reference sample. If there are multiple reference samples, we could calculate the whole dissimilarity matrix.

Anyway, we do not have to change the function structure. The only difference would be in the lines in .calc_divergence() function.

antagomir · 2024-08-29T09:34:09Z

If there are multiple reference samples, I think it would still be faster to apply/sweep/loop over those references, and calculate reference-vs-othersamples for each reference. This will be R x N comparisons, whereas all-vs-all will always be N x N comparisons.

TuomasBorman · 2024-08-29T09:47:27Z

That is something to check. The initial idea for this came because I noticed that this line is not working https://github.com/microbiome/miaTime/blob/2a78096888432796c95fe488af09c0ba0888e993/R/getStepwiseDivergence.R#L230

It loops through all reference samples. As loops are slow in R, it took forever to complete. When I calculated the whole dissimilarity matrix, it took only couple of seconds since vegdist is coded in C.

Signed-off-by: Daena Rys <rysdaena8@gmail.com>

TuomasBorman · 2024-09-09T07:45:34Z

You are correct @antagomir. If we calculate all dissimilarities, the time is exponential and if we loop through sample-pairs the time grows linearly.

I think the code in miaTime has some other issues which is why it takes so long (for around 1,000 samples it does not complete in sensible amount of time). I thought that it was something to do with calling and initializing C from R.

I updated the code. @Daenarys8 can you still check that the documentation is clearly stating that this function utilizes getDissimilarity()? Also, can you then update miaTime's divergence functions based on this (check the branch above)? They should have only functionality to add reference sample information to colData and then call getDivergence(). The branch that I initialized, have some idea, however, you can remove functionality from there as only the reference sample mapping should be included.

Signed-off-by: Daena Rys <rysdaena8@gmail.com>

codecov · 2024-09-21T20:20:32Z

Codecov Report

Attention: Patch coverage is 95.60440% with 4 lines in your changes missing coverage. Please review.

Please upload report for BASE (devel@04d0e04). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
R/addDissimilarity.R	85.71%	3 Missing ⚠️
R/estimateDivergence.R	98.57%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             devel     #635   +/-   ##
========================================
  Coverage         ?   68.12%           
========================================
  Files            ?       44           
  Lines            ?     5411           
  Branches         ?        0           
========================================
  Hits             ?     3686           
  Misses           ?     1725           
  Partials         ?        0

Flag	Coverage Δ
	`68.12% <95.60%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

TuomasBorman added 2 commits August 29, 2024 11:22

up

6448517

up

bce91f6

Daenarys8 self-assigned this Aug 30, 2024

Daenarys8 and others added 6 commits August 30, 2024 11:30

v_: Update to docs and fxns

b908119

Signed-off-by: Daena Rys <rysdaena8@gmail.com>

v_: update fxn

8b55b01

Signed-off-by: Daena Rys <rysdaena8@gmail.com>

v_: Update

9d1a6a1

Signed-off-by: Daena Rys <rysdaena8@gmail.com>

Merge branch 'devel' into update_divergence

8a2f09e

up

9c659b3

up

0ce7500

antagomir and others added 6 commits September 9, 2024 20:51

Merge branch 'devel' into update_divergence

fa9e3e4

Merge branch 'devel' into update_divergence

e998bd3

update docs

8815240

Signed-off-by: Daena Rys <rysdaena8@gmail.com>

update fxn

c1b4453

Signed-off-by: Daena Rys <rysdaena8@gmail.com>

update fxn

2377ca1

Signed-off-by: Daena Rys <rysdaena8@gmail.com>

Merge branch 'devel' into update_divergence

59ff6b3

Daenarys8 mentioned this pull request Sep 19, 2024

refactor getBaselineDivergence/getStepwiseDivergence microbiome/miaTime#91

Merged

Merge branch 'devel' into update_divergence

d57e306

TuomasBorman added 2 commits September 22, 2024 10:29

up

7850dbd

up

ca7b19a

TuomasBorman merged commit bb33807 into devel Sep 22, 2024
3 checks passed

TuomasBorman deleted the update_divergence branch September 22, 2024 08:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve addDivergence #635

Improve addDivergence #635

TuomasBorman commented Aug 29, 2024

antagomir commented Aug 29, 2024

TuomasBorman commented Aug 29, 2024 •

edited

Loading

antagomir commented Aug 29, 2024

TuomasBorman commented Aug 29, 2024

antagomir commented Aug 29, 2024

TuomasBorman commented Aug 29, 2024

TuomasBorman commented Sep 9, 2024

codecov bot commented Sep 21, 2024 •

edited

Loading

Improve addDivergence #635

Improve addDivergence #635

Conversation

TuomasBorman commented Aug 29, 2024

antagomir commented Aug 29, 2024

TuomasBorman commented Aug 29, 2024 • edited Loading

antagomir commented Aug 29, 2024

TuomasBorman commented Aug 29, 2024

antagomir commented Aug 29, 2024

TuomasBorman commented Aug 29, 2024

TuomasBorman commented Sep 9, 2024

codecov bot commented Sep 21, 2024 • edited Loading

Codecov Report

TuomasBorman commented Aug 29, 2024 •

edited

Loading

codecov bot commented Sep 21, 2024 •

edited

Loading