Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory usage in v0.3.0 due to full SVD #574

Closed
jsnel opened this issue Feb 28, 2021 · 1 comment · Fixed by #576
Closed

Excessive memory usage in v0.3.0 due to full SVD #574

jsnel opened this issue Feb 28, 2021 · 1 comment · Fixed by #576
Assignees
Labels
Priority: High Nasty bugs leading to incorrect results or crashes Status: In Progress Issues being worked on Type: Serious Bug Crashes, Broken code, Security Issues

Comments

@jsnel
Copy link
Member

jsnel commented Feb 28, 2021

  • glotaran version: v0.3.0
  • Python version: any
  • Operating System: any

Description

Running a dataset with a large number of datapoint in any dimension, e.g. 20.000 timepoints, will result in excessive memory usage, not during but just after optimization, at the result creation stage. This is because at this point the (full) singular value decomposition of the residual matrix is calculated (since the default for numpy.linalg.svd is full_matrices=True).

In the context of global analysis a full SVD is almost never needed, a economic SVD is what is needed. Further optimization (e.g. making the SVD calculation optional altogether, or using a memory efficient implementation) is possible but it left as a future exercise.

What I Did

Ran the _create_svd function decorate with memory_profiler's @profile decorator before and after the change to the call to numpy.linalg.svd

- l, v, r = np.linalg.svd(dataset[name])
+ l, v, r = np.linalg.svd(dataset[name], full_matrices=False)
# Before
-  1042    223.1 MiB    223.1 MiB           1       @profile
-  1043                                             def _create_svd(self, name: str, dataset: xr.Dataset):
-  1044   3276.3 MiB   3053.2 MiB           1           l, v, r = np.linalg.svd(dataset[name])
# after
+  1038    221.7 MiB    221.7 MiB           1       @profile
+  1039                                             def _create_svd(self, name: str, dataset: xr.Dataset):
+  1040    227.2 MiB      5.6 MiB           1           l, v, r = np.linalg.svd(dataset[name], full_matrices=False)

The same patch can be applied in the _prepare_dataset function

     if "data_singular_values" not in dataset:
-        l, s, r = np.linalg.svd(dataset.data)
+        l, s, r = np.linalg.svd(dataset.data, full_matrices=False)
@jsnel jsnel added Type: Serious Bug Crashes, Broken code, Security Issues Status: In Progress Issues being worked on Priority: High Nasty bugs leading to incorrect results or crashes labels Feb 28, 2021
@jsnel jsnel self-assigned this Feb 28, 2021
@jsnel jsnel added this to the v0.3.1 - maintenance release milestone Feb 28, 2021
@jsnel
Copy link
Member Author

jsnel commented Feb 28, 2021

Memory profiling results attached:
memory_profiler_results_before.txt
memory_profiler_results_after.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High Nasty bugs leading to incorrect results or crashes Status: In Progress Issues being worked on Type: Serious Bug Crashes, Broken code, Security Issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant