Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying SCORE to large datasets #2

Open
fred887 opened this issue Jan 26, 2024 · 2 comments
Open

Applying SCORE to large datasets #2

fred887 opened this issue Jan 26, 2024 · 2 comments

Comments

@fred887
Copy link

fred887 commented Jan 26, 2024

Hello,

I would like to apply SCORE to some very large datasets (e.g. DAGs with d=approx 200 nodes and n=approx 20000 samples).
I am currently blocked by a lack of memory error inside the procedure Stein_hess() due to the X_diff tensor whose memory becomes huge:

memory_X_diff = n * n * d  * sizeof(float)

For a tensor of float64 (the default setting), the memory needed by X_diff is ~ 800 GB; ~400GB for float32 or ~200GB for float16: this is just too big to fit in memory.

I know the standard approach would be to subsamples my dataset until everything fits inside memory. But I need to keep all samples for my experiments.

I have seen in your article that it is possible to use kernel approximation methods (such as MEKA) to reduce the computation load.
Indeed, with MEKA we can compute the kernel matrix K without needing to compute the overly large X_diff tensor.
However, the X_diff tensor is still necessary to compute the nablaK and nabla2K variables.

Would you know a way to completely avoid the computation of the very large X_diff tensor?

Or any other way to compute the diagonal elements of the Hessian matrix with large datasets?

Thank you very much for your help,

Best,

Frederic

@paulrolland1307
Copy link
Owner

paulrolland1307 commented Jan 26, 2024 via email

@fred887
Copy link
Author

fred887 commented Jan 30, 2024

Hello Paul,

Thank you very much for your suggestions!

Indeed, the computation of the matrix K using a block decomposition of the matrix X_diff will allow to compute the full K matrix block by block, replacing the computation of the full X_diff matrix by the computation of a series of smaller sub-blocks of X_diff.

For the K matrix, we can indeed use square blocks with shape such as n/10 x n/10 x d.

However, for the nablaK and nabla2K matrices, I think we will rather need to use a series of full slices of X_diff (such as n/100 x n x d) as we need to sum the components of the X_diff matrix over the full dataset of size n, as expressed in the Einstein summation: for nablaK, for instance we have:
image

I agree with you also regarding solving the linear system (K + eta I)X=nabla2K using the conjugate gradient as a way to avoid inverting the large K matrix: thank you very much for the suggestion!

Best,

Frederic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants