-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DelayedArray and SVD questions #9
Comments
Great to see someone diving deep into the intricacies of our matrix representations. It's unlikely that you're doing anything wrong. I'll focus on the wrapping of the The cost of this generality is that it is much less efficient than working with a naked sparse matrix. Having said that, there is infrastructure within DelayedArray to support dedicated operations on sparse backends. It just hasn't been joined up to the matrix multiplication, not because it's hard but because it just hasn't been a priority. The thinking has been that if you're wrapping a sparse matrix in a Now, BiocSingular uses the Obviously if your matrix is file-backed, you have the extra penalty of reading stuff from disk. We are also forced to store data in dense form in HDF5 arrays, so the sparse-to-dense cost still applies here. The only point of note is to make sure that your chunks are square-ish, which ensures that the row and column accesses are reasonably efficient. (This should already be the default.) |
Hi, I am grateful for your clear background descriptions of DelayedArray and BiocSingular, and additional details of DelayedArray wrapping I have still some questions. Some particularly insistent questions involve the First, I ** Run 1
This ran to completion. ** Run 2
This run failed with the message
I ran the command
I ran
(I am puzzled about the auto block size getting reset internally to a value > Moving along, I seem to misunderstand how memory is used by DelayedArray/BiocSingular. I imagined that setting auto block size to ~2 Gb would limit the memory used in runs; however, when I ran ** Run 3
This run failed. (When I rerun it using 2 workers it looks like each of the processes can use > 30 Gb of RAM.)
Incidentally, I noticed a paper entitled Out-of-Core Singular Value Decomposition arXiv: 1907.06470v1, which uses a block-oriented strategy and may work with DelayedArrays. Responding to my inquiry last month, an author, Vadim Demchik, wrote that ExB SVD could be open-sourced in a few months. |
This will soon be the case once the sparse matrix capabilities of DelayedArray are optimized.
Woah, woah. That's a block that 64 GB in size! Are you sure you want to do that? It only "works" for a small sparse matrix because you have enough memory to realize that sparse matrix fully as a dense ordinary matrix. Which, of course, totally defeats the purpose. I don't know why there's a Perhaps @hpages may be able to shed some light on the technical details. |
Hi, OK. I'll hold my horses until Still I want to get a feeling for how the That said, yup, I wanted a 64 GB block size but no longer because my concept of the block size function is clearly incongruous with reality. When loaded into a dgCMatrix wrapped with DelayedArray, the relatively large matrix requires 6.02 GB as reported by Again, I appreciate your consideration and patience. |
Hi, block size != block length block length = number of array elements in a block ( For example, for an integer array, block size (in bytes) is going to be 4 x block length. For a numeric array ( In its current form, block processing in DelayedArray must decide the geometry of the blocks before starting the walk on the blocks. It does this based on several criteria. Two of them are:
The auto block size setting and Note that this simple relationship between block size and block length assumes that blocks are loaded in memory as ordinary (a.k.a. dense) matrices or arrays. With sparse blocks, all bets are off. But the max block length is always taken to be the auto block size divided by It's important to keep in mind that the auto block size setting is a simple way for the user to put a cap on the memory footprint of the blocks. And that's all. In particular it doesn't control the maximum amount of memory used by the block processing algorithm. Other variables can impact dramatically memory usage like parallelization (where more than one block is loaded in memory at any given time), what the algorithm is doing with the blocks (e.g. something like Finally w.r.t. the Hope this helps. H. |
Hi @hpages, I am grateful for your valuable description of definitions/usage of
with adequate care.) I believe that I read somewhere that R 3.0.0 introduced 'long vectors' with > 2^31-1 elements although my recollection is hazy now. (I seem to recall also that the maximum index value for a matrix is limited to 2^31-1.) Ahh, here it is, Long Vectors. Or, perhaps I misunderstand you. Anyway, I think that I understand better what I see, and I need certainly to think more carefully about these details. And I appreciate better some of the complexities you deal with in the DelayedArray package! |
Correct, long vectors were introduced in R 3.0.0. However, in the early days you couldn't do much with them because very few operations in base R had been modified to support them. I believe things have improved significantly since then though so maybe all the base R operations that are needed to support blocks of length >= 2^31 are now capable to operate on long arrays. We would also need to make sure that the matrix summarization functions from the matrixStats package can also handle that. |
Hi @LTLA and @hpages, |
Hi,
I am considering using the BiocSingular, DelayedArray, and HDF5Array packages for initial processing of large single-cell data sets where I use on-disk storage of the expression matrices.
I wonder if you might be willing to answer some of my questions, and if so, where you prefer that I post them.
As some background, I have run timing tests in which I begin processing from an sci-RNA-seq counts matrix, estimate size factors, normalize counts, calculate column means and variances, and then run singular value decomposition. It became clear that the SVD is the bottleneck so I saved an object with the elements required for the SVD and limited timing tests to the SVD.
I ran the SVD using dgCMatrix sparse matrix passed to irlba::irlba in order to get a reference time.I followed this with tests in which I wrapped the dgCMatrix in a DelayedArray and passed it to irlba::irlba, to BiocSingular::runIrlbaSVD, and BiocSingular::runRandomSVD. I tried also using the HDF5Array::TENxMatrix as the DelayedArray seed. The run times for matrices wrapped in DelayedArray are substantially longer than the runs using sparse matrices. My biggest concern is that I may be running these tests incorrectly.
As a warning, I have a poor understanding of DelayedArrays so at least some of my questions may be basic.
Thank you.
Brent
The text was updated successfully, but these errors were encountered: