# Dot product of csr_matrix causes segmentation fault #3212

Closed
opened this Issue Jan 14, 2014 · 18 comments

Projects
None yet
5 participants

### omerlevy commented Jan 14, 2014

 I have two (scipy) CSR sparse matrices: `A (12414693, 235470)` `B (235470, 48063)` Performing: `A.dot(B)` causes a segmentation fault. Note that both matrices are extremely sparse; `A` contains "one-hot vectors", i.e. all-zero rows except for one element. I am using scipy version 0.12.0.
Owner

### pv commented Jan 14, 2014

 The following information would be useful: (i) what is the nnz of A and B, (ii) what is the nnz (expected) of the matrix product?

### omerlevy commented Jan 14, 2014

 As mentioned, A has 12414693 nnz values. B has 2678333 nnz values, an average of ~55.72 per row.
Member

### jakevdp commented Jan 14, 2014

 Just back-of-the-envelope, then: Each entry of dot(A,B) will have 235470 options for a "hit", and the probability of a hit is (2678333 / 235470 / 48063) = 2.37E-4. This means the probability of a non-hit is (1 - 2.37E-4), and the probability of 235470 non-hits is (1 - 2.37E-4) ^ 235470 ~ 1E-25. So it is likely that every entry in the final matrix will be nonzero. Your final matrix if of size 12414693 x 48063, so you're effectively trying to allocate ~4 x 10^13 entries, which is on the order of 10 TB double-precision (~15TB in sparse format because of the indices). I'd say that's probably where your problem lies 😄
Member

### jakevdp commented Jan 14, 2014

 Also, sparse matrices are currently limited to ~2^32 entries, because the indices are stored as int32. This is likely why you get a segfault rather than a memory error. I think I remember an issue/PR filed to try to upgrade this to int64, but I can't find it right now...
Contributor

### argriffing commented Jan 14, 2014

 maybe #442
Member

### jakevdp commented Jan 14, 2014

 Edit: I think I made a mistake here. The prob. of a hit is actually (2678333 / 235470 / 48063) / 235470 = 1E-9 So the probability of 235470 non-hits is (1 - 1E-9) ^ 235470 = (1 - 3E-4) So the number of nonzero entries should be 3E-4 * 12414693 x 48063 ~ 179 million. This could run into the int32 limit.
Member

### jaimefrio commented Jan 14, 2014

 @omerlevy If you run the code in my answer to your stackoveflow question, the number of non-zero entries of the resulting matrix will be `c[-1]`.
Owner

### pv commented Jan 14, 2014

 The seg fault may also be caused by a failing C++ memory allocation. Some of the routines allocate workspace via stuff like `std::vector sums(n_col, 0);`, and I don't think std::bad_alloc is caught in the SWIG wrappers. EDIT: gh-3215 addresses this. Needs: (i) reproducible simple example, and (ii) running in debugger to proceed further.

### omerlevy commented Jan 14, 2014

 @jaimefrio `c[-1] = 275764616`
Owner

### pv commented Jan 14, 2014

 @omerlevy: please try to get a C stack trace (see https://wiki.python.org/moin/DebuggingWithGdb for instructions) and paste it to gist.github.com and post a link to it here. Alternatively, write a small test program that reproduces the issue so that we can reproduce the issue.

### omerlevy commented Jan 15, 2014

 Here it is: https://gist.github.com/omerlevy/8433760
Owner

### pv commented Jan 16, 2014

 @omerlevy: thanks --- that clarifies that the issue is not in c++ exceptions, but something else. I did not manage to reproduce the issue in simple testing --- it would help here if you could upload the sparsity structure of your matrices somewhere on the net: `numpy.savez_compressed('dump.npz', A_indices=A.indices, A_indptr=A.indptr, B_indices=B.indices, B_indptr=B.indptr)`, assuming the data is in CSR format.

### omerlevy commented Jan 16, 2014

 Thanks, I've uploaded it here: http://www.filedropper.com/dump
Owner

### pv commented Jan 16, 2014

 @omerlevy: thanks, I'm able to reproduce the crash with this information.
Owner

### pv commented Jan 16, 2014

 The crash is due to 32-bit indices wrapping around. When tested with gh-442, it turns out that the matrix `C = A.dot(B)` would have `nnz = 4570731912` (which takes 30+GB of memory; with gh-442 this fails gracefully with a MemoryError rather than crashing on systems with less memory than this available). It might be possible to hack around this by using larger types for accumulators. OTOH, a better fix would be to finally get gh-442 merged.
Member

### jaimefrio commented Jan 16, 2014

 And of course `4570731912 - 2^32 = 275764616`, the value in `c[-1]`, which is positive because the indices manage to make it all the way to positive again after wrapping around...

### omerlevy commented Jan 16, 2014

 Thank you all for the quick and helpful answers. I will try to find a mathematical workaround for this issue in my application, that reduces the matrix's size.

Merged

Owner

### pv commented Jan 19, 2014

 gh-3224 converts the crash to a friendly error message

Closed