New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make hecke operators not blow up the memory #21303
Comments
This comment has been minimized.
This comment has been minimized.
comment:2
The |
comment:3
You will also need to write a cython function to convert an eclib sparse matrix to a Sage one. William and I did this for dense matrices in about 2008. The output function at lne 729 of https://github.com/JohnCremona/eclib/blob/master/libsrc/smat.cc might help; also the associated header file. I suggest that you loop through all the rows and for each one fetch all the (column, value) pairs. One thing to watch out for: if a row contains n nonzero entries then the 0'th entry in the associated col array stores n so that array has size n+1 while the values array has size n. Row and column numbers start from 1 not 0. Good luck. |
comment:4
I got a rather crude version of this up and running, so here's a patch. The crude part is that I ask I tried this out on my usual compute server:
The regression (48.9s to 71.7s) is annoying; the bottleneck is the line
where
On the memory side (crudely measured by watching And in my intended use case I work modulo some prime, and this is a big win:
so I'm taken care of. New commits:
|
Commit: |
comment:5
I should add that I have been running this code since I posted it and am observing a memory leak somewhere. Specifically, I've been running this code snippet (for various values of
and observing (via |
comment:6
When eclib was first put into Sage back in about 2007 it was subjected to rigorous valgrinding by Michael Abshoff, which revealed several issues which were then fixed (with some effort!). But there has been considerable change since then, so I should do that again. I have made this an Issue for eclib: JohnCremona/eclib#18 |
comment:7
Update: I can reproduce the memory leak behavior by simply repeating the code from #20788 (as reported therein). That is not to say that running valgrind again on |
comment:8
Oh, now this is bad news.
This is only an issue when It looks like |
comment:9
Apologies for the inconsistencies. There must be quite a few methods in this homspace class which are no longer used in the main programs and whose implementations have therefore not been kept consistent or tested. Unlike Sage where every single method has its own test, I just have high-level test programs and there must be a lot of code which is no longer tested, thought of by me as obsolete but not actually deleted. The number of people who have looked at this code in any detail is very small! |
Branch pushed to git repo; I updated commit sha1. New commits:
|
comment:11
I went ahead and switched over to Timings are similar, but now on a completely unloaded machine, so more indicative than before:
|
Author: Kiran Kedlaya |
comment:13
Note to myself: check to see whether the resolution of #22164 has had any effect on the memory leak reported above. |
comment:15
This syntax is deprecated and shouldn't be used anymore:
Cython will generate efficient code for |
comment:16
You should not put Python code inside There are two (not mutually exclusive) cases: (A) If there is any particular non-Python statement that takes a long time or might generate signals, put (B) If you just want to interrupt the loop, put |
comment:17
In the
It should be declared just once. |
comment:18
(NOTE: I just checked the Cython programming issues, I didn't actually try to understand the code; I assume that John can do that) |
comment:19
I looked at the code. Congratulations for being the first person in history other than me and Luis Figueiredo (a student of Taylor in the 1990s) to read my sparse vector / matrix code! What you do is go through the rows, extract the i'th row as a sparse vector, convert it to a dense vector, and then go through its entries storing the nonzero ones in the new Sage sparse matrix. This is inefficient as you look at all m*n netries (if the matrix is mxn). It would be better to extract just the nonzero entries in the first place. I can try to write that, or help someone else, but before do I need to know if Cython will have access to the "protected" (in C++ terms) i.e. semi-private data components of the C++ sparse matrix. If so I can write the necessary lines. |
comment:21
That makes it harder. The thing is an array of "rows" (one for each actual row even if the row is zero). We know how many from M.nrows(). For the i'th row we store the number of nonzero entries, the list of columns they are in and the list of the entries. The only difficulty is in getting the 0/1 rebasing correct since C, like Python goes from 0 bu my user interface goes from 1... I can see that to provide the interface needed will require adding to eclib. Sorry. Meanwhile it would still be simpler to extract just the nonzero entries from the sparse vector which we already extract for each row. But even that requires access to protected members of the svec class (which are essentially dicts containing (index, netry) pairs). So that leaves this:
I don't expect that to be any better than the current version though since for each i,j the elem() method will do a linear search along the i'th row to see if the j'th entry is present. Sorry not to have a low-level interface! |
comment:22
A possible low-level interface could work like a Python iterator delivering triple (i,j,M[i,j]) with some signal at the end such as (0,0,0). Without doing a lot of work that could easily be provided as a single 3xN matrix where N is one more than the number of nonzero entries. Then the user can loop over that. But such a change to eclib is for the future, and should not delay this. |
Branch pushed to git repo; I updated commit sha1. New commits:
|
comment:24
I fixed the Cython issues (I think). That said,
that should avoid creating a dense vector; but I need to learn a bit more about Cython to figure out how to exploit those. |
Changed upstream from Not yet reported upstream; Will do shortly. to None of the above - read trac for reasoning. |
comment:25
Replying to @kedlaya:
Cython does support that (I'm assuming that
|
Branch pushed to git repo; I updated commit sha1. New commits:
|
comment:27
I tried to use the iterator following a model in the Cython documentation, but...:
|
comment:28
Next time, please quote the complete error message:
You forgot |
comment:29
In fact, Cython does type inference, so you could also just remove the line
|
comment:31
OK, I think I now have a working interface that avoids instantiating any dense vectors. |
comment:33
I made some trivial fixes. Looks good to me. New commits:
|
Reviewer: David Roe |
Changed branch from u/roed/make_hecke_operators_not_blow_up_the_memory to |
Constructing Hecke operators acting on modular symbols can blow up the memory usage, though it doesn't have to. The eclib code has sparse and dense options for generating Hecke operators, but the sage caller only calls the dense options. The simple solution is to modify the sage caller to add a flag which specifies to use the sparse options: perhaps a better solution would be to intuit whether the resulting matrix is likely to be sparse and do something intelligent.
Upstream: None of the above - read trac for reasoning.
CC: @JohnCremona @kedlaya
Component: modular forms
Keywords: eclib, modular symbols, hecke operators
Author: Kiran Kedlaya
Branch/Commit:
6c3bac5
Reviewer: David Roe
Issue created by migration from https://trac.sagemath.org/ticket/21303
The text was updated successfully, but these errors were encountered: