New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime and RAM usage compared to FIt-SNE #101
Comments
Yes, it's possible to benchmark this, but it's a little involved. You can add the I fear there isn't all that much I can do to make it faster since the generated C code from cython is a lot, and there is a lot of overhead to switching between compiled cython code and python compared to pure C++.
This has been my suspicion for a long time actually. I haven't really run into this myself because I typically work on large RAM machines when doing any actual work, but I've been told this does happen. Unfortunately, for the time being, we still rely on pynndescent, which seems to be the source of almost all my problems with openTSNE. I would really like to replace this with something that doesn't use numba, but I haven't really found any good alternative. Numba makes it so easy to use custom metrics and it is easy to install that there's really nothing like it. I've considered switching to annoy several times since it also seems to be faster, but nothing has come of it yet, and from what I remember, it was a pain to package. If you do go into the code and find improvements, PRs are always welcome 😄 |
I'm trying to understand the RAM usage of pynndescent a bit better here: lmcinnes/pynndescent#90. See also https://twitter.com/hippopedoid/status/1213935495350276096. I think pynndescent is great in that it supports so many metrics and also sparse matrices. That said, it'd be great if openTSNE were also supporting annoy which seems much more memory-effective, and allowed to choose between pynndescent and annoy. What's the problem with using annoy? It seems to be on conda-forge: https://anaconda.org/conda-forge/python-annoy |
Oh, regarding building the index vs querying (the twitter conversation linked above), I just remembered that you mentioned that here: #28 (comment). |
Hmm, I should really follow twitter more often, I just saw that your new preprint on UMAP initialization. Very interesting work. To be quite honest, pynndescent is amazing. As you said, custom metrics, sparse matrices, it's quite fast, IMO it's probably the best library for ANN search. However, numba is a huge dependency (it bloats our Orange installer by 40MB or so) and seems unstable. For example, just last week one of my colleagues got segfaults when running some data through pynndescent. Removing a couple of samples fixed it. Totally bizarre stuff, and totally difficult to debug. On my new mac, I get some strange errors that I haven't had the time yet to debug. Specifying the number of jobs is a mystery and I can't ever seem to get it to use the right number of threads. 0.4 broke backward-compatibility, which makes me weary of depending on them as now I have to worry that every new release will break something (now that UMAP depends on them instead of using their own internal implementation, I hope they'll be better about this).
As far as I remember, annoy has a very limited python interface, and the only way to add samples to the index is to do it in a for-loop. The same goes for retrieving the distances and nearest neighbors. For-loops are slow in python so I worry this silly overhead would take a lot of time. But it might not be that big a deal. We'd really just need to try it. It should be very simple to write a
If they've made substantial changes to how the index is built, it might make sense to build the index with the proper number of neighbors now. But again, this is something that needs to be benchmarked. |
And, I've just noticed that the tests are failing due to some numba related problem from |
I can understand the frustration but IMHO if you entirely remove pynndecent from openTSNE it'd be a big step back, because it will lose the sparse matrix support. Currently openTSNE is the only existing t-SNE implementation supporting sparse inputs. |
Hopefully it will be fixed soon... lmcinnes/pynndescent#91 (comment) |
Yes, don't worry, I have no intention of removing it until I find something that will include all the features that are so great in pynndescent. Sparse support is very much something that I fully indend to keep. I'm glad they're aware of the issue and I hope it gets resolved as soon as possible. |
Sorry about all the breakage in the 0.4 release, but there were some signifcant low hanging performance gains available and I felt it was worth it. Right now not many projects that I know of were explicitly depending on pynndescent so I thought it was better to get the changes made earlier. Hopefully things will stabilise from here. The multi-threading is still a complicated story since there are multiple multi-threading mechanisms in play. One, written by Tom White, is very powerful and scales up very well; however it doesn't scale down that well (for 4 cores it is, at best, break even in performance, and worse for fewer) and is certainly memory intensive. The 0.4 version introduces a numba based parallel implementation that does not scale up as well, but works well for the 1-8 core range, and uses a lot less memory. Ideally I would like to move to that long term, but the numba parallel options are not quite there yet (they are coming). As for breakage with numba -- yes, it seems like the 0.47 version has introduced some recent issues as well. It is frustrating, but on the other hand it is far easier to write an maintain python using numba than what would be required to get similar performance any other way (Cython, or straight C++). Hopefully there will be less issues in the future. Sorry again for all the issues; hopefully they can get resolved. |
Hi, thanks for the explanation. As for breaking changes in publicly available software, it's generally best practice to have a grace period where you'd raise some I appreciate that this must be equally frustrating to you since I imagine most of your problems are also caused by numba updates. As I said, I think pynndescent is a great piece of software, but it's quite unreliable because things seem to break quite often.
I'm not sure I agree. It seems to me that numba is still quite immature and if I constantly have to return to openTSNE to fix bugs introduced by some downstream dependency, this does not scream easy to maintain to me. I would much rather have a stable piece of software written in C++ which might have taken a bit longer to develop, but it will be faster and more stable. I haven't made changes to the core openTSNE code in probably about a year, yet I find myself constantly returning to it to fix some bug or implement some workaround for numba or figuring out which versions work and which don't. I apologize if I sound a little frustrated, but I've just returned from holiday to find failing tests with cryptic numba error messages which aren't even part of my code. So as not to completely derail this issue, it's my understanding that the |
As far as I understood @lmcinnes -- no, this is not correct. He said "explicit" multithreading happens whenever I am not sure what |
There was a caching trick used in NNdescent that could provide dramatic speedups; unfortunately for some datasets this could cause a memory explosion (as too much got cached). The low memory turned off this caching. The current version uses a much less agressive caching approach. The low memory option still turns it off, but it is much less likely to have as dramatic an effect on memory usage in comparison to previous versions. |
@pavlin-policar : Sorry for the problems I have apparently caused. To be honest I am somewhat new to maintaining projects of any significant visibility and am very much learning as I go. I will endeavour to provide a little more of a smooth path in future. Finally I agree that numba is immature; on the other hand so is pynndescent (hence the significant changes it underwent; there are probably more to come, ideally including a distributed computation version). This is much of the reason why UMAP continues to maintain its own internal implementation for now. I am very sorry that my haste caused so much trouble. I could recommend annoy or nmslib as more stable ANN libraries written in C++. |
Ok, so that is very confusing as an end-user and I would encourage this to be changed. So setitng
@lmcinnes I have actually looked at and considered both of them, but I found problems with both of them (albeit this was quite a while ago). Like I said, I haven't really found anything that has the nice features of |
I am not completely sure, but I think the amount of RAM is just directly proportional to I am also not completely sure if the "implicit" numba multithreading ONLY kicks in with I think as far as openTSNE is concerned, it might be a good idea to add a separate |
I don' like adding yet another parameter that would be specific to pynndescent at all. I understand why you'd like the flexibility, but from a maintenance standpoint, it's not great. Hopefully, they change the behavior to something a little more reasonable in the near future, then when that happens, we'll have to release a new version to slowly remove this parameter. Like I was complaining above, changing the public API should be done with care, and any new parameter we introduce is going to have to be supported for the next couple of versions. Additionally, I feel there are already far too many parameters for the normal user (currently 24!) in the TSNE public API. I'd consider adding this to the API of NNDescent KNNIndex, which can then be passed to the t-SNE constructor, that would be better IMO. |
I am not sure what behaviour from |
That part makes perfect sense. The part that doesn't make sense is that when we set |
So thinking this over again, I think the best course of action might be to add Annoy support and make it default for dense inputs and all metrics that Annoy supports (of course assuming that it is not slower; this one would need to check). For sparse inputs or any other metrics, the code should switch to pynndescent. And of course one should be able to explicitly choose which method is used. Annoy actually says right in its headline that it is "optimized for memory usage". What do you think? Then everything about |
I might be able/willing to try to implement KNNIndex for Annoy and do some experiments regarding the runtime. Or have you already been looking into it? |
@dkobak Sorry, I missed this, my inbox has been a bit of a mess. No, I haven't really had a chance to look at this any further. Have you managed to tinker around with this yet? |
No, I haven't yet. Should we try to test it together with the changes of the default parameters? It could be a nice update together. |
I implemented Annoy, and it seems to work well! Should I do a PR? (Showing the same cell twice because I noticed that for some weird reason re-running openTSNE call with pynndescent sometimes makes it execute a lot faster; not sure why, maybe it's some kind of numba quirk? Looks almost as if something needs to be "initialized" the first time pynndescent is run after kernel restart.) UPDATE: Ran the same thing on my laptop: |
This is very nice. The slowdown you noticed is one of the annoying things about numba. When the code is first run, it has to go through the JIT compilation, meaning the first call will take a bit longer. |
I guess this issue can be closed now after #111. |
I understand that openTSNE is expected to be slower than FIt-SNE, but I'd like to understand how much slower it is in typical situations. As I reported earlier, when I run it on 70000x50 PCA-reduced MNIST data with default parameters and
n_jobs=-1
, I get ~60 seconds with FIt-SNE and ~120 seconds with openTSNE. Every 50 iterations take around 2s vs around 4s.I did not check for this specific case, but I suspect that FFT takes only a small fraction of this time, and the computational bottleneck is formed by the attractive forces. Can one profile openTSNE and see how much time is taken by different steps, such as repulsive/attractive computations?
Apart from that, and possibly even more worryingly, I replicated the data 6x and added some noise, to get a 420000x50 data matrix. It takes FIt-SNE around 1Gb of RAM to allocate the space for the kNN matrix, so it works just fine on my laptop. However, openTSNE rapidly took >7Gb of RAM and crashed the kernel (I have 16 Gb but around half was taken by other processes). This happened in the first seconds, so I assume it happens during the kNN search. Does pynndescent eat up so much memory in this case?
The text was updated successfully, but these errors were encountered: