-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early stopping criteria in TSNE set too high #24776
Comments
Thanks for your detailed explanation. I would be curious to know more about your benchmark results. For the convergence issue you are reporting, it seems that updating scikit-learn to version 1.2 would solve the problem, because it changes the default parameter to |
Hi Pavlin! And thanks, Tom, for pinging me. Several important changes are scheduled to become default in version 1.2, in particular PCA init (correctly scaled) and O(n) learning rate (see #18018). I have no opinion about the |
By the way, I checked now, and it seems the 1.2 release is expected in the "coming weeks", which is great! |
Hey Dmitry, I'm glad to see these changes finally make their way into scikit-learn!
Sure thing, I'm putting these benchmarks together for openTSNE, so I can ping you once they're finalized. But the benchmarks are pretty much the same as they've always been. For instance, for 1mln data points, using 8 cores, openTSNE (FFT) and FIt-SNE take about 15mins, openTSNE (BH) roughly 60 minutes, MulticoreTSNE roughly 95 minutes, and scikit-learn roughly 2 hours. From what I can remember, the scikit-learn implementation was particularily slow in the past, so this is a wonderful improvement. Regarding the actual issue at hand, I removed the |
Would it make sense to separately profile openTSNE BH with exact nearest neighbors? Sklearn uses exact kNN, whereas openTSNE/Fit-SNE/etc use approximate kNN which for 1mln points is of course faster.
I agree that it can be removed, but do not personally feel strongly about it; any API change to sklearn would need to go through a long deprecation cycle, which is quite a bit of hassle... |
Yes, definitely, that would definitely make a lot of sense. But here, I was more interested in the overall runtime than in how fast particular optimization schemes are.
Yeah, I suppose that would be quite a hassle and I guess it may not be worth it then. I don't really have strong feelings on this either, though I still would recommend removing this parameter down the road. I suppose it's more up to the core developers here to decide what they want to do here. Perhaps this issue itself can be useful to potential future users who run into this issue, and maybe that's enough. |
FYI (not necessarily for your benchmark), here is an example of how to precompute approximate nearest neighbors and using them in scikit-learn. Recently, the example has not been showing a big difference with exact nearest neighbors anymore, due to large speedups in scikit-learn exact nearest neighbors, but with more data points this example could still be useful. |
The issue should stop happening thanks to #19491, which will be part of v1.2. If you/someone finds a dataset where the new behaviour is still problematic please open a new issue. Let's close this issue for the time being. |
Great, glad to hear it! |
Actually the fact that |
As I was benchmarking the scikit-learn TSNE implementation, I ran across a strange problem. I run my benchmarks on a large data set, ~1.3 million data points, can be downloaded from http://file.biolab.si/opentsne/benchmark/10x_mouse_zheng.pkl.gz. This can be opened with
However, when I ran TSNE on this code, I was surprised that the results I got weren't good at all. For instance,
outputs
and produces
![image](https://user-images.githubusercontent.com/5758119/198514663-e95f1e6e-9ad0-4a20-88cb-d2e4a5a63625.png)
which has clearly not converged.
The output indicates that the optimization ran only for about 150 iterations in total, so I can fix this by setting the
min_grad_norm
parameter to zero.which outputs
and produces
![image](https://user-images.githubusercontent.com/5758119/198515003-4ef7cdbb-43c6-4fb9-8e6f-4c1bd2cb5093.png)
This works correctly, but the end result still hasn't converged, which is to be expected using the standard
learning_rate=200
. I was also pleasantly surprised to find that there is now anlearning_rate="auto"
option, which also solves the early stopping issue.outputs
producing
![image](https://user-images.githubusercontent.com/5758119/198515452-af989496-b318-4ca3-a265-a25f3f3700f8.png)
This is very similar to what I get with openTSNE using default parameters:
![image](https://user-images.githubusercontent.com/5758119/198516006-acfbe4cd-5399-4bc3-a7ff-e74f64719f6f.png)
The differences likely stem from the initalization (which I'm also glad to see is going to default to "pca" in next versions).
Notice that in this last example, I didn't have to set the
min_grad_norm
to zero. However, the default behaviour I've shown in the first example is wrong, and should probably be fixed. Perhaps setting themin_grad_norm
to a lower value might be a solution? Or removing it altogether wouldn't hurt either. In my experience with t-SNE, I've never come across any meaningful example where themin_grad_norm
criteria was actually met.Scikit-learn versions
The text was updated successfully, but these errors were encountered: