New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration into linfa #3
Comments
Thanks!
Sounds good to me.
Ok, easily done.
Sure.
Currently not much to be honest. There's just a check on the perplexity value and NaNs are free to spread. |
k I will ping back here once I have a minimal PR |
I opened a PR here rust-ml/linfa#101. I'm running into two problems with the Iris flower dataset:
|
Could you please specify the full configurations of parameters that you used? |
for the stack overflow: bhtsne::run(
&mut data,
nsamples,
nfeatures,
&mut y,
2,
1.0,
0.5,
false,
2000,
250,
250,
); for the NaN output: bhtsne::run(
&mut data,
nsamples,
nfeatures,
&mut y,
2,
15.0,
0.0,
false,
2000,
250,
250,
); |
The former error was caused by an overflow happening during the computation of the optimal entropy for the P distribution (it is done sort of by applying a binary search over the real numbers and for very small perplexity values it can take some iterations). Although unusual, as in the paper values between 5 and 50 are recommended, a perplexity value of 1.0 is fine and the algorithm should be capable of handling the case. The latter (the NaN output one) was caused by the same issue combined with a bug in the squared euclidean distance matrix function. They should both be fixed now. I'm currently working on switching to |
sounds good, I'm still a bit confused that you got a stack overflow though you are not using recursion anywhere in your algorithm. (at least that's where I ran into that last time) you still have to push your changes 😄 Line 486 in dce8862
|
Done, thanks for the tip.
Also done.
bhtsne::run(
&mut data,
nsamples,
nfeatures,
&mut y,
2,
1.0,
0.5, // theta
false,
2000,
250,
250,
);
If you find any other quirks please let me know. Also I'd like to ask the following question: why is the |
I see!
mainly for ergonomic reasons, so that people can choose the precision of their floating points without the need to cast |
I just saw your post on Reddit, awesome work! I'm the maintainer of linfa and thought about implementing t-SNE as a transformative dimensionality reduction technique in the past, but never came to it. This crate can take off a lot of work for us. We would implement a wrapper which adepts your algorithm by:
Sounds good? I just quickly glanced at the source code and three things stood out which could be improved:
csv
dependency optional, sometimes it's not necessary to pull that inThe text was updated successfully, but these errors were encountered: