New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+3] FEA Add PolynomialCountSketch to Kernel Approximation module #13003
Conversation
Added Tensor Sketch algorithm in the kernel approximation module.
removed or added white spaces where needed.
This is, polynomial kernels with coef0 != 0
You could use, try:
from scipy import fft
except ImportError: # scipy < 1.4
from scipy import fftpack as fft the scipy.fft uses their newer pocketfft implementation scipy/scipy#10238 that's more optimized in some cases numpy/numpy#11888 (comment) cf https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.4.0#scipyfft-added for more details. |
Co-authored-by: Christian Lorentzen <lorentzen.ch@googlemail.com>
@rth I think I addressed all your comments. All checks passing :) Maybe have a look at how I phrased things at |
for d in range(self.degree): | ||
iHashIndex = self.indexHash_[d, j] | ||
iHashBit = self.bitHash_[d, j] | ||
count_sketches[:, d, iHashIndex] += \ | ||
(iHashBit * X_gamma[:, j]).toarray().ravel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need to change now , but the following should be faster assuming a relatively low sparsity matrix. For typical matrices obtained by CountVectorizer
this makes the transform
around 5x faster e.g. with 10k samples, 8k input features.
for d in range(self.degree): | |
iHashIndex = self.indexHash_[d, j] | |
iHashBit = self.bitHash_[d, j] | |
count_sketches[:, d, iHashIndex] += \ | |
(iHashBit * X_gamma[:, j]).toarray().ravel() | |
Xg_col = X_gamma[:, j] | |
for d in range(self.degree): | |
iHashIndex = self.indexHash_[d, j] | |
iHashBit = self.bitHash_[d, j] | |
# The following requires X_gamma to be in CSC sparse | |
# format | |
count_sketches[Xg_col.indices, d, iHashIndex] += \ | |
(iHashBit * Xg_col.data) |
Thanks! I have been experimenting with using this approach for text classification on a 10k samples subset of AG News dataset. Granted it's probably not a typical use case, but I still wanted to check that results generally make sense. Below are classification accuracy results,
obtained with the following notebook tensorsketch-experiments-sparse.py. Here test scores are done without cross-validation as it takes already a long time (and requires the above optimization for sparse), so they are not too reliable. There are also likely overfitting issues with linearSVC and a large number of components. Main takeaways,
Overall +1 for merging after a few more documentation precisions on how to choose |
Another comment for reviewers is that I'm not convinced by the name The original name
WDYT? |
@rth, as you requested:
Regarding the name of the main class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PolynomialCountSketch
sounds better indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM. Let's wait a few more days to see if there are any objection to the PolynomialCountSketch
name (cf #13003 (comment)). And if there are none, rename it and merge.
+1 for |
(and hope nothing breaks)
f140f1d
to
c31e41c
Compare
Seems like everyone is happy with the new name ( Do I have to squash the ugly commit history in this branch into a single, clean commit? Anyways, let me know if there is anything left on my side. Super excited about having my first contribution to sklearn merged! 🥳 |
@lopeLH Thank you very much for your contribution and your patience! And feel free to continue - if you'd like on one of the possible follow-ups that @rth listed. I'm also excited as this is my first merge hoping everything went fine. |
Woohoo! Great! Exciting!
|
…learn#13003) * Add Tensor Sketch algorithm * Add user guide entry * Add example * Add benchmark Co-authored-by: Christian Lorentzen <lorentzen.ch@googlemail.com> Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
This PR adds the Tensor Sketch [1] algorithm for polynomial kernel feature map approximation to the Kernel Approximation module.
Tensor Sketch is a well established method for kernel feature map approximation, which has been broadly applied in the literature. For instance, it has recently gained a lot of popularity to accelerate certain bilinear models [2]. While the current kernel approximation module contains various kernel approximation methods, polynomial kernels are missing, so including TensorSketch completes the functionality of this module by providing an efficient and data-independent polynomial kernel approximation technique.
The PR contains the implementation of the algorithm, the corresponding tests, an example script, and a description of the algorithm in the documentation page of the kernel approximation module. This implementation has been tested to produce the same results as the original matlab implementation provided by the author of the algorithm [1].
[1] Pham, N., & Pagh, R. (2013, August). Fast and scalable polynomial kernels via explicit feature maps. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 239-247). ACM.
[2] Gao, Y., Beijbom, O., Zhang, N., & Darrell, T. (2016). Compact bilinear pooling. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 317-326).
Work for follow-up PR:
@rth: This can be a follow up issue/PR, and would need double checking but since the count_sketches input is real you can likely use rfft and irfft witch would be faster.
@rth: The issue is that calling fit twice would produce a different seed and therefore a different result, since RandomState instance is mutable. In [MRG] Expose random seed in Hashingvectorizer #14605 for a similar use case, we added a seed variable in transform, but I'm not particularly happy with that outcome either. This is probably fine as is, we would just have to address this globally at some point in RFC design of random_state #14042
@rth: It could be worth considering whether it would make sense thesholding (in the above example 1e-15 would be ok as a threshold) and converting back to sparse. Though the intermediary step with the FFT would still be dense with the associated memory requirements, maybe it could be worth chunking with respect to n_samples not sure ([MRG+3] FEA Add PolynomialCountSketch to Kernel Approximation module #13003 (comment)).