[MRG+3] FEA Add PolynomialCountSketch to Kernel Approximation module #13003

lopeLH · 2019-01-17T12:26:31Z

This PR adds the Tensor Sketch [1] algorithm for polynomial kernel feature map approximation to the Kernel Approximation module.

Tensor Sketch is a well established method for kernel feature map approximation, which has been broadly applied in the literature. For instance, it has recently gained a lot of popularity to accelerate certain bilinear models [2]. While the current kernel approximation module contains various kernel approximation methods, polynomial kernels are missing, so including TensorSketch completes the functionality of this module by providing an efficient and data-independent polynomial kernel approximation technique.

The PR contains the implementation of the algorithm, the corresponding tests, an example script, and a description of the algorithm in the documentation page of the kernel approximation module. This implementation has been tested to produce the same results as the original matlab implementation provided by the author of the algorithm [1].

[1] Pham, N., & Pagh, R. (2013, August). Fast and scalable polynomial kernels via explicit feature maps. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 239-247). ACM.

[2] Gao, Y., Beijbom, O., Zhang, N., & Darrell, T. (2016). Compact bilinear pooling. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 317-326).

Work for follow-up PR:

@rth: This can be a follow up issue/PR, and would need double checking but since the count_sketches input is real you can likely use rfft and irfft witch would be faster.
@rth: The issue is that calling fit twice would produce a different seed and therefore a different result, since RandomState instance is mutable. In [MRG] Expose random seed in Hashingvectorizer #14605 for a similar use case, we added a seed variable in transform, but I'm not particularly happy with that outcome either. This is probably fine as is, we would just have to address this globally at some point in RFC design of random_state #14042
@rth: It could be worth considering whether it would make sense thesholding (in the above example 1e-15 would be ok as a threshold) and converting back to sparse. Though the intermediary step with the FFT would still be dense with the associated memory requirements, maybe it could be worth chunking with respect to n_samples not sure ([MRG+3] FEA Add PolynomialCountSketch to Kernel Approximation module #13003 (comment)).

Added Tensor Sketch algorithm in the kernel approximation module.

removed or added white spaces where needed.

This is, polynomial kernels with coef0 != 0

rth · 2020-08-12T11:57:41Z

Back to fftpack

You could use,

try:
    from scipy import fft
except ImportError:   # scipy < 1.4
    from scipy import fftpack as fft

the scipy.fft uses their newer pocketfft implementation scipy/scipy#10238 that's more optimized in some cases numpy/numpy#11888 (comment)

cf https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.4.0#scipyfft-added for more details.

Co-authored-by: Christian Lorentzen <lorentzen.ch@googlemail.com>

lopeLH · 2020-08-12T13:31:21Z

@rth I think I addressed all your comments. All checks passing :)

Maybe have a look at how I phrased things at doc/modules/kernel_approximation.rst (line 175), regarding Count sketch and its role in TensorSketch.

rth · 2020-08-12T21:44:19Z

sklearn/kernel_approximation.py

+                for d in range(self.degree):
+                    iHashIndex = self.indexHash_[d, j]
+                    iHashBit = self.bitHash_[d, j]
+                    count_sketches[:, d, iHashIndex] += \
+                        (iHashBit * X_gamma[:, j]).toarray().ravel()


It doesn't need to change now , but the following should be faster assuming a relatively low sparsity matrix. For typical matrices obtained by CountVectorizer this makes the transform around 5x faster e.g. with 10k samples, 8k input features.

Suggested change

for d in range(self.degree):

iHashIndex = self.indexHash_[d, j]

iHashBit = self.bitHash_[d, j]

count_sketches[:, d, iHashIndex] += \

(iHashBit * X_gamma[:, j]).toarray().ravel()

Xg_col = X_gamma[:, j]

for d in range(self.degree):

iHashIndex = self.indexHash_[d, j]

iHashBit = self.bitHash_[d, j]

# The following requires X_gamma to be in CSC sparse

# format

count_sketches[Xg_col.indices, d, iHashIndex] += \

(iHashBit * Xg_col.data)

rth · 2020-08-13T11:34:19Z

Thanks! I have been experimenting with using this approach for text classification on a 10k samples subset of AG News dataset. Granted it's probably not a typical use case, but I still wanted to check that results generally make sense. Below are classification accuracy results,

label	fit_time	train accuracy	test accuracy
baseline: CountVectorizer + LinearSVC	0.18	0.98	0.84
baseline w/ PCA(n_components=100)	0.64	0.84	0.83
baseline w/ SparseRandomProjection(n_components=1000)	0.29	0.51	0.49
baseline w/ SVC(kernel='poly', degree=2)	9.68	0.99	0.86
baseline w/ PolynomialSampler(degree=2, n_components=100)	1.12	0.36	0.29
baseline w/ PolynomialSampler(degree=2, n_components=1000)	3.94	0.65	0.43
baseline w/ PolynomialSampler(degree=2, n_components=10000)	25.34	0.99	0.62
baseline w/ PolynomialSampler(degree=2, n_components=20000)	53.13	0.99	0.72
baseline w/ PolynomialSampler(degree=2, n_components=40000)	106.51	0.99	0.78
baseline w/ PolynomialSampler(degree=2, n_components=100000)	269.17	0.99	0.84

obtained with the following notebook tensorsketch-experiments-sparse.py. Here test scores are done without cross-validation as it takes already a long time (and requires the above optimization for sparse), so they are not too reliable. There are also likely overfitting issues with linearSVC and a large number of components.

Main takeaways,

we need to explicitly check in fit that degree>=1 otherwise FFT fails with a 0D input matrix for degree=0. It would be also good to add a test that check this exception with with pytest.raises(ValueError, match='degree=0 should be >=1.').
results will be very bad for n_components < n_features even evaluating on the train subset. As far as I can tell, both the paper and the example only illustrate the case n_components > n_features, with an optimum of evaluation score / run time cost around n_components=10*n_features. We should adds some of these suggestions in the docstring of n_components (and add a sentence to the user manual) to help choose n_components. Otherwise users may be very disappointing with the performance when using the default n_components=100 on a higher dimensional case.
For the sparse case 2. implies that it's not very usable, because one gets as output dense matrices with 10k+ features. Empirically, with sparse input and for n_components > n_features, the resulting dense matrix actually contains mostly zeros. So in a follow up PR, it could be worth considering whether it would make sense thesholding (in the above example 1e-15 would be ok as a threshold) and converting back to sparse. Though the intermediary step with the FFT would still be dense with the associated memory requirements, maybe it could be worth chunking with respect to n_samples not sure.

Overall +1 for merging after a few more documentation precisions on how to choose n_components and a check/test for degree<1.

rth · 2020-08-13T11:48:41Z

Another comment for reviewers is that I'm not convinced by the name PolynomialSampler. This algorithm does no sampling as far as I can see, and there is 0 occurrence of word sampling in ether of cited papers. Rather, roughly, it applies a convolution on hashed feature spaces (via product in FFT space). Also the actual implementation looks more like a random projection than hashing. So the current name is a bit misleading IMO.

The original name TensorSketch might be a bit opaque but how about,

PolynomialCountSketch or PolynomialTensorSketch: the count sketch is the correct name for this hashing technique. It may be a bit exotic/new (invented in 2003) but there is a wiki page https://en.wikipedia.org/wiki/Count_sketch and around 14k search results
PolynomialFeatureHasher could have worked except that FeatureHasher works with completely different input
PolynomialProjection by analogy with random projection as it's a bit related
or some name with a combination of "Polynomial" and "Hashing" or "Projection"

WDYT?

lopeLH · 2020-08-13T13:23:03Z

@rth, as you requested:

Added a check enforcing degree >= 1 in the ~~init~~ fit method of PolynomialSampler.
Added a test checking PolynomialSampler throws if initialized with a degree lower than one.
Added the suggested hints regarding the selection of n_components to the docstring and user guide (please, have a look, I don't trust my English).

Regarding the name of the main class, PolynomialCountSketch sounds good to me.

TomDLT

PolynomialCountSketch sounds better indeed.

sklearn/kernel_approximation.py

rth

Thanks, LGTM. Let's wait a few more days to see if there are any objection to the PolynomialCountSketch name (cf #13003 (comment)). And if there are none, rename it and merge.

lorentzenchr · 2020-08-14T21:40:27Z

+1 for PolynomialCountSketch from my side.

(and hope nothing breaks)

lopeLH · 2020-08-17T20:40:17Z

Seems like everyone is happy with the new name (PolynomialCountSketch), so I performed the change.

Do I have to squash the ugly commit history in this branch into a single, clean commit? Anyways, let me know if there is anything left on my side.

Super excited about having my first contribution to sklearn merged! 🥳

lorentzenchr · 2020-08-18T06:49:28Z

@lopeLH Thank you very much for your contribution and your patience! And feel free to continue - if you'd like on one of the possible follow-ups that @rth listed.
You don't need to squash commits. That's done automatically when we merge.

I'm also excited as this is my first merge hoping everything went fine.

GaelVaroquaux · 2020-08-18T06:59:02Z

Woohoo! Great! Exciting!

…learn#13003) * Add Tensor Sketch algorithm * Add user guide entry * Add example * Add benchmark Co-authored-by: Christian Lorentzen <lorentzen.ch@googlemail.com> Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

lopeLH added 29 commits January 17, 2019 12:46

Added Tensor Sketch algorithm

82a4df4

Added Tensor Sketch algorithm in the kernel approximation module.

some cleaning

a53144d

code style

5828e2b

removed or added white spaces where needed.

more style edits

f3745f9

style edits

577a080

style edits

87a50fe

style edits

bbe64e5

add tets for Tensor Sketch

a1af231

style edits

bb745c9

style edits

a57deba

style edits

8e17641

style edits

bb67562

style edits

4a5ea89

style edits

b990250

style edits

3858c2f

more and more style edits...

04c4432

cleaning

7be7816

-

1f3384e

style edits

2febb49

Added an example and optimized implementation

ab48311

style edits

118ce30

style edits

9c68dc1

style edits

4daa160

fix some comments

6a8f130

added entry in documentation

8d61247

style edits

d088371

style edits

54eeaa4

style edits

7d86737

add support for non-homogeneous poly kernels

68148dc

This is, polynomial kernels with coef0 != 0

lopeLH changed the title ~~[WIP] Add Tensor Sketch algorithm to Kernel Approximation module~~ [MRG] Add Tensor Sketch algorithm to Kernel Approximation module Jan 18, 2019

lopeLH and others added 3 commits August 12, 2020 14:03

Use scipy.fft if available, otherwise revert to scipy.fftpack

8a88e81

Update examples/plot_scalable_poly_kernels.py

9ff99e9

Co-authored-by: Christian Lorentzen <lorentzen.ch@googlemail.com>

Move references of Count sketch to the first mention of this method

81b3cce

lopeLH requested a review from rth August 12, 2020 13:31

rth reviewed Aug 13, 2020

View reviewed changes

lopeLH added 6 commits August 13, 2020 14:12

Check and test degree >= 1

92dc5fa

Use ValueError instead of AssertionError

4282e12

Test raises if degree<=0 for various degrees

cfb79ef

Style edit

1abc23d

Remove leftover comment

b4dc625

Added recomendations for n_components in docstring and user-guide

7f5b673

lopeLH requested a review from rth August 13, 2020 13:25

TomDLT reviewed Aug 13, 2020

View reviewed changes

sklearn/kernel_approximation.py Outdated Show resolved Hide resolved

Move degree check to fit method

3cbc659

rth approved these changes Aug 14, 2020

View reviewed changes

TomDLT changed the title ~~[MRG+1] Add Tensor Sketch algorithm to Kernel Approximation module~~ [MRG+3] Add Tensor Sketch algorithm to Kernel Approximation module Aug 17, 2020

Rename main class to PolynomialCountSketch

c31e41c

(and hope nothing breaks)

lopeLH force-pushed the add-tensorSketch branch from f140f1d to c31e41c Compare August 17, 2020 20:15

lorentzenchr merged commit daebcac into scikit-learn:master Aug 18, 2020

lorentzenchr changed the title ~~[MRG+3] Add Tensor Sketch algorithm to Kernel Approximation module~~ [MRG+3] FEA Add PolynomialCountSketch to Kernel Approximation module Aug 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+3] FEA Add PolynomialCountSketch to Kernel Approximation module #13003

[MRG+3] FEA Add PolynomialCountSketch to Kernel Approximation module #13003

Uh oh!

lopeLH commented Jan 17, 2019 •

edited by TomDLT

Loading

Uh oh!

rth commented Aug 12, 2020 •

edited

Loading

Uh oh!

lopeLH commented Aug 12, 2020

Uh oh!

rth Aug 12, 2020

Uh oh!

rth commented Aug 13, 2020

Uh oh!

rth commented Aug 13, 2020 •

edited

Loading

Uh oh!

lopeLH commented Aug 13, 2020 •

edited

Loading

Uh oh!

TomDLT left a comment

Uh oh!

Uh oh!

rth left a comment •

edited

Loading

Uh oh!

lorentzenchr commented Aug 14, 2020

Uh oh!

lopeLH commented Aug 17, 2020

Uh oh!

lorentzenchr commented Aug 18, 2020 •

edited

Loading

Uh oh!

GaelVaroquaux commented Aug 18, 2020 via email

Uh oh!

Uh oh!

Uh oh!

[MRG+3] FEA Add PolynomialCountSketch to Kernel Approximation module #13003

[MRG+3] FEA Add PolynomialCountSketch to Kernel Approximation module #13003

Uh oh!

Conversation

lopeLH commented Jan 17, 2019 • edited by TomDLT Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rth commented Aug 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lopeLH commented Aug 12, 2020

Uh oh!

rth Aug 12, 2020

Choose a reason for hiding this comment

Uh oh!

rth commented Aug 13, 2020

Uh oh!

rth commented Aug 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lopeLH commented Aug 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomDLT left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rth left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Aug 14, 2020

Uh oh!

lopeLH commented Aug 17, 2020

Uh oh!

lorentzenchr commented Aug 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaelVaroquaux commented Aug 18, 2020 via email

Uh oh!

Uh oh!

lopeLH commented Jan 17, 2019 •

edited by TomDLT

Loading

rth commented Aug 12, 2020 •

edited

Loading

rth commented Aug 13, 2020 •

edited

Loading

lopeLH commented Aug 13, 2020 •

edited

Loading

rth left a comment •

edited

Loading

lorentzenchr commented Aug 18, 2020 •

edited

Loading