Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Precompute norms to speed up Passive-Aggressive #1888

Open
wants to merge 9 commits into
from

Conversation

Projects
None yet
4 participants
Contributor

kemaleren commented Apr 23, 2013

Fixes #1327.

Not sure why, but performance is worse for small datasets. Using the following benchmark code:

from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.datasets import make_classification

learner = PassiveAggressiveClassifier(n_iter=50, random_state=0)
x, y = make_classification(n_samples=100, n_features=20, random_state=0)
learner.fit(x, y)

speedup was 0.85 (before: 1.75 ms, now: 2.06 ms).

However, with n_iter=500, n_samples=10000, n_features=2000, the speedup was 1.36 (before: 34.4 s, now: 25.3 s).

@larsmans larsmans and 2 others commented on an outdated diff Apr 24, 2013

sklearn/utils/seq_dataset.pyx
@@ -176,3 +222,24 @@ cdef class CSRDataset(SequentialDataset):
cdef void shuffle(self, seed):
np.random.RandomState(seed).shuffle(self.index)
+
+
+cdef double onenorm(DOUBLE * x_data_ptr, INTEGER * x_ind_ptr, int xnnz):
+ cdef double x_norm = 0.0
+ cdef int j
@larsmans

larsmans Apr 24, 2013

Owner

This cdef shouldn't be necessary. (Same story in sqnorm.)

@kemaleren

kemaleren Apr 24, 2013

Contributor

Why not? The function has C arguments, so it needs to be cdefed. If I use def instead, Cython complains.

@mblondel

mblondel Apr 24, 2013

Owner

I think onenorm can be removed. We should add it if we actually need it in the future.

@larsmans

larsmans Apr 24, 2013

Owner

I'm talking about the cdef int j. Cython can derive that information from the range(xnnz).

@kemaleren

kemaleren Apr 24, 2013

Contributor

Oh, that makes sense. Just fixed it. Sorry for the misunderstanding.

@mblondel

mblondel Apr 24, 2013

Owner

Yeah I meant using a pointer a doing the malloc / free manually. This is
faster in my experience.

@larsmans larsmans and 1 other commented on an outdated diff Apr 24, 2013

sklearn/utils/seq_dataset.pyx
@@ -44,6 +46,50 @@ cdef class SequentialDataset:
"""Permutes the ordering of examples. """
raise NotImplementedError()
+ cdef void precompute_norms(self, bool square=True):
+ """Computes and stores the norm of each example.
+
+ The norm for the current example can then be retrieved via
+ get_norm().
+
+ If bool == True, computes the square norm, else computes the 1 norm.
@larsmans

larsmans Apr 24, 2013

Owner

I suppose you mean the L1 norm?

@kemaleren

kemaleren Apr 24, 2013

Contributor

Yes, and I forgot to take the absolute value. Fixed now.

@larsmans larsmans and 2 others commented on an outdated diff Apr 24, 2013

sklearn/linear_model/sgd_fast.pyx
@@ -434,6 +434,8 @@ def plain_sgd(np.ndarray[DOUBLE, ndim=1, mode='c'] weights,
eta = eta0
+ dataset.precompute_norms()
@larsmans

larsmans Apr 24, 2013

Owner

This is done unconditionally, while it's only relevant to the PA learner.

(I'm starting to have second thoughts about the merger of PA into SGD, with all this special-purpose code in conditional blocks...)

@kemaleren

kemaleren Apr 24, 2013

Contributor

Fixed. Thanks for noticing!

@mblondel

mblondel Apr 24, 2013

Owner

(I'm starting to have second thoughts about the merger of PA into SGD, with all this special-purpose code in conditional blocks...)

If I remember correctly, we inherit a lot of useful code on the pure Python side, not so much on the Cython side.

@mblondel mblondel commented on the diff Apr 24, 2013

sklearn/utils/seq_dataset.pxd
ctypedef np.float64_t DOUBLE
ctypedef np.int32_t INTEGER
cdef class SequentialDataset:
cdef Py_ssize_t n_samples
+ cdef np.ndarray norms
@mblondel

mblondel Apr 24, 2013

Owner

I wonder if it would be faster to use a pointer here...

@larsmans

larsmans Apr 24, 2013

Owner

Just a pointer would not keep the NumPy object alive, I think.

@kemaleren

kemaleren Apr 24, 2013

Contributor

I just tried it. Not with a NumPy array at all though, just malloc. It is slightly faster. I'll push it now.

@GaelVaroquaux

GaelVaroquaux Apr 24, 2013

Owner

Just a pointer would not keep the NumPy object alive, I think.

No. We would have to do reference counting in addition. I'd really like
to stay away from such things.

@GaelVaroquaux

GaelVaroquaux Apr 24, 2013

Owner

I just tried it. Not with a NumPy array at all though, just malloc. It is
slightly faster. I'll push it now.

I had dreamed of a scikit with no malloc. The maintenance cost of manual
memory management is large (for instance, are you testing that your
malloc succeeds, and raising a MemoryError?). What is 'slightly' faster?

@mblondel mblondel and 1 other commented on an outdated diff Apr 24, 2013

sklearn/utils/seq_dataset.pyx
+ # no assumptions about our internal data structures (except
+ # the presence of self.current_index)
+ cdef int current_index_backup = self.current_index
+ self.current_index = 0
+ cdef DOUBLE * x_data_ptr = NULL
+ cdef INTEGER * x_ind_ptr = NULL
+ cdef int xnnz
+ cdef DOUBLE y = 0.0
+ cdef DOUBLE sample_weight
+
+ cdef np.ndarray[DOUBLE, ndim=1,
+ mode='c'] norms= np.zeros((self.n_samples,),
+ dtype=np.float64)
+ self.norms = norms
+
+ if square:
@mblondel

mblondel Apr 24, 2013

Owner

In #1327, what I meant by the square argument is to just compute sqnorm(...) if square is True and compute sqrt(sqnorm(...)) otherwise. Nothing related to L1 norm vs. L2 norm. Sorry for the confusion.

@kemaleren

kemaleren Apr 24, 2013

Contributor

No problem. I'll fix it now.

If this is really only slightly faster, then -1; this makes the code much harder to reason about while it only benefits one (not very popular) estimator.

Owner

kemaleren replied Apr 24, 2013

Okay, I'll change it back. It was really a very small improvement in speed anyways: 0.3 seconds on small datasets, and no improvement on large ones.

Owner

mblondel commented Apr 25, 2013

Pre-computing the square norms should only be slower if it computes norms which are not actually needed. This is not the case (the algorithms iterates over all training samples) so if it's slower, it means that this is an artifact of the implementation.

Owner

mblondel commented Apr 25, 2013

If this is really only slightly faster, then -1; this makes the code much harder to reason about while it only benefits one (not very popular) estimator.

That's too bad. Passive Aggressive is easier to use than SGD since it doesn't need the tuning of a learning rate.

Owner

larsmans commented Apr 25, 2013

In the dense case, can't BLAS be used to get the norms?

Owner

GaelVaroquaux commented Apr 25, 2013

That's too bad. Passive Aggressive is easier to use than SGD since it doesn't
need the tuning of a learning rate.

I was talking about the malloc: I would tend to think that it makes only
a small difference.

Contributor

kemaleren commented Apr 25, 2013

if it's slower, it means that this is an artifact of the implementation.

Could it be the creation of the numpy array to hold the norms? It was just as fast as the original version when using malloc.

Contributor

kemaleren commented Apr 25, 2013

In the dense case, can't BLAS be used to get the norms?

Would this be for speed? I can try it and see.

Owner

mblondel commented Apr 29, 2013

I was talking about the malloc: I would tend to think that it makes only a small difference.

I was responding to @larsmans :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment