[WIP] Make SGD support Cython fused types #6889

Open
wants to merge 1 commit into
from

Conversation

Projects
None yet
5 participants
@yenchenlin
Contributor

yenchenlin commented Jun 14, 2016

Current implementation of SGDClassifier and SGDRegressor doesn't allow the user to specify the dtype they want and may cause a MemoryError when the user wants to train the model since it will try to copy input data into np.float64. (See #5776).

To address the problem, this PR wants to make SGD related algorithms in scikit-learn supports Cython fused types.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 17, 2016

Member

What's interesting about this error is that it gets triggered for Classification but not Regression. I'll try to think about it or look closely at the code. What I'd like you to experiment with is creating a minimum example code snippet that fails to compile.

Member

jnothman commented Jun 17, 2016

What's interesting about this error is that it gets triggered for Classification but not Regression. I'll try to think about it or look closely at the code. What I'd like you to experiment with is creating a minimum example code snippet that fails to compile.

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jun 17, 2016

Member

would be neat to have this

@yenchenlin for looking into it

Member

agramfort commented Jun 17, 2016

would be neat to have this

@yenchenlin for looking into it

@yenchenlin

This comment has been minimized.

Show comment
Hide comment
@yenchenlin

yenchenlin Jun 17, 2016

Contributor

Hello @agramfort, @jnothman and @MechCoder,
I doubt that there are some bugs in Cython to make Fused Types and Inheritance work together.

I've created a small example here,
you can see that Inheritance in Cython works well with primitive type such as double, but it fails to compile when I substitute primitive types with Fused Types (i,e., floating here).

Any idea?

Contributor

yenchenlin commented Jun 17, 2016

Hello @agramfort, @jnothman and @MechCoder,
I doubt that there are some bugs in Cython to make Fused Types and Inheritance work together.

I've created a small example here,
you can see that Inheritance in Cython works well with primitive type such as double, but it fails to compile when I substitute primitive types with Fused Types (i,e., floating here).

Any idea?

@MechCoder

This comment has been minimized.

Show comment
Hide comment
@MechCoder

MechCoder Jun 17, 2016

Member

I'll have a detailed look soon but it seems fused types and inheritance in Cython don't go well together. See https://groups.google.com/forum/#!topic/cython-users/HPgmlIMabz0

Member

MechCoder commented Jun 17, 2016

I'll have a detailed look soon but it seems fused types and inheritance in Cython don't go well together. See https://groups.google.com/forum/#!topic/cython-users/HPgmlIMabz0

@MechCoder

This comment has been minimized.

Show comment
Hide comment
@MechCoder

MechCoder Jun 17, 2016

Member

Can someone come up with a better idea other than duplication of loss and dloss methods in LossFunction to handle float and double separately?

Member

MechCoder commented Jun 17, 2016

Can someone come up with a better idea other than duplication of loss and dloss methods in LossFunction to handle float and double separately?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 20, 2016

Member

I've created a small example here, you can see that Inheritance in Cython works well with primitive type such as double, but it fails to compile when I substitute primitive types with Fused Types (i,e., floating here).

It's interesting that it works with floating in the pyx; only when you add the pxd it breaks.

Member

jnothman commented Jun 20, 2016

I've created a small example here, you can see that Inheritance in Cython works well with primitive type such as double, but it fails to compile when I substitute primitive types with Fused Types (i,e., floating here).

It's interesting that it works with floating in the pyx; only when you add the pxd it breaks.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 20, 2016

Member

Ah, no. It fails at C compilation with just the pyx.

Member

jnothman commented Jun 20, 2016

Ah, no. It fails at C compilation with just the pyx.

@MechCoder

This comment has been minimized.

Show comment
Hide comment
@MechCoder

MechCoder Jun 29, 2016

Member

@yenchenlin @jnothman What do you think of this approach?

cdef class LossFunction:
    cdef float float_loss(self, float p, float y) nogil:
        return 0.
    cdef double double_loss(self, double p, double y) nogil:
        return 0.   

cdef class DerivedLossFunction(LossFunction):

    cdef floating _loss(floating p, floating y) nogil:
        if floating is float:
           ....
        else:
            ....       

    cdef float float_loss(self, float p, float y) nogil:
        return _loss(p, y)

    cdef double double_loss(self, double p, double y) nogil:
        return _loss(p, y)

And then in plain_sgd and elsewhere call the required methods according to the dtype.

(Of course, this means the method signatures are duplicated twice, but SGD is commonly used and hence it might be worth it)

Member

MechCoder commented Jun 29, 2016

@yenchenlin @jnothman What do you think of this approach?

cdef class LossFunction:
    cdef float float_loss(self, float p, float y) nogil:
        return 0.
    cdef double double_loss(self, double p, double y) nogil:
        return 0.   

cdef class DerivedLossFunction(LossFunction):

    cdef floating _loss(floating p, floating y) nogil:
        if floating is float:
           ....
        else:
            ....       

    cdef float float_loss(self, float p, float y) nogil:
        return _loss(p, y)

    cdef double double_loss(self, double p, double y) nogil:
        return _loss(p, y)

And then in plain_sgd and elsewhere call the required methods according to the dtype.

(Of course, this means the method signatures are duplicated twice, but SGD is commonly used and hence it might be worth it)

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 29, 2016

Member

As I pointed out in hangouts, I don't think we need fused type support for the loss functions. We need fused type support for X as represented by the SequentialDataset object. I'm sure that'll have its own problems, but I think this PR has led us up the garden path.

Member

jnothman commented Jun 29, 2016

As I pointed out in hangouts, I don't think we need fused type support for the loss functions. We need fused type support for X as represented by the SequentialDataset object. I'm sure that'll have its own problems, but I think this PR has led us up the garden path.

@MechCoder

This comment has been minimized.

Show comment
Hide comment
@MechCoder

MechCoder Jun 29, 2016

Member

Also, do we really care that p and y are not upcasted to double?

Member

MechCoder commented Jun 29, 2016

Also, do we really care that p and y are not upcasted to double?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 29, 2016

Member

Our comments overlapped, but the implication is: no, we don't care.

Member

jnothman commented Jun 29, 2016

Our comments overlapped, but the implication is: no, we don't care.

@MechCoder

This comment has been minimized.

Show comment
Hide comment
@MechCoder

MechCoder Jun 29, 2016

Member

@yenchenlin Could you let us know how difficult / easy it is for the fused type support for the SequentialDataset class? No worries, if you are currently busy with coordinate descent. We can postpone this to after that.

Member

MechCoder commented Jun 29, 2016

@yenchenlin Could you let us know how difficult / easy it is for the fused type support for the SequentialDataset class? No worries, if you are currently busy with coordinate descent. We can postpone this to after that.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 29, 2016

Member

Could you let us know how difficult / easy it is for the fused type support for the SequentialDataset class?

I suspect we'll find we need to use a templated def cppclass SequentialDataset[T] and instantiate for the appropriate type. (If that doesn't work out it might be changed to using void *s and an element size argument...?)

Member

jnothman commented Jun 29, 2016

Could you let us know how difficult / easy it is for the fused type support for the SequentialDataset class?

I suspect we'll find we need to use a templated def cppclass SequentialDataset[T] and instantiate for the appropriate type. (If that doesn't work out it might be changed to using void *s and an element size argument...?)

@yenchenlin

This comment has been minimized.

Show comment
Hide comment
@yenchenlin

yenchenlin Jul 12, 2016

Contributor

@MechCoder sorry to see this now.

Will do!

Contributor

yenchenlin commented Jul 12, 2016

@MechCoder sorry to see this now.

Will do!

@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Apr 20, 2018

Member

I'll have a detailed look soon but it seems fused types and inheritance in Cython don't go well together.

I wonder if the situation improved since: this was with Cython ~ v0.20 (also confirmed in v0.23dev) and we are now at v0.28..

Member

rth commented Apr 20, 2018

I'll have a detailed look soon but it seems fused types and inheritance in Cython don't go well together.

I wonder if the situation improved since: this was with Cython ~ v0.20 (also confirmed in v0.23dev) and we are now at v0.28..

@rth rth added the help wanted label Apr 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment