Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Make SGD support Cython fused types #6889

Open
wants to merge 1 commit into
base: master
from

Conversation

@yenchenlin
Copy link
Contributor

yenchenlin commented Jun 14, 2016

Current implementation of SGDClassifier and SGDRegressor doesn't allow the user to specify the dtype they want and may cause a MemoryError when the user wants to train the model since it will try to copy input data into np.float64. (See #5776).

To address the problem, this PR wants to make SGD related algorithms in scikit-learn supports Cython fused types.

@jnothman
Copy link
Member

jnothman commented Jun 17, 2016

What's interesting about this error is that it gets triggered for Classification but not Regression. I'll try to think about it or look closely at the code. What I'd like you to experiment with is creating a minimum example code snippet that fails to compile.

@agramfort
Copy link
Member

agramfort commented Jun 17, 2016

would be neat to have this

@yenchenlin for looking into it

@yenchenlin
Copy link
Contributor Author

yenchenlin commented Jun 17, 2016

Hello @agramfort, @jnothman and @MechCoder,
I doubt that there are some bugs in Cython to make Fused Types and Inheritance work together.

I've created a small example here,
you can see that Inheritance in Cython works well with primitive type such as double, but it fails to compile when I substitute primitive types with Fused Types (i,e., floating here).

Any idea?

@MechCoder
Copy link
Member

MechCoder commented Jun 17, 2016

I'll have a detailed look soon but it seems fused types and inheritance in Cython don't go well together. See https://groups.google.com/forum/#!topic/cython-users/HPgmlIMabz0

@MechCoder
Copy link
Member

MechCoder commented Jun 17, 2016

Can someone come up with a better idea other than duplication of loss and dloss methods in LossFunction to handle float and double separately?

@jnothman
Copy link
Member

jnothman commented Jun 20, 2016

I've created a small example here, you can see that Inheritance in Cython works well with primitive type such as double, but it fails to compile when I substitute primitive types with Fused Types (i,e., floating here).

It's interesting that it works with floating in the pyx; only when you add the pxd it breaks.

@jnothman
Copy link
Member

jnothman commented Jun 20, 2016

Ah, no. It fails at C compilation with just the pyx.

@MechCoder
Copy link
Member

MechCoder commented Jun 29, 2016

@yenchenlin @jnothman What do you think of this approach?

cdef class LossFunction:
    cdef float float_loss(self, float p, float y) nogil:
        return 0.
    cdef double double_loss(self, double p, double y) nogil:
        return 0.   

cdef class DerivedLossFunction(LossFunction):

    cdef floating _loss(floating p, floating y) nogil:
        if floating is float:
           ....
        else:
            ....       

    cdef float float_loss(self, float p, float y) nogil:
        return _loss(p, y)

    cdef double double_loss(self, double p, double y) nogil:
        return _loss(p, y)

And then in plain_sgd and elsewhere call the required methods according to the dtype.

(Of course, this means the method signatures are duplicated twice, but SGD is commonly used and hence it might be worth it)

@jnothman
Copy link
Member

jnothman commented Jun 29, 2016

As I pointed out in hangouts, I don't think we need fused type support for the loss functions. We need fused type support for X as represented by the SequentialDataset object. I'm sure that'll have its own problems, but I think this PR has led us up the garden path.

@MechCoder
Copy link
Member

MechCoder commented Jun 29, 2016

Also, do we really care that p and y are not upcasted to double?

@jnothman
Copy link
Member

jnothman commented Jun 29, 2016

Our comments overlapped, but the implication is: no, we don't care.

@MechCoder
Copy link
Member

MechCoder commented Jun 29, 2016

@yenchenlin Could you let us know how difficult / easy it is for the fused type support for the SequentialDataset class? No worries, if you are currently busy with coordinate descent. We can postpone this to after that.

@jnothman
Copy link
Member

jnothman commented Jun 29, 2016

Could you let us know how difficult / easy it is for the fused type support for the SequentialDataset class?

I suspect we'll find we need to use a templated def cppclass SequentialDataset[T] and instantiate for the appropriate type. (If that doesn't work out it might be changed to using void *s and an element size argument...?)

@yenchenlin
Copy link
Contributor Author

yenchenlin commented Jul 12, 2016

@MechCoder sorry to see this now.

Will do!

@rth
Copy link
Member

rth commented Apr 20, 2018

I'll have a detailed look soon but it seems fused types and inheritance in Cython don't go well together.

I wonder if the situation improved since: this was with Cython ~ v0.20 (also confirmed in v0.23dev) and we are now at v0.28..

Update (2019): It didn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.