Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Multi-Layer Perceptron #1653

Closed
wants to merge 21 commits into from

7 participants

@temporaer

This is a continuation of the GSoC project by David Marek and @amueller. I tried accelerating the implementation using manual calls to CBLAS.

The performance increase (5%/7% on digits dataset for minibatch=100/online) comes at the price of some readability. Aside from multiplications, the softmax/multinomial logistic loss computation takes a lot of time. However, running it in C using code from @larsmans mlperceptron branch does not increase efficiency significantly, though.

I'd be happy for suggestions on how to proceed further.

davidmarek and others added some commits
@davidmarek davidmarek Initial commit for multilayer perceptron.
* added mlp to sklearn/__init__.py
* created sklearn/mlp/setup.py and added to sklearn/setup.py
* created squared loss function in mlp_fast.pyx
d44835f
@davidmarek davidmarek Implemented basic sgd and created MLP classifier. fc96294
@davidmarek davidmarek Fixed source code with help from pylint 99be31e
@davidmarek davidmarek Added multinomial cross-entropy loss.
* Added cross-entropy loss function
* Added softmax output function
* cross-entropy and softmax are default for classifier
* softmax doesn't work for binomial
3fd9cae
@davidmarek davidmarek Added additional loss functions and tests.
* Added cross-entropy and multinomial cross-entropy loss function
* Added softmax and logsig output functions
* Modified tests to be more robust
* Compared performance with pure implementation
3f7bc4d
@davidmarek davidmarek Formatting tweaks 346489d
@davidmarek davidmarek FIX: wrong computation of delta_h 27add2d
@davidmarek davidmarek Comparing implementations with same initialization 19284a5
@davidmarek davidmarek Separated backward step, eliminated array creation in tanh f487b06
@davidmarek davidmarek forward and backward are now python functions 199d24d
@davidmarek davidmarek fixed computing gradient bf5a8f7
@davidmarek davidmarek Added shuffle_data attribute 14b6e97
@davidmarek davidmarek correct cross entropy loss be2d231
@davidmarek davidmarek Fixed dimensions for layer 473199b
@davidmarek davidmarek Added gradient checking and momentum term. 20f5367
@temporaer temporaer stable softmax computation 35b16b4
@temporaer temporaer convert labels to float64 as required by mlp sgd c4bd473
@temporaer temporaer remove numpy slicing for X, Y, use dgemm/dgemv to calculate products 446f9c7
@temporaer temporaer remove additional params in profile 1ced741
@amueller
Owner

Thanks Hannes :)
I think it would be cool if we had some numbers on MNIST. It really looks like we can't get much more out of this.

@larsmans
Owner

Much more interesting than MNIST, IMHO, is 20newsgroups; this version doesn't support sparse matrices.

I just got my version working again with minibatch optimization, see #1395. A single round on four newsgroups now takes 7s. That's without direct calls to BLAS and many other obvious optimizations.

@amueller
Owner

@larsmans what do you think is the best way forward? are you working on it right now?

@larsmans
Owner

Yes, I've been hacking on my implementation. I suggest we merge efforts rather than develop two competing versions; but I also strongly prefer my own code because it supports sparse matrices (and I really need that for practically every problem I'd use these things for). I also think the code organization I have in my head is cleaner than the one suggested here, though I admit the implementation doesn't perfectly reflect that yet ;)

@amueller
Owner

So what code organization do you have in mind?
@temporaer has a fair amount of experience implementing neural nets and I told him to keep it simple for sklearn. Any ugliness in the code is cause by me trying to keep it simple ;)

@larsmans
Owner

I want the code to mirror SGDClassifier as much as possible -- a generic, linear output-layer (decision_function) with several loss functions (log, hinge, multinomial LR, squared) to train under and output non-linearities implemented on the Python side (predict, predict_proba).

I.e., I want to emphasise the similarities with our existing linear models, similar to how Bishop presents neural nets.

@ogrisel
Owner

@larsmans could you open a new PR against the current master with the #1653 marker in the description so that we compare and discuss the merits of those two implementation directly from the github user interface?

@temporaer

@larsmans I'm not sure whether I understand your comment, could you please clarify whether you're trying to mirror user interface or implementation details of SGDClassifier?

@larsmans
Owner

Both, ideally, though SGDClassifier would have to be refactored to enable sharing of loss function implementations and the SequentialDataset interface.

@tobigue

+1 for code that supports sparse matrices

@temporaer

Yep, sparse matrices are definitely a plus. However, that should be a matter of substituting a few multiplication calls with their sparse counterparts...

The difference to #1395 regarding features is currently mainly sparse matrices, AFAICS. In terms of extensibility, e.g. the many features discussed in #1395, I believe both implementations aren't very good. However, this is not supposed to be a new theano, as @amueller keeps reminding me :-)

I'm willing to put more time into this, but I'm not sure how to proceed, since abandoning this branch is an option. MNIST? Sparse matrices? @larsmans, would you appreciate help on your branch?

@larsmans
Owner

I'd very much appreciate some help. Let's discuss on the other PR, shall we?

@IssamLaradji

Hi @temporaer , amazing work, however, did you consider partial_fit()? It would be a good addition to have online updating. Another issue is the weight initialization, the code sets them as zeros always which gives too much symmetry to the initial state. It might be better to have a hyper parameter that accepts weights, since different initial weights lead to different optimas. Also, the weights obtained from Restricted Boltzmann machines can be used as initial weights, which in turn lead to better results.

@temporaer

This PR was abandoned in favour of #1395, also with some work by me, not yet merged: larsmans#5

@temporaer temporaer closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Oct 28, 2012
  1. @davidmarek @amueller

    Initial commit for multilayer perceptron.

    davidmarek authored amueller committed
    * added mlp to sklearn/__init__.py
    * created sklearn/mlp/setup.py and added to sklearn/setup.py
    * created squared loss function in mlp_fast.pyx
  2. @davidmarek @amueller

    Implemented basic sgd and created MLP classifier.

    davidmarek authored amueller committed
  3. @davidmarek @amueller

    Fixed source code with help from pylint

    davidmarek authored amueller committed
  4. @davidmarek @amueller

    Added multinomial cross-entropy loss.

    davidmarek authored amueller committed
    * Added cross-entropy loss function
    * Added softmax output function
    * cross-entropy and softmax are default for classifier
    * softmax doesn't work for binomial
  5. @davidmarek @amueller

    Added additional loss functions and tests.

    davidmarek authored amueller committed
    * Added cross-entropy and multinomial cross-entropy loss function
    * Added softmax and logsig output functions
    * Modified tests to be more robust
    * Compared performance with pure implementation
  6. @davidmarek @amueller

    Formatting tweaks

    davidmarek authored amueller committed
  7. @davidmarek @amueller

    FIX: wrong computation of delta_h

    davidmarek authored amueller committed
  8. @davidmarek @amueller

    Comparing implementations with same initialization

    davidmarek authored amueller committed
  9. @davidmarek @amueller
  10. @davidmarek @amueller

    forward and backward are now python functions

    davidmarek authored amueller committed
  11. @davidmarek @amueller

    fixed computing gradient

    davidmarek authored amueller committed
  12. @davidmarek @amueller

    Added shuffle_data attribute

    davidmarek authored amueller committed
  13. @davidmarek @amueller

    correct cross entropy loss

    davidmarek authored amueller committed
  14. @davidmarek @amueller

    Fixed dimensions for layer

    davidmarek authored amueller committed
  15. @davidmarek @amueller

    Added gradient checking and momentum term.

    davidmarek authored amueller committed
Commits on Dec 23, 2012
  1. @temporaer

    stable softmax computation

    temporaer authored
Commits on Dec 28, 2012
  1. @temporaer
Commits on Feb 3, 2013
  1. @temporaer
  2. @temporaer
Commits on Feb 6, 2013
  1. @temporaer
  2. @temporaer

    start profiling MNIST

    temporaer authored
Something went wrong with that request. Please try again.