refactoring of mlperceptron #5

temporaer · 2013-03-05T10:11:50Z

I tried to aim for a few goals as discussed in the sklearn pull request:

have different concepts=classes for activation functions and weights
have different concepts for weight updates
not to sacrifice speed for doing the above
potentially allow to reuse components for multiple layers

I'm pretty sure I removed some bugs in the sequence of gradient calculation and weight updates in the process. At the same time, I 'simplified' few features away while debugging (e.g. the special tanh), but I'll put them in again once we can agree on the structure.

larsmans · 2013-03-05T12:55:43Z

Thanks heaps, I'll try to do a proper review this week (busy with research, but this might actually come in handy). Results might be bad because there's no regularization. I'd like to include at least L2 regularization.

temporaer · 2013-03-05T13:54:44Z

not sure what you mean with the L2 regularization, it is included. When you do the review, can you elaborate a bit on how you imagined reusing cost functions from SGD? I left them in the main loop for now.

larsmans · 2013-03-05T14:05:11Z

I thought I'd disabled regularization in the Python wrapper. Let's get back to this when I really have some time to sparse. Feel free to hack further in the meantime (in a separate branch).

Final remark: I don't think we can actually reuse much code from SGD, because it does OvA training only and I want true multiclass training here. But we can implement the same functions and architecture so we get e.g. hinge loss networks.

larsmans · 2013-03-13T20:12:47Z

sklearn/neural/backprop_sgd.pyx

@@ -289,38 +446,26 @@ def backprop_sgd(self, X, np.ndarray Y):
                end = n_samples
                start = n_samples - batchsize

+
+            # densify outputs


What is the purpose of this? sparse is true iff X is sparse; Y is always dense.

In the sparse implementation, randomization is over instances in the training set, in the dense version, randomization is over batches. The loops collect the different rows of Y into a dense matrix.

I believe it should be enough to randomize batches, but I did not want to break your functionality.

I see. But that makes the code quite hard to read, so I suggest we pick one strategy. SGDClassifier randomizes samples, so I picked that strategy. I can't immediately see the consequences of shuffling the batches only; wouldn't that take longer to converge if the samples are not pre-shuffled?

It is a bit messy, agreed. I chose randomization over batches because of the speed improvement (only views are required, no looping). It is probably a good idea to pre-shuffle your data before applying this technique, but I do not expect a strong effect on convergence otherwise.

Right, but we can't expect users to pre-shuffle their inputs -- not all users are experts (and honestly, I tend to forget that kind of stuff all the time myself).

Since this is a convenience/speed tradeoff... what about adding an classifier shuffle=True param, so that the classifier by default shuffles the (dense) data before learning?

How much is the speed difference? I'm not too fond of messing with X in-place, nor of copying it.

larsmans · 2013-03-13T20:36:58Z

@temporaer Can you still run the document classification example? It seems to be eating up all the CPU without doing anything (verbose=True but no output after several minutes waiting).

temporaer · 2013-03-13T21:00:46Z

The problem seems to be the sparse derivative for W0. I didn't see this, since i tested with dense inputs...looking into this.

larsmans · 2013-03-13T21:41:40Z

Ok. I like your way of structuring the code, btw. Sorry that it took me so long to do a review.

larsmans · 2013-03-13T21:49:25Z

The dense case seems broken too: I get a dtype mismatch because Y_batch must be np.float64, while Y is np.int. I'm writing unit tests.

larsmans reviewed Mar 13, 2013
View reviewed changes

temporaer mentioned this pull request Jun 5, 2013

Multi-Layer Perceptron scikit-learn/scikit-learn#1653

Closed

larsmans closed this Mar 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactoring of mlperceptron #5

refactoring of mlperceptron #5

temporaer commented Mar 5, 2013

larsmans commented Mar 5, 2013

temporaer commented Mar 5, 2013

larsmans commented Mar 5, 2013

larsmans Mar 13, 2013

temporaer Mar 13, 2013

larsmans Mar 13, 2013

temporaer Mar 13, 2013

larsmans Mar 13, 2013

temporaer Mar 13, 2013

larsmans Mar 14, 2013

larsmans commented Mar 13, 2013

temporaer commented Mar 13, 2013

larsmans commented Mar 13, 2013

larsmans commented Mar 13, 2013

refactoring of mlperceptron #5

refactoring of mlperceptron #5

Conversation

temporaer commented Mar 5, 2013

larsmans commented Mar 5, 2013

temporaer commented Mar 5, 2013

larsmans commented Mar 5, 2013

larsmans Mar 13, 2013

Choose a reason for hiding this comment

temporaer Mar 13, 2013

Choose a reason for hiding this comment

larsmans Mar 13, 2013

Choose a reason for hiding this comment

temporaer Mar 13, 2013

Choose a reason for hiding this comment

larsmans Mar 13, 2013

Choose a reason for hiding this comment

temporaer Mar 13, 2013

Choose a reason for hiding this comment

larsmans Mar 14, 2013

Choose a reason for hiding this comment

larsmans commented Mar 13, 2013

temporaer commented Mar 13, 2013

larsmans commented Mar 13, 2013

larsmans commented Mar 13, 2013