Keras Docs Examples silently assume categorical tasks #1454

Closed
pasky opened this Issue Jan 13, 2016 · 10 comments

Projects

None yet

4 participants

@pasky
Contributor
pasky commented Jan 13, 2016

My first few hours with Keras were needlessly a lot more painful than needed, because I didn't realize most of the examples at http://keras.io/examples/ and elsewhere were geared at categorical, not binary classification (which is what I personally assumed by default) - and when I realized, I didn't realize how important that was. My models weren't learning anything...

So, right at the top of http://keras.io/examples/, I'd propose to have two MLP variants, one for categorical classification and another for binary classification - sigmoid inst. of softmax activation for final layer and passing class_mode='binary' to .compile() (the former is tricky to realize for machine learning newbies and the latter is almost undiscoverable).

(For Google users benefit, I also documented my experience at http://log.or.cz/?p=386 .)

@farizrahman4u
Contributor

The examples cover all basic loss functions, mse, binary and categorical. Binary classification being the default is, as you said, your personal assumption. And the examples do not "silently" assume anything.. the loss function is explicitly provided to the model's compile function. Moreover, the Sequence classification with LSTM example is an example for binary classification. So your statement from the blog "All the examples silently assume that you want to classify to categories" is not true. Criticism is welcome, but get your facts straight. There are a lot of examples in the examples directory, which work out of the box. They download the data and train the models automatically..check them out to understand what your training data should look like.

@pasky
Contributor
pasky commented Jan 13, 2016

Sorry if the blog post came off as an attack at Keras - I toned it down
a bit more (and fixed that example mention), it wasn't meant so.

You are right that it's possible to figure it out with a bit of
digging and looking at many examples - I totally agree with that.

I just wanted to point out that it's not obvious imho and I don't
think it's unlikely it's confusing some other newbies too; I guess it
also has to do with personal learning approach, some people will try out
various existing examples to learn the framework, while others (like me)
will approach the framework with a task they need to solve right away
and play with the framework on that task. I think it's worth it to
make Keras easily discoverable for people of both kinds, if it's not
too much effort.

Just showing the simple MLP model at the top on both categorical and
binary task would convey a lot of information and clarify this.

@gammaguy

I agree with Pasky in that Keras is this great project that creates an abstraction layer to make models more accessible which according to the doc "It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research."

But there no simple binary classification example, say logical AND. You could have the data and the code together, people could run it and get excited.

While it is great that PhD’s and ML specialists who are familiar with all the syntax, semantics and quirks of the all models should have no problem figuring it out of little relevance to the people who are just learning or and want to try some simple toy models. What ends up happening is they try for couple of hours, can’t get it to work and then move on; damning Keras to obscurity as another expert system.

From what I can glean from the docs, Keras has great potential to help bring complex ML structures to the mainstream but needs a couple better toy examples with some comments as to say why softmax doesn’t make sense for binary classification.

@farizrahman4u
Contributor

needs a couple better toy examples with some comments as to say why softmax doesn’t make sense for binary classification.

Not to be rude but don't expect Keras docs to teach you machine learning or basic arithmetic and algebra. If you do not understand softmax, sigmoid or tanh, definitely Keras docs will make less sense to you (and so will the docs of other libraries). Keras is not your Deeplearning 101.. you should learn the math from other sources, and then come here to get your hands dirty. You don't learn aeronautics in the cockpit. Read a couple of books, watch a couple of videos on deep learning in youtube and come back.. you will be surprised to see that the Keras docs makes absolute sense then. Also, we have past the era where the XOR problem was the "Hello World" of neural networks. So don't expect those. MINST is the new XOR. The docs of all deep learning libraries will agree with this (see tensorflow). You can also see it this way : The parameters of sgd in keras are tuned for real world problems, so adding an AND / XOR example would require passing extra parameters to sgd, and this would complicate the example.

Again, I mean no offense and I appreciate you guys sharing your issues. cheers!

@gammaguy

Farizrahman4u, you are obviously a really smart guy who is passionate about ML and Keras.

What I don't understand is in the time and effort that it took you to respond to me and lightly flame me you could have done as I suggested.

I understand softmax, I used it as an example of something someone might use without first thinking. I've done NG's course. I've programmed my own NN with backprop and dropout from scratch. I don't want to be argumentative 'XOR' will always be the "Hello World" of neural networks. Just as "Hello World" is where all programming language starts. An XOR example is simple, easy to understand and gives someone who is starting with Keras the satisfaction that they got something they understand to work, whether or not there are ML experts.

I myself have been struggling with the implementation of the binary AND. Please find the code below that I can’t get to work. I think the not so expert community would benefit from your insight on how to debug it. You may even consider adding it to your examples.

One of the other things that I have been struggling with the bias nodes. While I see them sporadically mentioned it is unclear to me whether they are auto-included behind the scenes or if not how do I specify them.

If you are ever in Barbados, give me a shout and I’ll thank you with some beers or whatever your poison.

Thank you
Paul

import theano
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD

data = np.array([
[0, 0, 0],
[0, 1, 0],
[1, 0, 0],
[1, 1, 1],
])

X_train = data[:, :-1]
y_train = data[:, -1:]

X_test = data[:, :-1]
y_test = data[:, -1:]

model = Sequential()
model.add(Dense(5, input_dim=2, init='uniform',activation='sigmoid'))
model.add(Dense(5, init='uniform',activation='sigmoid'))
model.add(Dense(1, init='uniform',activation='sigmoid'))

sgd = SGD(lr=01, nesterov=False)

model.compile(loss='mean_squared_error', optimizer='sgd', class_mode='binary')

model.fit(X_train, y_train, nb_epoch=200, batch_size=1, verbose=1, show_accuracy=True)
score = model.evaluate(X_test, y_test, batch_size=4)
print("score = {:10.4}".format(score))

classes = model.predict_classes(X_test, batch_size=1)
print(classes)

@fchollet
Owner

But there no simple binary classification example, say logical AND.

There are several binary classification examples in the examples folder (all imdb_* scripts). I just added one in the docs as well: http://keras.io/examples/

I don't who gave you the idea that logical AND or XOR were good examples for neural networks, but they probably weren't active in the machine learning field past 1980.

@fchollet fchollet closed this Jan 31, 2016
@pasky
Contributor
pasky commented Jan 31, 2016
@gammaguy

Thank you @fchollet for adding the example and thank you for Keras. I have another comment for you a little lower on an error with your example.

Sorry @pasky if I put words in your mouth, I was just trying to support your point. As for your comment on all my training examples having y=0, the last one has y=1.

I was modelling an AND the inputs and output were:

0,0 -> 0
0,1 -> 0
1,0 -> 0
1,1 -> 1

I changed to model.compile(loss='binary_crossentropy', optimizer='rmsprop', class_mode='binary') using the sigmoid in my code and it converged, <.01 loss after 2627 epochs.

@fchollet I tried your full new example code but got the following error:

File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 463, in relu
x = T.nnet.relu(x, alpha)
AttributeError: 'module' object has no attribute 'relu'

where line 50 was class_mode='binary') from the model.compile line from your example.

The full traceback was:

Using gpu device 0: GeForce GTX TITAN X
Using Theano backend.
Traceback (most recent call last):
File "/home/paul/.PyCharm50/config/scratches/scratch_2", line 50, in
class_mode='binary')
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/models.py", line 408, in compile
self.y_train = self.get_output(train=True)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/containers.py", line 128, in get_output
return self.layers[-1].get_output(train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 949, in get_output
X = self.get_input(train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 159, in get_input
previous_output = self.previous.get_output(train=train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 624, in get_output
X = self.get_input(train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 159, in get_input
previous_output = self.previous.get_output(train=train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 658, in get_output
X = self.get_input(train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 159, in get_input
previous_output = self.previous.get_output(train=train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 949, in get_output
X = self.get_input(train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 159, in get_input
previous_output = self.previous.get_output(train=train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 624, in get_output
X = self.get_input(train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 159, in get_input
previous_output = self.previous.get_output(train=train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 658, in get_output
X = self.get_input(train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 159, in get_input
previous_output = self.previous.get_output(train=train)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 950, in get_output
output = self.activation(K.dot(X, self.W) + self.b)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/activations.py", line 25, in relu
return K.relu(x, alpha=alpha, max_value=max_value)
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 463, in relu
x = T.nnet.relu(x, alpha)
AttributeError: 'module' object has no attribute 'relu'

@pasky
Contributor
pasky commented Jan 31, 2016
@gammaguy

@pasky You were right, worked like a dream.

Thank you. If either you or @fchollet ever get to Barbados give a shout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment