Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Keras Docs Examples silently assume categorical tasks #1454
My first few hours with Keras were needlessly a lot more painful than needed, because I didn't realize most of the examples at http://keras.io/examples/ and elsewhere were geared at categorical, not binary classification (which is what I personally assumed by default) - and when I realized, I didn't realize how important that was. My models weren't learning anything...
So, right at the top of http://keras.io/examples/, I'd propose to have two MLP variants, one for categorical classification and another for binary classification - sigmoid inst. of softmax activation for final layer and passing class_mode='binary' to .compile() (the former is tricky to realize for machine learning newbies and the latter is almost undiscoverable).
(For Google users benefit, I also documented my experience at http://log.or.cz/?p=386 .)
The examples cover all basic loss functions, mse, binary and categorical. Binary classification being the default is, as you said, your personal assumption. And the examples do not "silently" assume anything.. the loss function is explicitly provided to the model's compile function. Moreover, the Sequence classification with LSTM example is an example for binary classification. So your statement from the blog "All the examples silently assume that you want to classify to categories" is not true. Criticism is welcome, but get your facts straight. There are a lot of examples in the examples directory, which work out of the box. They download the data and train the models automatically..check them out to understand what your training data should look like.
Sorry if the blog post came off as an attack at Keras - I toned it down
You are right that it's possible to figure it out with a bit of
I just wanted to point out that it's not obvious imho and I don't
Just showing the simple MLP model at the top on both categorical and
I agree with Pasky in that Keras is this great project that creates an abstraction layer to make models more accessible which according to the doc "It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research."
But there no simple binary classification example, say logical AND. You could have the data and the code together, people could run it and get excited.
While it is great that PhD’s and ML specialists who are familiar with all the syntax, semantics and quirks of the all models should have no problem figuring it out of little relevance to the people who are just learning or and want to try some simple toy models. What ends up happening is they try for couple of hours, can’t get it to work and then move on; damning Keras to obscurity as another expert system.
From what I can glean from the docs, Keras has great potential to help bring complex ML structures to the mainstream but needs a couple better toy examples with some comments as to say why softmax doesn’t make sense for binary classification.
Not to be rude but don't expect Keras docs to teach you machine learning or basic arithmetic and algebra. If you do not understand softmax, sigmoid or tanh, definitely Keras docs will make less sense to you (and so will the docs of other libraries). Keras is not your Deeplearning 101.. you should learn the math from other sources, and then come here to get your hands dirty. You don't learn aeronautics in the cockpit. Read a couple of books, watch a couple of videos on deep learning in youtube and come back.. you will be surprised to see that the Keras docs makes absolute sense then. Also, we have past the era where the XOR problem was the "Hello World" of neural networks. So don't expect those. MINST is the new XOR. The docs of all deep learning libraries will agree with this (see tensorflow). You can also see it this way : The parameters of sgd in keras are tuned for real world problems, so adding an AND / XOR example would require passing extra parameters to sgd, and this would complicate the example.
Again, I mean no offense and I appreciate you guys sharing your issues. cheers!
Farizrahman4u, you are obviously a really smart guy who is passionate about ML and Keras.
What I don't understand is in the time and effort that it took you to respond to me and lightly flame me you could have done as I suggested.
I understand softmax, I used it as an example of something someone might use without first thinking. I've done NG's course. I've programmed my own NN with backprop and dropout from scratch. I don't want to be argumentative 'XOR' will always be the "Hello World" of neural networks. Just as "Hello World" is where all programming language starts. An XOR example is simple, easy to understand and gives someone who is starting with Keras the satisfaction that they got something they understand to work, whether or not there are ML experts.
I myself have been struggling with the implementation of the binary AND. Please find the code below that I can’t get to work. I think the not so expert community would benefit from your insight on how to debug it. You may even consider adding it to your examples.
One of the other things that I have been struggling with the bias nodes. While I see them sporadically mentioned it is unclear to me whether they are auto-included behind the scenes or if not how do I specify them.
If you are ever in Barbados, give me a shout and I’ll thank you with some beers or whatever your poison.
data = np.array([
X_train = data[:, :-1]
X_test = data[:, :-1]
model = Sequential()
sgd = SGD(lr=01, nesterov=False)
model.compile(loss='mean_squared_error', optimizer='sgd', class_mode='binary')
model.fit(X_train, y_train, nb_epoch=200, batch_size=1, verbose=1, show_accuracy=True)
classes = model.predict_classes(X_test, batch_size=1)
There are several binary classification examples in the examples folder (all
I don't who gave you the idea that logical AND or XOR were good examples for neural networks, but they probably weren't active in the machine learning field past 1980.
Note that I personally think @gammaguy's comments, adding more very basic examples, do not necessarily entail or are entailed by fixing this bug (I guess I should PR soon...). My wish has been only a very minor tweak to the initial examples section, not necessarily building full new examples with new tasks. I like the idea of a very simple example model (probably not binary AND) that would also include running prediction on unseen data, looking at internal layer predictions and checking the weights. But it's a bit different direction. (And I also think that Keras doesn't need to tailor to complete ML beginners, sklearn is pretty great at that.) @gammaguy, regarding your code, I think the main problem is that you do not have enough variety in the training set - actually, all your training examples have y=0! Try a more reasonable task than binary AND, which is actually very hard for this reason - for example, the Iris dataset is popular for introduction to various ML models.
Thank you @fchollet for adding the example and thank you for Keras. I have another comment for you a little lower on an error with your example.
Sorry @pasky if I put words in your mouth, I was just trying to support your point. As for your comment on all my training examples having y=0, the last one has y=1.
I was modelling an AND the inputs and output were:
0,0 -> 0
I changed to model.compile(loss='binary_crossentropy', optimizer='rmsprop', class_mode='binary') using the sigmoid in my code and it converged, <.01 loss after 2627 epochs.
@fchollet I tried your full new example code but got the following error:
File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 463, in relu
where line 50 was class_mode='binary') from the model.compile line from your example.
The full traceback was:
Using gpu device 0: GeForce GTX TITAN X