Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add svm in last layer #2588

Closed
mundher opened this issue May 3, 2016 · 13 comments
Closed

add svm in last layer #2588

mundher opened this issue May 3, 2016 · 13 comments

Comments

@mundher
Copy link

mundher commented May 3, 2016

I want to add svm in last layer of my model

model = Sequential()
model.add(Convolution2D(nb_filters, nb_conv, nb_conv,
border_mode='valid',
input_shape=(1, img_rows, img_cols)))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])

I tried to change the loss to hinge

model = Sequential()
model.add(Convolution2D(nb_filters, nb_conv, nb_conv,
                            border_mode='valid',
                            input_shape=(1, img_rows, img_cols)))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('linear'))

model.compile(loss='hinge',
              optimizer='adadelta',
              metrics=['accuracy'])

but the accuracy of training data doesn't change in each iteration

@erlendd
Copy link

erlendd commented Jun 18, 2016

I'm not sure linear in the correct activation on your output: I think it should be tanh so that you get labels [-1,+1].

@mundher mundher closed this as completed Jun 18, 2016
@alyato
Copy link

alyato commented Aug 3, 2016

@mundher Do you solve your problem? When i want to use the svm, i always call the scikit-learn.svm. How do you implement it ? Thanks.

@mundher
Copy link
Author

mundher commented Aug 3, 2016

@alyato I didn't solve the problem. it looks like the hinge loss works only for binary class output

@fish128
Copy link

fish128 commented Sep 29, 2016

I tried a few different activations at the last layer, and found that the softmax activation works best with hinge loss. Can anyone explain this?

@huanglianghua
Copy link

You need to regularize the weight.

The hinge loss with regularization term forms the complete SVM loss function. Try:

from keras.regularizers import l2

model = Sequential()
model.add(Convolution2D(nb_filters, nb_conv, nb_conv,
                            border_mode='valid',
                            input_shape=(1, img_rows, img_cols)))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes), W_regularizer=l2(0.01))
model.add(Activation('linear'))

model.compile(loss='hinge',
              optimizer='adadelta',
              metrics=['accuracy'])

instead.

@McLawrence
Copy link

in the code the hinge loss is defined as:
K.mean(K.maximum(1. - y_true * y_pred, 0.), axis=-1)

However, as I know it, the loss for an SVM should be this:

screen shot 2017-04-29 at 16 54 25

where s_i ist the score of the i-th output unit. As I understand it the hingeloss ignores the additive s_j term and only uses the s_{y_i}-term in the formula. Where is my error?

@aniket03
Copy link

aniket03 commented Aug 11, 2017

@McLawrence the hinge loss implemented in keras is for a specific case of binary classification [A vs ~A]. If this is to be used labels must be in the format of {-1, 1}. You might refer this for reference #2830. And there is also categorical hinge now in losses.py. You can refer that also.

@bit-scientist
Copy link

@huanglianghua , @aniket03 Is it really enough to change the loss in the compile to hinge and regularize the last Dense layer with any ( like this, kernel_regularizer=regularizers.l2(0.01)) penalty? I can't seem to understand that? Could you cite some links, posts? Thanks

@statcom
Copy link

statcom commented May 29, 2020

@huanglianghua , @aniket03 Is it really enough to change the loss in the compile to hinge and regularize the last Dense layer with any ( like this, kernel_regularizer=regularizers.l2(0.01)) penalty? I can't seem to understand that? Could you cite some links, posts? Thanks

Here are a couple of good references::
https://cs231n.github.io/linear-classify/
https://github.com/nfmcclure/tensorflow_cookbook#ch-4-support-vector-machines

I tested the following code with real data without any problem. Of course, the results will be different from the ones from real SVM implementation (e.g., sklearn's SVM). An interesting thing is that this Keras implementation produced a lot better results than the official SVM with specific data sometimes. Of course, another benefit is that you can use GPU to train SVM.

fine_model_st.add(Dense(nb_classes, kernel_regularizer=regularizers.l2(0.0001)))
fine_model_st.add(Activation('linear'))
fine_model_st.compile(loss='squared_hinge',
                      optimizer='adadelta', metrics=['accuracy'])

@momja
Copy link

momja commented Oct 26, 2020

How does this align with the use of RandomFourierFeatures for SVM approximation found here? Is RandomFourierFeatures a better, more modern approach?

@Apidcloud
Copy link

Apidcloud commented May 14, 2021

The suggested code from @huanglianghua and @statcom seem to work, but I wonder if there is a way of outputting probabilities instead. I tried to use softmax, but the model doesn't improve at all (with 3 classes it gets stuck at 33%; with 2 classes gets stuck at 50%). My goal is to 'exactly' replicate an SVM model, so that I can convert it to .onnx and reuse it.

The only settings I got it to work are the following:

  # create model
  model = Sequential()
  # note sure whether the number of filters for the first layer needs to match the input shape or not
  model.add(Dense(30, input_shape=(30,), activation='relu', kernel_initializer='he_uniform'))
  model.add(Dense(count_classes, kernel_regularizer=regularizers.l2(0.1)))
  #model.add(Activation('softmax')) # linear by default; softmax doesn't seem to work. Any ideas?
  model.compile(loss=keras.losses.CategoricalHinge(), optimizer=keras.optimizers.Adam(lr=1e-3), metrics=['accuracy'])

Any ideas on how to get a probability at the end? Using argmax seems to work just fine, but how do I interpret the output? I mean, I would like to set some sort of threshold to decide whether a prediction is good or not. Let's say I got 3 classes, but I input something else to the model to predict. I will still get a max value from argmax, even if it's wrong. A probability would let me avoid this by setting some sort of threshold. Any ideas on how to approach this?

@erlendd
Copy link

erlendd commented May 17, 2021

Gettng probabilities out of SVM is usually done by adding a logistic regression after the linear output of the base SVM model. Use a K.stop_gradient to prevent the logistic layer from affecting the weights in the base model.

@Apidcloud
Copy link

Apidcloud commented May 17, 2021

Thanks, @erlendd

I tried the following approach:

  input = Input(shape=(30,))
  dense = Dense(30, activation='relu', kernel_initializer='he_uniform', name='mul')
  x = dense(input)
  x = Dense(count_classes, kernel_regularizer=regularizers.l2(0.1), name='regul')(x)
  stop_grad = Lambda(lambda x: K.stop_gradient(x))(x)
  x = Activation('linear')(stop_grad)
  # anything I add after linear activation, gets the model stuck at 33% (or 25% if 4 classes; 50% if 2; etc.)
  output = Dense(count_classes, activation="softmax", name='out')(x)

  model = Model(inputs=input, outputs=output)
  model.compile(loss=keras.losses.CategoricalHinge(), optimizer=keras.optimizers.Adam(lr=1e-3), metrics=['accuracy'])

But it still gets stuck at 33% for some reason. I guess the input data would have to be normalised in order to use softmax here, which is not. The input data are raw vectors--and that works totally fine as long as I don't add anything after the linear output.

Edit:
As I suspected, got softmax to work (as per the first example; no stop gradient or anything) by scaling the features of the model. I was specifically using really big numbers, which despite training well, were preventing softmax (logistic regression) to work properly. The scaling of the features can be done through the following code:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests