New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way in Keras to apply different weights to a cost function? #2115

Closed
ayalalazaro opened this Issue Mar 29, 2016 · 60 comments

Comments

Projects
None yet
@ayalalazaro

Hi there,
I am trying to implement a classification problem with three classes: 0,1 and 2. I would like to fine tune my cost function so that missclassification is weighted some how. In particular, predicting 1 instead of 2 should give twice the cost than predicting 0. writing it in a table format, it should be something like that:

Costs:
Predicted:
0 | 1 | 2
__________________________
Actual 0 | 0 | 0.25 | 0.25
1 | 0.25 | 0 | 0.5
2 | 0.25 | 0.5 | 0

I really like keras framework, it would be nice if it is possible to implement it and not having to dig into tensorflow or theano code.

Thanks

@ayalalazaro

This comment has been minimized.

Show comment
Hide comment
@ayalalazaro

ayalalazaro Mar 29, 2016

Sorry, the table has lost its format, I am sending an image:
image

Sorry, the table has lost its format, I am sending an image:
image

@carlthome

This comment has been minimized.

Show comment
Hide comment
@carlthome

carlthome Mar 29, 2016

Contributor

Similar: #2121

Contributor

carlthome commented Mar 29, 2016

Similar: #2121

@tboquet

This comment has been minimized.

Show comment
Hide comment
@tboquet

tboquet Mar 29, 2016

Contributor

You could use class_weight.

Contributor

tboquet commented Mar 29, 2016

You could use class_weight.

@ayalalazaro

This comment has been minimized.

Show comment
Hide comment
@ayalalazaro

ayalalazaro Mar 29, 2016

class_weight applies a weight to all data that belongs to the class, it should be dependent on the missclassification.

class_weight applies a weight to all data that belongs to the class, it should be dependent on the missclassification.

@tboquet

This comment has been minimized.

Show comment
Hide comment
@tboquet

tboquet Mar 30, 2016

Contributor

You are absolutely right, I'm sorry I misunderstood your question. I will try to come back with something tomorrow using partial to define the weights. What you want to achieve should be doable with Keras abstract backend.

Contributor

tboquet commented Mar 30, 2016

You are absolutely right, I'm sorry I misunderstood your question. I will try to come back with something tomorrow using partial to define the weights. What you want to achieve should be doable with Keras abstract backend.

@tboquet

This comment has been minimized.

Show comment
Hide comment
@tboquet

tboquet Mar 31, 2016

Contributor

Ok so I had the time to quickly test it.
This is a fully reproducible example on mnist where we put a higher cost when a 1 is missclassified as a 7 and when a 7 is missclassified as a 1.

So if you want to pass constants included in the cost function, just build a new function with partial.

'''Train a simple deep NN on the MNIST dataset.
Get to 98.40% test accuracy after 20 epochs
(there is *a lot* of margin for parameter tuning).
2 seconds per epoch on a K520 GPU.
'''

from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, Adam, RMSprop
from keras.utils import np_utils
import keras.backend as K
from itertools import product

# Custom loss function with costs

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

w_array = np.ones((10,10))
w_array[1, 7] = 1.2
w_array[7, 1] = 1.2

ncce = partial(w_categorical_crossentropy, weights=np.ones((10,10)))

batch_size = 128
nb_classes = 10
nb_epoch = 20

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))

rms = RMSprop()
model.compile(loss=ncce, optimizer=rms)

model.fit(X_train, Y_train,
          batch_size=batch_size, nb_epoch=nb_epoch,
          show_accuracy=True, verbose=1,
          validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test,
                       show_accuracy=True, verbose=1)
print('Test score:', score[0])
print('Test accuracy:', score[1])
Contributor

tboquet commented Mar 31, 2016

Ok so I had the time to quickly test it.
This is a fully reproducible example on mnist where we put a higher cost when a 1 is missclassified as a 7 and when a 7 is missclassified as a 1.

So if you want to pass constants included in the cost function, just build a new function with partial.

'''Train a simple deep NN on the MNIST dataset.
Get to 98.40% test accuracy after 20 epochs
(there is *a lot* of margin for parameter tuning).
2 seconds per epoch on a K520 GPU.
'''

from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, Adam, RMSprop
from keras.utils import np_utils
import keras.backend as K
from itertools import product

# Custom loss function with costs

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

w_array = np.ones((10,10))
w_array[1, 7] = 1.2
w_array[7, 1] = 1.2

ncce = partial(w_categorical_crossentropy, weights=np.ones((10,10)))

batch_size = 128
nb_classes = 10
nb_epoch = 20

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))

rms = RMSprop()
model.compile(loss=ncce, optimizer=rms)

model.fit(X_train, Y_train,
          batch_size=batch_size, nb_epoch=nb_epoch,
          show_accuracy=True, verbose=1,
          validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test,
                       show_accuracy=True, verbose=1)
print('Test score:', score[0])
print('Test accuracy:', score[1])
@ayalalazaro

This comment has been minimized.

Show comment
Hide comment
@ayalalazaro

ayalalazaro Apr 1, 2016

Wow, that s nice. Thanks for the detailed answer!

Wow, that s nice. Thanks for the detailed answer!

@tboquet

This comment has been minimized.

Show comment
Hide comment
@tboquet

tboquet Apr 1, 2016

Contributor

Try to test it on a toy example to verify that it actually works. If it's what you are looking for, feel free to close the issue!
Keras 1.0 will provide a more flexible way to introduce new objectives and metrics.

Contributor

tboquet commented Apr 1, 2016

Try to test it on a toy example to verify that it actually works. If it's what you are looking for, feel free to close the issue!
Keras 1.0 will provide a more flexible way to introduce new objectives and metrics.

@ayalalazaro

This comment has been minimized.

Show comment
Hide comment
@ayalalazaro

ayalalazaro Apr 2, 2016

Well, I am stuck, I can t make it run in my model, it says:

line 56, in w_categorical_crossentropy
y_pred_max = K.reshape(y_pred_max, (y_pred.shape[0], 1))

AttributeError: 'Tensor' object has no attribute 'shape'

This is the model I am using:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (y_pred.shape[0], 1))
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2
ncce = partial(w_categorical_crossentropy, weights=w_array)

def build_model(X_data):
    data_dim = X_data.shape[2]
    timesteps = X_data.shape[1]
    model = Sequential()
    model.add(BatchNormalization(input_shape = (timesteps,data_dim)))  
    model.add(GRU(output_dim=50,init ='glorot_normal',
         return_sequences=True, W_regularizer=l2(0.00),U_regularizer=l1(0.01),dropout_W =0.2 ))
    model.add(GRU(output_dim=50,init ='glorot_normal',
        return_sequences=True,W_regularizer=l2(0.00),U_regularizer=l1(0.01),dropout_W =0.2))
    model.add(GRU(50,init ='glorot_normal',return_sequences=False,dropout_W =0.01, W_regularizer=l2(0.00),U_regularizer=l1(0.01)))
    model.add(Dense(3, init='glorot_normal'))
    model.add(Activation('softmax'))

    model.compile(loss=ncce,
              optimizer='Adam'
            )
    return model

Well, I am stuck, I can t make it run in my model, it says:

line 56, in w_categorical_crossentropy
y_pred_max = K.reshape(y_pred_max, (y_pred.shape[0], 1))

AttributeError: 'Tensor' object has no attribute 'shape'

This is the model I am using:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (y_pred.shape[0], 1))
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2
ncce = partial(w_categorical_crossentropy, weights=w_array)

def build_model(X_data):
    data_dim = X_data.shape[2]
    timesteps = X_data.shape[1]
    model = Sequential()
    model.add(BatchNormalization(input_shape = (timesteps,data_dim)))  
    model.add(GRU(output_dim=50,init ='glorot_normal',
         return_sequences=True, W_regularizer=l2(0.00),U_regularizer=l1(0.01),dropout_W =0.2 ))
    model.add(GRU(output_dim=50,init ='glorot_normal',
        return_sequences=True,W_regularizer=l2(0.00),U_regularizer=l1(0.01),dropout_W =0.2))
    model.add(GRU(50,init ='glorot_normal',return_sequences=False,dropout_W =0.01, W_regularizer=l2(0.00),U_regularizer=l1(0.01)))
    model.add(Dense(3, init='glorot_normal'))
    model.add(Activation('softmax'))

    model.compile(loss=ncce,
              optimizer='Adam'
            )
    return model
@tboquet

This comment has been minimized.

Show comment
Hide comment
@tboquet

tboquet Apr 4, 2016

Contributor

Sure, sorry I was using Theano functionnalities. I replaced the following line in my previous example:

y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))

It should do the trick!

Contributor

tboquet commented Apr 4, 2016

Sure, sorry I was using Theano functionnalities. I replaced the following line in my previous example:

y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))

It should do the trick!

@ayalalazaro

This comment has been minimized.

Show comment
Hide comment
@ayalalazaro

ayalalazaro Apr 5, 2016

Sounds the way to go, I was using tensorflow as backend. I tell you if it works as soon as posiible. Thanks!

Sounds the way to go, I was using tensorflow as backend. I tell you if it works as soon as posiible. Thanks!

@ayalalazaro

This comment has been minimized.

Show comment
Hide comment
@ayalalazaro

ayalalazaro Apr 5, 2016

I still get an error:

line 57, in w_categorical_crossentropy
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 271, in reshape
return tf.reshape(x, shape)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 682, in reshape
name=name)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 411, in apply_op
as_ref=input_arg.is_ref)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 529, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 178, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 161, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 319, in make_tensor_proto
_AssertCompatible(values, dtype)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 259, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))

TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

I ve tried your first reply under theano backend and it works though.

I still get an error:

line 57, in w_categorical_crossentropy
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 271, in reshape
return tf.reshape(x, shape)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 682, in reshape
name=name)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 411, in apply_op
as_ref=input_arg.is_ref)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 529, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 178, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 161, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 319, in make_tensor_proto
_AssertCompatible(values, dtype)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 259, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))

TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

I ve tried your first reply under theano backend and it works though.

@tboquet

This comment has been minimized.

Show comment
Hide comment
@tboquet

tboquet Apr 5, 2016

Contributor

Ok, I was not sure about how K.shape would behave with TensorFlow. It seems you should use:

y_pred_max = K.reshape(y_pred_max, (K.int_shape(y_pred)[0], 1))
Contributor

tboquet commented Apr 5, 2016

Ok, I was not sure about how K.shape would behave with TensorFlow. It seems you should use:

y_pred_max = K.reshape(y_pred_max, (K.int_shape(y_pred)[0], 1))
@ayalalazaro

This comment has been minimized.

Show comment
Hide comment
@ayalalazaro

ayalalazaro Apr 6, 2016

I get more or less the same:

line 59, in w_categorical_crossentropy
y_pred_max = K.reshape(y_pred_max, (K.int_shape(y_pred)[0], 1))

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 271, in reshape
return tf.reshape(x, shape)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 682, in reshape
name=name)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 411, in apply_op
as_ref=input_arg.is_ref)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 529, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 178, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 161, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 319, in make_tensor_proto
_AssertCompatible(values, dtype)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 259, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))

TypeError: Expected int32, got None of type '_Message' instead.

It seems like it cannot get the shape of y_pred as an integer , right?

I get more or less the same:

line 59, in w_categorical_crossentropy
y_pred_max = K.reshape(y_pred_max, (K.int_shape(y_pred)[0], 1))

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 271, in reshape
return tf.reshape(x, shape)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 682, in reshape
name=name)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 411, in apply_op
as_ref=input_arg.is_ref)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 529, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 178, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 161, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 319, in make_tensor_proto
_AssertCompatible(values, dtype)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 259, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))

TypeError: Expected int32, got None of type '_Message' instead.

It seems like it cannot get the shape of y_pred as an integer , right?

@tboquet

This comment has been minimized.

Show comment
Hide comment
@tboquet

tboquet Apr 6, 2016

Contributor

Mm, ok I will take a look at it today and work directly with tensors to try to find a way to have it work properly for both backend.

Contributor

tboquet commented Apr 6, 2016

Mm, ok I will take a look at it today and work directly with tensors to try to find a way to have it work properly for both backend.

@pgallego25

This comment has been minimized.

Show comment
Hide comment
@pgallego25

pgallego25 Apr 8, 2016

Hi there, I tried something like that:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, K.shape(y_pred))
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],tf.float32) * K.cast(y_pred_max_mat[:, c_p] ,tf.float32)* K.cast(y_true[:, c_t],tf.float32))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

I Think it will do it.

Hi there, I tried something like that:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, K.shape(y_pred))
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],tf.float32) * K.cast(y_pred_max_mat[:, c_p] ,tf.float32)* K.cast(y_true[:, c_t],tf.float32))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

I Think it will do it.

@ayalalazaro

This comment has been minimized.

Show comment
Hide comment
@ayalalazaro

ayalalazaro Apr 8, 2016

The latter only works for non recurrent networks, but this code works for RNNs following the same idea. It only works for tensorflow though. I couldn t find a way to reshape a tensor the way we want with the keras backend:

import tensorflow as tf

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = tf.expand_dims(y_pred_max, 1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

The latter only works for non recurrent networks, but this code works for RNNs following the same idea. It only works for tensorflow though. I couldn t find a way to reshape a tensor the way we want with the keras backend:

import tensorflow as tf

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = tf.expand_dims(y_pred_max, 1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask
@ayalalazaro

This comment has been minimized.

Show comment
Hide comment
@ayalalazaro

ayalalazaro Apr 9, 2016

My bad, just replacing tf.expand_dims with K.expand_dims worked for me:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.expand_dims(y_pred_max, 1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask
w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2
ncce = partial(w_categorical_crossentropy, weights=w_array)
ncce.__name__ ='w_categorical_crossentropy'

The last line is necessary for tensorboard callback to work, thanks!!

My bad, just replacing tf.expand_dims with K.expand_dims worked for me:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.expand_dims(y_pred_max, 1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask
w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2
ncce = partial(w_categorical_crossentropy, weights=w_array)
ncce.__name__ ='w_categorical_crossentropy'

The last line is necessary for tensorboard callback to work, thanks!!

@kimardenmiller

This comment has been minimized.

Show comment
Hide comment
@kimardenmiller

kimardenmiller Dec 3, 2016

Is the Mar 31 solution for @ayalalazaro above still recommended as of v1.2? (Noticed @tboquet 's comment: Keras 1.0 will provide a more flexible way to introduce new objectives and metrics.)

My problem is binary classification where true positive accuracy is more important, and some false negatives are acceptable. Would I need the approach above to achieve that objective? I tried class_weights = {0: 1, 1: 10}, but saw no change. (examples are 25% positive, 75% negative)

kimardenmiller commented Dec 3, 2016

Is the Mar 31 solution for @ayalalazaro above still recommended as of v1.2? (Noticed @tboquet 's comment: Keras 1.0 will provide a more flexible way to introduce new objectives and metrics.)

My problem is binary classification where true positive accuracy is more important, and some false negatives are acceptable. Would I need the approach above to achieve that objective? I tried class_weights = {0: 1, 1: 10}, but saw no change. (examples are 25% positive, 75% negative)

@curiale

This comment has been minimized.

Show comment
Hide comment
@curiale

curiale Jan 20, 2017

Just a small detail about the w_categorical_crossentropy implementetion. There is no need to cast weights and y_true. The following code is working in Theano and TensorFlow:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

curiale commented Jan 20, 2017

Just a small detail about the w_categorical_crossentropy implementetion. There is no need to cast weights and y_true. The following code is working in Theano and TensorFlow:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    return K.categorical_crossentropy(y_pred, y_true) * final_mask
@jerpint

This comment has been minimized.

Show comment
Hide comment
@jerpint

jerpint Feb 20, 2017

Hello, I am trying to implement this in tensorflow.

I am confused as to what partial is in the line :

ncce = partial(w_categorical_crossentropy, weights=np.ones((10,10)))

I do not see it defined anywhere in this thread, and get

NameError: name 'partial' is not defined

as output...

Thanks

jerpint commented Feb 20, 2017

Hello, I am trying to implement this in tensorflow.

I am confused as to what partial is in the line :

ncce = partial(w_categorical_crossentropy, weights=np.ones((10,10)))

I do not see it defined anywhere in this thread, and get

NameError: name 'partial' is not defined

as output...

Thanks

@0x00b1

This comment has been minimized.

Show comment
Hide comment
@0x00b1

0x00b1 Feb 20, 2017

@jerpint It’s available from functools, i.e.

import functools

ncce = functools.partial(w_categorical_crossentropy, weights=np.ones((10,10)))

0x00b1 commented Feb 20, 2017

@jerpint It’s available from functools, i.e.

import functools

ncce = functools.partial(w_categorical_crossentropy, weights=np.ones((10,10)))
@mongoose54

This comment has been minimized.

Show comment
Hide comment
@mongoose54

mongoose54 Feb 23, 2017

I am trying to incorporate @curiale's implementation w_categorical_crossentropy for a binary classification where the output of my model has shape (?, 5120, 2) but I am running into a couple of issues:

  1. Assuming my classs weight distribution is e.g. class_weights=[ 0.85144055 , 1.14855945] What should thew_array be like? Something like this below?

w_array = np.ones((2,2))
w_array[1,0] = class_weights[0]
w_array[0,1] = class_weights[1]
ncce = functools.partial(w_categorical_crossentropy, weights=w_array)

  1. When I run model.compile(optimizer=Adam(lr=1e-5), loss=ncce, metrics=[dice_coef]) I get the following error:

ValueError: Dimensions must be equal, but are 5120 and 2 for 'mul_339' (op: 'Mul') with input shapes: [?,5120], [?,2].

These are the variables' shapes inside w_categorical_crossentropy

y_pred shape: (?, 5120, 2) y_true shape: (?, ?, ?) final_mask.shape: (?, 2)

Frankly I am lost in w_categorical_crossentropy function (e.g. what is final_mask be? Its shape?). Any help would be much appreciated.

mongoose54 commented Feb 23, 2017

I am trying to incorporate @curiale's implementation w_categorical_crossentropy for a binary classification where the output of my model has shape (?, 5120, 2) but I am running into a couple of issues:

  1. Assuming my classs weight distribution is e.g. class_weights=[ 0.85144055 , 1.14855945] What should thew_array be like? Something like this below?

w_array = np.ones((2,2))
w_array[1,0] = class_weights[0]
w_array[0,1] = class_weights[1]
ncce = functools.partial(w_categorical_crossentropy, weights=w_array)

  1. When I run model.compile(optimizer=Adam(lr=1e-5), loss=ncce, metrics=[dice_coef]) I get the following error:

ValueError: Dimensions must be equal, but are 5120 and 2 for 'mul_339' (op: 'Mul') with input shapes: [?,5120], [?,2].

These are the variables' shapes inside w_categorical_crossentropy

y_pred shape: (?, 5120, 2) y_true shape: (?, ?, ?) final_mask.shape: (?, 2)

Frankly I am lost in w_categorical_crossentropy function (e.g. what is final_mask be? Its shape?). Any help would be much appreciated.

@recluze

This comment has been minimized.

Show comment
Hide comment
@recluze

recluze Feb 23, 2017

Hnn, I'm sorry but I don't quite understand: What does this (?, 5120, 2) entail? If ? is the batch size and 2 is the number of classes, what is 5120?

recluze commented Feb 23, 2017

Hnn, I'm sorry but I don't quite understand: What does this (?, 5120, 2) entail? If ? is the batch size and 2 is the number of classes, what is 5120?

@mongoose54

This comment has been minimized.

Show comment
Hide comment
@mongoose54

mongoose54 Feb 23, 2017

@recluze Sorry for the confusion. Let me clarify: The model is an image segmentation network with output (?, 5120, 2) where ? : batch_size , 5120 : total_number_of_pixels_per_image and 2 : classes (foreground, background). So basically the network does classification per pixel.

@recluze Sorry for the confusion. Let me clarify: The model is an image segmentation network with output (?, 5120, 2) where ? : batch_size , 5120 : total_number_of_pixels_per_image and 2 : classes (foreground, background). So basically the network does classification per pixel.

@recluze

This comment has been minimized.

Show comment
Hide comment
@recluze

recluze Feb 23, 2017

Hnn, the last 2 should be removed I think since you have two classes, a single output with binary crossentropy instead of categorical one should work. Don't think 3-dim output shape would work with w_categorical_crossentropy...

recluze commented Feb 23, 2017

Hnn, the last 2 should be removed I think since you have two classes, a single output with binary crossentropy instead of categorical one should work. Don't think 3-dim output shape would work with w_categorical_crossentropy...

@mongoose54

This comment has been minimized.

Show comment
Hide comment
@mongoose54

mongoose54 Feb 24, 2017

So I hacked Kera's backend binary_crossentropy function to the following to include weighted_cross_entropy_with_logits() to pass class weights :

def w_binary_crossentropy(output, target, weights):
    output = tf.clip_by_value(output, tf.cast(_EPSILON, dtype=_FLOATX),
                                  tf.cast(1.-_EPSILON, dtype=_FLOATX))
    output = tf.log(output / (1 - output))
    return tf.nn.weighted_cross_entropy_with_logits(output, target, weights)

and in my code I call it like this:


    def wrapped_partial(func, *args, **kwargs):
        partial_func = functools.partial(func, *args, **kwargs)
        functools.update_wrapper(partial_func, func)
        return partial_func


    ncce = wrapped_partial(w_binary_crossentropy, weights=0.01) where weight is the ratio of positive/negatives 

    model.compile(optimizer=Adam(lr=1e-5), loss=ncce, metrics=[dice_coef])

But I am not sure if these weights are the class weights I am after. It is not clear from the definition of weighted_cross_entropy_with_logits whether this is class balancing. I just wanted to share it here with everyone. Any comments are much appreciated.

So I hacked Kera's backend binary_crossentropy function to the following to include weighted_cross_entropy_with_logits() to pass class weights :

def w_binary_crossentropy(output, target, weights):
    output = tf.clip_by_value(output, tf.cast(_EPSILON, dtype=_FLOATX),
                                  tf.cast(1.-_EPSILON, dtype=_FLOATX))
    output = tf.log(output / (1 - output))
    return tf.nn.weighted_cross_entropy_with_logits(output, target, weights)

and in my code I call it like this:


    def wrapped_partial(func, *args, **kwargs):
        partial_func = functools.partial(func, *args, **kwargs)
        functools.update_wrapper(partial_func, func)
        return partial_func


    ncce = wrapped_partial(w_binary_crossentropy, weights=0.01) where weight is the ratio of positive/negatives 

    model.compile(optimizer=Adam(lr=1e-5), loss=ncce, metrics=[dice_coef])

But I am not sure if these weights are the class weights I am after. It is not clear from the definition of weighted_cross_entropy_with_logits whether this is class balancing. I just wanted to share it here with everyone. Any comments are much appreciated.

@dralves

This comment has been minimized.

Show comment
Hide comment
@dralves

dralves May 16, 2017

[edited]
@mongoose54 I'm current playing around with this will post the results back, shouldn't be hard to get a version with fixed weights

dralves commented May 16, 2017

[edited]
@mongoose54 I'm current playing around with this will post the results back, shouldn't be hard to get a version with fixed weights

@dralves

This comment has been minimized.

Show comment
Hide comment
@dralves

dralves May 17, 2017

@mongoose54 This is what I came up with for binary crossentropy based on tensorpack's version
TF only, but no need to change keras

class WeightedBinaryCrossEntropy(object):

    def __init__(self, pos_ratio):
        neg_ratio = 1. - pos_ratio
        self.pos_ratio = tf.constant(pos_ratio, tf.float32)
        self.weights = tf.constant(neg_ratio / pos_ratio, tf.float32)
        self.__name__ = "weighted_binary_crossentropy({0})".format(pos_ratio)

    def __call__(self, y_true, y_pred):
        return self.weighted_binary_crossentropy(y_true, y_pred)

    def weighted_binary_crossentropy(self, y_true, y_pred):
        # Transform to logits
        epsilon = tf.convert_to_tensor(K.common._EPSILON, y_pred.dtype.base_dtype)
        y_pred = tf.clip_by_value(y_pred, epsilon, 1 - epsilon)
        y_pred = tf.log(y_pred / (1 - y_pred))

        cost = tf.nn.weighted_cross_entropy_with_logits(y_true, y_pred, self.weights)
        return K.mean(cost * self.pos_ratio, axis=-1)

dralves commented May 17, 2017

@mongoose54 This is what I came up with for binary crossentropy based on tensorpack's version
TF only, but no need to change keras

class WeightedBinaryCrossEntropy(object):

    def __init__(self, pos_ratio):
        neg_ratio = 1. - pos_ratio
        self.pos_ratio = tf.constant(pos_ratio, tf.float32)
        self.weights = tf.constant(neg_ratio / pos_ratio, tf.float32)
        self.__name__ = "weighted_binary_crossentropy({0})".format(pos_ratio)

    def __call__(self, y_true, y_pred):
        return self.weighted_binary_crossentropy(y_true, y_pred)

    def weighted_binary_crossentropy(self, y_true, y_pred):
        # Transform to logits
        epsilon = tf.convert_to_tensor(K.common._EPSILON, y_pred.dtype.base_dtype)
        y_pred = tf.clip_by_value(y_pred, epsilon, 1 - epsilon)
        y_pred = tf.log(y_pred / (1 - y_pred))

        cost = tf.nn.weighted_cross_entropy_with_logits(y_true, y_pred, self.weights)
        return K.mean(cost * self.pos_ratio, axis=-1)
@gjgd

This comment has been minimized.

Show comment
Hide comment
@gjgd

gjgd May 22, 2017

Thank you @dralves that helps me a lot

Just a quick question. When I compare the ouputs of your class with 0.5 positive weights and the binary_crossentropy loss function from keras, it seems the results differ by a factor of 2

Do you know why and which one is correct ?

import tensorflow as tf
import keras.backend as K
import numpy as np
from keras.losses import binary_crossentropy

class WeightedBinaryCrossEntropy(object):

    def __init__(self, pos_ratio):
        neg_ratio = 1. - pos_ratio
        self.pos_ratio = tf.constant(pos_ratio, tf.float32)
        self.weights = tf.constant(neg_ratio / pos_ratio, tf.float32)
        self.__name__ = "weighted_binary_crossentropy({0})".format(pos_ratio)

    def __call__(self, y_true, y_pred):
        return self.weighted_binary_crossentropy(y_true, y_pred)

    def weighted_binary_crossentropy(self, y_true, y_pred):
            # Transform to logits
            epsilon = tf.convert_to_tensor(K.common._EPSILON, y_pred.dtype.base_dtype)
            y_pred = tf.clip_by_value(y_pred, epsilon, 1 - epsilon)
            y_pred = tf.log(y_pred / (1 - y_pred))

            cost = tf.nn.weighted_cross_entropy_with_logits(y_true, y_pred, self.weights)
            return K.mean(cost * self.pos_ratio, axis=-1)
    
y_true_arr = np.array([0,1,0,1], dtype="float32")
y_pred_arr = np.array([0,0,1,1], dtype="float32")
y_true = tf.constant(y_true_arr)
y_pred = tf.constant(y_pred_arr)

with tf.Session().as_default(): 
    print(WeightedBinaryCrossEntropy(0.5)(y_true, y_pred).eval())
    print(binary_crossentropy(y_true, y_pred).eval())

Outputs

4.00756
8.01512

gjgd commented May 22, 2017

Thank you @dralves that helps me a lot

Just a quick question. When I compare the ouputs of your class with 0.5 positive weights and the binary_crossentropy loss function from keras, it seems the results differ by a factor of 2

Do you know why and which one is correct ?

import tensorflow as tf
import keras.backend as K
import numpy as np
from keras.losses import binary_crossentropy

class WeightedBinaryCrossEntropy(object):

    def __init__(self, pos_ratio):
        neg_ratio = 1. - pos_ratio
        self.pos_ratio = tf.constant(pos_ratio, tf.float32)
        self.weights = tf.constant(neg_ratio / pos_ratio, tf.float32)
        self.__name__ = "weighted_binary_crossentropy({0})".format(pos_ratio)

    def __call__(self, y_true, y_pred):
        return self.weighted_binary_crossentropy(y_true, y_pred)

    def weighted_binary_crossentropy(self, y_true, y_pred):
            # Transform to logits
            epsilon = tf.convert_to_tensor(K.common._EPSILON, y_pred.dtype.base_dtype)
            y_pred = tf.clip_by_value(y_pred, epsilon, 1 - epsilon)
            y_pred = tf.log(y_pred / (1 - y_pred))

            cost = tf.nn.weighted_cross_entropy_with_logits(y_true, y_pred, self.weights)
            return K.mean(cost * self.pos_ratio, axis=-1)
    
y_true_arr = np.array([0,1,0,1], dtype="float32")
y_pred_arr = np.array([0,0,1,1], dtype="float32")
y_true = tf.constant(y_true_arr)
y_pred = tf.constant(y_pred_arr)

with tf.Session().as_default(): 
    print(WeightedBinaryCrossEntropy(0.5)(y_true, y_pred).eval())
    print(binary_crossentropy(y_true, y_pred).eval())

Outputs

4.00756
8.01512
@dralves

This comment has been minimized.

Show comment
Hide comment
@dralves

dralves May 23, 2017

@dardelet good point

This comes directly from tensorpacks implementation which returns the same results
If you remove the final cost * self.pos_ratio you get the same results as with normal sigmoid cross entropy
I do see that in the original implementation of balanced classes cross entropy (from this paper) the authors multiply the loss from the positive labels by the positive ratio and the loss from the negative labels by the negative ratio

I'll look into it a bit more

dralves commented May 23, 2017

@dardelet good point

This comes directly from tensorpacks implementation which returns the same results
If you remove the final cost * self.pos_ratio you get the same results as with normal sigmoid cross entropy
I do see that in the original implementation of balanced classes cross entropy (from this paper) the authors multiply the loss from the positive labels by the positive ratio and the loss from the negative labels by the negative ratio

I'll look into it a bit more

@sedghi

This comment has been minimized.

Show comment
Hide comment
@sedghi

sedghi May 31, 2017

Thank you @dralves
any new findings on the mentioned difference ?

sedghi commented May 31, 2017

Thank you @dralves
any new findings on the mentioned difference ?

@bzhong2

This comment has been minimized.

Show comment
Hide comment
@bzhong2

bzhong2 Jul 1, 2017

In the example what should the variable final mask look like. I tried to use weights matrix as :

weights = np.matrix([ [0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[1000 ,1000 ,1000 ,1000 ,1000 ,1000 ,1000 ]])

It seems everything except the last class will mess up because the weights are always zeros. However, the confusion matrix is :
[[ 144. 5. 0. 0. 9. 0. 20.]
[ 9. 150. 9. 0. 0. 0. 14.]
[ 7. 8. 109. 6. 2. 1. 17.]
[ 4. 0. 5. 93. 41. 4. 4.]
[ 11. 1. 0. 12. 123. 6. 21.]
[ 0. 0. 1. 5. 12. 126. 8.]
[ 39. 15. 16. 4. 39. 11. 326.]]

I am using keras with tensorflow as backend. Any ideas of why this happens?

bzhong2 commented Jul 1, 2017

In the example what should the variable final mask look like. I tried to use weights matrix as :

weights = np.matrix([ [0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[1000 ,1000 ,1000 ,1000 ,1000 ,1000 ,1000 ]])

It seems everything except the last class will mess up because the weights are always zeros. However, the confusion matrix is :
[[ 144. 5. 0. 0. 9. 0. 20.]
[ 9. 150. 9. 0. 0. 0. 14.]
[ 7. 8. 109. 6. 2. 1. 17.]
[ 4. 0. 5. 93. 41. 4. 4.]
[ 11. 1. 0. 12. 123. 6. 21.]
[ 0. 0. 1. 5. 12. 126. 8.]
[ 39. 15. 16. 4. 39. 11. 326.]]

I am using keras with tensorflow as backend. Any ideas of why this happens?

@asiron

This comment has been minimized.

Show comment
Hide comment
@asiron

asiron Jul 16, 2017

As @recluze has mentioned above the w_categorical_crossentropy doesn't work with data that's rank 3+ (for example a LSTM with return_sequences=True, TimeDistributed(Dense), etc).

I have changed the above example to support rank 3+ tensors and wrapped it in a class, just like the above WeightedBinaryCrossEntropy.

class WeightedCategoricalCrossEntropy(object):

  def __init__(self, weights):
    nb_cl = len(weights)
    self.weights = np.ones((nb_cl, nb_cl))
    for class_idx, class_weight in weights.items():
      self.weights[0][class_idx] = class_weight
      self.weights[class_idx][0] = class_weight
    self.__name__ = 'w_categorical_crossentropy'

  def __call__(self, y_true, y_pred):
    return self.w_categorical_crossentropy(y_true, y_pred)

  def w_categorical_crossentropy(self, y_true, y_pred):
    nb_cl = len(self.weights)
    final_mask = K.zeros_like(y_pred[..., 0])
    y_pred_max = K.max(y_pred, axis=-1)
    y_pred_max = K.expand_dims(y_pred_max, axis=-1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in itertools.product(range(nb_cl), range(nb_cl)):
        w = K.cast(self.weights[c_t, c_p], K.floatx())
        y_p = K.cast(y_pred_max_mat[..., c_p], K.floatx())
        y_t = K.cast(y_pred_max_mat[..., c_t], K.floatx())
        final_mask += w * y_p * y_t
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

The constructor expects a dictionary with same structure as class_weight param from model.fit

{0: 1.0, 1: 29.6, 2: 17.69, 3: 27.08, 4: 11.04, 5: 45.45, 6: 136.344, 7: 57.304}

asiron commented Jul 16, 2017

As @recluze has mentioned above the w_categorical_crossentropy doesn't work with data that's rank 3+ (for example a LSTM with return_sequences=True, TimeDistributed(Dense), etc).

I have changed the above example to support rank 3+ tensors and wrapped it in a class, just like the above WeightedBinaryCrossEntropy.

class WeightedCategoricalCrossEntropy(object):

  def __init__(self, weights):
    nb_cl = len(weights)
    self.weights = np.ones((nb_cl, nb_cl))
    for class_idx, class_weight in weights.items():
      self.weights[0][class_idx] = class_weight
      self.weights[class_idx][0] = class_weight
    self.__name__ = 'w_categorical_crossentropy'

  def __call__(self, y_true, y_pred):
    return self.w_categorical_crossentropy(y_true, y_pred)

  def w_categorical_crossentropy(self, y_true, y_pred):
    nb_cl = len(self.weights)
    final_mask = K.zeros_like(y_pred[..., 0])
    y_pred_max = K.max(y_pred, axis=-1)
    y_pred_max = K.expand_dims(y_pred_max, axis=-1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in itertools.product(range(nb_cl), range(nb_cl)):
        w = K.cast(self.weights[c_t, c_p], K.floatx())
        y_p = K.cast(y_pred_max_mat[..., c_p], K.floatx())
        y_t = K.cast(y_pred_max_mat[..., c_t], K.floatx())
        final_mask += w * y_p * y_t
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

The constructor expects a dictionary with same structure as class_weight param from model.fit

{0: 1.0, 1: 29.6, 2: 17.69, 3: 27.08, 4: 11.04, 5: 45.45, 6: 136.344, 7: 57.304}
@sedghi

This comment has been minimized.

Show comment
Hide comment
@sedghi

sedghi Jul 16, 2017

@asiron Thanks for the code,
Just out of curiosity, what do you follow as rules for assigning different weights to different classes?
any specific formula ? should they sum to 1 ?

sedghi commented Jul 16, 2017

@asiron Thanks for the code,
Just out of curiosity, what do you follow as rules for assigning different weights to different classes?
any specific formula ? should they sum to 1 ?

@asiron

This comment has been minimized.

Show comment
Hide comment
@asiron

asiron Jul 16, 2017

@alirzsedghi I think this was answered well in #5116

asiron commented Jul 16, 2017

@alirzsedghi I think this was answered well in #5116

@stergioc

This comment has been minimized.

Show comment
Hide comment
@stergioc

stergioc Aug 11, 2017

Hey @asiron thank you for sharing this code. I was wondering if you also figured a way to save the weights with which the loss was initialized when saving the model. This would be really helpful since the weights will be loaded along with the model.

In this version of this custom loss function this is not supported. I am not sure if this functionality is supported by Keras. Any ideas?

Here is a sample code that reproduces the problem.

import keras
import itertools
import numpy as np

from keras import backend as K 
from keras.models import Model
from keras.layers import Input, Dense, Activation

from ipdb import set_trace as bp

class WeightedCategoricalCrossEntropy(object):

  def __init__(self, weights):
    nb_cl = len(weights)
    self.weights = np.ones((nb_cl, nb_cl))
    for class_idx, class_weight in weights.items():
        self.weights[0][class_idx] = class_weight
        self.weights[class_idx][0] = class_weight
    self.__name__ = 'w_categorical_crossentropy'

  def __call__(self, y_true, y_pred):
    return self.w_categorical_crossentropy(y_true, y_pred)

  def w_categorical_crossentropy(self, y_true, y_pred):
    nb_cl = len(self.weights)
    final_mask = K.zeros_like(y_pred[..., 0])
    y_pred_max = K.max(y_pred, axis=-1)
    y_pred_max = K.expand_dims(y_pred_max, axis=-1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in itertools.product(range(nb_cl), range(nb_cl)):
        w = K.cast(self.weights[c_t, c_p], K.floatx())
        y_p = K.cast(y_pred_max_mat[..., c_p], K.floatx())
        y_t = K.cast(y_pred_max_mat[..., c_t], K.floatx())
        final_mask += w * y_p * y_t
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

# create a toy model
i = Input(shape=(100,))
h = Dense(7)(i)
o = Activation('softmax')(h)

model = Model(inputs=i, outputs=o)


# compile the model with custom loss
loss = WeightedCategoricalCrossEntropy({0: 1.0, 1: 29.6, 2: 17.69, 3: 27.08, 4: 11.04, 5: 45.45, 6: 136.344})
model.compile(loss=loss, optimizer='sgd')
print "Compilation OK!"

# fit model
model.fit(np.random.random((64, 100)),np.random.random((64, 7)), epochs=10)

# save and load model
model.save('model.h5')
model = keras.models.load_model('model.h5', custom_objects={'w_categorical_crossentropy': WeightedCategoricalCrossEntropy})
print "Load OK!"

Hey @asiron thank you for sharing this code. I was wondering if you also figured a way to save the weights with which the loss was initialized when saving the model. This would be really helpful since the weights will be loaded along with the model.

In this version of this custom loss function this is not supported. I am not sure if this functionality is supported by Keras. Any ideas?

Here is a sample code that reproduces the problem.

import keras
import itertools
import numpy as np

from keras import backend as K 
from keras.models import Model
from keras.layers import Input, Dense, Activation

from ipdb import set_trace as bp

class WeightedCategoricalCrossEntropy(object):

  def __init__(self, weights):
    nb_cl = len(weights)
    self.weights = np.ones((nb_cl, nb_cl))
    for class_idx, class_weight in weights.items():
        self.weights[0][class_idx] = class_weight
        self.weights[class_idx][0] = class_weight
    self.__name__ = 'w_categorical_crossentropy'

  def __call__(self, y_true, y_pred):
    return self.w_categorical_crossentropy(y_true, y_pred)

  def w_categorical_crossentropy(self, y_true, y_pred):
    nb_cl = len(self.weights)
    final_mask = K.zeros_like(y_pred[..., 0])
    y_pred_max = K.max(y_pred, axis=-1)
    y_pred_max = K.expand_dims(y_pred_max, axis=-1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in itertools.product(range(nb_cl), range(nb_cl)):
        w = K.cast(self.weights[c_t, c_p], K.floatx())
        y_p = K.cast(y_pred_max_mat[..., c_p], K.floatx())
        y_t = K.cast(y_pred_max_mat[..., c_t], K.floatx())
        final_mask += w * y_p * y_t
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

# create a toy model
i = Input(shape=(100,))
h = Dense(7)(i)
o = Activation('softmax')(h)

model = Model(inputs=i, outputs=o)


# compile the model with custom loss
loss = WeightedCategoricalCrossEntropy({0: 1.0, 1: 29.6, 2: 17.69, 3: 27.08, 4: 11.04, 5: 45.45, 6: 136.344})
model.compile(loss=loss, optimizer='sgd')
print "Compilation OK!"

# fit model
model.fit(np.random.random((64, 100)),np.random.random((64, 7)), epochs=10)

# save and load model
model.save('model.h5')
model = keras.models.load_model('model.h5', custom_objects={'w_categorical_crossentropy': WeightedCategoricalCrossEntropy})
print "Load OK!"
@sry002

This comment has been minimized.

Show comment
Hide comment
@sry002

sry002 Aug 23, 2017

Thanks to those who contributed code here. Helped me along a lot.

In the implementation of @asiron (which I tested because I needed to handle rank 3+ rank tensors), I believe a small error crept in relative to the upstream @tboquet implementation.

y_t = K.cast(y_pred_max_mat[..., c_t], K.floatx())

should be

y_t = K.cast(y_true[..., c_t], K.floatx())

otherwise the boolean logic is comparing y_pred with y_pred (instead of y_pred with y_true).

A slightly different point is that the way in which the class_weights dictionary is transformed into the weights matrix within WeightedCategoricalCrossEntropy does not seem consistent with what the original poster was trying to achieve, which is to specify pairwise weights for all combinations of true and predicted values. As it stands it populates only the 0th row and column penalising misclassifications of the 0th class as another class, or another class as the 0th class. Maybe better to supply the complete matrix instead? Just a thought. Thanks again to contributors.

sry002 commented Aug 23, 2017

Thanks to those who contributed code here. Helped me along a lot.

In the implementation of @asiron (which I tested because I needed to handle rank 3+ rank tensors), I believe a small error crept in relative to the upstream @tboquet implementation.

y_t = K.cast(y_pred_max_mat[..., c_t], K.floatx())

should be

y_t = K.cast(y_true[..., c_t], K.floatx())

otherwise the boolean logic is comparing y_pred with y_pred (instead of y_pred with y_true).

A slightly different point is that the way in which the class_weights dictionary is transformed into the weights matrix within WeightedCategoricalCrossEntropy does not seem consistent with what the original poster was trying to achieve, which is to specify pairwise weights for all combinations of true and predicted values. As it stands it populates only the 0th row and column penalising misclassifications of the 0th class as another class, or another class as the 0th class. Maybe better to supply the complete matrix instead? Just a thought. Thanks again to contributors.

@nd26

This comment has been minimized.

Show comment
Hide comment
@nd26

nd26 Sep 15, 2017

Question, do the classes need to be in one-hot representations for @asiron code?

And, @sry002, I think you are right.

nd26 commented Sep 15, 2017

Question, do the classes need to be in one-hot representations for @asiron code?

And, @sry002, I think you are right.

@ThePianoDentist

This comment has been minimized.

Show comment
Hide comment
@ThePianoDentist

ThePianoDentist Oct 17, 2017

whilst the @asiron code with @sry002 alteration seems to 'work' for me.

it is not only considerably slower than not weighting the loss, as well as forcing my computer out of memory.
I think thought this is just a case of me using too complex a model, with too much input data examples, for my lowly desktop to handle :(

@nd26 looking at the code above your comment
WeightedCategoricalCrossEntropy({0: 1.0, 1: 29.6, 2: 17.69, 3: 27.08, 4: 11.04, 5: 45.45, 6: 136.344})

I think this suggests no, you dont pass the classes into WeightedCategoricalCrossEntropy as one-hot representations.
(Unless you mean should the output matrix passed into model.fit be one-hot. I think these should still be one-hot)

whilst the @asiron code with @sry002 alteration seems to 'work' for me.

it is not only considerably slower than not weighting the loss, as well as forcing my computer out of memory.
I think thought this is just a case of me using too complex a model, with too much input data examples, for my lowly desktop to handle :(

@nd26 looking at the code above your comment
WeightedCategoricalCrossEntropy({0: 1.0, 1: 29.6, 2: 17.69, 3: 27.08, 4: 11.04, 5: 45.45, 6: 136.344})

I think this suggests no, you dont pass the classes into WeightedCategoricalCrossEntropy as one-hot representations.
(Unless you mean should the output matrix passed into model.fit be one-hot. I think these should still be one-hot)

@dickreuter

This comment has been minimized.

Show comment
Hide comment
@dickreuter

dickreuter Nov 24, 2017

Would there be a way to pass in weights that are different for each sample and give each individual sample item a weight if predicted accurately or not? I.e. different payoffs depending on the item?

Would there be a way to pass in weights that are different for each sample and give each individual sample item a weight if predicted accurately or not? I.e. different payoffs depending on the item?

@curiale

This comment has been minimized.

Show comment
Hide comment
@curiale

curiale Nov 24, 2017

Hi @dickreuter, I've managed to pass a weight for each sample just by adding a new layer into the clasification (y_true). Then, I modified the objective and metrics functions to properly unravel the weights before computing the operations.

curiale commented Nov 24, 2017

Hi @dickreuter, I've managed to pass a weight for each sample just by adding a new layer into the clasification (y_true). Then, I modified the objective and metrics functions to properly unravel the weights before computing the operations.

@dickreuter

This comment has been minimized.

Show comment
Hide comment
@dickreuter

dickreuter Nov 25, 2017

Do you have an example how this looks like? How do I split the tensor in the loss function to extract y_true and the weights?

Do you have an example how this looks like? How do I split the tensor in the loss function to extract y_true and the weights?

@mrgloom

This comment has been minimized.

Show comment
Hide comment
@mrgloom

mrgloom Dec 12, 2017

@tboquet Have you tested your code?
Seems like you need wrapper around partial to make things work, like described here http://louistiao.me/posts/adding-__name__-and-__doc__-attributes-to-functoolspartial-objects/

In my case I have tried weighted binary crossentropy:

from functools import partial, update_wrapper

def wrapped_partial(func, *args, **kwargs):
	partial_func = partial(func, *args, **kwargs)
	update_wrapper(partial_func, func)
	return partial_func

def binary_crossentropy_weigted(y_true, y_pred, class_weights):
	y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon())
	loss = K.mean(class_weights*(-y_true * K.log(y_pred) - (1.0 - y_true) * K.log(1.0 - y_pred)),axis=-1)
	return loss

custom_loss = wrapped_partial(binary_crossentropy_weigted, class_weights=np.array([1.0, 2.0]))

model.compile(optimizer=Adadelta(), loss=[custom_loss])

mrgloom commented Dec 12, 2017

@tboquet Have you tested your code?
Seems like you need wrapper around partial to make things work, like described here http://louistiao.me/posts/adding-__name__-and-__doc__-attributes-to-functoolspartial-objects/

In my case I have tried weighted binary crossentropy:

from functools import partial, update_wrapper

def wrapped_partial(func, *args, **kwargs):
	partial_func = partial(func, *args, **kwargs)
	update_wrapper(partial_func, func)
	return partial_func

def binary_crossentropy_weigted(y_true, y_pred, class_weights):
	y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon())
	loss = K.mean(class_weights*(-y_true * K.log(y_pred) - (1.0 - y_true) * K.log(1.0 - y_pred)),axis=-1)
	return loss

custom_loss = wrapped_partial(binary_crossentropy_weigted, class_weights=np.array([1.0, 2.0]))

model.compile(optimizer=Adadelta(), loss=[custom_loss])
@curiale

This comment has been minimized.

Show comment
Hide comment
@curiale

curiale Dec 12, 2017

Sorry for my late response @dickreuter. If you want to weight the batch with a unique spatial weight I recommend to use a similar option as the proposed by @stergioc instead of just a wrapped function. However, if you want to weight each sample in the batch with a particular weight you need to pass the weight inside the y_true. I didn't find another way to do that because it was imposible to me to identify the samples inside the batch. Just an example of what I did is:

class WeightedLoss(object):

  def __init__(self, alpha):
    self.alpha = alpha
    if K.image_dim_ordering() == 'th':
        self.stack_axis = 1
    else:
        self.stack_axis = -1
    self.__name__ = 'w_loss'


  def __call__(self, y_true, y_pred):
    return self.w_loss(y_true, y_pred)

  def w_loss(self, y_true, y_pred):
    # y_true should has the weight concatenated in the last dimension
    slice_stack = [slice(None) for i in range(y_true.get_shape().ndims)]
    slice_stack[self.stack_axis] = slice(2, None)
    weights = y_true[slice_stack]

    slice_stack[self.stack_axis] = slice(0,2)
    y_true = y_true[slice_stack]
    ........

curiale commented Dec 12, 2017

Sorry for my late response @dickreuter. If you want to weight the batch with a unique spatial weight I recommend to use a similar option as the proposed by @stergioc instead of just a wrapped function. However, if you want to weight each sample in the batch with a particular weight you need to pass the weight inside the y_true. I didn't find another way to do that because it was imposible to me to identify the samples inside the batch. Just an example of what I did is:

class WeightedLoss(object):

  def __init__(self, alpha):
    self.alpha = alpha
    if K.image_dim_ordering() == 'th':
        self.stack_axis = 1
    else:
        self.stack_axis = -1
    self.__name__ = 'w_loss'


  def __call__(self, y_true, y_pred):
    return self.w_loss(y_true, y_pred)

  def w_loss(self, y_true, y_pred):
    # y_true should has the weight concatenated in the last dimension
    slice_stack = [slice(None) for i in range(y_true.get_shape().ndims)]
    slice_stack[self.stack_axis] = slice(2, None)
    weights = y_true[slice_stack]

    slice_stack[self.stack_axis] = slice(0,2)
    y_true = y_true[slice_stack]
    ........

@JoaoLages

This comment has been minimized.

Show comment
Hide comment
@JoaoLages

JoaoLages Dec 21, 2017

Is there any way to use the weights for binary_crossentropy only for misclassification? The examples above use class weights but I only want to use the weight when a misclassification occurs

Is there any way to use the weights for binary_crossentropy only for misclassification? The examples above use class weights but I only want to use the weight when a misclassification occurs

@blakewest

This comment has been minimized.

Show comment
Hide comment
@blakewest

blakewest Jan 1, 2018

Hey all,
I'm using the weighted categorical cross entropy function described above by @ayalalazaro but it doesn't seem to work as expected. My understanding is if I pass a weight array of just 1's, then it should replicate what normally happens with Keras' categorical cross entropy. But that's not what I'm seeing. Here's some example code:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    cross_ent = K.categorical_crossentropy(y_pred, y_true, from_logits=False)
    return cross_ent * final_mask

w_array = np.ones((2,2))
custom_loss = partial(w_categorical_crossentropy, weights=w_array)
custom_loss.__name__ ='w_categorical_crossentropy'

default_model = Sequential([
    Dense(128, input_shape=(20,), activation="relu"),
    BatchNormalization(axis=1),
    Dropout(0.6),
    Dense(2, activation="sigmoid")
])
default_model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=["accuracy"])
default_model.optimizer.lr = 0.001
default_model.fit(x=trainable_data.values, y=train_target.values, validation_split=0.1, epochs=20, shuffle=True, batch_size=64)
## Epoch 20/20
## 2018/2018 [==============================] - 0s 73us/step - loss: 0.6188 - acc: 0.6571 ## - val_loss: 0.6402 - val_acc: 0.6222


# THEN USE CUSTOM LOSS, WHICH SHOULD BE THE SAME
custom_model = Sequential([
    Dense(128, input_shape=(20,), activation="relu"),
    BatchNormalization(axis=1),
    Dropout(0.6),
    Dense(2, activation="sigmoid")
])
custom_model.compile(optimizer='rmsprop', loss=custom_loss, metrics=["accuracy"])
custom_model.optimizer.lr = 0.001
## Epoch 20/20
## 2018/2018 [==============================] - 0s 90us/step - loss: 1.0241e-04 - acc: ## 0.6065 - val_loss: 3.9465e-06 - val_acc: 0.6089

Notice that the custom model pretty quickly gets to essentially zero loss. Which sounds cool, except it doesn't make any sense, and really it means my model stopped learning anything new after only a few epochs. It may be worth noting that I only actually have 2 classes here. I want to weight mis-classifications higher, and thought I could do so with the code above. But it doesn't seem to work. Anyone have any ideas for how I can weight mis-classifications higher on a binary problem?

Hey all,
I'm using the weighted categorical cross entropy function described above by @ayalalazaro but it doesn't seem to work as expected. My understanding is if I pass a weight array of just 1's, then it should replicate what normally happens with Keras' categorical cross entropy. But that's not what I'm seeing. Here's some example code:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    cross_ent = K.categorical_crossentropy(y_pred, y_true, from_logits=False)
    return cross_ent * final_mask

w_array = np.ones((2,2))
custom_loss = partial(w_categorical_crossentropy, weights=w_array)
custom_loss.__name__ ='w_categorical_crossentropy'

default_model = Sequential([
    Dense(128, input_shape=(20,), activation="relu"),
    BatchNormalization(axis=1),
    Dropout(0.6),
    Dense(2, activation="sigmoid")
])
default_model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=["accuracy"])
default_model.optimizer.lr = 0.001
default_model.fit(x=trainable_data.values, y=train_target.values, validation_split=0.1, epochs=20, shuffle=True, batch_size=64)
## Epoch 20/20
## 2018/2018 [==============================] - 0s 73us/step - loss: 0.6188 - acc: 0.6571 ## - val_loss: 0.6402 - val_acc: 0.6222


# THEN USE CUSTOM LOSS, WHICH SHOULD BE THE SAME
custom_model = Sequential([
    Dense(128, input_shape=(20,), activation="relu"),
    BatchNormalization(axis=1),
    Dropout(0.6),
    Dense(2, activation="sigmoid")
])
custom_model.compile(optimizer='rmsprop', loss=custom_loss, metrics=["accuracy"])
custom_model.optimizer.lr = 0.001
## Epoch 20/20
## 2018/2018 [==============================] - 0s 90us/step - loss: 1.0241e-04 - acc: ## 0.6065 - val_loss: 3.9465e-06 - val_acc: 0.6089

Notice that the custom model pretty quickly gets to essentially zero loss. Which sounds cool, except it doesn't make any sense, and really it means my model stopped learning anything new after only a few epochs. It may be worth noting that I only actually have 2 classes here. I want to weight mis-classifications higher, and thought I could do so with the code above. But it doesn't seem to work. Anyone have any ideas for how I can weight mis-classifications higher on a binary problem?

@blakewest

This comment has been minimized.

Show comment
Hide comment
@blakewest

blakewest Jan 1, 2018

@ayalalazaro OK, so I found the error. A silly, but big one. The function listed above returns K.categorical_crossentropy(y_pred, y_true). But I checked the source code here, and that flips the arguments. The real signature is K.categorical_crossentropy(y_true y_pred, from_logits=False). Truth goes first, then predictions.

Once I made that switch, it started working!

blakewest commented Jan 1, 2018

@ayalalazaro OK, so I found the error. A silly, but big one. The function listed above returns K.categorical_crossentropy(y_pred, y_true). But I checked the source code here, and that flips the arguments. The real signature is K.categorical_crossentropy(y_true y_pred, from_logits=False). Truth goes first, then predictions.

Once I made that switch, it started working!

@enikkari

This comment has been minimized.

Show comment
Hide comment
@enikkari

enikkari Jan 5, 2018

Hi,
I use Keras 2.0.8 and Python 2.7.12
I tried to run this and get the output

$ python testt.py
Using TensorFlow backend.
60000 train samples
10000 test samples
Traceback (most recent call last):
File "testt.py", line 69, in
model.compile(loss=ncce, optimizer=rms)
File "build/bdist.linux-x86_64/egg/keras/models.py", line 784, in compile
File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 850, in compile
File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 450, in weighted
File "testt.py", line 29, in w_categorical_crossentropy
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 881, in r_binary_op_wrapper
return func(x, y, name=name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1088, in _mul_dispatch
return gen_math_ops._mul(x, y, name=name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1449, in _mul
result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 589, in apply_op
param_name=input_name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 60, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'x' has DataType bool not in list of allowed values: float16, float32, float64, uint8, int8, uint16, int16, int32, int64, complex64, complex128

enikkari commented Jan 5, 2018

Hi,
I use Keras 2.0.8 and Python 2.7.12
I tried to run this and get the output

$ python testt.py
Using TensorFlow backend.
60000 train samples
10000 test samples
Traceback (most recent call last):
File "testt.py", line 69, in
model.compile(loss=ncce, optimizer=rms)
File "build/bdist.linux-x86_64/egg/keras/models.py", line 784, in compile
File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 850, in compile
File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 450, in weighted
File "testt.py", line 29, in w_categorical_crossentropy
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 881, in r_binary_op_wrapper
return func(x, y, name=name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1088, in _mul_dispatch
return gen_math_ops._mul(x, y, name=name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1449, in _mul
result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 589, in apply_op
param_name=input_name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 60, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'x' has DataType bool not in list of allowed values: float16, float32, float64, uint8, int8, uint16, int16, int32, int64, complex64, complex128

@davideboschetto

This comment has been minimized.

Show comment
Hide comment
@davideboschetto

davideboschetto Jan 8, 2018

Is this still the best approach? @fchollet just to recap the problem: having a classification problem in which i have images of cats, dogs and snakes I need to penalize twice as much the case in which a snake is classified as dog than the other cases. Do we really need to go through partial to do this?

Is this still the best approach? @fchollet just to recap the problem: having a classification problem in which i have images of cats, dogs and snakes I need to penalize twice as much the case in which a snake is classified as dog than the other cases. Do we really need to go through partial to do this?

@dickreuter

This comment has been minimized.

Show comment
Hide comment
@dickreuter

dickreuter Feb 9, 2018

I want to build a binary classifier that does the following with one input neuron (giving x) and one output neuron:

If the output neuron is 0: the payoff is 0
If the output neuron is 1 and correct: the payoff is +1
If the output neuron is 1 and incorrect: the payoff is -x (x is different for each individual sample)
How can I maximize the payoff with a neural network?

How can I create a loss function that would do that? Can I use keras directly or do I need a custom loss function? Does the loss function have to be differentiable? Can I use binary cross entropy or even mse?

I want to build a binary classifier that does the following with one input neuron (giving x) and one output neuron:

If the output neuron is 0: the payoff is 0
If the output neuron is 1 and correct: the payoff is +1
If the output neuron is 1 and incorrect: the payoff is -x (x is different for each individual sample)
How can I maximize the payoff with a neural network?

How can I create a loss function that would do that? Can I use keras directly or do I need a custom loss function? Does the loss function have to be differentiable? Can I use binary cross entropy or even mse?

@blakewest

This comment has been minimized.

Show comment
Hide comment
@blakewest

blakewest Feb 9, 2018

@dickreuter you can do this with keras, but you need a custom loss function. And loss functions always minimize a number, so if you want to "maximize" a payoff, you should just flip your payoffs and make them negative. Then the optimizer will make it the most negative it can, which is equivalent to maximizing.
Now, wanting a different payoff for each X sounds tricky. Probably possible by doing some sorcery where you set shuffle to False, and keep track of the batches or something, but I'm not sure exactly how. Could you use an average or median? If so, then you can use the code listed above in this issue to create the custom loss function, and then just minimize it. You might at least try the average/median approach, and see if it helps your problem. If it does, then you could investigate further optimization by trying to get a different loss for each X sample.

blakewest commented Feb 9, 2018

@dickreuter you can do this with keras, but you need a custom loss function. And loss functions always minimize a number, so if you want to "maximize" a payoff, you should just flip your payoffs and make them negative. Then the optimizer will make it the most negative it can, which is equivalent to maximizing.
Now, wanting a different payoff for each X sounds tricky. Probably possible by doing some sorcery where you set shuffle to False, and keep track of the batches or something, but I'm not sure exactly how. Could you use an average or median? If so, then you can use the code listed above in this issue to create the custom loss function, and then just minimize it. You might at least try the average/median approach, and see if it helps your problem. If it does, then you could investigate further optimization by trying to get a different loss for each X sample.

@dickreuter

This comment has been minimized.

Show comment
Hide comment
@dickreuter

dickreuter Feb 9, 2018

No, I can't take the average or median as each sample has distinct features (I say it's has just one input neuron, but in reality, there are additional input neurons).

No, I can't take the average or median as each sample has distinct features (I say it's has just one input neuron, but in reality, there are additional input neurons).

@blakewest

This comment has been minimized.

Show comment
Hide comment
@blakewest

blakewest Feb 9, 2018

I know. I meant use the average/median for your loss function. I did not mean change your X's. Just pick some payoff for each X that is a reasonable default guess. I don't know your domain, so I can't comment further. But was just saying, if the custom loss function you're talking about will actually improve your model, then it would likely still improve it (over simply binary cross entropy) even if you use an average or a median. If you see improvement over binary cross entropy, then you can try to optimize further by figuring out how to have a custom payoff for each sample.

I know. I meant use the average/median for your loss function. I did not mean change your X's. Just pick some payoff for each X that is a reasonable default guess. I don't know your domain, so I can't comment further. But was just saying, if the custom loss function you're talking about will actually improve your model, then it would likely still improve it (over simply binary cross entropy) even if you use an average or a median. If you see improvement over binary cross entropy, then you can try to optimize further by figuring out how to have a custom payoff for each sample.

@dickreuter

This comment has been minimized.

Show comment
Hide comment
@dickreuter

dickreuter Feb 9, 2018

I don't think this would work in my case, as the model would need to punish large negative payoffs more than small positive ones, so that the payoff can be maximized.

I don't think this would work in my case, as the model would need to punish large negative payoffs more than small positive ones, so that the payoff can be maximized.

@dickreuter

This comment has been minimized.

Show comment
Hide comment
@dickreuter

dickreuter Feb 10, 2018

Let me rephrase the problem again:
Is there a way in keras or tensorflow to give samples an extra weight if they are incorrectly classified only. i. e. a combination of class weight and sample weight but only apply the sample weight for one of the outcome in a binary class (averaging is not an option)? How can this be achieved?

Let me rephrase the problem again:
Is there a way in keras or tensorflow to give samples an extra weight if they are incorrectly classified only. i. e. a combination of class weight and sample weight but only apply the sample weight for one of the outcome in a binary class (averaging is not an option)? How can this be achieved?

@tharuniitk

This comment has been minimized.

Show comment
Hide comment
@tharuniitk

tharuniitk Feb 13, 2018

@curiale I have an issue that seems to have no straight forward solution in Keras. My server runs on ubuntu 14.04, Keras with backend tensorflow. It has 4 Nvidia Geforce gtx1080 GPUs.

I am trying to test the best available implementation of weighted categorical cross entropy(#2115 commented on Jan20,2017).

The input array Xtrain is of shape (800,40) where 800 indicates the number of samples and 40 represents the input feature dimension. Similarly Xtest is of shape (400,40). The problem is of a multiclass scenario where the number of classes is three. Following code is used to implement but an error is showing up indicating a GPU and batchsize mismatch, which is difficult to address, please provide some pointers to address this.

import keras
from keras.models import Sequential, Model, load_model
from keras.layers.embeddings import Embedding
from keras.layers.core import Activation, Dense, Dropout, Reshape
from keras.optimizers import SGD, Adam, RMSprop
#from keras.layers import TimeDistributed,Merge, Conv1D, Conv2D, Flatten, MaxPooling2D, Conv2DTranspose, UpSampling2D, RepeatVector
#from 

keras.layers.recurrent import GRU, LSTM
#from keras.datasets.data_utils import get_file
#import tarfile
from functools import partial, update_wrapper
from keras.callbacks import TensorBoard
from time import time
from sklearn.model_selection import KFold
import numpy as np
from keras.callbacks import EarlyStopping
import tensorflow as tf
import scipy.io
from keras import backend as K
from keras.layers import Input, Lambda
import os
from keras import optimizers
from matplotlib import pyplot
from sklearn.preprocessing import MinMaxScaler
#os.export CUDA_VISIBLE_DEVICES="0,1"
import keras, sys
from matplotlib import pyplot
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
#from keras.utils import np_utils
from itertools import product
from keras.layers import Input

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = weights.shape[1]#len(weights[0,:])
    print weights.shape
    print nb_cl
    print y_pred
    print y_true
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)#returns maximum value along an axis in a tensor
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] *y_pred_max_mat[:, c_p]*y_true[:, c_t])
    #ypred_tensor=K.constant(y_pred,dtype=K.set_floatx('float32'))
    #ytrue_tensor=K.constant(y_true,dtype=K.set_floatx('float32'))
    return K.categorical_crossentropy(y_true,y_pred) * final_mask

def get_mat_data(add,in1,in2):
    # Assuming sample_matlab_file.mat has 2 matrices A and B
    matData = scipy.io.loadmat(add)
    matrixA = matData[in1]
    matrixA1 = matData[in2]
    matrixB = matData['Ytrain']
    matrixB1 = matData['Ytest']
    weights = matData['w']
    matrixC = matData['Ytrainclassify']
    matrixC1 = matData['Ytestclassify']
    nfold = matData['nfold']
    return matrixA, matrixA1, matrixB, matrixB1, weights, matrixC, matrixC1, nfold 
def wrapped_partial(func, *args, **kwargs):
    partial_func = partial(func, *args, **kwargs)
    update_wrapper(partial_func, func)
    return partial_func

def gen_model():
    input = Input(shape=(40,))  
    #m1=Sequential()
    # m1.add(conv_model)
    # #m1.add(Conv2D(15, (5,5), strides=(1, 1),activation='relu', input_shape=(1,30,125), kernel_initializer='glorot_uniform'))#temporal filters theano
    # m1.add(Dropout(0.2))
    # #m1.add(Conv2D(15, (5,1), strides=(1, 1),activation='relu',kernel_initializer='glorot_uniform'))#spatial filters
    # #m1.add(Dropout(0.2))
    # m1.add(Flatten())
    # m1.add(Dropout(0.2))
    x1 =(Dense(200,activation='relu',name='dense_1'))(input)
    x2 =(Dropout(0.2))(x1)
    x3 =(Dense(100,activation='relu',name='dense_2'))(x2)
    x4 =(Dropout(0.2))(x3)
    x5 =(Dense(3,activation='softmax',name='softmax_layer'))(x4)
    model = Model(input=input, output=[x5])
    return model

    in1 = 'Xtrain'
    in2 = 'Xtest'
    add = '/home/tharun/all_mat_files/test_keras.mat'
    Xtrain, Xtest, Ytrain, Ytest, weights, Ytrainclassify, Ytestclassify, nfold = get_mat_data(add,in1,in2)
    nb_classes = 3
    print Xtrain.shape, Xtest.shape, Ytrain.shape, Ytest.shape, weights.shape,Ytrainclassify.shape, Ytestclassify.shape
    wts = np.array([[1/weights[:,0], 1, 1],[1, 1/weights[:,1], 1],[1, 1, 1/weights[:,2]]])
    print 'wts:' 
    print wts.shape
    # convert class vectors to binary class matrices
    Y_train = keras.utils.to_categorical(Ytrainclassify[:,None], nb_classes)
    Y_test = keras.utils.to_categorical(Ytestclassify[:,None], nb_classes)
    Xtrain=Xtrain.astype('float32')
    Xtest=Xtest.astype('float32')

    print Xtrain.shape
    print Y_train.shape
    print Xtest.shape
    print Y_test.shape
    ncce = wrapped_partial(w_categorical_crossentropy, wts)
    batch_size = 10
    nb_classes = 3
    nb_epoch = 1
    model=gen_model()
    #model.compile(loss=ncce, optimizer="adam")
    model.summary()
    rms = SGD()
    model.compile(loss=ncce, optimizer=rms)

    model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
    score = model.evaluate(Xtest, Y_test)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])

    #saving weights
    model.save('model_classify_weights.h5')

Error:

python /home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py 

/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
(800, 40) (400, 40) (800, 1) (400, 1) (1, 3) (800, 1) (400, 1)
wts:
(3, 3)
(800, 40)
(800, 3)
(400, 40)
(400, 3)
/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py:129: UserWarning: Update your `Model` call to the Keras 2 API: `Model(outputs=[<tf.Tenso..., inputs=Tensor("in...)`
  model = Model(input=input, output=[x5])
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 40)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 200)               8200      
_________________________________________________________________
dropout_1 (Dropout)          (None, 200)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 100)               20100     
_________________________________________________________________
dropout_2 (Dropout)          (None, 100)               0         
_________________________________________________________________
softmax_layer (Dense)        (None, 3)                 303       
=================================================================
Total params: 28,603
Trainable params: 28,603
Non-trainable params: 0
_________________________________________________________________
(?, 3)
3
Tensor("softmax_layer_target:0", shape=(?, ?), dtype=float32)
[[array([1.41292294]) 1 1]
 [1 array([7.328564]) 1]
 [1 1 array([2.38611435])]]
/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py:176: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.
  model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
Epoch 1/1
2018-02-13 15:41:44.382214: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-02-13 15:41:44.758387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:05:00.0
totalMemory: 7.92GiB freeMemory: 7.42GiB
2018-02-13 15:41:44.992640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:06:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-02-13 15:41:45.225696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 2 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:09:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-02-13 15:41:45.458070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 3 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:0a:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-02-13 15:41:45.461078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-02-13 15:41:45.461151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 2 3 
2018-02-13 15:41:45.461160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y Y Y Y 
2018-02-13 15:41:45.461165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   Y Y Y Y 
2018-02-13 15:41:45.461170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 2:   Y Y Y Y 
2018-02-13 15:41:45.461175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 3:   Y Y Y Y 
2018-02-13 15:41:45.461191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:05:00.0, compute capability: 6.1)
2018-02-13 15:41:45.461198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:06:00.0, compute capability: 6.1)
2018-02-13 15:41:45.461204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: GeForce GTX 1080, pci bus id: 0000:09:00.0, compute capability: 6.1)
2018-02-13 15:41:45.461209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: GeForce GTX 1080, pci bus id: 0000:0a:00.0, compute capability: 6.1)
Traceback (most recent call last):
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 239, in <module>
    main()
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 176, in main
    model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 1598, in fit
    validation_steps=validation_steps)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 1183, in _fit_loop
    outs = f(ins_batch)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2273, in __call__
    **self.session_kwargs)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [3] vs. [10]
     [[Node: training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/softmax_layer_loss/mul_20"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape, training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape_1)]]
     [[Node: loss/mul/_19 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_806_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op u'training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/BroadcastGradientArgs', defined at:
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 239, in <module>
    main()
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 176, in main
    model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 1575, in fit
    self._make_train_function()
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 960, in _make_train_function
    loss=self.total_loss)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/optimizers.py", line 156, in get_updates
    grads = self.get_gradients(loss, params)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/optimizers.py", line 73, in get_gradients
    grads = K.gradients(loss, params)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2310, in gradients
    return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/math_grad.py", line 742, in _MulGrad
    rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 532, in _broadcast_gradient_args
    "BroadcastGradientArgs", s0=s0, s1=s1, name=name)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

...which was originally created as op u'loss/softmax_layer_loss/mul_20', defined at:
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 239, in <module>
    main()
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 174, in main
    model.compile(loss=ncce, optimizer=rms)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 850, in compile
    sample_weight, mask)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 466, in weighted
    score_array *= weights
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 894, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1117, in _mul_dispatch
    return gen_math_ops._mul(x, y, name=name)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2726, in _mul
    "Mul", x=x, y=y, name=name)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [3] vs. [10]
     [[Node: training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/softmax_layer_loss/mul_20"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape, training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape_1)]]
     [[Node: loss/mul/_19 = _Recv[client_terminated=false, recv_device="/job:loc

@curiale I have an issue that seems to have no straight forward solution in Keras. My server runs on ubuntu 14.04, Keras with backend tensorflow. It has 4 Nvidia Geforce gtx1080 GPUs.

I am trying to test the best available implementation of weighted categorical cross entropy(#2115 commented on Jan20,2017).

The input array Xtrain is of shape (800,40) where 800 indicates the number of samples and 40 represents the input feature dimension. Similarly Xtest is of shape (400,40). The problem is of a multiclass scenario where the number of classes is three. Following code is used to implement but an error is showing up indicating a GPU and batchsize mismatch, which is difficult to address, please provide some pointers to address this.

import keras
from keras.models import Sequential, Model, load_model
from keras.layers.embeddings import Embedding
from keras.layers.core import Activation, Dense, Dropout, Reshape
from keras.optimizers import SGD, Adam, RMSprop
#from keras.layers import TimeDistributed,Merge, Conv1D, Conv2D, Flatten, MaxPooling2D, Conv2DTranspose, UpSampling2D, RepeatVector
#from 

keras.layers.recurrent import GRU, LSTM
#from keras.datasets.data_utils import get_file
#import tarfile
from functools import partial, update_wrapper
from keras.callbacks import TensorBoard
from time import time
from sklearn.model_selection import KFold
import numpy as np
from keras.callbacks import EarlyStopping
import tensorflow as tf
import scipy.io
from keras import backend as K
from keras.layers import Input, Lambda
import os
from keras import optimizers
from matplotlib import pyplot
from sklearn.preprocessing import MinMaxScaler
#os.export CUDA_VISIBLE_DEVICES="0,1"
import keras, sys
from matplotlib import pyplot
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
#from keras.utils import np_utils
from itertools import product
from keras.layers import Input

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = weights.shape[1]#len(weights[0,:])
    print weights.shape
    print nb_cl
    print y_pred
    print y_true
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)#returns maximum value along an axis in a tensor
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] *y_pred_max_mat[:, c_p]*y_true[:, c_t])
    #ypred_tensor=K.constant(y_pred,dtype=K.set_floatx('float32'))
    #ytrue_tensor=K.constant(y_true,dtype=K.set_floatx('float32'))
    return K.categorical_crossentropy(y_true,y_pred) * final_mask

def get_mat_data(add,in1,in2):
    # Assuming sample_matlab_file.mat has 2 matrices A and B
    matData = scipy.io.loadmat(add)
    matrixA = matData[in1]
    matrixA1 = matData[in2]
    matrixB = matData['Ytrain']
    matrixB1 = matData['Ytest']
    weights = matData['w']
    matrixC = matData['Ytrainclassify']
    matrixC1 = matData['Ytestclassify']
    nfold = matData['nfold']
    return matrixA, matrixA1, matrixB, matrixB1, weights, matrixC, matrixC1, nfold 
def wrapped_partial(func, *args, **kwargs):
    partial_func = partial(func, *args, **kwargs)
    update_wrapper(partial_func, func)
    return partial_func

def gen_model():
    input = Input(shape=(40,))  
    #m1=Sequential()
    # m1.add(conv_model)
    # #m1.add(Conv2D(15, (5,5), strides=(1, 1),activation='relu', input_shape=(1,30,125), kernel_initializer='glorot_uniform'))#temporal filters theano
    # m1.add(Dropout(0.2))
    # #m1.add(Conv2D(15, (5,1), strides=(1, 1),activation='relu',kernel_initializer='glorot_uniform'))#spatial filters
    # #m1.add(Dropout(0.2))
    # m1.add(Flatten())
    # m1.add(Dropout(0.2))
    x1 =(Dense(200,activation='relu',name='dense_1'))(input)
    x2 =(Dropout(0.2))(x1)
    x3 =(Dense(100,activation='relu',name='dense_2'))(x2)
    x4 =(Dropout(0.2))(x3)
    x5 =(Dense(3,activation='softmax',name='softmax_layer'))(x4)
    model = Model(input=input, output=[x5])
    return model

    in1 = 'Xtrain'
    in2 = 'Xtest'
    add = '/home/tharun/all_mat_files/test_keras.mat'
    Xtrain, Xtest, Ytrain, Ytest, weights, Ytrainclassify, Ytestclassify, nfold = get_mat_data(add,in1,in2)
    nb_classes = 3
    print Xtrain.shape, Xtest.shape, Ytrain.shape, Ytest.shape, weights.shape,Ytrainclassify.shape, Ytestclassify.shape
    wts = np.array([[1/weights[:,0], 1, 1],[1, 1/weights[:,1], 1],[1, 1, 1/weights[:,2]]])
    print 'wts:' 
    print wts.shape
    # convert class vectors to binary class matrices
    Y_train = keras.utils.to_categorical(Ytrainclassify[:,None], nb_classes)
    Y_test = keras.utils.to_categorical(Ytestclassify[:,None], nb_classes)
    Xtrain=Xtrain.astype('float32')
    Xtest=Xtest.astype('float32')

    print Xtrain.shape
    print Y_train.shape
    print Xtest.shape
    print Y_test.shape
    ncce = wrapped_partial(w_categorical_crossentropy, wts)
    batch_size = 10
    nb_classes = 3
    nb_epoch = 1
    model=gen_model()
    #model.compile(loss=ncce, optimizer="adam")
    model.summary()
    rms = SGD()
    model.compile(loss=ncce, optimizer=rms)

    model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
    score = model.evaluate(Xtest, Y_test)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])

    #saving weights
    model.save('model_classify_weights.h5')

Error:

python /home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py 

/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
(800, 40) (400, 40) (800, 1) (400, 1) (1, 3) (800, 1) (400, 1)
wts:
(3, 3)
(800, 40)
(800, 3)
(400, 40)
(400, 3)
/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py:129: UserWarning: Update your `Model` call to the Keras 2 API: `Model(outputs=[<tf.Tenso..., inputs=Tensor("in...)`
  model = Model(input=input, output=[x5])
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 40)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 200)               8200      
_________________________________________________________________
dropout_1 (Dropout)          (None, 200)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 100)               20100     
_________________________________________________________________
dropout_2 (Dropout)          (None, 100)               0         
_________________________________________________________________
softmax_layer (Dense)        (None, 3)                 303       
=================================================================
Total params: 28,603
Trainable params: 28,603
Non-trainable params: 0
_________________________________________________________________
(?, 3)
3
Tensor("softmax_layer_target:0", shape=(?, ?), dtype=float32)
[[array([1.41292294]) 1 1]
 [1 array([7.328564]) 1]
 [1 1 array([2.38611435])]]
/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py:176: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.
  model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
Epoch 1/1
2018-02-13 15:41:44.382214: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-02-13 15:41:44.758387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:05:00.0
totalMemory: 7.92GiB freeMemory: 7.42GiB
2018-02-13 15:41:44.992640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:06:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-02-13 15:41:45.225696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 2 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:09:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-02-13 15:41:45.458070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 3 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:0a:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-02-13 15:41:45.461078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-02-13 15:41:45.461151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 2 3 
2018-02-13 15:41:45.461160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y Y Y Y 
2018-02-13 15:41:45.461165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   Y Y Y Y 
2018-02-13 15:41:45.461170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 2:   Y Y Y Y 
2018-02-13 15:41:45.461175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 3:   Y Y Y Y 
2018-02-13 15:41:45.461191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:05:00.0, compute capability: 6.1)
2018-02-13 15:41:45.461198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:06:00.0, compute capability: 6.1)
2018-02-13 15:41:45.461204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: GeForce GTX 1080, pci bus id: 0000:09:00.0, compute capability: 6.1)
2018-02-13 15:41:45.461209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: GeForce GTX 1080, pci bus id: 0000:0a:00.0, compute capability: 6.1)
Traceback (most recent call last):
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 239, in <module>
    main()
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 176, in main
    model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 1598, in fit
    validation_steps=validation_steps)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 1183, in _fit_loop
    outs = f(ins_batch)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2273, in __call__
    **self.session_kwargs)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [3] vs. [10]
     [[Node: training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/softmax_layer_loss/mul_20"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape, training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape_1)]]
     [[Node: loss/mul/_19 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_806_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op u'training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/BroadcastGradientArgs', defined at:
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 239, in <module>
    main()
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 176, in main
    model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 1575, in fit
    self._make_train_function()
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 960, in _make_train_function
    loss=self.total_loss)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/optimizers.py", line 156, in get_updates
    grads = self.get_gradients(loss, params)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/optimizers.py", line 73, in get_gradients
    grads = K.gradients(loss, params)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2310, in gradients
    return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/math_grad.py", line 742, in _MulGrad
    rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 532, in _broadcast_gradient_args
    "BroadcastGradientArgs", s0=s0, s1=s1, name=name)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

...which was originally created as op u'loss/softmax_layer_loss/mul_20', defined at:
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 239, in <module>
    main()
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 174, in main
    model.compile(loss=ncce, optimizer=rms)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 850, in compile
    sample_weight, mask)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 466, in weighted
    score_array *= weights
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 894, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1117, in _mul_dispatch
    return gen_math_ops._mul(x, y, name=name)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2726, in _mul
    "Mul", x=x, y=y, name=name)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [3] vs. [10]
     [[Node: training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/softmax_layer_loss/mul_20"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape, training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape_1)]]
     [[Node: loss/mul/_19 = _Recv[client_terminated=false, recv_device="/job:loc
@marleymwangi

This comment has been minimized.

Show comment
Hide comment
@marleymwangi

marleymwangi May 22, 2018

hey, i have an imbalanced data set. I was hoping to use the weighted cost to help with classification since it would always end up predicting only one outcome(in my case 0). I was hoping for some help in building the cost matrix. I have 3 classes 1:1270, 0:7145. -1:1260 so from the above examples it would be a 3 by 3 matrix, picking the values to fill the matrix is the problem?

if i could also penalize wrong prediction of 1 as -1 or vice versa that would be great

marleymwangi commented May 22, 2018

hey, i have an imbalanced data set. I was hoping to use the weighted cost to help with classification since it would always end up predicting only one outcome(in my case 0). I was hoping for some help in building the cost matrix. I have 3 classes 1:1270, 0:7145. -1:1260 so from the above examples it would be a 3 by 3 matrix, picking the values to fill the matrix is the problem?

if i could also penalize wrong prediction of 1 as -1 or vice versa that would be great

@machisuke

This comment has been minimized.

Show comment
Hide comment
@machisuke

machisuke May 27, 2018

In my case, lambda function worked fine.

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.expand_dims(y_pred_max, 1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask


w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2

loss = lambda y_true, y_pred: w_categorical_crossentropy(y_true, y_pred, weights=w_array)

model.compile(loss=loss,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

In my case, lambda function worked fine.

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.expand_dims(y_pred_max, 1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask


w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2

loss = lambda y_true, y_pred: w_categorical_crossentropy(y_true, y_pred, weights=w_array)

model.compile(loss=loss,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

@pooriaPoorsarvi

This comment has been minimized.

Show comment
Hide comment
@pooriaPoorsarvi

pooriaPoorsarvi Jul 19, 2018

this is my code , although it's a bit messy , it seems to work with RNNs as well :D

def getLoss(weights, rnn=True):
    def w_categorical_crossentropy(y_true, y_pred):
        nb_cl = len(weights)
        if(not rnn):
            final_mask = K.zeros_like(y_pred[:, 0])
            y_pred_max = K.max(y_pred, axis=1)
            y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
            y_pred_max_mat = K.equal(y_pred, y_pred_max)
            for c_p, c_t in product(range(nb_cl), range(nb_cl)):
                final_mask += ( weights[c_t, c_p] * K.cast(y_pred_max_mat, tf.float32)[:, c_p] * K.cast(y_true, tf.float32)[:, c_t]  )
            return K.categorical_crossentropy(y_pred, y_true) * final_mask 
        else:
            final_mask = K.zeros_like(y_pred[:, :,0])
            y_pred_max = K.max(y_pred, axis=2)
            y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], K.shape(y_pred)[1], 1))
            y_pred_max_mat = K.equal(y_pred, y_pred_max)
            for c_p, c_t in product(range(nb_cl), range(nb_cl)):
                final_mask += ( weights[c_t, c_p] * K.cast(y_pred_max_mat, tf.float32)[:, :,c_p] * K.cast(y_true, tf.float32)[:, :,c_t]  )
            return K.categorical_crossentropy(y_pred, y_true) * final_mask       
    return w_categorical_crossentropy

pooriaPoorsarvi commented Jul 19, 2018

this is my code , although it's a bit messy , it seems to work with RNNs as well :D

def getLoss(weights, rnn=True):
    def w_categorical_crossentropy(y_true, y_pred):
        nb_cl = len(weights)
        if(not rnn):
            final_mask = K.zeros_like(y_pred[:, 0])
            y_pred_max = K.max(y_pred, axis=1)
            y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
            y_pred_max_mat = K.equal(y_pred, y_pred_max)
            for c_p, c_t in product(range(nb_cl), range(nb_cl)):
                final_mask += ( weights[c_t, c_p] * K.cast(y_pred_max_mat, tf.float32)[:, c_p] * K.cast(y_true, tf.float32)[:, c_t]  )
            return K.categorical_crossentropy(y_pred, y_true) * final_mask 
        else:
            final_mask = K.zeros_like(y_pred[:, :,0])
            y_pred_max = K.max(y_pred, axis=2)
            y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], K.shape(y_pred)[1], 1))
            y_pred_max_mat = K.equal(y_pred, y_pred_max)
            for c_p, c_t in product(range(nb_cl), range(nb_cl)):
                final_mask += ( weights[c_t, c_p] * K.cast(y_pred_max_mat, tf.float32)[:, :,c_p] * K.cast(y_true, tf.float32)[:, :,c_t]  )
            return K.categorical_crossentropy(y_pred, y_true) * final_mask       
    return w_categorical_crossentropy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment