Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this multi-class segmentation possible? #50

Closed
javadan opened this issue Aug 15, 2021 · 7 comments
Closed

Is this multi-class segmentation possible? #50

javadan opened this issue Aug 15, 2021 · 7 comments

Comments

@javadan
Copy link

javadan commented Aug 15, 2021

Hi karolzak,

I've had some good progress training unet segmentation in the cloud with your library.
I can train binary segmentation fine now, at least.

Now I have RGB images, and png masks where np.unique(pixel_options) = [0 1 2 3 255]
I am interested in tracking class ids 1 and 3.

I've looked through your answers to others regarding multi-class segmentation, and it came down to using multiple binary segmentation networks, by running multiple Keras Sessions.

Otherwise, there was an option to +1 to num_classes and change the output shape to (n, w, h, num_classes). Then each class gets a full (w x h) binary mask of its own. (You are still using sigmoid and binary_crossentropy for this?)

From other blogs I've read, there's one-hot encoding techniques, where each class gets its own binary mask as output above, and then they use a softmax function and then argmax each pixel to get the winning class of each pixel.

Then there's what I think I'm interested in,
where images are (n, 256 , 256, 1) (i.e. gray-scale)
and masks are also (n, 256 , 256, 1) because the pixel options are just integers from 0 to 255.
That's what I want, as output, too. I'll read the prediction mask pixel values to get the class numbers.
I see Keras recommends 'sparse_categorical_crossentropy' as the loss function for this use case, and then it apparently doesn't matter if you use sigmoid or softmax.

But I'm just a bit stuck on whether this will work.

I'm thinking then, to take the resulting PNG with [0 1 2 3 255] values, and map each class to a colour.

Any advice on the 'integer' classes use case, as opposed to the 'one hot vector' style answers I've read in the other issues?

Thanks

@karolzak
Copy link
Owner

Hi @javadan ,

As you already noticed, this topic was discussed a few times now and to my knowledge there are 2 options:

  • separate model for each class with separate output mask for each class (256x256x1 with 0..1 values range)
  • separate binary mask for each class (256x256x1 with 0..1 values range), stacked together (256x256xNUM_CLASS with 0..1 values range) and used as a target mask for a single model with modified num_classes param

Otherwise, there was an option to +1 to num_classes and change the output shape to (n, w, h, num_classes). Then each class gets a full (w x h) binary mask of its own. (You are still using sigmoid and binary_crossentropy for this?)

You can use either sigmoid (if one pixel can belong to more than one class) or softmax (in case one pixel can only be connected with a single class) but for loss function I'm still using binary_crossentropy, yes.

Then there's what I think I'm interested in,
where images are (n, 256 , 256, 1) (i.e. gray-scale)
and masks are also (n, 256 , 256, 1) because the pixel options are just integers from 0 to 255.
That's what I want, as output, too. I'll read the prediction mask pixel values to get the class numbers.
I see Keras recommends 'sparse_categorical_crossentropy' as the loss function for this use case, and then it apparently doesn't matter if you use sigmoid or softmax.

I'm sorry but I don't understand how this would work. Both sigmoid and softmax activation functions output values between 0..1 so I don't see how would they transform that output into what you're looking for (0..255).
Closest to what you're trying to achieve is using linear regression but I don't see how that could work either.

Again, from what I know in terms of multi-class image segmentation problem it all comes down to 2 different methods which I pointed out at the top of this answer. Can you read these again and let me know what holds you back from using these methods?

Happy to discuss this further if you need

@javadan
Copy link
Author

javadan commented Aug 16, 2021

Hi @karolzak

Ok, I will let you know if I work out how to do it the way I'm describing.
Otherwise, I'll use one of the multiple binary-segmentation methods.

The multiple binary-segmentation methods should work fine for me, once I've turned my 5-class mask into 5 x 1-class masks. I'm just thinking ahead, in case I decide to add more classes later.

If num_classes increases,

Then with an integer-encoded single layer output, no changes would need to be made to the architecture or code, and the size of the network doesn't increase.

With the layer per class methods, the architecture and code and size of the network increases with every new class.

I imagine it's possible, as the occasional answer here and there seem to suggest that softmax and sparse_categorical_crossentropy could allow for integer encoding. (Perhaps the class ids are dividing by 255, to get them between 0 to 1, for training, and then multiplied by 255, to get back to 0 to 255 for the final PNG output).

But anyway, was just finding out if you were familiar with integer-encoding multi-class segmentation. I'll give it a try, and will probably end up using one of your suggested methods, in the end, when it doesn't work.

Thanks for your time

@javadan javadan closed this as completed Aug 16, 2021
@karolzak
Copy link
Owner

If num_classes increases,
Then with an integer-encoded single layer output, no changes would need to be made to the architecture or code, and the size of the network doesn't increase.
With the layer per class methods, the architecture and code and size of the network increases with every new class.

Well I partially agree with this statement although the change is tiny and it's only the output tensor size that changes so it would never become a concern in terms of network size. On top of that if you write your training logic well then there's no need for code changes while retraining. num_classes can be easily provided automatically based on your masks.
In fact in terms of trainable params it would barely change at all:

  • with num_classes=10
    image
  • with num_classes=1
    image

As you can see above, in comparison to the overall network size the difference is omittable.


I imagine it's possible, as the occasional answer here and there seem to suggest that softmax and sparse_categorical_crossentropy could allow for integer encoding. (Perhaps the class ids are dividing by 255, to get them between 0 to 1, for training, and then multiplied by 255, to get back to 0 to 255 for the final PNG output).

I read through these suggestions and run some experiments with sparse_categorical_crossentropy but it doesn't change much tbh. Yes you can pass in 256x256x1 tensor of integers as Y to calculate the loss function but it does not change the fact that the output tensor from the network still needs to be of shape 256x256xNUM_CLASSES (same as with binary_crossentropy) where NUM_CLASS==max_class_ID. If you use a mask with values like [0 1 2 3 256] your NUM_CLASS needs to be 256 so it would be best to encode 256 into 4 to avoid artificially blowing up the size of the output tensor.
So in fact the network size using sparse_categorical_crossentropy is the same as when using binary_crossentropy because for both of these the output network tensor would be of the same size/shape - the only difference is that for sparse you need target of shape 256x256x1 vs for binary you need 256x256xNUM_CLASS.

Good luck and do let me know how it went!

@soans1994
Copy link

hello,

Can ypu please help me underatand, how can i address the overlapping masks. I have 15 class masks as binary mask. I could use categorical cross entropy loss by one hot encoding the targets, but i manually removed the overlapping pixels from some of the classes. The output is not so accurate. Can i make use of the overlapped binary masks for 15 classes with binary cross entropy loss?

thank you

@karolzak
Copy link
Owner

Hi @soans1994
How big of an overlap are we talking about here?
I suspect the problem of your output not being so accurate might be caused by something else than just overlapping pixels.
When it comes to image segmentation for multiple classes what I found working best is training a separate binary classification model for each class

@akashsindhu96
Copy link

Hi @karolzak
Can you help me understand this line If you use a mask with values like [0 1 2 3 256] your NUM_CLASS needs to be 256 so it would be best to encode 256 into 4 to avoid artificially blowing up the size of the output tensor. Why NUM_CLASS needs to be 256 instead of 5?

@javadan
Copy link
Author

javadan commented Mar 1, 2023

@akashsindhu96 he meant I should just use 4 to represent the value 256. (Output layer would need to be
256x256x256 if I need it to output 0-256 but only needs to be 256x256x5 if I need it to output 0-4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants