New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this multi-class segmentation possible? #50
Comments
Hi @javadan , As you already noticed, this topic was discussed a few times now and to my knowledge there are 2 options:
You can use either
I'm sorry but I don't understand how this would work. Both sigmoid and softmax activation functions output values between 0..1 so I don't see how would they transform that output into what you're looking for (0..255). Again, from what I know in terms of multi-class image segmentation problem it all comes down to 2 different methods which I pointed out at the top of this answer. Can you read these again and let me know what holds you back from using these methods? Happy to discuss this further if you need |
Hi @karolzak Ok, I will let you know if I work out how to do it the way I'm describing. The multiple binary-segmentation methods should work fine for me, once I've turned my 5-class mask into 5 x 1-class masks. I'm just thinking ahead, in case I decide to add more classes later. If num_classes increases, Then with an integer-encoded single layer output, no changes would need to be made to the architecture or code, and the size of the network doesn't increase. With the layer per class methods, the architecture and code and size of the network increases with every new class. I imagine it's possible, as the occasional answer here and there seem to suggest that softmax and sparse_categorical_crossentropy could allow for integer encoding. (Perhaps the class ids are dividing by 255, to get them between 0 to 1, for training, and then multiplied by 255, to get back to 0 to 255 for the final PNG output). But anyway, was just finding out if you were familiar with integer-encoding multi-class segmentation. I'll give it a try, and will probably end up using one of your suggested methods, in the end, when it doesn't work. Thanks for your time |
Well I partially agree with this statement although the change is tiny and it's only the output tensor size that changes so it would never become a concern in terms of network size. On top of that if you write your training logic well then there's no need for code changes while retraining. As you can see above, in comparison to the overall network size the difference is omittable.
I read through these suggestions and run some experiments with Good luck and do let me know how it went! |
hello, Can ypu please help me underatand, how can i address the overlapping masks. I have 15 class masks as binary mask. I could use categorical cross entropy loss by one hot encoding the targets, but i manually removed the overlapping pixels from some of the classes. The output is not so accurate. Can i make use of the overlapped binary masks for 15 classes with binary cross entropy loss? thank you |
Hi @soans1994 |
Hi @karolzak |
@akashsindhu96 he meant I should just use 4 to represent the value 256. (Output layer would need to be |
Hi karolzak,
I've had some good progress training unet segmentation in the cloud with your library.
I can train binary segmentation fine now, at least.
Now I have RGB images, and png masks where np.unique(pixel_options) = [0 1 2 3 255]
I am interested in tracking class ids 1 and 3.
I've looked through your answers to others regarding multi-class segmentation, and it came down to using multiple binary segmentation networks, by running multiple Keras Sessions.
Otherwise, there was an option to +1 to num_classes and change the output shape to (n, w, h, num_classes). Then each class gets a full (w x h) binary mask of its own. (You are still using sigmoid and binary_crossentropy for this?)
From other blogs I've read, there's one-hot encoding techniques, where each class gets its own binary mask as output above, and then they use a softmax function and then argmax each pixel to get the winning class of each pixel.
Then there's what I think I'm interested in,
where images are (n, 256 , 256, 1) (i.e. gray-scale)
and masks are also (n, 256 , 256, 1) because the pixel options are just integers from 0 to 255.
That's what I want, as output, too. I'll read the prediction mask pixel values to get the class numbers.
I see Keras recommends 'sparse_categorical_crossentropy' as the loss function for this use case, and then it apparently doesn't matter if you use sigmoid or softmax.
But I'm just a bit stuck on whether this will work.
I'm thinking then, to take the resulting PNG with [0 1 2 3 255] values, and map each class to a colour.
Any advice on the 'integer' classes use case, as opposed to the 'one hot vector' style answers I've read in the other issues?
Thanks
The text was updated successfully, but these errors were encountered: