New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up KMeans #2558
Comments
@mazumdarparijat this is yours! :) |
I would like to take up this task. |
Mini-batch k-means is actually mentioned in the documentation briefly. Anyway, I agree this class should be designed in a better way. @mallikarjun26, the description above by @karlnapf gives a pretty good idea of what needs to be done. Basically, the implementations of the two different k-means variations available in Shogun are all right (each one has its own implementation file, KMeansLloydImpl and KMeansMiniBatchImpl). However, the interface is maybe too cohesive (CKMeans is responsible for both Lloyd and mini-batch training). It would be better to have two different classes. A way to go might be to use inheritance. For instance, a class CKMeans could implement the standard algorithm while a subclass CKMeansMiniBatch would override the training method. There may better designs though. This was just an example. It is up to you to come up with a good solution and implement it :-) |
In my view, the best way to approach this problem would be through using factory method pattern, i.e. by encapsulating the derived CKMeansMiniBatch and CKMeansLloydImpl into the base Kmeans. |
I think we should have all methods that are shared in the base class and subclasses only differ in a couple of lines how to compute the udpates |
Sorry, but I didn't get what you are trying to say. Would you please mind saying it again. |
I mean that I do not like the idea of having this factory pattern, a single class should have a fixed behaviour. Different behaviours/algorithms should be implemented in sub-classes. Put all functionality that is used by both KMeans algorithms into the base class. |
I agree with @karlnapf because I do not quite see the advantage of using the factory method pattern for our use case. With the pattern we would do stuff like:
whereas using just inheritance,
Above, CKMeans stands for a base class that contains common functionality and CMBKMeans and CLloydKMeans stand, respectively, for children classes with the particularities of mini-batch and Llody k-means. Again, I am not sure what would be the advantages of using this design pattern here. Do you see any, @abhinavagarwalla? |
The class is very messy. Since the mini batch extension, things got worse
The text was updated successfully, but these errors were encountered: