Clustering does not run on standalone mode #654

Closed
beam2d opened this Issue Feb 10, 2014 · 4 comments

Comments

Projects
None yet
3 participants
Owner

beam2d commented Feb 10, 2014

Since clustering runs k-means or GMM on MIX, clustering at standalone mode does not run any clustering. It also makes tests of driver (ongoing at #650) cannot test the most of its functionality. It may be fixed by implementing an API to manually invoke clustering with current internal states.

kmaehashi added this to the 0.5.3 milestone Feb 25, 2014

hido self-assigned this Mar 3, 2014

Contributor

hido commented Mar 3, 2014

On discussion 2014-03-03 with @beam2d:
Currently, batch_update of clustering model is only executed when MIX is called. By executing it also after each compression of coreset, clustering can work on standalone mode.

In the future, users may control which timing to run do_clustering in config options to balance the trade-off between clustering accuracy/latency and throughput. jubaclustering may also have a kind of do_clustering in the API in the future version.

Contributor

hido commented Mar 23, 2014

I found that clustering already works even for standalone mode when the current bucket size exceeds the config parameter bucket_size, by calling increment_revision() which is the only function to invoke update_clusters().

https://github.com/jubatus/jubatus/blob/0e9968dc7d307704a9897a1e5c85ddd38b9f517c/jubatus/core/clustering/simple_storage.cpp#L35-L37
https://github.com/jubatus/jubatus/blob/0e9968dc7d307704a9897a1e5c85ddd38b9f517c/jubatus/core/clustering/compressive_storage.cpp#L44-L53

I have added test for make it sure in f077b67 (fix_standalone_clustering branch).

In my opinion, there is nothing to do with it now.
For compressive_storage, it is natural that clustering result is updated every time the size exceeds the parameter and their compression happens.
For simple_storage, the size of stored points is actually NOT limited by bucket_size. Then, this parameter can be just a threshold to control the frequency of cluster updates.

Owner

kmaehashi commented Mar 24, 2014

Discussion from the meeting on 2014-03-24:

  • Merge added test for this case (@hido will open pull-req) and close this issue.

@hido hido modified the milestone: Near Future, 0.5.3 Mar 31, 2014

hido referenced this issue in jubatus/jubatus_core Jul 14, 2014

Closed

Add driver test for standalone clustering #32

Contributor

hido commented Jul 14, 2014

This has been moved to jubatus/jubatus_core#32

hido closed this Jul 14, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment