Since clustering runs k-means or GMM on MIX, clustering at standalone mode does not run any clustering. It also makes tests of driver (ongoing at #650) cannot test the most of its functionality. It may be fixed by implementing an API to manually invoke clustering with current internal states.
On discussion 2014-03-03 with @beam2d:
Currently, batch_update of clustering model is only executed when MIX is called. By executing it also after each compression of coreset, clustering can work on standalone mode.
In the future, users may control which timing to run do_clustering in config options to balance the trade-off between clustering accuracy/latency and throughput. jubaclustering may also have a kind of do_clustering in the API in the future version.
I found that clustering already works even for standalone mode when the current bucket size exceeds the config parameter bucket_size, by calling increment_revision() which is the only function to invoke update_clusters().
I have added test for make it sure in f077b67 (fix_standalone_clustering branch).
In my opinion, there is nothing to do with it now.
For compressive_storage, it is natural that clustering result is updated every time the size exceeds the parameter and their compression happens.
For simple_storage, the size of stored points is actually NOT limited by bucket_size. Then, this parameter can be just a threshold to control the frequency of cluster updates.
Discussion from the meeting on 2014-03-24:
This has been moved to jubatus/jubatus_core#32