The reason is that cmatrix.trace()/cmatrix.sum() is so tempting
This is about twice as fast as doing it in Python for equal arrays, and possibly 1000x faster in case of early exit.
Removed unnecessary allocations and casts
A lot of data was being allocated when reusing old segments was possible... I took advantage of this as much as possible. I also optimized the distance normalization.
Added initial centroid parameter for kmeans
There are many applications where specifying the initial centroids for a kmeans run is useful. This could be in cases where new initialization methods are being considered or when some iterative kmeans algorithm is desired (ie: xmeans) This pull request adds the base functionality for this in milk.unsupervised.kmeans which should extend to all other related kmeans functions.
Full ChangeLog: * Fix MDS for non-array inputs * Fix MDS bug * Add return_* arguments to kmeans * Extend zscore() to work on non-ndarrays * Add frac_precluster_learner * Work with older C++ compilers
This is especially relevant for use with jug. This way, if only interested in one of the outputs, the return value is a single numpy array, which jug handles particularly well.
Previously, the distribution was not actually buildable. I have now been able to build on a Mac OS X.
Most important new "feature" is bundling of Eigen with source. Full ChangeLog - Add subspace projection kNN - Export ``pdist`` in milk namespace - Add Eigen to source distribution - Add measures.curves.roc - Add ``mds_dists`` function - Add ``verbose`` argument to milk.tests.run
For more informative output
This is actually a primitive to mds() which was implemented before