Speed issues #4

JamesOwers · 2015-08-11T17:03:04Z

Apologies for vague issue, I've not much time, I noticed that I got a speedup by converting my cluster centroids to integers. Granted my categories are integers and not strings, but you may be able to use np.unique() to swap out strings for integers.

To test this try generating a large matrix of integer classes (for me this was 70,000 rows by 20 columns with each column having 4 classes i.e. numbers 1 to 4) and running the following:

_labels_cost(X, np.uint8(km.cluster_centroids_))

vs.

_labels_cost(X, km.cluster_centroids_)

The text was updated successfully, but these errors were encountered:

nicodv · 2016-02-27T01:27:51Z

Thanks, that's a great idea!

Implemented in 5457e56, could give a huge speed boost to people using big strings in X.

nicodv closed this as completed Feb 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed issues #4

Speed issues #4

JamesOwers commented Aug 11, 2015

nicodv commented Feb 27, 2016

Speed issues #4

Speed issues #4

Comments

JamesOwers commented Aug 11, 2015

nicodv commented Feb 27, 2016