You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A k-prototypes model fit can't be pickle-saved when a user-defined dissimilarity metric is used (see my post on Stack Overflow).
It seems to me that the issue would be solved if that user-defined dissimilary metric is actually implemented in the module, next to jaccard_dissim, euclidean_dissim, etc. Thus, it would be great to have some more commonly used distance functions implemented in the package. In my personal case, I'd like to be able to use L1 (Manhattan distance).
The text was updated successfully, but these errors were encountered:
There is such a large variety of potential distance functions to use for numerical clustering that I prefer to leave it to the users to provide them. kmodes specializes more in categorical distance functions.
But of course, feel free to submit a PR to add the function. :)
As for the pickling error, I'm not able to reproduce it:
>>> import numpy as np
>>> from kmodes.kprototypes import KPrototypes
>>> def L1(a, b):
... return np.sum(np.abs(a-b), axis=1)
>>> model = KPrototypes(n_clusters=20, gamma=1, num_dissim=L1, init='Cao')
>>> model
KPrototypes(gamma=1, n_clusters=20, num_dissim=<function L1 at 0x7fa526505090>)
>>> import pickle
>>> pickle.dumps(model)
b'\x80\x04\x95\xe8\x00\x00\x00\x00\x00\x00\x00\x8c\x12kmodes.kprototypes\x94\x8c\x0bKPrototypes\x94\x93\x94)\x81\x94}\x94(\x8c\nn_clusters\x94K\x14\x8c\x08max_iter\x94Kd\x8c\ncat_dissim\x94\x8c\x12kmodes.util.dissim\x94\x8c\x0fmatching_dissim\x94\x93\x94\x8c\x04init\x94\x8c\x03Cao\x94\x8c\x06n_init\x94K\n\x8c\x07verbose\x94K\x00\x8c\x0crandom_state\x94N\x8c\x06n_jobs\x94K\x01\x8c\nnum_dissim\x94\x8c\x08__main__\x94\x8c\x02L1\x94\x93\x94\x8c\x05gamma\x94K\x01ub.'
A k-prototypes model fit can't be pickle-saved when a user-defined dissimilarity metric is used (see my post on Stack Overflow).
It seems to me that the issue would be solved if that user-defined dissimilary metric is actually implemented in the module, next to jaccard_dissim, euclidean_dissim, etc. Thus, it would be great to have some more commonly used distance functions implemented in the package. In my personal case, I'd like to be able to use L1 (Manhattan distance).
The text was updated successfully, but these errors were encountered: