Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add L1 as a dissimilarity function option for continuous variables #169

Closed
imamilion opened this issue Feb 24, 2022 · 1 comment
Closed

Comments

@imamilion
Copy link

A k-prototypes model fit can't be pickle-saved when a user-defined dissimilarity metric is used (see my post on Stack Overflow).

It seems to me that the issue would be solved if that user-defined dissimilary metric is actually implemented in the module, next to jaccard_dissim, euclidean_dissim, etc. Thus, it would be great to have some more commonly used distance functions implemented in the package. In my personal case, I'd like to be able to use L1 (Manhattan distance).

@nicodv
Copy link
Owner

nicodv commented Feb 26, 2022

There is such a large variety of potential distance functions to use for numerical clustering that I prefer to leave it to the users to provide them. kmodes specializes more in categorical distance functions.

But of course, feel free to submit a PR to add the function. :)

As for the pickling error, I'm not able to reproduce it:

>>> import numpy as np
>>> from kmodes.kprototypes import KPrototypes

>>> def L1(a, b):
...    return np.sum(np.abs(a-b), axis=1)

>>> model = KPrototypes(n_clusters=20, gamma=1, num_dissim=L1, init='Cao')
>>> model

KPrototypes(gamma=1, n_clusters=20, num_dissim=<function L1 at 0x7fa526505090>)

>>> import pickle
>>> pickle.dumps(model)

b'\x80\x04\x95\xe8\x00\x00\x00\x00\x00\x00\x00\x8c\x12kmodes.kprototypes\x94\x8c\x0bKPrototypes\x94\x93\x94)\x81\x94}\x94(\x8c\nn_clusters\x94K\x14\x8c\x08max_iter\x94Kd\x8c\ncat_dissim\x94\x8c\x12kmodes.util.dissim\x94\x8c\x0fmatching_dissim\x94\x93\x94\x8c\x04init\x94\x8c\x03Cao\x94\x8c\x06n_init\x94K\n\x8c\x07verbose\x94K\x00\x8c\x0crandom_state\x94N\x8c\x06n_jobs\x94K\x01\x8c\nnum_dissim\x94\x8c\x08__main__\x94\x8c\x02L1\x94\x93\x94\x8c\x05gamma\x94K\x01ub.'

I suspect the problem lies with where you define your function. Have a look at this: https://www.pythonanywhere.com/forums/topic/27818/

A step-by-step reproducible example would help here.

@nicodv nicodv closed this as completed Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants