Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KPrototypes fit_predict error: "could not convert string to float" #47

Closed
kroscek opened this issue Jul 11, 2017 · 4 comments
Closed

KPrototypes fit_predict error: "could not convert string to float" #47

kroscek opened this issue Jul 11, 2017 · 4 comments
Labels

Comments

@kroscek
Copy link

kroscek commented Jul 11, 2017

I want to apply Kprototype into my dataset but it seems that the code can't convert into numpy arrays?
km = kprototypes.KPrototypes(n_clusters=10, init='Cao', verbose=2)
train=pd.read_csv('/home/lemma/train.csv')
train['clusters_KModes'] = km.fit_predict(train1,categorical=[1])
ValueError: could not convert string to float: MJ

Trying to convert into object to match the example given also not successful:
km = kprototypes.KPrototypes(n_clusters=10, init='Cao', verbose=2)
train=pd.read_csv('/home/lemma/train.csv')
train1=train1.values.astype(object)
train['clusters_KModes'] = km.fit_predict(train1,categorical=[1])
ValueError: could not convert string to float: MJ

@nicodv
Copy link
Owner

nicodv commented Jul 11, 2017

Could you post the full traceback, so I can see where the error occurs exactly?

Also, what does your data look like?

@nicodv nicodv added the bug label Jul 11, 2017
@nicodv nicodv changed the title [HELP] Kprototype fit_predict error: could not convert string to float? KPrototypes fit_predict error: "could not convert string to float" Jul 11, 2017
@kroscek
Copy link
Author

kroscek commented Jul 15, 2017

My data contains both numerical (0.xxx)and categorical of one and two alphabet (A,B,XZ). Here is the snippet:

km = kprototypes.KPrototypes(n_clusters=10, init='Cao', verbose=2)
train1=pd.read_csv('/home/lemma/train.csv')
train1=train1.drop(['id','loss'],1)
train1=train1.values.astype(object)
train1['clusters_KModes'] = km.fit_predict(train1,categorical=[1,2,3])
image

@nicodv
Copy link
Owner

nicodv commented Jul 18, 2017

With categorical=[1,2,3], you're telling the algorithm that the second, third and fourth columns are categorical. But from your screenshot it looks like there's many more categorical variables. That why kmodes is trying to interpret the 'MJ' string as a float.

@nicodv nicodv closed this as completed Jul 18, 2017
@kroscek
Copy link
Author

kroscek commented Jul 19, 2017

Hi categorical=[1,2,3] is just for snipped only, The actual categorical written in python is ranging from 1-116. Thus I need to do treatment for categorical data because two alphabetic categorical cannot be converted into float?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants