Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVC: Do not enforce that input data is of type np.float64 #10713

Closed
jamesqo opened this issue Feb 27, 2018 · 9 comments · Fixed by #11296
Closed

SVC: Do not enforce that input data is of type np.float64 #10713

jamesqo opened this issue Feb 27, 2018 · 9 comments · Fixed by #11296

Comments

@jamesqo
Copy link

jamesqo commented Feb 27, 2018

I am writing a text classification algorithm that uses an SVM based on tree kernels. I have the following code:

    for tree_kernel in 'ptk', 'sptk', 'csptk':
        kernel = TweetKernel(tree_kernel=tree_kernel)
        svc = SVC(kernel=kernel)
        svc.fit(X_train, y_train)

where X_train contains some text columns. Unfortunately this is erroring out at the following line:

X, y = check_X_y(X, y, dtype=np.float64, order='C', accept_sparse='csr')

Please fix so that if a callable is passed for the kernel argument, this restriction is not enforced.

@jamesqo
Copy link
Author

jamesqo commented Feb 27, 2018

This also happens when I try to call pairwise_kernels:

X, Y = check_pairwise_arrays(X, Y)

@jnothman
Copy link
Member

jnothman commented Feb 27, 2018 via email

@jnothman jnothman added Bug good first issue Easy with clear instructions to resolve help wanted and removed good first issue Easy with clear instructions to resolve labels Feb 27, 2018
@jnothman
Copy link
Member

We could just fix this specific case, or somehow create a common tests that if an estimator can fit with metric/affinity/kernel=some_function for numeric input that it will also allow text input and some basic text kernel (#common characters) or metric (? edit distance)

@FarahSaeed
Copy link
Contributor

Can I take this if any one else is not working on it?

@qmick
Copy link
Contributor

qmick commented Feb 27, 2018

Since check _X_y will do conversion for some inputs, if we remove check _X_y for text input with callable kernel, I am a little worried about some existing codes may use callable kernels that rely on this conversion.

@jnothman
Copy link
Member

jnothman commented Feb 27, 2018 via email

@jnothman
Copy link
Member

jnothman commented Feb 27, 2018 via email

@qmick
Copy link
Contributor

qmick commented Mar 1, 2018

Yes, that is it. But for the decimal strings case, It's unlikely used by many users and not in the documents. Not sure we should ignore this case (just leave all text input unchanged) or do something else.

@jnothman
Copy link
Member

jnothman commented Mar 1, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants