-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 11 while Y.shape[1] == 1 #11367
Comments
Don't use fit_transform a second time if you want the feature representation to be the same for test data. Use transform. But this is not a forum for usage questions, it is a tracker for software development issues. |
Sometime I'll have new data not as train data.
My exact problem is this, please help whether this this vectorizer solve my
issue.
https://datascience.stackexchange.com/q/32578/42377
…On Wed, 27 Jun 2018, 1:28 p.m. Joel Nothman, ***@***.***> wrote:
Don't use fit_transform a second time if you want the feature
representation to be the same for test data. Use transform.
But this is not a forum for usage questions, it is a tracker for software
development issues.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#11367 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/APUN4jPEyYCSzhBmRBY8wnHzoTlxS4B2ks5uAzsdgaJpZM4U5JvF>
.
|
use transform after the first fit_transform. or just use a Pipeline
|
could you please give an example or any reference link |
I have made a few changes in your code, changes highlighted in bold from sklearn.feature_extraction.text import CountVectorizer train_set = ["president of India","machine learning is awesome", "python is awesome", "thanks for reading"] tfidf_vectorizer = TfidfVectorizer() print(cosine_similarity(tfidf_matrix_train,tfidf_matrix_test)) |
Description
Hi I am new to vectorizing and feature extracting, I am trying to find the similarity of a single document against multiple documents.
Steps/Code to Reproduce
here is my code:
Expected Results
similarity list=[matching values with each document]
Actual Results
The text was updated successfully, but these errors were encountered: