ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 11 while Y.shape[1] == 1 #11367

Pechi77 · 2018-06-27T06:56:26Z

Description

Hi I am new to vectorizing and feature extracting, I am trying to find the similarity of a single document against multiple documents.

Steps/Code to Reproduce

here is my code:

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer


train_set = ["president of India","machine learning is awesome", "python is awesome", "thanks for reading"]

tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix_train = tfidf_vectorizer.fit_transform(train_set)
tfidf_matrix_test = tfidf_vectorizer.fit_transform(["president"])
cosine_similarity(tfidf_matrix_train,tfidf_matrix_test)

Expected Results

similarity list=[matching values with each document]

Actual Results

ValueError                                Traceback (most recent call last)
<ipython-input-19-e0da281ca84b> in <module>()
     15 tfidf_matrix_train = tfidf_vectorizer.fit_transform(train_set)
     16 tfidf_matrix_test = tfidf_vectorizer.fit_transform(["president"])
---> 17 cosine_similarity(tfidf_matrix_train,tfidf_matrix_test)
     18 #finds the tfidf score with normalization

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\pairwise.py in cosine_similarity(X, Y, dense_output)
    908     # to avoid recursive import
    909 
--> 910     X, Y = check_pairwise_arrays(X, Y)
    911 
    912     X_normalized = normalize(X, copy=True)

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\pairwise.py in check_pairwise_arrays(X, Y, precomputed, dtype)
    120         raise ValueError("Incompatible dimension for X and Y matrices: "
    121                          "X.shape[1] == %d while Y.shape[1] == %d" % (
--> 122                              X.shape[1], Y.shape[1]))
    123 
    124     return X, Y

ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 11 while Y.shape[1] == 1

The text was updated successfully, but these errors were encountered:

jnothman · 2018-06-27T07:57:39Z

Don't use fit_transform a second time if you want the feature representation to be the same for test data. Use transform.

But this is not a forum for usage questions, it is a tracker for software development issues.

Pechi77 · 2018-06-27T08:10:49Z

Sometime I'll have new data not as train data. My exact problem is this, please help whether this this vectorizer solve my issue. https://datascience.stackexchange.com/q/32578/42377

…

On Wed, 27 Jun 2018, 1:28 p.m. Joel Nothman, ***@***.***> wrote: Don't use fit_transform a second time if you want the feature representation to be the same for test data. Use transform. But this is not a forum for usage questions, it is a tracker for software development issues. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#11367 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APUN4jPEyYCSzhBmRBY8wnHzoTlxS4B2ks5uAzsdgaJpZM4U5JvF> .

jnothman · 2018-06-27T09:38:04Z

use transform after the first fit_transform. or just use a Pipeline

Pechi77 · 2018-06-27T09:40:06Z

could you please give an example or any reference link

jnothman · 2018-06-27T10:31:10Z

http://scikit-learn.org/stable/data_transforms.html

rakeshskc · 2018-10-22T08:25:56Z

I have made a few changes in your code, changes highlighted in bold

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

train_set = ["president of India","machine learning is awesome", "python is awesome", "thanks for reading"]

tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix_train = tfidf_vectorizer.fit_transform(train_set)
tfidf_matrix_test = tfidf_vectorizer.transform(["president"])

print(cosine_similarity(tfidf_matrix_train,tfidf_matrix_test))

jnothman closed this as completed Jun 27, 2018

akgw mentioned this issue Nov 27, 2018

Feature/bugfix akgw/news#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 11 while Y.shape[1] == 1 #11367

ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 11 while Y.shape[1] == 1 #11367

Pechi77 commented Jun 27, 2018

jnothman commented Jun 27, 2018

Pechi77 commented Jun 27, 2018 via email •

edited

jnothman commented Jun 27, 2018 via email

Pechi77 commented Jun 27, 2018

jnothman commented Jun 27, 2018 via email

rakeshskc commented Oct 22, 2018 •

edited

ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 11 while Y.shape[1] == 1 #11367

ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 11 while Y.shape[1] == 1 #11367

Comments

Pechi77 commented Jun 27, 2018

Description

Steps/Code to Reproduce

Expected Results

Actual Results

jnothman commented Jun 27, 2018

Pechi77 commented Jun 27, 2018 via email • edited

jnothman commented Jun 27, 2018 via email

Pechi77 commented Jun 27, 2018

jnothman commented Jun 27, 2018 via email

rakeshskc commented Oct 22, 2018 • edited

Pechi77 commented Jun 27, 2018 via email •

edited

rakeshskc commented Oct 22, 2018 •

edited