Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
IndexError in nmf module #11650
my nmf code doesn't work with small datasets!
when i run my code, i get this error:
File ".../myapp/utils/utils_mdl.py", line 72, in __init__ self.model.fit(data) File ".../.venv/lib/python3.6/site-packages/sklearn/pipeline.py", line 255, in fit self._final_estimator.fit(Xt, y, **fit_params) File ".../.venv/lib/python3.6/site-packages/sklearn/decomposition/nmf.py", line 1279, in fit self.fit_transform(X, **params) File ".../.venv/lib/python3.6/site-packages/sklearn/decomposition/nmf.py", line 1254, in fit_transform shuffle=self.shuffle) File ".../.venv/lib/python3.6/site-packages/sklearn/decomposition/nmf.py", line 1030, in non_negative_factorization random_state=random_state) File ".../.venv/lib/python3.6/site-packages/sklearn/decomposition/nmf.py", line 341, in _initialize_nmf x, y = U[:, j], V[j, :] IndexError: index 3 is out of bounds for axis 1 with size 3
in my code:
self.model = Pipeline(( ('vec', TfidfVectorizer( input='content', encoding='utf-8', decode_error='strict', strip_accents=None, analyzer='word', preprocessor=None, tokenizer=None, ngram_range=(1, 1), stop_words=STOP_WORDS, lowercase=True, max_df=0.7, min_df=2, max_features=None, vocabulary=None, binary=False, norm='l2', use_idf=True, smooth_idf=True, sublinear_tf=True, )), ('dec', NMF( n_components=n_components, init='nndsvda', solver='mu', beta_loss='frobenius', tol=2**-16, max_iter=2**10, random_state=None, alpha=0.1, l1_ratio=1/2, verbose=False, shuffle=True, )) ))
i get this output before that error:
U.shape: (3, 3)
i tested my code with version "0.19.1" and "github/master"
after run this code can get same error:
from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.decomposition import NMF # random texts from random book data = [ 'Human reason, in one sphere of its cognition, is called upon to consider questions, which it cannot decline, as they are presented by its own nature, but which it cannot answer, as they transcend every faculty of the mind.', 'Time was, when she was the queen of all the sciences; and, if we take the will for the deed, she certainly deserves, so far as regards the high importance of her object-matter, this title of honour. Now, it is the fashion of the time to heap contempt and scorn upon her; and the matron mourns, forlorn and forsaken, like Hecuba:', 'Modo maxima rerum, Tot generis, natisque potens... Nunc trahor exul, inops. —Ovid, Metamorphoses. xiii', ] model = Pipeline(( ('vec', TfidfVectorizer()), ('dec', NMF(10)), )) model.fit(data)
i think, it's because number of rows always must more than number of components