Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError in nmf module #11650

Closed
hossein-pourbozorg opened this Issue Jul 21, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@hossein-pourbozorg
Copy link
Contributor

hossein-pourbozorg commented Jul 21, 2018

my nmf code doesn't work with small datasets!

when i run my code, i get this error:

  File ".../myapp/utils/utils_mdl.py", line 72, in __init__
    self.model.fit(data)
  File ".../.venv/lib/python3.6/site-packages/sklearn/pipeline.py", line 255, in fit
    self._final_estimator.fit(Xt, y, **fit_params)
  File ".../.venv/lib/python3.6/site-packages/sklearn/decomposition/nmf.py", line 1279, in fit
    self.fit_transform(X, **params)
  File ".../.venv/lib/python3.6/site-packages/sklearn/decomposition/nmf.py", line 1254, in fit_transform
    shuffle=self.shuffle)
  File ".../.venv/lib/python3.6/site-packages/sklearn/decomposition/nmf.py", line 1030, in non_negative_factorization
    random_state=random_state)
  File ".../.venv/lib/python3.6/site-packages/sklearn/decomposition/nmf.py", line 341, in _initialize_nmf
    x, y = U[:, j], V[j, :]
IndexError: index 3 is out of bounds for axis 1 with size 3

in my code:

self.model = Pipeline((
            ('vec', TfidfVectorizer(
                input='content',
                encoding='utf-8',
                decode_error='strict',
                strip_accents=None,
                analyzer='word',
                preprocessor=None,
                tokenizer=None,
                ngram_range=(1, 1),
                stop_words=STOP_WORDS,
                lowercase=True,
                max_df=0.7,
                min_df=2,
                max_features=None,
                vocabulary=None,
                binary=False,
                norm='l2',
                use_idf=True,
                smooth_idf=True,
                sublinear_tf=True,
            )),
            ('dec', NMF(
                n_components=n_components,
                init='nndsvda',
                solver='mu',
                beta_loss='frobenius',
                tol=2**-16,
                max_iter=2**10,
                random_state=None,
                alpha=0.1,
                l1_ratio=1/2,
                verbose=False,
                shuffle=True,
            ))
        ))

when i change nmf module of scikit-learn (sklearn/decomposition/nmf.py) to print some variables:
screenshot from 2018-07-21 14-00-40

i get this output before that error:

U.shape: (3, 3)
V.shape: (3, 131)
j: 1
U.shape: (3, 3)
V.shape: (3, 131)
j: 2
U.shape: (3, 3)
V.shape: (3, 131)
j: 3

i tested my code with version "0.19.1" and "github/master"

@hossein-pourbozorg

This comment has been minimized.

Copy link
Contributor Author

hossein-pourbozorg commented Jul 21, 2018

i just want to say:
i expected to get a clear exception to know that i should have more data instead of get a index error

@jnothman jnothman added the Bug label Jul 22, 2018

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 22, 2018

Please try to provide a MCVE so that we can help solve this.

@hossein-pourbozorg

This comment has been minimized.

Copy link
Contributor Author

hossein-pourbozorg commented Jul 22, 2018

after run this code can get same error:

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF

# random texts from random book
data = [
    'Human reason, in one sphere of its cognition, is called upon to consider questions, which it cannot decline, as they are presented by its own nature, but which it cannot answer, as they transcend every faculty of the mind.',
    'Time was, when she was the queen of all the sciences; and, if we take the will for the deed, she certainly deserves, so far as regards the high importance of her object-matter, this title of honour. Now, it is the fashion of the time to heap contempt and scorn upon her; and the matron mourns, forlorn and forsaken, like Hecuba:',
    'Modo maxima rerum, Tot generis, natisque potens... Nunc trahor exul, inops. —Ovid, Metamorphoses. xiii',
]

model = Pipeline((
    ('vec', TfidfVectorizer()),
    ('dec', NMF(10)),
))

model.fit(data)

i think, it's because number of rows always must more than number of components
and when i don't consider this assumption, i reach to this error

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 22, 2018

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 22, 2018

I can replicate with

NMF(2).fit_transform([np.arange(3)])
@zjpoh

This comment has been minimized.

Copy link
Contributor

zjpoh commented Jul 22, 2018

@hossein-pourbozorg if you are not working on this or if nobody is working on this, I would like to give it a try.

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jul 23, 2018

@zjpoh go for it. also needs to add a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.