New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LatentDirichletAllocation.fit() gives joblib error when evaluate_every > 0. #6258

Closed
groceryheist opened this Issue Feb 1, 2016 · 5 comments

Comments

Projects
None yet
4 participants
@groceryheist

groceryheist commented Feb 1, 2016

how to reproduce:

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

docs = ["help i have a bug yikes" for i in range(1000)]

vectorizer = CountVectorizer(input=docs,analyzer='word')
lda_features = vectorizer.fit_transform(docs)

lda_model = LatentDirichletAllocation(
    n_topics=10,
    learning_method='online',
    evaluate_every=10,
    n_jobs=4,
)
model = lda_model.fit(lda_features)

The error only occurs when 10 >= evaluate_every = 0.

The error is:

Traceback (most recent call last):
  File "topic_model.py", line 59, in <module>
    model = lda_model.fit(lda_features)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/decomposition/online_lda.py", line 520, in fit
    random_init=False)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/decomposition/online_lda.py", line 358, in _e_step
    for idx_slice in gen_even_slices(X.shape[0], n_jobs))
  File "/usr/local/lib/python3.4/dist-packages/sklearn/externals/joblib/parallel.py", line 771, in __call__
    n_jobs = self._initialize_pool()
  File "/usr/local/lib/python3.4/dist-packages/sklearn/externals/joblib/parallel.py", line 518, in _initialize_pool
    raise ImportError('[joblib] Attempting to do parallel computing '
ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. 
To use parallel-computing in a script, you must protect your main loop using "if __name__ == '__main__'". 
Please see the joblib documentation on Parallel for more information

This is the error that windows users get when they don't run their code in "if name == "main": . However, I am on linux.
The error actually indicates that a threadpool is being reinitialized. I suspect that the issue is that the threadpool is reinitialized after perplexity is evaluated.

@groceryheist groceryheist changed the title from LatentDirichletAllocation.fit() gives joblib error when evaluate_every is not None. to LatentDirichletAllocation.fit() gives joblib error when evaluate_every > 0. Feb 1, 2016

@yenchenlin

This comment has been minimized.

Contributor

yenchenlin commented Feb 2, 2016

The error only occurs when evaluate_every = 0.

I think you mean "The error only occurs when evaluate_every = 10." ?
@groceryheist

@groceryheist

This comment has been minimized.

groceryheist commented Feb 2, 2016

It occurs for every value of evaluate_every where 10 >= evaluate_every > 0.

@chyikwei

This comment has been minimized.

Contributor

chyikwei commented Feb 9, 2016

I can reproduce the error. It is caused by parallel parameter not passed to _e_step in here.

I will send a pull request to fix this.

ogrisel added a commit that referenced this issue Feb 10, 2016

Merge pull request #6324 from chyikwei/fix-lda-joblib-error
[MRG + 1] Fix joblib error in LatentDirichletAllocation (#6258)

ogrisel added a commit that referenced this issue Feb 10, 2016

ogrisel added a commit that referenced this issue Feb 10, 2016

glemaitre added a commit to glemaitre/scikit-learn that referenced this issue Feb 13, 2016

@groceryheist

This comment has been minimized.

groceryheist commented Feb 16, 2016

Thank You!

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this issue Feb 19, 2016

Merge tag '0.17.1' into releases
* tag '0.17.1': (29 commits)
  Release 0.17.1
  MAINT remove non-existing cache folder in 0.17.X branch
  FIX cythonize TSNE
  MAINT simplify freeing logic for Barnes-Hut SNE memory leak fix
  Fix memory leak in Barnes-Hut SNE
  FIX check_build_doc.py false positive detections
  MAINT more informative output to circle/check_build_doc.py
  FIX fetch_california_housing
  FIX in randomized_svd flip sign
  Updated examples and tests that use scipy's lena
  DOC whats_new entry for scikit-learn#6258
  fix joblib error in LatentDirichletAllocation
  MAINT fix / speedup travis on 0.17.X
  MAINT Upgrade pip in appveyor and display version
  DOC missing changelog entry for scikit-learn#5857
  DOC add fix for scikit-learn#6147 to the changelog
  FIX 6147: ensure that AUC is always a float
  TST non-regression test for scikit-learn#6147, roc_auc on memmap data
  Added changelog entry about scikit-learn#6196
  Fix reading of bunch pickles
  ...

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this issue Feb 19, 2016

Merge branch 'releases' into dfsg
* releases: (29 commits)
  Release 0.17.1
  MAINT remove non-existing cache folder in 0.17.X branch
  FIX cythonize TSNE
  MAINT simplify freeing logic for Barnes-Hut SNE memory leak fix
  Fix memory leak in Barnes-Hut SNE
  FIX check_build_doc.py false positive detections
  MAINT more informative output to circle/check_build_doc.py
  FIX fetch_california_housing
  FIX in randomized_svd flip sign
  Updated examples and tests that use scipy's lena
  DOC whats_new entry for scikit-learn#6258
  fix joblib error in LatentDirichletAllocation
  MAINT fix / speedup travis on 0.17.X
  MAINT Upgrade pip in appveyor and display version
  DOC missing changelog entry for scikit-learn#5857
  DOC add fix for scikit-learn#6147 to the changelog
  FIX 6147: ensure that AUC is always a float
  TST non-regression test for scikit-learn#6147, roc_auc on memmap data
  Added changelog entry about scikit-learn#6196
  Fix reading of bunch pickles
  ...

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this issue Feb 19, 2016

Merge branch 'dfsg' into debian
* dfsg: (29 commits)
  Release 0.17.1
  MAINT remove non-existing cache folder in 0.17.X branch
  FIX cythonize TSNE
  MAINT simplify freeing logic for Barnes-Hut SNE memory leak fix
  Fix memory leak in Barnes-Hut SNE
  FIX check_build_doc.py false positive detections
  MAINT more informative output to circle/check_build_doc.py
  FIX fetch_california_housing
  FIX in randomized_svd flip sign
  Updated examples and tests that use scipy's lena
  DOC whats_new entry for scikit-learn#6258
  fix joblib error in LatentDirichletAllocation
  MAINT fix / speedup travis on 0.17.X
  MAINT Upgrade pip in appveyor and display version
  DOC missing changelog entry for scikit-learn#5857
  DOC add fix for scikit-learn#6147 to the changelog
  FIX 6147: ensure that AUC is always a float
  TST non-regression test for scikit-learn#6147, roc_auc on memmap data
  Added changelog entry about scikit-learn#6196
  Fix reading of bunch pickles
  ...

mannby pushed a commit to mannby/scikit-learn that referenced this issue Apr 22, 2016

@ogrisel

This comment has been minimized.

Member

ogrisel commented Jul 12, 2016

Fixed by #6324. The joblib error message should still be improved but the scikit-learn specific problem is fixed.

@ogrisel ogrisel closed this Jul 12, 2016

TomDLT added a commit to TomDLT/scikit-learn that referenced this issue Oct 3, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment