Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_20news fails in master branch #4711

Closed
TomDLT opened this issue May 12, 2015 · 12 comments
Closed

test_20news fails in master branch #4711

TomDLT opened this issue May 12, 2015 · 12 comments
Labels
Milestone

Comments

@TomDLT
Copy link
Member

TomDLT commented May 12, 2015

I work on Debian GNU/Linux 7 (wheezy).
The test test_20news fails on my three conda environnements:

Python 2.7.9 Scipy 0.15.1 Numpy 1.9.2
Python 3.4.3 Scipy 0.15.1 Numpy 1.9.2
Python 2.6.9 Scipy 0.11.0 Numpy 1.6.2

FAIL: sklearn.datasets.tests.test_20news.test_20news
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/cal/homes/tdupre/.conda/envs/py27/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/cal/homes/tdupre/work/src/scikit-learn/sklearn/datasets/tests/test_20news.py", line 25, in test_20news
    data.target_names[-2:])
AssertionError: Lists differ: ['alt.atheism', 'comp.graphics... != ['talk.politics.misc', 'talk.r...

First differing element 0:
alt.atheism
talk.politics.misc

First list contains 18 additional elements.
First extra element 2:
comp.os.ms-windows.misc

+ ['talk.politics.misc', 'talk.religion.misc']
- ['alt.atheism',
-  'comp.graphics',
-  'comp.os.ms-windows.misc',
-  'comp.sys.ibm.pc.hardware',
-  'comp.sys.mac.hardware',
-  'comp.windows.x',
-  'misc.forsale',
-  'rec.autos',
-  'rec.motorcycles',
-  'rec.sport.baseball',
-  'rec.sport.hockey',
-  'sci.crypt',
-  'sci.electronics',
-  'sci.med',
-  'sci.space',
-  'soc.religion.christian',
-  'talk.politics.guns',
-  'talk.politics.mideast',
-  'talk.politics.misc',
-  'talk.religion.misc']
    """Fail immediately, with the given message."""
>>  raise self.failureException("Lists differ: ['alt.atheism', 'comp.graphics... != ['talk.politics.misc', 'talk.r...\n\nFirst differing element 0:\nalt.atheism\ntalk.politics.misc\n\nFirst list contains 18 additional elements.\nFirst extra element 2:\ncomp.os.ms-windows.misc\n\n+ ['talk.politics.misc', 'talk.religion.misc']\n- ['alt.atheism',\n-  'comp.graphics',\n-  'comp.os.ms-windows.misc',\n-  'comp.sys.ibm.pc.hardware',\n-  'comp.sys.mac.hardware',\n-  'comp.windows.x',\n-  'misc.forsale',\n-  'rec.autos',\n-  'rec.motorcycles',\n-  'rec.sport.baseball',\n-  'rec.sport.hockey',\n-  'sci.crypt',\n-  'sci.electronics',\n-  'sci.med',\n-  'sci.space',\n-  'soc.religion.christian',\n-  'talk.politics.guns',\n-  'talk.politics.mideast',\n-  'talk.politics.misc',\n-  'talk.religion.misc']")
@trevorstephens
Copy link
Contributor

I came across this the other day as well and fixed it as follows: Try deleting scikit_learn_data/20news-bydate.pkz from your home directory and re-run tests, you might need to re-run twice to get the above test to actually execute as I think it skips the download if it's not present, and then a doctest later downloads it, at least if I'm not mistaken.

@amueller
Copy link
Member

Wait, a doctest downloads? That is no good. Hum, that might also happen with mnist. we should check that.

@amueller amueller added the Bug label May 12, 2015
@trevorstephens
Copy link
Contributor

Yep, at least locally for me...

Doctest: working_with_text_data.rst ... downloads 20news-bydate.pkz

A directory mldata is also created in my scikit_learn_data/ folder, though it is empty after tests. Can't pinpoint which test is creating it.

@trevorstephens
Copy link
Contributor

FWIW, the doctests ran in 78.453s when download was triggered on first run, and 38.172s second run.

@amueller
Copy link
Member

Can you open a separate issue for that please?

@amueller
Copy link
Member

I also got that at some point. Did we change what we download? Or what happened?

@mattgiguere
Copy link

I can take a look at this now.

@raghavrv
Copy link
Member

@mattgiguere Please do :)

mattgiguere added a commit to mattgiguere/scikit-learn that referenced this issue Nov 16, 2015
@ekkus93
Copy link

ekkus93 commented Nov 16, 2015

Deleting scikit_learn_data/20news-bydate.pkz just skips the tests. All of the test functions in test_20news.py whcih call fetch_20newsgroups() override the default of download_if_missing with FALSE.

@mattgiguere
Copy link

@racheltho and I looked into this, and this error seems to have been related with the Bunch class. I could recreate the error in 0.17, but it appears to have been fixed in 0.18. @amueller, we suggest closing this issue.

@ekkus93
Copy link

ekkus93 commented Nov 17, 2015

Added pull request #5864. 20news-bydate.pkz was getting downloaded while processing "working_with_text_data.rst". It probably shouldn't be downloaded there but in test_20news.py instead. If the file is actually there, the test get run and it fails with issues with the Bunch class.

@amueller
Copy link
Member

hm we should also fix the problem in the bunch class :-/

@amueller amueller modified the milestone: 0.19 Sep 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants