New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 1] fix kdd_kddcup99 #9731

Merged
merged 4 commits into from Sep 13, 2017

Conversation

Projects
None yet
5 participants
@ngoix
Contributor

ngoix commented Sep 11, 2017

fix #9730

@ngoix ngoix changed the title from fix kdd_kddcup99 to [MRG] fix kdd_kddcup99 Sep 11, 2017

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Sep 11, 2017

Member

@ngoix you'll need an entry in what's new bug section

Member

agramfort commented Sep 11, 2017

@ngoix you'll need an entry in what's new bug section

@amueller amueller changed the title from [MRG] fix kdd_kddcup99 to [MRG + 1] fix kdd_kddcup99 Sep 11, 2017

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 11, 2017

Member

lgtm

Member

amueller commented Sep 11, 2017

lgtm

@jnothman

Shouldn't we also remove the shuffle param from _fetch_brute_kddcup99?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 11, 2017

Member

random_state=0 should pass the test, by the way

Member

jnothman commented Sep 11, 2017

random_state=0 should pass the test, by the way

@ngoix

This comment has been minimized.

Show comment
Hide comment
@ngoix

ngoix Sep 12, 2017

Contributor

also added a SkipTest in case kdd data are not downloaded, as it's not small data.

Contributor

ngoix commented Sep 12, 2017

also added a SkipTest in case kdd data are not downloaded, as it's not small data.

@ngoix

This comment has been minimized.

Show comment
Hide comment
@ngoix

ngoix Sep 12, 2017

Contributor

@jnothman I removed the shuffle param from _fetch_brute_kddcup99

Contributor

ngoix commented Sep 12, 2017

@jnothman I removed the shuffle param from _fetch_brute_kddcup99

@jnothman

I suppose that's reasonable although ideally we'd then have a CI that fetches all the datasets and runs the datasets tests...

LGTM

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 12, 2017

Member

CI failing?

Member

jnothman commented Sep 12, 2017

CI failing?

dataset = fetch_kddcup99(random_state=0, subset='SA', shuffle=True,
percent10=True, download_if_missing=False)
except IOError:
raise SkipTest("kddcup99 dataset can not be loaded.")

This comment has been minimized.

@agramfort

agramfort Sep 12, 2017

Member

so this is not tested in CIs?

@agramfort

agramfort Sep 12, 2017

Member

so this is not tested in CIs?

@ngoix

This comment has been minimized.

Show comment
Hide comment
@ngoix

ngoix Sep 13, 2017

Contributor

I removed the SkipTest, and now the coverage tests are passing.

Contributor

ngoix commented Sep 13, 2017

I removed the SkipTest, and now the coverage tests are passing.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 13, 2017

Member

I removed the SkipTest, and now the coverage tests are passing.

The CIs were failing because of coverage issues. My understanding is that we do not want to download datasets on Travis during the tests.

Edit: just to be clear I was suggesting reverting your last commit.

Member

lesteve commented Sep 13, 2017

I removed the SkipTest, and now the coverage tests are passing.

The CIs were failing because of coverage issues. My understanding is that we do not want to download datasets on Travis during the tests.

Edit: just to be clear I was suggesting reverting your last commit.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 13, 2017

Member
Member

jnothman commented Sep 13, 2017

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 13, 2017

Member

I'd agree that we don't usually want to download them in Travis, and poor
coverage here is acceptable, but it would also be nice if we had a Travis
instance that did test this sort of thing.

I am not sure how this would work on Travis. When I was working on testing datasets on figshare, it was taking quite a while to download all the datasets from scratch (maybe 30-40 minutes off the top of my head from a good university network) and maybe we do not have the time to do it within a Travis build.

A possible work-around is to run the datasets tests on CircleCI where some of the datasets are downloaded/cached already.

We chatted about something a bit related with @ogrisel. For some tests, it would be nice to run them once in a while but not on each PR. The idea was to set-up a separate repo in the scikit-learn organization and use daily cron jobs in Travis. Amongst the things we thought of:

  • tests on OSX (takes too much time to spin up a OSX VM on Travis apparently)
  • tests on numpy-dev and scipy-dev (failures if any are very likely linked to a change in numpy than to the PR changes)
  • maybe datasets download if we have enough time to do it in a Travis build.

Note that neither the CircleCI nor the Travis cron job play nice with the coverage ...

Member

lesteve commented Sep 13, 2017

I'd agree that we don't usually want to download them in Travis, and poor
coverage here is acceptable, but it would also be nice if we had a Travis
instance that did test this sort of thing.

I am not sure how this would work on Travis. When I was working on testing datasets on figshare, it was taking quite a while to download all the datasets from scratch (maybe 30-40 minutes off the top of my head from a good university network) and maybe we do not have the time to do it within a Travis build.

A possible work-around is to run the datasets tests on CircleCI where some of the datasets are downloaded/cached already.

We chatted about something a bit related with @ogrisel. For some tests, it would be nice to run them once in a while but not on each PR. The idea was to set-up a separate repo in the scikit-learn organization and use daily cron jobs in Travis. Amongst the things we thought of:

  • tests on OSX (takes too much time to spin up a OSX VM on Travis apparently)
  • tests on numpy-dev and scipy-dev (failures if any are very likely linked to a change in numpy than to the PR changes)
  • maybe datasets download if we have enough time to do it in a Travis build.

Note that neither the CircleCI nor the Travis cron job play nice with the coverage ...

Revert "try to rm skiptest for CIs"
This reverts commit ecd1e9e.
@ngoix

This comment has been minimized.

Show comment
Hide comment
@ngoix

ngoix Sep 13, 2017

Contributor

commit reverted!

Contributor

ngoix commented Sep 13, 2017

commit reverted!

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 13, 2017

Member

Merging, thanks a lot @ngoix.

Member

lesteve commented Sep 13, 2017

Merging, thanks a lot @ngoix.

@lesteve lesteve merged commit 7e3ad6d into scikit-learn:master Sep 13, 2017

3 of 4 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details

massich added a commit to massich/scikit-learn that referenced this pull request Sep 15, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment