Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Deprecate fetch_mldata #11466

Merged
merged 7 commits into from
Aug 18, 2018
Merged

[MRG+1] Deprecate fetch_mldata #11466

merged 7 commits into from
Aug 18, 2018

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Jul 10, 2018

Fixes #11317

I have changed the use of fetch_mldata in examples, but not yet in benchmarks under the assumption that those will be replaced with fetch_openml soon enough (even if after 0.20 release).

Waiting on the OpenML fetcher (#11419) to be completed.

@jnothman jnothman changed the title Deprecate fetch_mldata [MRG] Deprecate fetch_mldata Jul 10, 2018
@jnothman jnothman changed the title [MRG] Deprecate fetch_mldata [WIP] Deprecate fetch_mldata Jul 10, 2018
@jnothman jnothman changed the title [WIP] Deprecate fetch_mldata [MRG] Deprecate fetch_mldata Jul 10, 2018
@jnothman
Copy link
Member Author

It seems there are bugs in scipy.io.arff.loadarff in our earliest supported dependencies. Maybe this will need to wait for the OpenML fetcher (#9908, #11419)

@jnothman jnothman changed the title [MRG] Deprecate fetch_mldata [WIP] Deprecate fetch_mldata Jul 11, 2018
@jnothman jnothman added this to the 0.20 milestone Jul 11, 2018
@qinhanmin2014
Copy link
Member

I'm fine with the deprecation, but do we have ways to make the examples more friendly? (e.g., make use of scikit-learn/examples-data) I'm a bit uncomfortable about things like from sklearn.externals.joblib import Memory and the long functions which download&preprocess the data.

@amueller
Copy link
Member

all the fetchers can be removed when we add the openml loader, right? If not, we should upload those datasets to openml.

@FollowKenny
Copy link
Contributor

@jnothman Hi, do you need help on this right now ?

@jnothman
Copy link
Member Author

No, thanks @FollowKenny. The plan is to get fetch_openml merged before release, I hope, and fix the issues here through that.

@jorisvandenbossche jorisvandenbossche added this to PRs tagged in scikit-learn 0.20 Jul 16, 2018
@amueller amueller moved this from PRs tagged to Blockers in scikit-learn 0.20 Jul 16, 2018
@jnothman jnothman changed the title [WIP] Deprecate fetch_mldata [MRG] Deprecate fetch_mldata Aug 15, 2018
@jnothman jnothman changed the title [MRG] Deprecate fetch_mldata [WIP] Deprecate fetch_mldata Aug 15, 2018
@jnothman
Copy link
Member Author

This isn't perfect, but it's mergeable after #11419

@jnothman
Copy link
Member Author

We should also look into uploading Mauna Loa to openml

@jnothman
Copy link
Member Author

I think if this goes green, I'll be bold and merge #11419 and create a long series of follow-up issues.

@jnothman
Copy link
Member Author

This is now ready for review.

@jnothman jnothman changed the title [WIP] Deprecate fetch_mldata [MRG] Deprecate fetch_mldata Aug 15, 2018

Downloading datasets from the mldata.org repository
---------------------------------------------------

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's better to remove this section altogether rather than say put a deprecation note at the begining?

@@ -257,7 +257,6 @@ Loaders
datasets.fetch_kddcup99
datasets.fetch_lfw_pairs
datasets.fetch_lfw_people
datasets.fetch_mldata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, maybe still keep the doc even if it's deprecated? Who knows maybe the website will work again in september, and there is still code using this. Those users will no longer be able to find any documentation about this function in the stable docs... A doc with a large deprecation warning would avoid us some opened issues from confused users.

data = fetch_mldata('mauna-loa-atmospheric-co2').data
X = data[:, [1]]
y = data[:, 0]
def load_mauna_loa_atmospheric_c02():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should open an issue that this should be uploaded to openml, as you suggested?

fetch_mldata, 'not_existing_name')
finally:
datasets.mldata.urlopen = _urlopen_ref


def test_fetch_one_column(tmpdata):
def test_fetch_one_column(tmpdata, recwarn):
warnings.simplefilter('ignore', DeprecationWarning)
Copy link
Member

@rth rth Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather use,

@pytest.mark.filterwarnings('ignore::DeprecationWarning'):
def test_fetch_one_column(tmpdata):
   ...

if I got the syntax right. I'm never sure about the side effects explicitly setting warning filters could have in a pytest session..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought recwarn grabbed a filter context... But you're right that syntax is a little better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, whether this isolates the filter context (and will continue doing so in the future) is another story. That's why using standard pytest mechanisms is probably safer (besides syntax considerations).

@@ -82,7 +87,8 @@ def test_fetch_one_column(tmpdata):
datasets.mldata.urlopen = _urlopen_ref


def test_fetch_multiple_column(tmpdata):
def test_fetch_multiple_column(tmpdata, recwarn):
warnings.simplefilter('ignore', DeprecationWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

@@ -1513,6 +1512,7 @@ To be removed in 0.22
:template: deprecated_function.rst

covariance.graph_lasso
datasets.fetch_mldata
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will generate the docs with the large deprecation warning.

fetch_mldata, 'not_existing_name')
finally:
datasets.mldata.urlopen = _urlopen_ref


def test_fetch_one_column(tmpdata):
def test_fetch_one_column(tmpdata, recwarn):
warnings.simplefilter('ignore', DeprecationWarning)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought recwarn grabbed a filter context... But you're right that syntax is a little better.

@amueller
Copy link
Member

are these test failures from master?

@@ -3,6 +3,7 @@
import os
import scipy as sp
import shutil
import warnings
Copy link
Member

@amueller amueller Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used (that's the flake8 error)

@amueller
Copy link
Member

is that the failure that's fixed in #11831?

@rth
Copy link
Member

rth commented Aug 17, 2018

mldata doctest is failing as far as I can see. We might want to skip it in pytest_collection_modifyitems under conftest.py ?

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, +1 to skip the test (maybe # doctest: +SKIP?)
Maybe we should include it in the rc

@@ -209,6 +209,10 @@ Support for Python 3.3 has been officially dropped.
data points could be generated. :issue:`10045` by :user:`Christian Braune
<christianbraune79>`.

- |API| Deprecated :func:`sklearn.datasets.fetch_mldata` to be removed in
version 0.22. MLData.org is no longer operational. Until removal it will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mldata.org?

"""Load MNIST, select two classes, shuffle and return only n_samples."""
mnist = fetch_mldata('MNIST original')
mnist = fetch_openml('mnist_784', version=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(alternative) Maybe we can add a link to openml in the examples (e.g., https://www.openml.org/d/554) so that users can get more information about the dataset.

@rth
Copy link
Member

rth commented Aug 17, 2018

(maybe # doctest: +SKIP?)

Definitely, I'm spending too much time writing pytest fixtures ..

@jnothman
Copy link
Member Author

I just deleted the example. If someone wants to remind me the right way to disable it...

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'm fine with removing the doctests. We should encourage users to turn to openml.
Appveyor failure irrelevant.

@qinhanmin2014 qinhanmin2014 changed the title [MRG] Deprecate fetch_mldata [MRG+1] Deprecate fetch_mldata Aug 18, 2018
@rth
Copy link
Member

rth commented Aug 18, 2018

Merging, thanks @jnothman !

@rth rth merged commit fc56da5 into scikit-learn:master Aug 18, 2018
scikit-learn 0.20 automation moved this from Blockers to Done Aug 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants