Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add as_frame='auto' option in datasets.fetch_openml #17396

Merged

Conversation

fujiaxiang
Copy link
Contributor

@fujiaxiang fujiaxiang commented May 31, 2020

Reference Issues/PRs

Closes #14888

What does this implement/fix? Explain your changes.

This PR adds an 'auto' option for argument as_frame in function datasets.fetch_openml.
When as_frame is set to be 'auto', returned data will be converted to pandas DataFrame or Series unless data is sparse.

Any other comments?

This PR doesn't fully close #14888. The team still need to discuss and decide whether to make 'auto' the default option.

@fujiaxiang fujiaxiang changed the title [WIP] Added 'auto' option for argument as_frame in datasets.fetch_openml [MRG] Added 'auto' option for argument as_frame in datasets.fetch_openml May 31, 2020
@fujiaxiang
Copy link
Contributor Author

ping

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

sklearn/datasets/_openml.py Outdated Show resolved Hide resolved
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me aside from a few comments below. I would also be +1 to make "auto" the default since fetch_openml is experimental.

If True, the data is a pandas DataFrame including columns with
appropriate dtypes (numeric, string or categorical). The target is
a pandas DataFrame or Series depending on the number of target_columns.
The Bunch will contain a ``frame`` attribute with the target and the
data. If ``return_X_y`` is True, then ``(data, target)`` will be pandas
DataFrames or Series as describe above.
If as_frame is 'auto', the data and target will be converted to
DataFrame or Series as if as_frame is set to True, unless the dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always a DataFrame I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data.target may be a Series. For example:

data_id = 61  # iris dataset version 1
data = fetch_openml(data_id=data_id, as_frame=True)

>>> type(data.target)
<class 'pandas.core.series.Series'>

sklearn/datasets/tests/test_openml.py Show resolved Hide resolved
sklearn/datasets/tests/test_openml.py Show resolved Hide resolved
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I would also be +1 to make "auto" the default since fetch_openml is experimental.

Let's see what other reviewers think about doing that.

@jnothman
Copy link
Member

jnothman commented Jun 9, 2020 via email

@jnothman
Copy link
Member

jnothman commented Jun 9, 2020

Could you propose that change to default in another pull request?

@fujiaxiang
Copy link
Contributor Author

Sure. Do you want to merge this first? Or do I branch out from here?

@rth rth changed the title [MRG] Added 'auto' option for argument as_frame in datasets.fetch_openml Add as_frame='auto' option in datasets.fetch_openml Jun 10, 2020
@rth rth merged commit 77a3429 into scikit-learn:master Jun 10, 2020
@rth
Copy link
Member

rth commented Jun 10, 2020

Thanks! Please open a new PR for changing the default.

@fujiaxiang fujiaxiang deleted the add_as_frame_auto_option_in_fetch_openml branch June 16, 2020 11:15
viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add fetch_openml(..., as_frame='auto') option
3 participants