New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add as_frame='auto' option in datasets.fetch_openml #17396
Add as_frame='auto' option in datasets.fetch_openml #17396
Conversation
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me aside from a few comments below. I would also be +1 to make "auto" the default since fetch_openml
is experimental.
If True, the data is a pandas DataFrame including columns with | ||
appropriate dtypes (numeric, string or categorical). The target is | ||
a pandas DataFrame or Series depending on the number of target_columns. | ||
The Bunch will contain a ``frame`` attribute with the target and the | ||
data. If ``return_X_y`` is True, then ``(data, target)`` will be pandas | ||
DataFrames or Series as describe above. | ||
If as_frame is 'auto', the data and target will be converted to | ||
DataFrame or Series as if as_frame is set to True, unless the dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always a DataFrame I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data.target
may be a Series
. For example:
data_id = 61 # iris dataset version 1
data = fetch_openml(data_id=data_id, as_frame=True)
>>> type(data.target)
<class 'pandas.core.series.Series'>
Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I would also be +1 to make "auto" the default since fetch_openml is experimental.
Let's see what other reviewers think about doing that.
I'm okay with making it default. If the occasional user needs to change
their code to add .values, so be it.
|
Could you propose that change to default in another pull request? |
Sure. Do you want to merge this first? Or do I branch out from here? |
Thanks! Please open a new PR for changing the default. |
Reference Issues/PRs
Closes #14888
What does this implement/fix? Explain your changes.
This PR adds an
'auto'
option for argumentas_frame
in functiondatasets.fetch_openml
.When
as_frame
is set to be'auto'
, returned data will be converted to pandas DataFrame or Series unless data is sparse.Any other comments?
This PR doesn't fully close #14888. The team still need to discuss and decide whether to make
'auto'
the default option.