New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 1] load_iris dataset: added return_X_y option #7049

Merged
merged 12 commits into from Jul 29, 2016

Conversation

Projects
None yet
5 participants
@manu-chroma
Contributor

manu-chroma commented Jul 19, 2016

Reference Issue

Add return_tuple option to data loaders that return a Bunch #6670

What does this implement/fix? Explain your changes.

  1. added return_X_y option in load_iris
  2. Wrote tests for the same

Any other comments?

  1. Majority of the code is taken from #6704
  2. Some improvements based on the discussion on the same PR
  3. First contribution to scikit-learn
return_X_y : boolean, default=False.
If True, returns (data, target) instead of a Bunch object.
See below for more information about the `data` and `target` object
Returns
-------
data : Bunch

This comment has been minimized.

@nelson-liu

nelson-liu Jul 19, 2016

Contributor

I think you should change this line to reflect how the function can return a Bunch or a tuple now.

This comment has been minimized.

@manu-chroma

manu-chroma Jul 19, 2016

Contributor

That's precisely what this line defines. Don't you think ?

  If True, returns (data, target) instead of a Bunch object.

This comment has been minimized.

@nelson-liu

nelson-liu Jul 19, 2016

Contributor

Well on line 266 in the diff it still says that it returns data : Bunch. Would it be apt to include information about it possibly returning (data, target) as well?

This comment has been minimized.

@manu-chroma

manu-chroma Jul 19, 2016

Contributor

Got it thanks.

----------
return_X_y : boolean, default=False.
If True, returns (data, target) instead of a Bunch object.
See below for more information about the `data` and `target` object

This comment has been minimized.

@nelson-liu

nelson-liu Jul 19, 2016

Contributor

add a period at the end of this line?

This comment has been minimized.

@manu-chroma

manu-chroma Jul 19, 2016

Contributor

Done. Also updated the return type description.

Returns
-------
if return_X_y == false

This comment has been minimized.

@nelson-liu

nelson-liu Jul 19, 2016

Contributor

i think you can actually omit this...i'm thinking of the notion that the "default" case is returning the Bunch, and you return the tuple if return_X_y is True

data : Bunch
Dictionary-like object, the interesting attributes are:
'data', the data to learn, 'target', the classification labels,
'target_names', the meaning of the labels, 'feature_names', the
meaning of the features, and 'DESCR', the
full description of the dataset.
if return_X_y == true
(data,target) : tuple

This comment has been minimized.

@nelson-liu

nelson-liu Jul 19, 2016

Contributor

I think this would look better in the docs as (data, target) : tuple if return_X_y is True...I can't think of any precedent for this in the code base; could you generate the docs by running make html-noplot in the docs/ directory and post a screenshot of what it looks like?

This comment has been minimized.

@manu-chroma

manu-chroma Jul 19, 2016

Contributor

Ok. I'll do it in a few hours.

This comment has been minimized.

@manu-chroma

manu-chroma Jul 20, 2016

Contributor

Hey, I'm a little confused here. The methods and description doesn't seem to be updated after I ran the command.

Path of html page: file:///home/manu/github/scikit-learn/doc/_build/html/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris

image

data : Bunch
Dictionary-like object, the interesting attributes are:
'data', the data to learn, 'target', the classification labels,
'target_names', the meaning of the labels, 'feature_names', the
meaning of the features, and 'DESCR', the
full description of the dataset.
if return_X_y == true

This comment has been minimized.

@jnothman

jnothman Jul 20, 2016

Member

numpydoc can't parse this line

This comment has been minimized.

@manu-chroma

manu-chroma Jul 20, 2016

Contributor

I updated to this in the latest commit.

(data,target) : tuple if return_X_y == true

Still not parsing.
Also, the parameter (return_X_y=False) is also not visible in the method definition.

This comment has been minimized.

@maniteja123

maniteja123 Jul 20, 2016

Contributor

Hi, in case even the method signature isn't changed in the html docs, the docs aren't getting generated correctly. Can you please check that ?

This comment has been minimized.

@nelson-liu

nelson-liu Jul 20, 2016

Contributor

the way i usually do this is hit the link at the bottom of the page "view this page source" and verify that the source is what is in my file.

This comment has been minimized.

@manu-chroma

manu-chroma Jul 20, 2016

Contributor

After making the changes, I generated the docs by running make html-noplot indoc/ folder.
Path of html page: file:///home/manu/github/scikit-learn/doc/_build/html/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris

What is it, I might be doing wrong ?

This comment has been minimized.

@manu-chroma

manu-chroma Jul 20, 2016

Contributor

@nelson-liu Sure, I'll do that.

This comment has been minimized.

@manu-chroma

manu-chroma Jul 20, 2016

Contributor

I'm afraid page source is not helping.
image

Clicking on the source in line with the method signature points to https://github.com/scikit-learn/scikit-learn/blob/3f9494c/sklearn/datasets/base.py#L242
Which, I suppose, is the most recent commit I made.

image

This comment has been minimized.

@nelson-liu

nelson-liu Jul 20, 2016

Contributor

i downloaded the PR myself and built the docs, and it looks like this (which seems correct, bar a few nitpicks) to me
screen shot 2016-07-20 at 11 17 10 am

@@ -263,6 +269,9 @@ def load_iris():
'target_names', the meaning of the labels, 'feature_names', the
meaning of the features, and 'DESCR', the
full description of the dataset.

This comment has been minimized.

@nelson-liu

nelson-liu Jul 20, 2016

Contributor

i checked this PR out locally and this line has whitespace, can you remove that?

@@ -263,6 +269,9 @@ def load_iris():
'target_names', the meaning of the labels, 'feature_names', the
meaning of the features, and 'DESCR', the
full description of the dataset.
(data,target) : tuple if return_X_y == true

This comment has been minimized.

@nelson-liu

nelson-liu Jul 20, 2016

Contributor

extraneous newline

This comment has been minimized.

@manu-chroma

manu-chroma Jul 20, 2016

Contributor

Thanks alot. @jnothman did point this fact in the beginning but I failed to understood.
Fixed.

@@ -263,6 +269,9 @@ def load_iris():
'target_names', the meaning of the labels, 'feature_names', the
meaning of the features, and 'DESCR', the
full description of the dataset.
(data,target) : tuple if return_X_y == true

This comment has been minimized.

@nelson-liu

nelson-liu Jul 20, 2016

Contributor

"true" is generally capitalized

This comment has been minimized.

@manu-chroma

manu-chroma Jul 20, 2016

Contributor

Hey, regenerated the docs. Still not working after the latest commit. Any clue ?

This comment has been minimized.

@nelson-liu

nelson-liu Jul 20, 2016

Contributor

hmm not sure, I'm not super familiar with the intricacies of sphinx and the doc building process. what are your versions (run the snippet below) + sphinx version?

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import sklearn; print("Scikit-Learn", sklearn.__version__)

This comment has been minimized.

@manu-chroma

manu-chroma Jul 21, 2016

Contributor
Linux-4.4.0-31-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Jul  1 2016, 15:12:24) \n[GCC 5.4.0 20160609]')
('NumPy', '1.11.1')
('SciPy', '0.17.0')
('Scikit-Learn', '0.18.dev0')

Sphinx (sphinx-build) 1.3.6

This comment has been minimized.

@nelson-liu

nelson-liu Jul 21, 2016

Contributor

try upgrading your sphinx (I'm on 1.4.5). Otherwise, our systems look pretty similar

Darwin-15.5.0-x86_64-i386-64bit
('Python', '2.7.12 (default, Jun 29 2016, 14:05:02) \n[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)]')
('NumPy', '1.11.1')
('SciPy', '0.17.1')
('Scikit-Learn', '0.18.dev0')

This comment has been minimized.

@manu-chroma

manu-chroma Jul 21, 2016

Contributor

Updated and added an extra line.

@@ -263,6 +269,7 @@ def load_iris():
'target_names', the meaning of the labels, 'feature_names', the
meaning of the features, and 'DESCR', the
full description of the dataset.
(data,target) : tuple if return_X_y == True

This comment has been minimized.

@nelson-liu

nelson-liu Jul 21, 2016

Contributor

sorry, i mean that there should be a newline above the new return type but not one below it.

Returns
-------
data : Bunch
Dictionary-like object, the interesting attributes are:
'data', the data to learn, 'target', the classification labels,
'target_names', the meaning of the labels, 'feature_names', the
meaning of the features, and 'DESCR', the
full description of the dataset.
full description of the dataset. This is test text.

This comment has been minimized.

@manu-chroma

manu-chroma Jul 21, 2016

Contributor

@nelson-liu Hey, I added this test sentence and it as well didn't show up on re-building the docs.

I've also tried updating Sphinx to 1.4.5
Really confused here.

This comment has been minimized.

@nelson-liu

nelson-liu Jul 21, 2016

Contributor

how are you building? you should be navigating to the doc/ subdirectory and then running make html-noplot there. the build output should show up under at doc/_build/html/stable/modules/generated/sklearn.datasets.load_iris.html

This comment has been minimized.

@manu-chroma

manu-chroma Jul 21, 2016

Contributor

Yep. That's what I'm doing.

image

image

This comment has been minimized.

@amueller

amueller Jul 27, 2016

Member

try "make clean" beforehand

This comment has been minimized.

@nelson-liu

nelson-liu Jul 27, 2016

Contributor

whoops this totally fell off my radar, thanks for the suggestion Andy

This comment has been minimized.

@manu-chroma

manu-chroma Jul 28, 2016

Contributor

@nelson-liu It doesn't!
I'm not able to understand why it isn't pointing to the path you pointed out.

image

This comment has been minimized.

@nelson-liu

nelson-liu Jul 28, 2016

Contributor

You probably pip installed the github dev repo at some point in time ; remove that / all scikit learn installs (I'm assuming you only need one) and then reinstall with the development instructions above

This comment has been minimized.

@nelson-liu

nelson-liu Jul 28, 2016

Contributor

this is what pip list shows on my computer:
screen shot 2016-07-28 at 9 43 40 am

This comment has been minimized.

@manu-chroma

manu-chroma Jul 28, 2016

Contributor

Reinstalled! :)
Building docs now
image

This comment has been minimized.

@manu-chroma

manu-chroma Jul 28, 2016

Contributor

This makes me really happy.
Thanks alot @nelson-liu for your constant feedback and @amueller for your insights. 😀

image

@@ -264,6 +270,8 @@ def load_iris():
meaning of the features, and 'DESCR', the
full description of the dataset.
(data,target) : tuple if return_X_y == True

This comment has been minimized.

@nelson-liu

nelson-liu Jul 28, 2016

Contributor

i prefer "...tuple if return_X_y is True", the == looks a bit odd to me. Additionally, i think it'd be nice to add to add two backticks to the start and front of return_X_y.

This comment has been minimized.

@amueller

amueller Jul 28, 2016

Member

+1. Also, space after , ;)

Parameters
----------
return_X_y : boolean, default=False.
If True, returns (data, target) instead of a Bunch object.

This comment has been minimized.

@amueller

amueller Jul 28, 2016

Member

double backticks around (data, target)

----------
return_X_y : boolean, default=False.
If True, returns (data, target) instead of a Bunch object.
See below for more information about the `data` and `target` object.

This comment has been minimized.

@amueller

amueller Jul 28, 2016

Member

Double backticks

#test return_X_y option
X_y_tuple = load_iris(return_X_y=True)
bunch = load_iris()
assert_true(isinstance(X_y_tuple,tuple))

This comment has been minimized.

@amueller

amueller Jul 28, 2016

Member

pep8: space after comma

@amueller

This comment has been minimized.

Member

amueller commented Jul 28, 2016

lgtm apart from comments

@amueller

This comment has been minimized.

Member

amueller commented Jul 28, 2016

I restarted CI

@manu-chroma

This comment has been minimized.

Contributor

manu-chroma commented Jul 28, 2016

@amueller I did the fixes. Do I need to squash commits or do anything else here ?

@@ -264,6 +270,8 @@ def load_iris():
meaning of the features, and 'DESCR', the
full description of the dataset.
(data,target) : tuple if ``return_X_y`` is True

This comment has been minimized.

@amueller
@@ -180,6 +181,13 @@ def test_load_iris():
assert_equal(res.target_names.size, 3)
assert_true(res.DESCR)
#test return_X_y option

This comment has been minimized.

@amueller

amueller Jul 28, 2016

Member

pep8: single space after #

@amueller

This comment has been minimized.

Member

amueller commented Jul 28, 2016

lgtm apart from style comments. No, nothing else to do (except fix these)

@amueller amueller changed the title from [MRG] load_iris dataset: added return_X_y option to [MRG + 1] load_iris dataset: added return_X_y option Jul 28, 2016

@amueller

This comment has been minimized.

Member

amueller commented Jul 28, 2016

@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Jul 28, 2016

still looking, give me a second

@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Jul 28, 2016

yup LGTM thanks @manu-chroma

@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Jul 28, 2016

does this warrant a what's new? or should there be one after this is implemented in all the loaders.

@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Jul 28, 2016

Also perhaps a versionadded?

@amueller

This comment has been minimized.

Member

amueller commented Jul 28, 2016

good catch. definitely a versionadded for the keyword. We can also add a whatsnew, and edit that in the future when we add this functionality to the other functions.

@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Jul 28, 2016

@manu-chroma does that sound doable? You would have to add an entry in whats_new.rst, and the appropriate "versionadded" tags below your new parameter (see this as an example)

@manu-chroma

This comment has been minimized.

Contributor

manu-chroma commented Jul 28, 2016

@nelson-liu Sure, I can do that.

I was thinking of updating the whats_new.rst after I have implemented this for all loaders, which I suppose wouldn't take more than a few days. Thoughts ?

@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Jul 28, 2016

yeah, that's one possibility but i think it would be a bit better to create an entry now, and then edit it as the functionality for more loaders is added with their associated PR's.

@manu-chroma

This comment has been minimized.

Contributor

manu-chroma commented Jul 28, 2016

Okay, I'll update the PR in a few hours. I suppose this change should come under Enhancements in whats_new.rst ?

@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Jul 28, 2016

yup, that's a good assumption.

@manu-chroma

This comment has been minimized.

Contributor

manu-chroma commented Jul 29, 2016

Hey, I updated my branch before editing whats_new.rst

After that I added the version tag (f4d6168) and tried running sudo make html-noplot
It failed. Here's the error log. Error is because of a different file.

manu@hp:~/github/scikit-learn/doc$ sudo make html-noplot 
sphinx-build -D plot_gallery=0 -b html -d _build/doctrees   . _build/html/stable
Running Sphinx v1.4.5
WARNING: sphinx.ext.pngmath has been deprecated. Please use sphinx.ext.imgmath instead.
loading pickled environment... not yet created
/home/manu/github/scikit-learn/sklearn/cross_validation.py:43: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/manu/github/scikit-learn/sklearn/learning_curve.py:23: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)

Exception occurred:
  File "/home/manu/github/scikit-learn/sklearn/tree/tree.py", line 61, in <module>
    "mae": _criterion.MAE}
AttributeError: 'module' object has no attribute 'MAE'
The full traceback has been saved in /tmp/sphinx-err-0pnBuo.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!
Makefile:47: recipe for target 'html-noplot' failed
make: *** [html-noplot] Error 1
@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Jul 29, 2016

Did you rerun python setup.py build_ext --inplace?

@manu-chroma

This comment has been minimized.

Contributor

manu-chroma commented Jul 29, 2016

Yes, that was the case. Really sorry for forgetting the standard procedure.

I've updated whats_new and added version tag. Thoughts ?

@@ -258,6 +258,15 @@ def load_iris():
Read more in the :ref:`User Guide <datasets>`.
Parameters
----------

This comment has been minimized.

@amueller

amueller Jul 29, 2016

Member

There is an extra space here. Sphinx will complain.

This comment has been minimized.

@manu-chroma

manu-chroma Jul 29, 2016

Contributor

Fixed. 👍

@amueller amueller merged commit b8be019 into scikit-learn:master Jul 29, 2016

0 of 3 checks passed

ci/circleci CircleCI is running your tests
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@amueller

This comment has been minimized.

Member

amueller commented Jul 29, 2016

thanks :)

Parameters
----------
.. versionadded:: 0.18

This comment has been minimized.

@nelson-liu

nelson-liu Jul 29, 2016

Contributor

sorry i'm late to the party, shouldn't the versionadded be under return_X_y? and do we need a versionadded under the new return type (say around line 280)?

This comment has been minimized.

@amueller

amueller Aug 3, 2016

Member

yes. Do you want to do a PR to fix it? (sorry I'm swamped)

This comment has been minimized.

@nelson-liu

nelson-liu Aug 4, 2016

Contributor

sure ill do it / fix the other comments i had on this PR @amueller @manu-chroma

This comment has been minimized.

@manu-chroma

manu-chroma Aug 4, 2016

Contributor

Sorry for replying late.

I intend on sending the next PR this weekend (adding return_X_y option to load_breast_cancer dataset)
Should I take care of this ?

This comment has been minimized.

@nelson-liu

nelson-liu Aug 4, 2016

Contributor

that's fine, i made a fix at #7138 . Just make sure to do it for next time :)

This comment has been minimized.

@manu-chroma

manu-chroma Aug 4, 2016

Contributor

Sorry and I'll keep that in mind.

@@ -230,6 +230,9 @@ Enhancements
(`#6846 <https://github.com/scikit-learn/scikit-learn/pull/6846>`_)
By `Sebastian Säger`_ and `YenChen Lin`_.
- Added new return type ``(data, target)`` : tuple option to :func:`load_iris` dataset. (`#7049 <https://github.com/scikit-learn/scikit-learn/pull/7049>`_)

This comment has been minimized.

@nelson-liu

nelson-liu Jul 29, 2016

Contributor

maybe split this into two lines, this line is a lot longer than the other ones.

This comment has been minimized.

@manu-chroma

manu-chroma Jul 30, 2016

Contributor

I can fix this in the next PR (adding return_X_y option to load_breast_cancer dataset)
Also, the position of the versionadded tag. What do you think ?

This comment has been minimized.

@nelson-liu

nelson-liu Jul 30, 2016

Contributor

sure sounds good. thanks!

@@ -230,6 +230,9 @@ Enhancements
(`#6846 <https://github.com/scikit-learn/scikit-learn/pull/6846>`_)
By `Sebastian Säger`_ and `YenChen Lin`_.
- Added new return type ``(data, target)`` : tuple option to :func:`load_iris` dataset. (`#7049 <https://github.com/scikit-learn/scikit-learn/pull/7049>`_)
By `Manvendra Singh`_ and `Nelson Liu`_.

This comment has been minimized.

@nelson-liu

nelson-liu Jul 29, 2016

Contributor

I shouldn't have any attribution on this, I didn't write any code / this is your contribution. thanks, though 😉

This comment has been minimized.

@manu-chroma

manu-chroma Jul 30, 2016

Contributor

Your code review resulted in major changes in the initial PR . I think you had a decent part to play in this contribution. 😄

This comment has been minimized.

@nelson-liu

nelson-liu Jul 30, 2016

Contributor

nah, that's the job of a reviewer. thanks for the consideration though.

olologin added a commit to olologin/scikit-learn that referenced this pull request Aug 24, 2016

[MRG + 1] load_iris dataset: added return_X_y option (scikit-learn#7049)
* load_iris dataset:added return_X_y option

* Updated return type description

* improved return type description

* Removed extra line

* Added extra line

* Added test sentence.

* Remove sample text

* Fixes

* pep8

* Added version tag

* added entry in whats_new

* fixed extra space

TomDLT added a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016

[MRG + 1] load_iris dataset: added return_X_y option (scikit-learn#7049)
* load_iris dataset:added return_X_y option

* Updated return type description

* improved return type description

* Removed extra line

* Added extra line

* Added test sentence.

* Remove sample text

* Fixes

* pep8

* Added version tag

* added entry in whats_new

* fixed extra space

@manu-chroma manu-chroma deleted the manu-chroma:load_iris branch Jun 20, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment