New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Take over PR #7647 - Add a "filename" attribute to datasets that have a CSV file #9101

Merged
merged 23 commits into from Dec 4, 2017

Conversation

Projects
None yet
8 participants
@maskani-moh
Contributor

maskani-moh commented Jun 10, 2017

Reference Issue: #7627

What does this implement/fix? Explain your changes.

(original PR description)
-As per issue description, this change provide filename for load_iris, load_boston, load_breast_cancer
-data_filename and target_filenames are added to load_diabetes, load_linnerud
-fix PEP8 issues in dataset/base.py and relevant test file

Any other comments?

  • Taking over old PR (one modification)
@vene

This comment has been minimized.

Member

vene commented Jun 10, 2017

please use a meaningful title for possible reviewers who might not know all prs by their number :)

@maskani-moh maskani-moh changed the title from [MRG] Take over PR #7647 to [MRG] Take over PR #7647 - Add a "filename" attribute to datasets that have a CSV file Jun 10, 2017

@maskani-moh maskani-moh changed the title from [MRG] Take over PR #7647 - Add a "filename" attribute to datasets that have a CSV file to [WIP] Take over PR #7647 - Add a "filename" attribute to datasets that have a CSV file Jun 10, 2017

maskani-moh added some commits Jun 10, 2017

@maskani-moh maskani-moh changed the title from [WIP] Take over PR #7647 - Add a "filename" attribute to datasets that have a CSV file to [MRG] Take over PR #7647 - Add a "filename" attribute to datasets that have a CSV file Jun 10, 2017

@glemaitre

This comment has been minimized.

Contributor

glemaitre commented Sep 1, 2017

@maskani-moh could you merge on master and re run the CI

meaning of the features, and 'DESCR', the
full description of the dataset.
meaning of the features, 'DESCR', the
full description of the dataset, 'filename' (added in version 0.19),

This comment has been minimized.

@amueller

amueller Sep 21, 2017

Member

I would put the 0.20 (because 0.19 has been released) below in a separate ..versionchanged: section. That's easier to keep track of if someone is looking for changes, and usually it's irrelevant.

@amueller

This comment has been minimized.

Member

amueller commented Sep 21, 2017

Looks good. I think this is an improvement, though it kind of feels odd that you're loading the data twice: once in load_X to create the datasets, and then again using the filename. I feel it would be more natural to not load the data just to get the file name.
Though adding it to the bunch was @GaelVaroquaux's suggestion. The other option would be module-level constants, but that's a bit ugly.

So lgtm apart from the minor docstring nitpick.

@jnothman

This is otherwise looking good, thanks.

.. topic:: Loading from external datasets
To load from an external dataset, please refer to :ref:`loading external datasets <external_datasets>`.
.. topic:: Loading from the data files

This comment has been minimized.

@jnothman

jnothman Sep 26, 2017

Member

I don't get why this belongs in the tutorial, unless it's framed as "You can also load your own data. For example, load_boston(...) just pulls in data using numpy.loadtxt::". This currently appears to be too much detail on the internals of scikit-learn.

This comment has been minimized.

@maskani-moh

maskani-moh Sep 26, 2017

Contributor

@jnothman, I agree with you, no need to mention the filename attribute in the tutorial. Too much detail for a tutorial.
Should I remove this section then?

@jnothman

This comment has been minimized.

Member

jnothman commented Sep 26, 2017

@maskani-moh

This comment has been minimized.

Contributor

maskani-moh commented Nov 15, 2017

@jnothman @amueller
I think nothing is left to do in this PR.
Am I wrong?

meaning of the features, 'DESCR', the full description of
the dataset, 'filename', the physical location of iris csv dataset.
.. versionadded:: 0.20

This comment has been minimized.

@amueller

amueller Nov 15, 2017

Member

That seems a bit unclear. data was not added. Maybe specify that 'filename' was added. Same for other datasets.

This comment has been minimized.

@maskani-moh

maskani-moh Nov 15, 2017

Contributor

@amueller
Will something like "the full description of the dataset, 'filename', the physical location of iris csv dataset (added in version 0.20)" do the job ?

I don't get how to use ..versionadded:: 0.20 to specify the version for the attribute only :/

@jnothman

This comment has been minimized.

Member

jnothman commented Nov 15, 2017

@jnothman

This comment has been minimized.

Member

jnothman commented Nov 29, 2017

Please make the version added comment local to the changed features. Otherwise this LGTM.

@maskani-moh

This comment has been minimized.

Contributor

maskani-moh commented Dec 4, 2017

@jnothman
Done ! 👍
I apologize for the delay

@jnothman jnothman changed the title from [MRG] Take over PR #7647 - Add a "filename" attribute to datasets that have a CSV file to [MRG+1] Take over PR #7647 - Add a "filename" attribute to datasets that have a CSV file Dec 4, 2017

@agramfort agramfort merged commit 7182a43 into scikit-learn:master Dec 4, 2017

6 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.16%)
Details
codecov/project Absolute coverage decreased by -0.02% but relative coverage increased by +3.83% compared to 94fa697
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details
@agramfort

This comment has been minimized.

Member

agramfort commented Dec 4, 2017

thx @maskani-moh

@jnothman

This comment has been minimized.

Member

jnothman commented Dec 4, 2017

Congrats, @maskani-moh! But argh! With all those versionaddeds, we forgot to mention this in the change logs. Would you like to submit another small PR? Please add an entry to the change log at doc/whats_new/v0.20.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

@lesteve

This comment has been minimized.

Member

lesteve commented Dec 5, 2017

Friendly advice for next time @maskani-moh: please use "Fix #issueNumber" in your PR description, this way the associated issue gets closed automatically when the PR is merged. For more details, look at this.

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG+1] Take over PR scikit-learn#7647 - Add a "filename" attribute t…
…o datasets that have a CSV file (scikit-learn#9101)

* add filename attribute for load_iris

* add filename attribute for load_boston

* add filename attribute for load_linnerud
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment