New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Take over PR #7647 - Add a "filename" attribute to datasets that have a CSV file #9101

merged 23 commits into from Dec 4, 2017
Jump to file or symbol
Failed to load files and symbols.
+15 −2
Diff settings


Just for now

Viewing a subset of changes. View all

add examples of numpy.loadtxt usage

  • Loading branch information...
alex-33 authored and maskani-moh committed Oct 18, 2016
commit 6532a9499e710aa8ca93be0113c8a08fe0dd5a11
@@ -144,8 +144,8 @@ learn::
.. topic:: Loading from the data files

This comment has been minimized.


jnothman Sep 26, 2017


I don't get why this belongs in the tutorial, unless it's framed as "You can also load your own data. For example, load_boston(...) just pulls in data using numpy.loadtxt::". This currently appears to be too much detail on the internals of scikit-learn.

This comment has been minimized.


maskani-moh Sep 26, 2017


@jnothman, I agree with you, no need to mention the filename attribute in the tutorial. Too much detail for a tutorial.
Should I remove this section then?

All standard datasets which you can import with ``load_`` have underlying source files that
you can read manually (consider numpy.loadtxt and pandas for analysis).
The data and target can be stored in one file (e.g. iris, boston, breast_cancer) or
you can read manually (consider :func:`numpy.loadtxt` and `pandas <>`_
for analysis). The data and target can be stored in one file (e.g. iris, boston, breast_cancer) or
in several (e.g. diabetes, linnerud).
>>> from sklearn.datasets import load_boston
@@ -160,6 +160,19 @@ learn::
>>> print(diabetes.target_filename) # doctest: +SKIP
Example of reading data file with numpy. Boston dataset contains
2 header lines, that is why we are going to skip them:
>>> import numpy as np
>>> boston_data = np.loadtxt(boston.filename, delimiter=",", skiprows=2)
>>> # sklearn dataset
(506, 13)
>>> boston_data.shape # also contains target columns
(506, 14)
See also:
Learning and predicting
ProTip! Use n and p to navigate between commits in a pull request.