New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Take over PR #7647 - Add a "filename" attribute to datasets that have a CSV file #9101

Merged
merged 23 commits into from Dec 4, 2017
Commits
Jump to file or symbol
Failed to load files and symbols.
+0 −32
Diff settings

Always

Just for now

Viewing a subset of changes. View all

Remove detailed loading section from tutorial

  • Loading branch information...
maskani-moh committed Oct 9, 2017
commit fd1bcbb81b328d6ac9849e0a8190b55ed11518d2
@@ -141,38 +141,6 @@ learn::
To load from an external dataset, please refer to :ref:`loading external datasets <external_datasets>`.
.. topic:: Loading from the data files
All standard datasets which you can import with ``load_`` have underlying source files that
you can read manually (consider :func:`numpy.loadtxt` and `pandas <http://pandas.pydata.org/>`_
for analysis). The data and target can be stored in one file (e.g. iris, boston, breast_cancer) or
in several (e.g. diabetes, linnerud).
>>> from sklearn.datasets import load_boston
>>> boston = load_boston()
>>> print(boston.filename) # doctest: +SKIP
(some-path)/sklearn/datasets/data/boston_house_prices.csv
>>> from sklearn.datasets import load_diabetes
>>> diabetes = load_diabetes()
>>> print(diabetes.data_filename) # doctest: +SKIP
(some-path)/sklearn/datasets/data/diabetes_data.csv.gz
>>> print(diabetes.target_filename) # doctest: +SKIP
(some-path)/sklearn/datasets/data/diabetes_target.csv.gz
You can also read the data file directly with numpy. Consider the following example.
Boston dataset contains 2 header lines, that is why we are going to skip them::
>>> import numpy as np
>>> boston_data = np.loadtxt(boston.filename, delimiter=",", skiprows=2)
>>> boston.data.shape # sklearn dataset
(506, 13)
>>> boston_data.shape # also contains target columns
(506, 14)
.. seealso::
`pandas.read_csv <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html>`_
Learning and predicting
------------------------
ProTip! Use n and p to navigate between commits in a pull request.