Skip to content

Commit

Permalink
DOC Updates sklearn naming convention for consistency (#7268)
Browse files Browse the repository at this point in the history
* Updates sklearn naming convention for consistency

* minor grammar fix

* clarifies source of string vs. function cosine
  • Loading branch information
ClimbsRocks authored and jnothman committed Aug 28, 2016
1 parent 7f4d279 commit 58b35d8
Show file tree
Hide file tree
Showing 14 changed files with 63 additions and 60 deletions.
2 changes: 1 addition & 1 deletion doc/datasets/rcv1_fixture.py
@@ -1,7 +1,7 @@
"""Fixture module to skip the datasets loading when offline
The RCV1 data is rather large and some CI workers such as travis are
stateless hence will not cache the dataset as regular sklearn users would do.
stateless hence will not cache the dataset as regular scikit-learn users would do.
The following will skip the execution of the rcv1.rst doctests
if the proper environment variable is configured (see the source code of
Expand Down
8 changes: 4 additions & 4 deletions doc/developers/contributing.rst
Expand Up @@ -871,9 +871,9 @@ an integer called ``n_iter``.
Rolling your own estimator
==========================
If you want to implement a new estimator that is scikit-learn-compatible,
whether it is just for you or for contributing it to sklearn, there are several
internals of scikit-learn that you should be aware of in addition to the
sklearn API outlined above. You can check whether your estimator
whether it is just for you or for contributing it to scikit-learn, there are
several internals of scikit-learn that you should be aware of in addition to
the scikit-learn API outlined above. You can check whether your estimator
adheres to the scikit-learn interface and standards by running
:func:`utils.estimator_checks.check_estimator` on the class::

Expand Down Expand Up @@ -929,7 +929,7 @@ E.g., below is a custom classifier. For more information on this example, see

get_params and set_params
-------------------------
All sklearn estimator have ``get_params`` and ``set_params`` functions.
All scikit-learn estimators have ``get_params`` and ``set_params`` functions.
The ``get_params`` function takes no arguments and returns a dict of the
``__init__`` parameters of the estimator, together with their values.
It must take one keyword argument, ``deep``,
Expand Down
20 changes: 10 additions & 10 deletions doc/modules/gaussian_process.rst
Expand Up @@ -66,7 +66,7 @@ WhiteKernel component into the kernel, which can estimate the global noise
level from the data (see example below).

The implementation is based on Algorithm 2.1 of [RW2006]_. In addition to
the API of standard sklearn estimators, GaussianProcessRegressor:
the API of standard scikit-learn estimators, GaussianProcessRegressor:

* allows prediction without prior fitting (based on the GP prior)

Expand Down Expand Up @@ -164,7 +164,7 @@ than just predicting the mean.
GPR on Mauna Loa CO2 data
-------------------------

This example is based on Section 5.4.3 of [RW2006]_.
This example is based on Section 5.4.3 of [RW2006]_.
It illustrates an example of complex kernel engineering and
hyperparameter optimization using gradient ascent on the
log-marginal-likelihood. The data consists of the monthly average atmospheric
Expand Down Expand Up @@ -602,11 +602,11 @@ References
----------

* `[RW2006]
<http://www.gaussianprocess.org/gpml/chapters/>`_
**Gaussian Processes for Machine Learning**,
Carl Eduard Rasmussen and Christopher K.I. Williams, MIT Press 2006.
Link to an official complete PDF version of the book
`here <http://www.gaussianprocess.org/gpml/chapters/RW.pdf>`_ .
<http://www.gaussianprocess.org/gpml/chapters/>`_
**Gaussian Processes for Machine Learning**,
Carl Eduard Rasmussen and Christopher K.I. Williams, MIT Press 2006.
Link to an official complete PDF version of the book
`here <http://www.gaussianprocess.org/gpml/chapters/RW.pdf>`_ .

.. currentmodule:: sklearn.gaussian_process

Expand All @@ -616,9 +616,9 @@ References
Legacy Gaussian Processes
=========================

In this section, the implementation of Gaussian processes used in sklearn until
release 0.16.1 is described. Note that this implementation is deprecated and
will be removed in version 0.18.
In this section, the implementation of Gaussian processes used in scikit-learn
until release 0.16.1 is described. Note that this implementation is deprecated
and will be removed in version 0.18.

An introductory regression example
----------------------------------
Expand Down
50 changes: 25 additions & 25 deletions doc/modules/manifold.rst
Expand Up @@ -59,10 +59,10 @@ interesting structure within the data will be lost.

To address this concern, a number of supervised and unsupervised linear
dimensionality reduction frameworks have been designed, such as Principal
Component Analysis (PCA), Independent Component Analysis, Linear
Discriminant Analysis, and others. These algorithms define specific
Component Analysis (PCA), Independent Component Analysis, Linear
Discriminant Analysis, and others. These algorithms define specific
rubrics to choose an "interesting" linear projection of the data.
These methods can be powerful, but often miss important non-linear
These methods can be powerful, but often miss important non-linear
structure in the data.


Expand Down Expand Up @@ -91,7 +91,7 @@ from the data itself, without the use of predetermined classifications.
* See :ref:`sphx_glr_auto_examples_manifold_plot_compare_methods.py` for an example of
dimensionality reduction on a toy "S-curve" dataset.

The manifold learning implementations available in sklearn are
The manifold learning implementations available in scikit-learn are
summarized below

.. _isomap:
Expand Down Expand Up @@ -121,13 +121,13 @@ The Isomap algorithm comprises three stages:
nearest neighbors of :math:`N` points in :math:`D` dimensions.

2. **Shortest-path graph search.** The most efficient known algorithms
for this are *Dijkstra's Algorithm*, which is approximately
for this are *Dijkstra's Algorithm*, which is approximately
:math:`O[N^2(k + \log(N))]`, or the *Floyd-Warshall algorithm*, which
is :math:`O[N^3]`. The algorithm can be selected by the user with
the ``path_method`` keyword of ``Isomap``. If unspecified, the code
attempts to choose the best algorithm for the input data.

3. **Partial eigenvalue decomposition.** The embedding is encoded in the
3. **Partial eigenvalue decomposition.** The embedding is encoded in the
eigenvectors corresponding to the :math:`d` largest eigenvalues of the
:math:`N \times N` isomap kernel. For a dense solver, the cost is
approximately :math:`O[d N^2]`. This cost can often be improved using
Expand Down Expand Up @@ -191,7 +191,7 @@ The overall complexity of standard LLE is
* :math:`d` : output dimension

.. topic:: References:

* `"Nonlinear dimensionality reduction by locally linear embedding"
<http://www.sciencemag.org/content/290/5500/2323.full>`_
Roweis, S. & Saul, L. Science 290:2323 (2000)
Expand Down Expand Up @@ -221,7 +221,7 @@ It requires ``n_neighbors > n_components``.
:target: ../auto_examples/manifold/plot_lle_digits.html
:align: center
:scale: 50

Complexity
----------

Expand All @@ -232,7 +232,7 @@ The MLLE algorithm comprises three stages:
2. **Weight Matrix Construction**. Approximately
:math:`O[D N k^3] + O[N (k-D) k^2]`. The first term is exactly equivalent
to that of standard LLE. The second term has to do with constructing the
weight matrix from multiple weights. In practice, the added cost of
weight matrix from multiple weights. In practice, the added cost of
constructing the MLLE weight matrix is relatively small compared to the
cost of steps 1 and 3.

Expand All @@ -247,7 +247,7 @@ The overall complexity of MLLE is
* :math:`d` : output dimension

.. topic:: References:

* `"MLLE: Modified Locally Linear Embedding Using Multiple Weights"
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.382>`_
Zhang, Z. & Wang, J.
Expand All @@ -271,7 +271,7 @@ It requires ``n_neighbors > n_components * (n_components + 3) / 2``.
:target: ../auto_examples/manifold/plot_lle_digits.html
:align: center
:scale: 50

Complexity
----------

Expand Down Expand Up @@ -308,10 +308,10 @@ Spectral Embedding
Spectral Embedding (also known as Laplacian Eigenmaps) is one method
to calculate non-linear embedding. It finds a low dimensional representation
of the data using a spectral decomposition of the graph Laplacian.
The graph generated can be considered as a discrete approximation of the
low dimensional manifold in the high dimensional space. Minimization of a
cost function based on the graph ensures that points close to each other on
the manifold are mapped close to each other in the low dimensional space,
The graph generated can be considered as a discrete approximation of the
low dimensional manifold in the high dimensional space. Minimization of a
cost function based on the graph ensures that points close to each other on
the manifold are mapped close to each other in the low dimensional space,
preserving local distances. Spectral embedding can be performed with the
function :func:`spectral_embedding` or its object-oriented counterpart
:class:`SpectralEmbedding`.
Expand All @@ -326,9 +326,9 @@ The Spectral Embedding algorithm comprises three stages:

2. **Graph Laplacian Construction**. unnormalized Graph Laplacian
is constructed as :math:`L = D - A` for and normalized one as
:math:`L = D^{-\frac{1}{2}} (D - A) D^{-\frac{1}{2}}`.
:math:`L = D^{-\frac{1}{2}} (D - A) D^{-\frac{1}{2}}`.

3. **Partial Eigenvalue Decomposition**. Eigenvalue decomposition is
3. **Partial Eigenvalue Decomposition**. Eigenvalue decomposition is
done on graph Laplacian

The overall complexity of spectral embedding is
Expand All @@ -342,7 +342,7 @@ The overall complexity of spectral embedding is
.. topic:: References:

* `"Laplacian Eigenmaps for Dimensionality Reduction
and Data Representation"
and Data Representation"
<http://web.cse.ohio-state.edu/~mbelkin/papers/LEM_NC_03.pdf>`_
M. Belkin, P. Niyogi, Neural Computation, June 2003; 15 (6):1373-1396

Expand All @@ -354,7 +354,7 @@ Though not technically a variant of LLE, Local tangent space alignment (LTSA)
is algorithmically similar enough to LLE that it can be put in this category.
Rather than focusing on preserving neighborhood distances as in LLE, LTSA
seeks to characterize the local geometry at each neighborhood via its
tangent space, and performs a global optimization to align these local
tangent space, and performs a global optimization to align these local
tangent spaces to learn the embedding. LTSA can be performed with function
:func:`locally_linear_embedding` or its object-oriented counterpart
:class:`LocallyLinearEmbedding`, with the keyword ``method = 'ltsa'``.
Expand Down Expand Up @@ -421,7 +421,7 @@ space and the similarities/dissimilarities.
:target: ../auto_examples/manifold/plot_lle_digits.html
:align: center
:scale: 50


Let :math:`S` be the similarity matrix, and :math:`X` the coordinates of the
:math:`n` input points. Disparities :math:`\hat{d}_{ij}` are transformation of
Expand Down Expand Up @@ -456,7 +456,7 @@ order to avoid that, the disparities :math:`\hat{d}_{ij}` are normalized.
:target: ../auto_examples/manifold/plot_mds.html
:align: center
:scale: 60


.. topic:: References:

Expand Down Expand Up @@ -499,7 +499,7 @@ probabilities in the original space and the embedded space will be minimized
by gradient descent. Note that the KL divergence is not convex, i.e.
multiple restarts with different initializations will end up in local minima
of the KL divergence. Hence, it is sometimes useful to try different seeds
and select the embedding with the lowest KL divergence.
and select the embedding with the lowest KL divergence.

The disadvantages to using t-SNE are roughly:

Expand Down Expand Up @@ -552,16 +552,16 @@ divergence will increase during optimization. More tips can be found in
Laurens van der Maaten's FAQ (see references). The last parameter, angle,
is a tradeoff between performance and accuracy. Larger angles imply that we
can approximate larger regions by a single point,leading to better speed
but less accurate results.
but less accurate results.

Barnes-Hut t-SNE
----------------

The Barnes-Hut t-SNE that has been implemented here is usually much slower than
other manifold learning algorithms. The optimization is quite difficult
and the computation of the gradient is :math:`O[d N log(N)]`, where :math:`d`
is the number of output dimensions and :math:`N` is the number of samples. The
Barnes-Hut method improves on the exact method where t-SNE complexity is
is the number of output dimensions and :math:`N` is the number of samples. The
Barnes-Hut method improves on the exact method where t-SNE complexity is
:math:`O[d N^2]`, but has several other notable differences:

* The Barnes-Hut implementation only works when the target dimensionality is 3
Expand Down
8 changes: 4 additions & 4 deletions doc/tutorial/statistical_inference/model_selection.rst
Expand Up @@ -207,7 +207,7 @@ Grid-search

.. currentmodule:: sklearn.model_selection

The sklearn provides an object that, given data, computes the score
scikit-learn provides an object that, given data, computes the score
during the fit of an estimator on a parameter grid and chooses the
parameters to maximize the cross-validation score. This object takes an
estimator during the construction and exposes an estimator API::
Expand Down Expand Up @@ -257,9 +257,9 @@ Cross-validated estimators
----------------------------

Cross-validation to set a parameter can be done more efficiently on an
algorithm-by-algorithm basis. This is why for certain estimators the
sklearn exposes :ref:`cross_validation` estimators that set their parameter
automatically by cross-validation::
algorithm-by-algorithm basis. This is why, for certain estimators,
scikit-learn exposes :ref:`cross_validation` estimators that set their
parameter automatically by cross-validation::

>>> from sklearn import linear_model, datasets
>>> lasso = linear_model.LassoCV()
Expand Down
@@ -1,7 +1,7 @@
"""Fixture module to skip the datasets loading when offline
The 20 newsgroups data is rather large and some CI workers such as travis are
stateless hence will not cache the dataset as regular sklearn users would do.
stateless hence will not cache the dataset as regular scikit-learn users would.
The following will skip the execution of the working_with_text_data.rst doctests
if the proper environment variable is configured (see the source code of
Expand Down
2 changes: 1 addition & 1 deletion examples/hetero_feature_union.py
Expand Up @@ -51,7 +51,7 @@ class ItemSelector(BaseEstimator, TransformerMixin):
>> len(data[key]) == n_samples
Please note that this is the opposite convention to sklearn feature
Please note that this is the opposite convention to scikit-learn feature
matrixes (where the first index corresponds to sample).
ItemSelector only requires that the collection implement getitem
Expand Down
4 changes: 2 additions & 2 deletions sklearn/covariance/tests/test_graph_lasso.py
Expand Up @@ -61,8 +61,8 @@ def test_graph_lasso(random_state=0):

def test_graph_lasso_iris():
# Hard-coded solution from R glasso package for alpha=1.0
# The iris datasets in R and sklearn do not match in a few places, these
# values are for the sklearn version
# The iris datasets in R and scikit-learn do not match in a few places,
# these values are for the scikit-learn version.
cov_R = np.array([
[0.68112222, 0.0, 0.2651911, 0.02467558],
[0.00, 0.1867507, 0.0, 0.00],
Expand Down
4 changes: 2 additions & 2 deletions sklearn/datasets/mldata.py
Expand Up @@ -103,7 +103,7 @@ def fetch_mldata(dataname, target_name='label', data_name='data',
(150, 4)
Load the 'leukemia' dataset from mldata.org, which needs to be transposed
to respects the sklearn axes convention:
to respects the scikit-learn axes convention:
>>> leuk = fetch_mldata('leukemia', transpose_data=True,
... data_home=test_data_home)
Expand Down Expand Up @@ -205,7 +205,7 @@ def fetch_mldata(dataname, target_name='label', data_name='data',
del dataset[col_names[1]]
dataset['data'] = matlab_dict[col_names[1]]

# set axes to sklearn conventions
# set axes to scikit-learn conventions
if transpose_data:
dataset['data'] = dataset['data'].T
if 'target' in dataset:
Expand Down
8 changes: 4 additions & 4 deletions sklearn/feature_extraction/image.py
Expand Up @@ -152,8 +152,8 @@ def img_to_graph(img, mask=None, return_as=sparse.coo_matrix, dtype=None):
Notes
-----
For sklearn versions 0.14.1 and prior, return_as=np.ndarray was handled
by returning a dense np.matrix instance. Going forward, np.ndarray
For scikit-learn versions 0.14.1 and prior, return_as=np.ndarray was
handled by returning a dense np.matrix instance. Going forward, np.ndarray
returns an np.ndarray, as expected.
For compatibility, user code relying on this method should wrap its
Expand Down Expand Up @@ -188,8 +188,8 @@ def grid_to_graph(n_x, n_y, n_z=1, mask=None, return_as=sparse.coo_matrix,
Notes
-----
For sklearn versions 0.14.1 and prior, return_as=np.ndarray was handled
by returning a dense np.matrix instance. Going forward, np.ndarray
For scikit-learn versions 0.14.1 and prior, return_as=np.ndarray was
handled by returning a dense np.matrix instance. Going forward, np.ndarray
returns an np.ndarray, as expected.
For compatibility, user code relying on this method should wrap its
Expand Down
3 changes: 2 additions & 1 deletion sklearn/gaussian_process/gpr.py
Expand Up @@ -23,7 +23,8 @@ class GaussianProcessRegressor(BaseEstimator, RegressorMixin):
The implementation is based on Algorithm 2.1 of Gaussian Processes
for Machine Learning (GPML) by Rasmussen and Williams.
In addition to standard sklearn estimator API, GaussianProcessRegressor:
In addition to standard scikit-learn estimator API,
GaussianProcessRegressor:
* allows prediction without prior fitting (based on the GP prior)
* provides an additional method sample_y(X), which evaluates samples
Expand Down
8 changes: 5 additions & 3 deletions sklearn/metrics/tests/test_pairwise.py
Expand Up @@ -61,7 +61,8 @@ def test_pairwise_distances():
Y_tuples = tuple([tuple([v for v in row]) for row in Y])
S2 = pairwise_distances(X_tuples, Y_tuples, metric="euclidean")
assert_array_almost_equal(S, S2)
# "cityblock" uses sklearn metric, cityblock (function) is scipy.spatial.
# "cityblock" uses scikit-learn metric, cityblock (function) is
# scipy.spatial.
S = pairwise_distances(X, metric="cityblock")
S2 = pairwise_distances(X, metric=cityblock)
assert_equal(S.shape[0], S.shape[1])
Expand All @@ -78,7 +79,8 @@ def test_pairwise_distances():
S3 = manhattan_distances(X, Y, size_threshold=10)
assert_array_almost_equal(S, S3)
# Test cosine as a string metric versus cosine callable
# "cosine" uses sklearn metric, cosine (function) is scipy.spatial
# The string "cosine" uses sklearn.metric,
# while the function cosine is scipy.spatial
S = pairwise_distances(X, Y, metric="cosine")
S2 = pairwise_distances(X, Y, metric=cosine)
assert_equal(S.shape[0], X.shape[0])
Expand Down Expand Up @@ -330,7 +332,7 @@ def test_pairwise_distances_argmin_min():
assert_equal(type(Dsp), np.ndarray)
assert_equal(type(Esp), np.ndarray)

# Non-euclidean sklearn metric
# Non-euclidean scikit-learn metric
D, E = pairwise_distances_argmin_min(X, Y, metric="manhattan")
D2 = pairwise_distances_argmin(X, Y, metric="manhattan")
assert_array_almost_equal(D, [0, 1])
Expand Down
2 changes: 1 addition & 1 deletion sklearn/tests/test_base.py
Expand Up @@ -73,7 +73,7 @@ def predict(self, X=None):


class VargEstimator(BaseEstimator):
"""Sklearn estimators shouldn't have vargs."""
"""scikit-learn estimators shouldn't have vargs."""
def __init__(self, *vargs):
pass

Expand Down

0 comments on commit 58b35d8

Please sign in to comment.