Merge branch 'master' into multiple_grid_search

Conflicts: sklearn/grid_search.py sklearn/learning_curve.py
mblondel · Feb 7, 2014 · 5d8570b · 5d8570b
2 parents eaa3aeb + 5319994
commit 5d8570b
Show file tree

Hide file tree

Showing 115 changed files with 14,454 additions and 11,264 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -3,12 +3,22 @@ env:
     - COVERAGE=--with-coverage
 python:
     - "2.7"
+    - "2.6"
+    - "3.3"
 virtualenv:
-    system_site_packages: true
+  system_site_packages: true
 before_install:
-    - sudo apt-get update -qq
-    - sudo apt-get install -qq python-scipy python-nose
-    - sudo apt-get install python-pip
+    - if [[ $TRAVIS_PYTHON_VERSION != '2.7' ]]; then wget http://repo.continuum.io/miniconda/Miniconda-2.2.2-Linux-x86_64.sh -O miniconda.sh ; fi
+    - if [[ $TRAVIS_PYTHON_VERSION != '2.7' ]]; then chmod +x miniconda.sh ; fi
+    - if [[ $TRAVIS_PYTHON_VERSION != '2.7' ]]; then ./miniconda.sh -b ; fi
+    - if [[ $TRAVIS_PYTHON_VERSION != '2.7' ]]; then export PATH=/home/travis/anaconda/bin:$PATH ; fi
+    - if [[ $TRAVIS_PYTHON_VERSION != '2.7' ]]; then conda update --yes conda ; fi
+    - if [[ $TRAVIS_PYTHON_VERSION != '2.7' ]]; then conda update --yes conda ; fi
+    - if [[ $TRAVIS_PYTHON_VERSION != '2.7' ]]; then conda create -n testenv --yes pip python=$TRAVIS_PYTHON_VERSION ; fi
+    - if [[ $TRAVIS_PYTHON_VERSION != '2.7' ]]; then source activate testenv ; fi
+    - if [[ $TRAVIS_PYTHON_VERSION != '2.7' ]]; then conda install --yes numpy scipy nose ; fi
+    - if [[ $TRAVIS_PYTHON_VERSION == '2.7' ]]; then sudo apt-get update -qq ; fi
+    - if [[ $TRAVIS_PYTHON_VERSION == '2.7' ]]; then sudo apt-get install -qq python-scipy python-nose python-pip ; fi
 install:
     - python setup.py build_ext --inplace
     - if [ "${COVERAGE}" == "--with-coverage" ]; then sudo pip install coverage; fi

diff --git a/doc/Makefile b/doc/Makefile
@@ -37,6 +37,10 @@ clean:
 	-rm -rf modules/generated/*
 
 html:
+	# These two lines make the build a bit more lengthy, and the
+	# the embedding of images more robust
+	rm -rf $(BUILDDIR)/html/_images
+	#rm -rf _build/doctrees/
 	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html/stable
 	@echo
 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html/stable"

diff --git a/doc/model_selection.rst b/doc/model_selection.rst
@@ -11,3 +11,4 @@ Model selection and evaluation
     modules/grid_search
     modules/pipeline
     modules/model_evaluation
+    modules/learning_curve
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
@@ -624,6 +624,7 @@ From text
    :template: function.rst
 
    learning_curve.learning_curve
+   learning_curve.validation_curve
 
 .. _linear_model_ref:
 
@@ -657,6 +658,8 @@ From text
    linear_model.LogisticRegression
    linear_model.MultiTaskLasso
    linear_model.MultiTaskElasticNet
+   linear_model.MultiTaskLassoCV
+   linear_model.MultiTaskElasticNetCV
    linear_model.OrthogonalMatchingPursuit
    linear_model.OrthogonalMatchingPursuitCV
    linear_model.PassiveAggressiveClassifier
@@ -1057,6 +1060,7 @@ Pairwise metrics
    preprocessing.Normalizer
    preprocessing.OneHotEncoder
    preprocessing.StandardScaler
+   preprocessing.PolynomialFeatures
 
 .. autosummary::
    :toctree: generated/

diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
@@ -309,12 +309,44 @@ of each iterates until convergence.
 
 Mean Shift
 ==========
+:class:`MeanShift` clustering aims to discover *blobs* in a smooth density of
+samples. It is a centroid based algorithm, which works by updating candidates
+for centroids to be the mean of the points within a given region. These
+candidates are then filtered in a
+post-processing stage to eliminate near-duplicates to form the final set of
+centroids.
+
+Given a candidate centroid :math:`x_i` for iteration :math:`t`, the candidate
+is updated according to the following equation:
+
+.. math::
+
+    x_i^{t+1} = x_i^t + m(x_i^t)
+
+Where :math:`N(x_i)` is the neighborhood of samples within a given distance
+around :math:`x_i` and :math:`m` is the *mean shift* vector that is computed
+for each centroid that
+points towards a region of the maximum increase in the density of points. This
+is computed using the following equation, effectively updating a centroid to be
+the mean of the samples within its neighborhood:
+
+.. math::
+
+    m(x_i) = \frac{\sum_{x_j \in N(x_i)}K(x_j - x_i)x_j}{\sum_{x_j \in N(x_i)}K(x_j - x_i)}
+
+The algorithm automatically sets the number of clusters, instead of relying on a
+parameter `bandwidth`, which dictates the size of the region to search through.
+This parameter can be set manually, but can be estimated using the provided
+`estimate_bandwidth` function, which is called if the bandwidth is not set.
+
+The algorithm is not highly scalable, as it requires multiple nearest neighbor
+searches during the execution of the algorithm. The algorithm is guaranteed to
+converge, however the algorithm will stop iterating when the change in centroids
+is small.
+
+Labelling a new sample is performed by finding the nearest centroid for a
+given sample.
 
-:class:`MeanShift` clusters data by estimating *blobs* in a smooth
-density of points matrix. This algorithm automatically sets its numbers
-of cluster. It will have difficulties scaling to thousands of samples.
-The utility function :func:`estimate_bandwidth` can be used to guess
-the optimal bandwidth for :class:`MeanShift` from the data.
 
 .. figure:: ../auto_examples/cluster/images/plot_mean_shift_1.png
    :target: ../auto_examples/cluster/plot_mean_shift.html
@@ -327,6 +359,13 @@ the optimal bandwidth for :class:`MeanShift` from the data.
  * :ref:`example_cluster_plot_mean_shift.py`: Mean Shift clustering
    on a synthetic 2D datasets with 3 classes.
 
+.. topic:: References:
+
+ * `"Mean shift: A robust approach toward feature space analysis."
+   <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.8968&rep=rep1&type=pdf>`_
+   D. Comaniciu, & P. Meer *IEEE Transactions on Pattern Analysis and Machine Intelligence* (2002)
+
+
 .. _spectral_clustering:
 
 Spectral clustering

diff --git a/doc/modules/ensemble.rst b/doc/modules/ensemble.rst
@@ -313,7 +313,7 @@ AdaBoost
 ========
 
 The module :mod:`sklearn.ensemble` includes the popular boosting algorithm
-AdaBoost, introduced in 1995 by Freud and Schapire [FS1995]_.
+AdaBoost, introduced in 1995 by Freund and Schapire [FS1995]_.
 
 The core principle of AdaBoost is to fit a sequence of weak learners (i.e.,
 models that are only slightly better than random guessing, such as small
@@ -388,8 +388,8 @@ decision trees).
 
 .. topic:: References
 
- .. [FS1995] Y. Freud, and R. Schapire, "A decision theoretic generalization of
-             online learning and an application to boosting", 1997.
+ .. [FS1995] Y. Freund, and R. Schapire, "A Decision-Theoretic Generalization of
+             On-Line Learning and an Application to Boosting", 1997.
 
  .. [ZZRH2009] J. Zhu, H. Zou, S. Rosset, T. Hastie. "Multi-class AdaBoost",
                2009.

diff --git a/doc/modules/feature_extraction.rst b/doc/modules/feature_extraction.rst
@@ -87,7 +87,7 @@ suitable for feeding into a classifier (maybe after being piped into a
   >>> pos_vectorized = vec.fit_transform(pos_window)
   >>> pos_vectorized                # doctest: +NORMALIZE_WHITESPACE  +ELLIPSIS
   <1x6 sparse matrix of type '<... 'numpy.float64'>'
-      with 6 stored elements in Compressed Sparse Row format>
+      with 6 stored elements in Compressed Sparse ... format>
   >>> pos_vectorized.toarray()
   array([[ 1.,  1.,  1.,  1.,  1.,  1.]])
   >>> vec.get_feature_names()
@@ -176,7 +176,7 @@ can be constructed using::
 
 and fed to a hasher with::
 
-  hasher = FeatureHasher(input_type=string)
+  hasher = FeatureHasher(input_type='string')
   X = hasher.transform(raw_X)
 
 to get a ``scipy.sparse`` matrix ``X``.
@@ -310,7 +310,7 @@ corpus of text documents::
   >>> X = vectorizer.fit_transform(corpus)
   >>> X                              # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
   <4x9 sparse matrix of type '<... 'numpy.int64'>'
-      with 19 stored elements in Compressed Sparse Column format>
+      with 19 stored elements in Compressed Sparse ... format>
 
 The default configuration tokenizes the string by extracting words of
 at least 2 letters. The specific function that does this step can be
@@ -430,7 +430,7 @@ content of the documents::
   >>> tfidf = transformer.fit_transform(counts)
   >>> tfidf                         # doctest: +NORMALIZE_WHITESPACE  +ELLIPSIS
   <6x3 sparse matrix of type '<... 'numpy.float64'>'
-      with 9 stored elements in Compressed Sparse Row format>
+      with 9 stored elements in Compressed Sparse ... format>
 
   >>> tfidf.toarray()                        # doctest: +ELLIPSIS
   array([[ 0.85...,  0.  ...,  0.52...],
@@ -457,7 +457,7 @@ class called :class:`TfidfVectorizer` that combines all the options of
   >>> vectorizer.fit_transform(corpus)
   ...                                # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
   <4x9 sparse matrix of type '<... 'numpy.float64'>'
-      with 19 stored elements in Compressed Sparse Row format>
+      with 19 stored elements in Compressed Sparse ... format>
 
 While the tf–idf normalization is often very useful, there might
 be cases where the binary occurrence markers might offer better
@@ -621,7 +621,7 @@ span across words::
   >>> ngram_vectorizer.fit_transform(['jumpy fox'])
   ...                                # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
   <1x4 sparse matrix of type '<... 'numpy.int64'>'
-     with 4 stored elements in Compressed Sparse Column format>
+     with 4 stored elements in Compressed Sparse ... format>
   >>> ngram_vectorizer.get_feature_names() == (
   ...     [' fox ', ' jump', 'jumpy', 'umpy '])
   True
@@ -630,7 +630,7 @@ span across words::
   >>> ngram_vectorizer.fit_transform(['jumpy fox'])
   ...                                # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
   <1x5 sparse matrix of type '<... 'numpy.int64'>'
-      with 5 stored elements in Compressed Sparse Column format>
+      with 5 stored elements in Compressed Sparse ... format>
   >>> ngram_vectorizer.get_feature_names() == (
   ...     ['jumpy', 'mpy f', 'py fo', 'umpy ', 'y fox'])
   True
@@ -699,7 +699,7 @@ meaning that you don't have to call ``fit`` on it::
   >>> hv.transform(corpus)
   ...                                # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
   <4x10 sparse matrix of type '<... 'numpy.float64'>'
-      with 16 stored elements in Compressed Sparse Row format>
+      with 16 stored elements in Compressed Sparse ... format>
 
 You can see that 16 non-zero feature tokens were extracted in the vector
 output: this is less than the 19 non-zeros extracted previously by the
@@ -724,7 +724,7 @@ Let's try again with the default setting::
   >>> hv.transform(corpus)
   ...                               # doctest: +NORMALIZE_WHITESPACE  +ELLIPSIS
   <4x1048576 sparse matrix of type '<... 'numpy.float64'>'
-      with 19 stored elements in Compressed Sparse Row format>
+      with 19 stored elements in Compressed Sparse ... format>
 
 We no longer get the collisions, but this comes at the expense of a much larger
 dimensionality of the output space.

diff --git a/doc/modules/label_propagation.rst b/doc/modules/label_propagation.rst
@@ -9,8 +9,8 @@ Semi-Supervised
 `Semi-supervised learning
 <http://en.wikipedia.org/wiki/Semi-supervised_learning>`_ is a situation
 in which in your training data some of the samples are not labeled. The
-semi-supervised estimators, in :mod:`sklean.semi_supervised` are able to
-make use of this addition unlabeled data to capture better the shape of
+semi-supervised estimators in :mod:`sklearn.semi_supervised` are able to
+make use of this additional unlabeled data to better capture the shape of
 the underlying data distribution and generalize better to new samples.
 These algorithms can perform well when we have a very small amount of
 labeled points and a large amount of unlabeled points.
@@ -19,14 +19,14 @@ labeled points and a large amount of unlabeled points.
 
     It is important to assign an identifier to unlabeled points along with the
     labeled data when training the model with the `fit` method. The identifier
-    that this implementation uses the integer value :math:`-1`.
+    that this implementation uses is the integer value :math:`-1`.
 
 .. _label_propagation:
 
 Label Propagation
 =================
 
-Label propagation denote a few variations of semi-supervised graph
+Label propagation denotes a few variations of semi-supervised graph
 inference algorithms. 
 
 A few features available in this model:
@@ -75,11 +75,11 @@ available:
   * knn (:math:`1[x' \in kNN(x)]`). :math:`k` is specified by keyword
     n_neighbors.
 
-RBF kernel will produce a fully connected graph which is represented in memory
+The RBF kernel will produce a fully connected graph which is represented in memory
 by a dense matrix. This matrix may be very large and combined with the cost of
 performing a full matrix multiplication calculation for each iteration of the
 algorithm can lead to prohibitively long running times. On the other hand,
-the KNN kernel will produce a much more memory friendly sparse matrix
+the KNN kernel will produce a much more memory-friendly sparse matrix
 which can drastically reduce running times.
 
 .. topic:: Examples