Skip to content

Commit

Permalink
Changes in documentation. Rephrasing, fixed examples, standarized not…
Browse files Browse the repository at this point in the history
…ation, etc. (#274)

* Multiple changes to the documentation. Rephrasing, fixed examples and standarized notation, and others.

* Forgot to change one A to L

* Replaced broken modindex link for module list

* fixed compliance with flake8

* Fixed typos, misplaced example, etc

* No new bullet and rectification

* remove modules index link

* add "respectively"

* fix rca examples

* fix rca examples again
  • Loading branch information
grudloff authored and bellet committed Jan 20, 2020
1 parent f48a55d commit 1b40c3b
Show file tree
Hide file tree
Showing 10 changed files with 114 additions and 66 deletions.
1 change: 1 addition & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ metric-learn contains efficient Python implementations of several popular superv

- For SDML, using skggm will allow the algorithm to solve problematic cases
(install from commit `a0ed406 <https://github.com/skggm/skggm/commit/a0ed406586c4364ea3297a658f415e13b5cbdaf8>`_).
``pip install 'git+https://github.com/skggm/skggm.git@a0ed406586c4364ea3297a658f415e13b5cbdaf8'`` to install the required version of skggm from GitHub.
- For running the examples only: matplotlib

**Installation/Setup**
Expand Down
3 changes: 2 additions & 1 deletion doc/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Run ``pip install metric-learn`` to download and install from PyPI.
Alternately, download the source repository and run:

- ``python setup.py install`` for default installation.
- ``python setup.py test`` to run all tests.
- ``pytest test`` to run all tests.

**Dependencies**

Expand All @@ -21,6 +21,7 @@ Alternately, download the source repository and run:

- For SDML, using skggm will allow the algorithm to solve problematic cases
(install from commit `a0ed406 <https://github.com/skggm/skggm/commit/a0ed406586c4364ea3297a658f415e13b5cbdaf8>`_).
``pip install 'git+https://github.com/skggm/skggm.git@a0ed406586c4364ea3297a658f415e13b5cbdaf8'`` to install the required version of skggm from GitHub.
- For running the examples only: matplotlib

Quick start
Expand Down
2 changes: 1 addition & 1 deletion doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Documentation outline

auto_examples/index

:ref:`genindex` | :ref:`modindex` | :ref:`search`
:ref:`genindex` | :ref:`search`

.. |Travis-CI Build Status| image:: https://api.travis-ci.org/scikit-learn-contrib/metric-learn.svg?branch=master
:target: https://travis-ci.org/scikit-learn-contrib/metric-learn
Expand Down
30 changes: 15 additions & 15 deletions doc/supervised.rst
Original file line number Diff line number Diff line change
Expand Up @@ -131,13 +131,13 @@ The distance is learned by solving the following optimization problem:
c\sum_{i, j, l}\eta_{ij}(1-y_{ij})[1+||\mathbf{L(x_i-x_j)}||^2-||
\mathbf{L(x_i-x_l)}||^2]_+)
where :math:`\mathbf{x}_i` is an data point, :math:`\mathbf{x}_j` is one
of its k nearest neighbors sharing the same label, and :math:`\mathbf{x}_l`
where :math:`\mathbf{x}_i` is a data point, :math:`\mathbf{x}_j` is one
of its k-nearest neighbors sharing the same label, and :math:`\mathbf{x}_l`
are all the other instances within that region with different labels,
:math:`\eta_{ij}, y_{ij} \in \{0, 1\}` are both the indicators,
:math:`\eta_{ij}` represents :math:`\mathbf{x}_{j}` is the k nearest
neighbors(with same labels) of :math:`\mathbf{x}_{i}`, :math:`y_{ij}=0`
indicates :math:`\mathbf{x}_{i}, \mathbf{x}_{j}` belong to different class,
:math:`\eta_{ij}` represents :math:`\mathbf{x}_{j}` is the k-nearest
neighbors (with same labels) of :math:`\mathbf{x}_{i}`, :math:`y_{ij}=0`
indicates :math:`\mathbf{x}_{i}, \mathbf{x}_{j}` belong to different classes,
:math:`[\cdot]_+=\max(0, \cdot)` is the Hinge loss.

.. topic:: Example Code:
Expand Down Expand Up @@ -235,7 +235,7 @@ the sum of probability of being correctly classified:

Local Fisher Discriminant Analysis (:py:class:`LFDA <metric_learn.LFDA>`)

`LFDA` is a linear supervised dimensionality reduction method. It is
`LFDA` is a linear supervised dimensionality reduction method which effectively combines the ideas of `Linear Discriminant Analysis <https://en.wikipedia.org/wiki/Linear_discriminant_analysis>` and Locality-Preserving Projection . It is
particularly useful when dealing with multi-modality, where one ore more classes
consist of separate clusters in input space. The core optimization problem of
LFDA is solved as a generalized eigenvalue problem.
Expand All @@ -261,18 +261,18 @@ where
\,\,\mathbf{A}_{i,j}(1/n-1/n_l) \qquad y_i = y_j\end{aligned}\right.\\
here :math:`\mathbf{A}_{i,j}` is the :math:`(i,j)`-th entry of the affinity
matrix :math:`\mathbf{A}`:, which can be calculated with local scaling methods.
matrix :math:`\mathbf{A}`:, which can be calculated with local scaling methods, `n` and `n_l` are the total number of points and the number of points per cluster `l` respectively.

Then the learning problem becomes derive the LFDA transformation matrix
:math:`\mathbf{T}_{LFDA}`:
:math:`\mathbf{L}_{LFDA}`:

.. math::
\mathbf{T}_{LFDA} = \arg\max_\mathbf{T}
[\text{tr}((\mathbf{T}^T\mathbf{S}^{(w)}
\mathbf{T})^{-1}\mathbf{T}^T\mathbf{S}^{(b)}\mathbf{T})]
\mathbf{L}_{LFDA} = \arg\max_\mathbf{L}
[\text{tr}((\mathbf{L}^T\mathbf{S}^{(w)}
\mathbf{L})^{-1}\mathbf{L}^T\mathbf{S}^{(b)}\mathbf{L})]
That is, it is looking for a transformation matrix :math:`\mathbf{T}` such that
That is, it is looking for a transformation matrix :math:`\mathbf{L}` such that
nearby data pairs in the same class are made close and the data pairs in
different classes are separated from each other; far apart data pairs in the
same class are not imposed to be close.
Expand Down Expand Up @@ -326,9 +326,9 @@ empirical development. The Gaussian kernel is denoted as:
where :math:`d(\cdot, \cdot)` is the squared distance under some metrics,
here in the fashion of Mahalanobis, it should be :math:`d(\mathbf{x}_i,
\mathbf{x}_j) = ||\mathbf{A}(\mathbf{x}_i - \mathbf{x}_j)||`, the transition
matrix :math:`\mathbf{A}` is derived from the decomposition of Mahalanobis
matrix :math:`\mathbf{M=A^TA}`.
\mathbf{x}_j) = ||\mathbf{L}(\mathbf{x}_i - \mathbf{x}_j)||`, the transition
matrix :math:`\mathbf{L}` is derived from the decomposition of Mahalanobis
matrix :math:`\mathbf{M=L^TL}`.

Since :math:`\sigma^2` can be integrated into :math:`d(\cdot)`, we can set
:math:`\sigma^2=1` for the sake of simplicity. Here we use the cumulative
Expand Down
37 changes: 17 additions & 20 deletions doc/weakly_supervised.rst
Original file line number Diff line number Diff line change
Expand Up @@ -367,36 +367,36 @@ other methods, `ITML` does not rely on an eigenvalue computation or
semi-definite programming.


Given a Mahalanobis distance parameterized by :math:`A`, its corresponding
Given a Mahalanobis distance parameterized by :math:`M`, its corresponding
multivariate Gaussian is denoted as:

.. math::
p(\mathbf{x}; \mathbf{A}) = \frac{1}{Z}\exp(-\frac{1}{2}d_\mathbf{A}
p(\mathbf{x}; \mathbf{M}) = \frac{1}{Z}\exp(-\frac{1}{2}d_\mathbf{M}
(\mathbf{x}, \mu))
= \frac{1}{Z}\exp(-\frac{1}{2}((\mathbf{x} - \mu)^T\mathbf{A}
= \frac{1}{Z}\exp(-\frac{1}{2}((\mathbf{x} - \mu)^T\mathbf{M}
(\mathbf{x} - \mu))
where :math:`Z` is the normalization constant, the inverse of Mahalanobis
matrix :math:`\mathbf{A}^{-1}` is the covariance of the Gaussian.
matrix :math:`\mathbf{M}^{-1}` is the covariance of the Gaussian.

Given pairs of similar points :math:`S` and pairs of dissimilar points
:math:`D`, the distance metric learning problem is to minimize the LogDet
divergence, which is equivalent as minimizing :math:`\textbf{KL}(p(\mathbf{x};
\mathbf{A}_0) || p(\mathbf{x}; \mathbf{A}))`:
\mathbf{M}_0) || p(\mathbf{x}; \mathbf{M}))`:

.. math::
\min_\mathbf{A} D_{\ell \mathrm{d}}\left(A, A_{0}\right) =
\operatorname{tr}\left(A A_{0}^{-1}\right)-\log \operatorname{det}
\left(A A_{0}^{-1}\right)-n\\
\text{subject to } \quad d_\mathbf{A}(\mathbf{x}_i, \mathbf{x}_j)
\min_\mathbf{A} D_{\ell \mathrm{d}}\left(M, M_{0}\right) =
\operatorname{tr}\left(M M_{0}^{-1}\right)-\log \operatorname{det}
\left(M M_{0}^{-1}\right)-n\\
\text{subject to } \quad d_\mathbf{M}(\mathbf{x}_i, \mathbf{x}_j)
\leq u \qquad (\mathbf{x}_i, \mathbf{x}_j)\in S \\
d_\mathbf{A}(\mathbf{x}_i, \mathbf{x}_j) \geq l \qquad (\mathbf{x}_i,
d_\mathbf{M}(\mathbf{x}_i, \mathbf{x}_j) \geq l \qquad (\mathbf{x}_i,
\mathbf{x}_j)\in D
where :math:`u` and :math:`l` is the upper and the lower bound of distance
for similar and dissimilar pairs respectively, and :math:`\mathbf{A}_0`
for similar and dissimilar pairs respectively, and :math:`\mathbf{M}_0`
is the prior distance metric, set to identity matrix by default,
:math:`D_{\ell \mathrm{d}}(\cdot)` is the log determinant.

Expand Down Expand Up @@ -518,17 +518,14 @@ as the Mahalanobis matrix.

from metric_learn import RCA

pairs = [[[1.2, 7.5], [1.3, 1.5]],
[[6.4, 2.6], [6.2, 9.7]],
[[1.3, 4.5], [3.2, 4.6]],
[[6.2, 5.5], [5.4, 5.4]]]
y = [1, 1, -1, -1]

# in this task we want points where the first feature is close to be closer
# to each other, no matter how close the second feature is
X = [[-0.05, 3.0],[0.05, -3.0],
[0.1, -3.55],[-0.1, 3.55],
[-0.95, -0.05],[0.95, 0.05],
[0.4, 0.05],[-0.4, -0.05]]
chunks = [0, 0, 1, 1, 2, 2, 3, 3]

rca = RCA()
rca.fit(pairs, y)
rca.fit(X, chunks)

.. topic:: References:

Expand Down
2 changes: 1 addition & 1 deletion examples/plot_metric_learning_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ def plot_tsne(X, y, colormap=plt.cm.Paired):
#
# ITML uses a regularizer that automatically enforces a Semi-Definite
# Positive Matrix condition - the LogDet divergence. It uses soft
# must-link or cannot like constraints, and a simple algorithm based on
# must-link or cannot-link constraints, and a simple algorithm based on
# Bregman projections. Unlike LMNN, ITML will implicitly enforce points from
# the same class to belong to the same cluster, as you can see below.
#
Expand Down
27 changes: 20 additions & 7 deletions metric_learn/itml.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,13 +198,16 @@ class ITML(_BaseITML, _PairsClassifierMixin):
Examples
--------
>>> from metric_learn import ITML_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> itml = ITML_Supervised(num_constraints=200)
>>> itml.fit(X, Y)
>>> from metric_learn import ITML
>>> pairs = [[[1.2, 7.5], [1.3, 1.5]],
>>> [[6.4, 2.6], [6.2, 9.7]],
>>> [[1.3, 4.5], [3.2, 4.6]],
>>> [[6.2, 5.5], [5.4, 5.4]]]
>>> y = [1, 1, -1, -1]
>>> # in this task we want points where the first feature is close to be
>>> # closer to each other, no matter how close the second feature is
>>> itml = ITML()
>>> itml.fit(pairs, y)
References
----------
Expand Down Expand Up @@ -335,6 +338,16 @@ class ITML_Supervised(_BaseITML, TransformerMixin):
The linear transformation ``L`` deduced from the learned Mahalanobis
metric (See function `components_from_metric`.)
Examples
--------
>>> from metric_learn import ITML_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> itml = ITML_Supervised(num_constraints=200)
>>> itml.fit(X, Y)
See Also
--------
metric_learn.ITML : The original weakly-supervised algorithm
Expand Down
26 changes: 19 additions & 7 deletions metric_learn/lsml.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,13 +186,15 @@ class LSML(_BaseLSML, _QuadrupletsClassifierMixin):
Examples
--------
>>> from metric_learn import LSML_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> lsml = LSML_Supervised(num_constraints=200)
>>> lsml.fit(X, Y)
>>> from metric_learn import LSML
>>> quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]],
>>> [[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]],
>>> [[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]],
>>> [[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]]
>>> # we want to make closer points where the first feature is close, and
>>> # further if the second feature is close
>>> lsml = LSML()
>>> lsml.fit(quadruplets)
References
----------
Expand Down Expand Up @@ -290,6 +292,16 @@ class LSML_Supervised(_BaseLSML, TransformerMixin):
prior. In any case, `random_state` is also used to randomly sample
constraints from labels.
Examples
--------
>>> from metric_learn import LSML_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> lsml = LSML_Supervised(num_constraints=200)
>>> lsml.fit(X, Y)
Attributes
----------
n_iter_ : `int`
Expand Down
27 changes: 20 additions & 7 deletions metric_learn/mmc.py
Original file line number Diff line number Diff line change
Expand Up @@ -426,13 +426,16 @@ class MMC(_BaseMMC, _PairsClassifierMixin):
Examples
--------
>>> from metric_learn import MMC_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> mmc = MMC_Supervised(num_constraints=200)
>>> mmc.fit(X, Y)
>>> from metric_learn import MMC
>>> pairs = [[[1.2, 7.5], [1.3, 1.5]],
>>> [[6.4, 2.6], [6.2, 9.7]],
>>> [[1.3, 4.5], [3.2, 4.6]],
>>> [[6.2, 5.5], [5.4, 5.4]]]
>>> y = [1, 1, -1, -1]
>>> # in this task we want points where the first feature is close to be
>>> # closer to each other, no matter how close the second feature is
>>> mmc = MMC()
>>> mmc.fit(pairs, y)
References
----------
Expand Down Expand Up @@ -552,6 +555,16 @@ class MMC_Supervised(_BaseMMC, TransformerMixin):
samples, and pairs of dissimilar samples by taking different class
samples. It then passes these pairs to `MMC` for training.
Examples
--------
>>> from metric_learn import MMC_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> mmc = MMC_Supervised(num_constraints=200)
>>> mmc.fit(X, Y)
Attributes
----------
n_iter_ : `int`
Expand Down
25 changes: 18 additions & 7 deletions metric_learn/rca.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,14 @@ class RCA(MahalanobisMixin, TransformerMixin):
Examples
--------
>>> from metric_learn import RCA_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> rca = RCA_Supervised(num_chunks=30, chunk_size=2)
>>> rca.fit(X, Y)
>>> from metric_learn import RCA
>>> X = [[-0.05, 3.0],[0.05, -3.0],
>>> [0.1, -3.55],[-0.1, 3.55],
>>> [-0.95, -0.05],[0.95, 0.05],
>>> [0.4, 0.05],[-0.4, -0.05]]
>>> chunks = [0, 0, 1, 1, 2, 2, 3, 3]
>>> rca = RCA()
>>> rca.fit(X, chunks)
References
------------------
Expand Down Expand Up @@ -196,6 +197,16 @@ class RCA_Supervised(RCA):
A pseudo random number generator object or a seed for it if int.
It is used to randomly sample constraints from labels.
Examples
--------
>>> from metric_learn import RCA_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> rca = RCA_Supervised(num_chunks=30, chunk_size=2)
>>> rca.fit(X, Y)
Attributes
----------
components_ : `numpy.ndarray`, shape=(n_components, n_features)
Expand Down

0 comments on commit 1b40c3b

Please sign in to comment.