scikit-learn-contrib · mvargas33 · Oct 18, 2021 · Oct 18, 2021 · Oct 19, 2021 · Oct 20, 2021
diff --git a/doc/conf.py b/doc/conf.py
@@ -64,7 +64,6 @@
 # generate autosummary even if no references
 autosummary_generate = True
 
-
 # Temporary work-around for spacing problem between parameter and parameter
 # type in the doc, see https://github.com/numpy/numpydoc/issues/215. The bug
 # has been fixed in sphinx (https://github.com/sphinx-doc/sphinx/pull/5976) but

diff --git a/doc/contrib/algorithm.rst b/doc/contrib/algorithm.rst
@@ -0,0 +1,43 @@
+.. _implement-new::
+
+=========================
+Implement a new algorithm
+=========================
+
+Criteria
+^^^^^^^^
+
+If you want to implement an algorithm and include it in the library, you need
+to be aware of the criteria that exists in order to be accepted. In general,
+any new algorithm must have:
+
+- A publication with a reasonable number of citations.
+- A reference implementation or published inputs/outputs that we can validate
+  our version against.
+- An implementation that doesn't require thousands of lines of new code, or
+  adding new mandatory dependencies.
+
+Of course, any of these three guidelines could be ignored in special cases. On
+the other hand, we should prioritize the algorithms that have:
+
+- Larger number of citations
+- Common parts that can be reused by other/existing algorithms
+- Better proven performance over other similar/existing algorithms
+
+
+Algorithm wish list
+^^^^^^^^^^^^^^^^^^^
+
+Some desired algorithms that are not implemented yet in package can be found
+`here <https://github.com/scikit-learn-contrib/metric-learn/issues/13>`_ and
+`here <https://github.com/scikit-learn-contrib/metric-learn/issues/205>`_.
+
+How to
+^^^^^^
+
+1. First, you need to be familiar with the metric-learn API, so check out the
+   :ref:`api-structure` first.
+2. Propose in `Github Issues
+   <https://github.com/scikit-learn-contrib/metric-learn/issues>`_ the algorithm
+   you want to incorporate to get feedback from the core developers.
+3. If you get a green light, follow the guidelines on :ref:`contrib-code`
diff --git a/doc/contrib/api.rst b/doc/contrib/api.rst
@@ -0,0 +1,89 @@
+.. _api-structure:
+
+=============
+API Structure
+=============
+
+The API structure of metric-learn is insipred on the main classes from scikit-learn:
+``Estimator``, ``Predictor``, ``Transformer`` (check them
+`here <https://scikit-learn.org/stable/developers/develop.html>`_).
+
+
+BaseMetricLearner
+^^^^^^^^^^^^^^^^^
+
+All learners are ``BaseMetricLearner`` wich inherit from scikit-learn's ``BaseEstimator``
+class, so all of them have a ``fit`` method to learn from data, either:
+
+.. code-block::
+
+  estimator = estimator.fit(data, targets)
+
+or 
+
+.. code-block::
+
+  estimator = estimator.fit(data)
+
+This class has three main abstract methods that all learners need to implement:
+
++---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| **Abstract method** | **Description**                                                                                                                                                                                                    |
++---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| pair_score          | Returns the similarity score between pairs of points (the larger the score, the more similar the pair). For metric learners that learn a distancethe score is simply the opposite of the distance between pairs.   |
++---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| pair_distance       | Returns the (pseudo) distance between pairs, when available. For metric learrners that do not learn a (pseudo) distance, an error is thrown instead.                                                               |
++---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| get_metric          | Returns a function that takes as input two 1D arrays and outputs the value of the learned metric on these two points. Depending on the algorithm, it can return a distance or a similarity function between pairs. |
++---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+
+As you may noticed, the algorithms can learn a (pseudo) distance or a similarity. Most
+algorithms in the package learn a Mahalanobis metric, and have these three methods
+available, but for similarity learners ``pair_distance`` must throw an error. If you
+want to implement an algorithm of this kind, take this into account.
+
+MetricTransformer
+^^^^^^^^^^^^^^^^^
+
+Following scikit-learn's ``Transformer`` class gidelines, Mahalanobis learners inherit
+from a custom class named ``MetricTransformer`` wich only has the ``transform`` method.
+With it, these learners can apply a linear transformation to the input:
+
+.. code-block::
+
+  new_data = transformer.transform(data)
+
+Mixins
+^^^^^^
+
+Mixins represent the `metric` that algorithms need to learn. As of now, two main
+mixins are available: ``MahalanobisMixin`` and ``BilinearMixin``. They inherit from
+``BaseMetricLearner``, and/or ``MetricTransformer`` and **implement the abstract methods**
+needed. Later on, the algorithms inherit from the Mixin to access these methods while
+computing distance or the similarity score.
+
+As many algorithms learn the same metric, such as Mahalanobis, its useful to have the
+Mixins to avoid duplicated code, and to make sure that these metrics are computed
+correctly.
+
+Classifiers
+^^^^^^^^^^^
+
+Weakly-Supervised algorithms that learn from tuples such as pairs, triplets or quadruplets
+can also classify unseen points, using the learned metric.
+
+Metric-learn has three specific plug-and-play classes for this: ``_PairsClassifierMixin``,
+``_TripletsClassifierMixin`` and ``_QuadrupletsClassifierMixin``. All inherit from
+``BaseMetricLearner`` to access the methods described earlier.
+
+All these classifiers implement the following methods:
+
++---------------------+-------------------------------------------------------------------------------------+
+| **Abstract method** | **Description**                                                                     |
++---------------------+-------------------------------------------------------------------------------------+
+| predict             | Predicts the ordering between sample distances in input pairs/triplets/quadruplets. |
++---------------------+-------------------------------------------------------------------------------------+
+| decision_function   | Returns the decision function used to classify the pairs.                           |
++---------------------+-------------------------------------------------------------------------------------+
+| score               | Computes score of pairs/triplets/quadruplets similarity prediction.                 |
++---------------------+-------------------------------------------------------------------------------------+