Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] FIX: make proposal for sdml formulation #162

Merged
merged 78 commits into from Mar 22, 2019

Conversation

wdevazelhes
Copy link
Member

@wdevazelhes wdevazelhes commented Jan 28, 2019

Digging into SDML's code and paper, I don't understand some parts of the implementation. I think it should be as proposed in this PR. Tell me if I'm wrong

Looking at these this paper about SDML https://icml.cc/Conferences/2009/papers/46.pdf, line , and this paper about Graphical Lasso http://statweb.stanford.edu/~tibs/ftp/graph.pdf: it seems that in SDML, we want to optimize equation 8, which can indeed be done with Graphical Lasso according to the paper on Graphical Lasso. In fact, equation 8 in SDML's paper is the same as equation 1 in Graphical Lasso's paper (up to a minus sign). The following variables are equivalent:

SDML's paper  Graphical Lasso's paper
M theta
P S

where in SDML's paper, P = M_0^-1 + etha * X.L.X^T.

But note that in SDML's paper, M_0^-1 is the inverse of the a priori Mahalanobis matrix, which can be indeed initialized to the inverse of the covariance matrix. So then M_0^-1 will be the inverse of the inverse of the covariance matrix hence the covariance matrix itself.

So we should just compute P = emp. covariance matrix + self.balance_param * loss_matrix and do graphical lasso on this (and not: inverse the covariance matrix, do P = this_inverse + self.balance_param * loss_matrix, inverse the result, and compute Graphical Lasso on this as it is done currently)

And in both cases, we want to evaluate the sparse inverse of M/theta, so things are OK for that

Also, I didn't get the hack to ensure positive semidefinite, doesn't it change the result ?

This PR's modification does not fix the issues we had (like plot_sandwich.py does not have better results with this). So maybe let's not merge it until we have a whole fix for SDML.

There are other things that could explain why SDML doesn't work, like choosing the optimization parameter alpha, and also that graph_lasso seems to sometimes work badly, see scikit-learn/scikit-learn#6887 and scikit-learn/scikit-learn#11417)

So first tell me if you agree with this modification (not on merging it or not, just whether it's the right formula or not), so that we can look elsewhere to see how to fix SDML's error.

TODO:

  • Put a message just before merge, in the release draft, to announce that skggm is needed for SDML
  • replace sklearn's pinvh (deprecated) by scipy's pinvh
  • deal with the 1D input case for sdml
  • Add a small test on the non SPD case that skggm can solve
  • Make travis indeed install skggm if the version allows it

@bellet
Copy link
Member

bellet commented Jan 28, 2019

I think this is correct except for the initialization: M_0 should be either identity or the empirical covariance matrix (see SDML paper, beginning of page 7, where these two possible initializations are mentioned).

Hence in this case we should compute P = pinvh(emp. covariance matrix) + self.balance_param * loss_matrix and feed this to the graph lasso (not inverting P like in the current code of master).

Or more generally (this is probably how things should be written): P = pinvh(prior) + self.balance_param * loss_matrix
where prior is either the identity or the covariance matrix

Once this has been updated, if it does not fix the problem, here are some next things to try:

  • switch to identity as the prior
  • add a tiny bit of identity to the empirical covariance to make sure it is PSD (it might not due to some small numerical errors)
  • try varying the parameters balance_param and sparsity_param (which corresponds to parameter alpha of graph_lasso) to see if it allows to obtain satisfying results on the buggy cases of sandwich and iris
  • try also to increase a bit the parameter eps of graph_lasso when getting warnings/errors about ill conditioned system

@wdevazelhes
Copy link
Member Author

I think this is correct except for the initialization: M_0 should be either identity or the empirical covariance matrix (see SDML paper, beginning of page 7, where these two possible initializations are mentioned).

Ah that's right, I didn't see that. But isn't it weird to use the covariance matrix as a prior ? I thought we would usually take the inverse covariance matrix (Mahalanobis Matrix)? Like for instance if we take the balance parameter to zero the learned metric would be a sparse covariance matrix, wouldn't we want a sparse precision matrix (mahalanobis matrix) instead ?

Or more generally (this is probably how things should be written): P = pinvh(prior) + self.balance_param * loss_matrix
where prior is either the identity or the covariance matrix

Yes I agree I should have written things like that

@wdevazelhes
Copy link
Member Author

Once this has been updated, if it does not fix the problem, here are some next things to try:

* switch to identity as the prior

* add a tiny bit of identity to the empirical covariance to make sure it is PSD (it might not due to some small numerical errors)

* try varying the parameters `balance_param` and `sparsity_param` (which corresponds to parameter `alpha` of graph_lasso) to see if it allows to obtain satisfying results on the buggy cases of sandwich and iris

* try also to increase a bit the parameter `eps` of graph_lasso when getting warnings/errors about ill conditioned system

I agree, I'll test this and report the result here

@bellet
Copy link
Member

bellet commented Jan 28, 2019

I think this is correct except for the initialization: M_0 should be either identity or the empirical covariance matrix (see SDML paper, beginning of page 7, where these two possible initializations are mentioned).

Ah that's right, I didn't see that. But isn't it weird to use the covariance matrix as a prior ? I thought we would usually take the inverse covariance matrix (Mahalanobis Matrix)? Like for instance if we take the balance parameter to zero the learned metric would be a sparse covariance matrix, wouldn't we want a sparse precision matrix (mahalanobis matrix) instead ?

Yes in fact you are right, it must be a typo. M_0 should be the inverse covariance matrix so that both terms of P are some sort of covariance matrices

@wdevazelhes
Copy link
Member Author

An update on this topic: using the package http://github.com/skggm/skggm, replacing graph_lasso by their function quic, it works, as long as we specify an SPD initialization.

@bellet
Copy link
Member

bellet commented Jan 31, 2019

Until the sklearn version is made more robust (as intended in scikit-learn/scikit-learn#12228), I would vote for temporarily switching to skggm so that we have a working implementation

@mnarayan
Copy link

One of the skggm developers here. Do let us know how using skggm works out for you.

@@ -141,7 +141,7 @@ def test_iris(self):
sdml = SDML_Supervised(num_constraints=1500)
sdml.fit(self.iris_points, self.iris_labels, random_state=rs)
csep = class_separation(sdml.transform(self.iris_points), self.iris_labels)
self.assertLess(csep, 0.25)
self.assertLess(csep, 0.20)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The score that I had when running the example was 0.1933487414312185 so I thought let's be a bit more conservative that 0.25 so that if it fails we would be warned

@wdevazelhes
Copy link
Member Author

Note that cython is also needed as a dependency for skggm

@perimosocordiae
Copy link
Contributor

We shouldn't need to specify Cython as a dependency unless we have Cython code in metric-learn itself.

@wdevazelhes
Copy link
Member Author

We shouldn't need to specify Cython as a dependency unless we have Cython code in metric-learn itself.

Ah yes that's right, I could install it without explicitly installing cython indeed, even if skggm installs cython itself (among other packages)

@wdevazelhes
Copy link
Member Author

Oh but it worked locally, with python 3.7. With python 3.4 (like travis), is fails indeed. Maybe that's because skggm say they are compatible for python 3.6.x ? Maybe that's the occasion to move to python 3.6.x for metric learn ? According to pip messages I have when installing on python 3.4: "Please upgrade your Python as Python 3.4 won't be maintained after March 2019 (cf PEP 429)"

@wdevazelhes wdevazelhes changed the title [WIP] FIX: make proposal for sdml formulation [MRG] FIX: make proposal for sdml formulation Mar 8, 2019
@@ -96,6 +96,6 @@ def wrap_pairs(X, constraints):
c = np.array(constraints[2])
d = np.array(constraints[3])
constraints = np.vstack((np.column_stack((a, b)), np.column_stack((c, d))))
y = np.vstack([np.ones((len(a), 1)), - np.ones((len(c), 1))])
y = np.hstack([np.ones((len(a),)), - np.ones((len(c),))])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe be simpler to do:

y = np.concatenate([np.ones_like(a), -np.ones_like(c)])

@@ -15,6 +15,14 @@ def vector_norm(X):
return np.linalg.norm(X, axis=1)


def has_installed_skggm():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a function, as the answer isn't going to change during execution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right thanks


from .base_metric import MahalanobisMixin, _PairsClassifierMixin
from .constraints import Constraints, wrap_pairs
from ._util import transformer_from_metric
from ._util import transformer_from_metric, has_installed_skggm
if has_installed_skggm():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a simpler conditional import here:

try:
  from inverse_covariance import quic
except ImportError:
  HAS_SKGGM = False
else:
  HAS_SKGGM = True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to duplicate the logic in the tests, but I'm fine with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

"version.)")
warnings.warn(msg)
else:
print("SDML will use skggm's solver.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only print this if self.verbose is true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe clarify: skggm's graphical lasso solver

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe print something similar when sklearn solver is used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

"scikit-learn's graphical_lasso method. It can fail to converge"
"on some non SPD matrices where skggm would converge. If so, "
"try to install skggm. (see the README.md for the right "
"version.)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can catch the case where scikit-learn's version fails and emit the warning then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, here is the new version I'll commit. When using scikit-learn's graphical lasso, we try it and if an error is returned or the result is not finite, we raise a warning that will be printed before the error (if there is an error), or before returning M (if there is no error but there are NaNs) : Tell me what you think

    if HAS_SKGGM:
      theta0 = pinvh(sigma0)
      M, _, _, _, _, _ = quic(emp_cov, lam=self.sparsity_param,
                              msg=self.verbose,
                              Theta0=theta0, Sigma0=sigma0)
    else:
      try:
        _, M = graphical_lasso(emp_cov, alpha=self.sparsity_param,
                               verbose=self.verbose,
                               cov_init=sigma0)
        error = None
      except FloatingPointError as e:
        error = e
      if not np.isfinite(M).all() or error is not None:
        msg = ("Scikit-learn's graphical lasso has failed to converge. "
               "Package skggm's graphical lasso can sometimes converge on "
               "non SPD cases where scikit-learn's graphical lasso fails to "
               "converge. Try to install skggm and rerun the algorithm. (see "
               "the README.md for the right version.)")
        warnings.warn(msg)
        if error is not None:
          raise(error)  

EDIT:

    if HAS_SKGGM:
      theta0 = pinvh(sigma0)
      M, _, _, _, _, _ = quic(emp_cov, lam=self.sparsity_param,
                              msg=self.verbose,
                              Theta0=theta0, Sigma0=sigma0)
    else:
      try:
        _, M = graphical_lasso(emp_cov, alpha=self.sparsity_param,
                               verbose=self.verbose,
                               cov_init=sigma0)
      except FloatingPointError as e:
        msg = ("Scikit-learn's graphical lasso has failed to converge. "
               "Package skggm's graphical lasso can sometimes converge on "
               "non SPD cases where scikit-learn's graphical lasso fails to "
               "converge. Try to install skggm and rerun the algorithm. (see "
               "the README.md for the right version of skggm.)")
        warnings.warn(msg)
        raise(e)

(in fact it's skggm's graphical lasso that throws nans, scikit-learn's graphical lasso will return FloatingPointError in case of error (i didn't find cases where it would give nans) so it's better to stick to the case we know)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, trying to come up with the right tests, I realized the following:
in pathological diagonal cases (see above), scikit-learn's graphical lasso can return no error and return a result that is not SPD (in the below example, a diagonal matrix with a negative coefficient on the diagonal). But then a NAN will appear when taking the transformer_from_metric. So maybe if using scikit-learn, I should additionally check that the result is indeed SPD and if not raise the error "SDML has failed to converge to an SPD matrix using scikit-learn's graphical lasso"
I don't know if skggm guarantees to return an SPD matrix, so maybe I should do the check there too ? (and then later if we dig deeper into that we could remove this if the test is not/[no more] needed)

Example (go in debug mode or put a print statement in SDML to see the result of the graphical lasso)

    from metric_learn import SDML
    import numpy as np

    pairs = np.array([[[-10., 0.], [10., 0.]], [[0., 50.], [0., -60]]])
    y_pairs = [1, -1]

    sdml = SDML(use_cov=False, balance_param=100,verbose=True)

    diff = pairs[:, 0] - pairs[:, 1]
    emp_cov = np.identity(pairs.shape[2]) + 100 * (diff.T * y_pairs).dot(diff)

    print(emp_cov)

    sdml.fit(pairs, y_pairs)
    print(sdml.get_mahalanobis_matrix())

Returns:

[[   40001.        0.]
 [       0. -1209999.]]
SDML will use scikit-learn's graphical lasso solver.
/home/will/Code/metric-learn/metric_learn/sdml.py:92: ConvergenceWarning: Warning, the input matrix of graphical lasso is not positive semi-definite (PSD). The algorithm may diverge, and lead to degenerate solutions. To prevent that, try to decrease the balance parameter `balance_param` and/or to set use_covariance=False.
  ConvergenceWarning)
[graphical_lasso] Iteration   0, cost  inf, dual gap 0.000e+00
/home/will/Code/metric-learn/metric_learn/_util.py:346: RuntimeWarning: invalid value encountered in sqrt
  return np.sqrt(metric)
[[2.4999375e-05           nan]
 [          nan           nan]]

And if we print the result of graphical lasso (note that it's the inverse of the initial matrix):

[[ 2.49993750e-05  0.00000000e+00]
 [ 0.00000000e+00 -8.26446964e-07]]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:
1- first check if the matrix given as input is PSD. If not, show the warning that the algorithm may diverge etc
2- regardless of the solver: if an error is thrown by the solver, or the solution is not a PSD matrix, show error and abort (as there is no result or the result does not correspond to a valid distance)
3- if we fall in case 2- and we use sklearn solver, say that the user could try to install skggm as its graphical lasso solver is more stable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I'll go for something like this:

    try:
      if HAS_SKGGM:
        theta0 = pinvh(sigma0)
        M, _, _, _, _, _ = quic(emp_cov, lam=self.sparsity_param,
                                msg=self.verbose,
                                Theta0=theta0, Sigma0=sigma0)
      else:
        _, M = graphical_lasso(emp_cov, alpha=self.sparsity_param,
                               verbose=self.verbose,
                               cov_init=sigma0)
      raised_error = None
      w_mahalanobis, _ = np.linalg.eigh(M)
      not_spd = any(w_mahalanobis < 0.)
    except Exception as e:
      raised_error = e
      not_spd = False  # not_spd not applicable so we set to False
    if raised_error is not None or not_spd:
      msg = ("There was a problem in SDML when using {}'s graphical "
             "lasso.").format("skggm" if HAS_SKGGM else "scikit-learn")
      if not HAS_SKGGM:
        skggm_advice = ("skggm's graphical lasso can sometimes converge "
                        "on non SPD cases where scikit-learn's graphical "
                        "lasso fails to converge. Try to install skggm and "
                        "rerun the algorithm. (See the README.md for the " 
                        "right version of skggm.)")
        msg += skggm_advice
      if raised_error is not None:
        msg += "The following error message was thrown: {}.".format(
          raised_error)
      raise RuntimeError(msg)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add infinite values that can be returned by skggm as a failure case too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, in commit eb95719

_, M = graph_lasso(emp_cov, self.sparsity_param, verbose=self.verbose)

self.transformer_ = transformer_from_metric(M)
emp_cov = pinvh(prior) + self.balance_param * loss_matrix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd to round-trip the covariance matrix through two pinvh calls in the self.use_cov case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done

"and lead to degenerate solutions. "
"To prevent that, try to decrease the balance parameter "
"`balance_param` and/or to set use_covariance=False.",
ConvergenceWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it.

"To prevent that, try to decrease the balance parameter "
"`balance_param` and/or to set use_covariance=False.",
ConvergenceWarning)
sigma0 = (V * (w - min(0, np.min(w)) + 1e-10)).dot(V.T)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe simpler:

min_eigval = w.min()
if min_eigval < 0:
  warnings.warn(...)
  min_eigval = 0

w += 1e-10 - min_eigval
sigma0 = (V * w).dot(V.T)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the heuristic if min_eigval > 0 we would want to keep the matrix untouched (with just an epsilon added), (so I don't think with your solution it would be the case since we would substract the min eigval) but I agree something simpler would be better, so something like:

min_eigval = w.min()
if min_eigval < 0:
  warnings.warn(...)
  w -= min_eigval   # we translate the eigenvalues to make them all positive

w += 1e-10  # we add a small offset to avoid definiteness problems
sigma0 = (V * w).dot(V.T)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

README.rst Outdated
@@ -20,7 +20,7 @@ Metric Learning algorithms in Python.
**Dependencies**

- Python 2.7+, 3.4+
- numpy, scipy, scikit-learn
- numpy, scipy, scikit-learn, and skggm (commit `a0ed406 <https://github.com/skggm/skggm/commit/a0ed406586c4364ea3297a658f415e13b5cbdaf8>`_) for `SDML`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call out skggm separately as an optional dependency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I agree, I forgot to change that after making skggm optional

setup.py Outdated
@@ -38,6 +38,7 @@
extras_require=dict(
docs=['sphinx', 'shinx_rtd_theme', 'numpydoc'],
demo=['matplotlib'],
sdml=['skggm']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specify a commit hash here? Or maybe since their latest release is 0.2.8, we could specify skggm>=0.2.9.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried indeed but didn't manage to make it work. But good idea, skggm>=0.2.9 seems is better than nothing here

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few minor changes to make in addition to those requested by @perimosocordiae. But otherwise it looks good! Nice work

Have you verified that it fixes #138 and #154 ? If so please indicate this in the PR text

@@ -15,7 +15,7 @@ Alternately, download the source repository and run:
**Dependencies**

- Python 2.7+, 3.4+
- numpy, scipy, scikit-learn
- numpy, scipy, scikit-learn, and skggm (commit `a0ed406 <https://github.com/skggm/skggm/commit/a0ed406586c4364ea3297a658f415e13b5cbdaf8>`_) for `SDML`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_, M = graph_lasso(emp_cov, self.sparsity_param, verbose=self.verbose)

self.transformer_ = transformer_from_metric(M)
emp_cov = pinvh(prior) + self.balance_param * loss_matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

"version.)")
warnings.warn(msg)
else:
print("SDML will use skggm's solver.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe clarify: skggm's graphical lasso solver

for slice_idx in slices[trunc_data.shape[1]]:
pairs = trunc_data[:, slice_idx, :]
diffs = pairs[:, 1, :] - pairs[:, 0, :]
to_keep = np.nonzero(diffs.ravel())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit difficult to parse. Why do we need these slices? I am a bit lazy to check but maybe it should be made more clear even if it is less efficient (this is a small 1D dataset anyway so we don't care)

also maybe removing things that are very close to being the same (as opposed to exactly the same) would be more robust

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's difficult to parse, I'll change the test copying/pasting for the quadruplets/pairs case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, in commit 31072d3
What do you think ?

@wdevazelhes wdevazelhes mentioned this pull request Mar 18, 2019
4 tasks
@wdevazelhes
Copy link
Member Author

wdevazelhes commented Mar 18, 2019

Have you verified that it fixes #138 and #154 ? If so please indicate this in the PR text

@bellet I just checked and it does not directly fixes them, since the two issues in fact are in cases where the matrix is not SPD and there are convergence problems. So I think to solve them we should change the balance parameter. I'll open a different PR for that

@wdevazelhes
Copy link
Member Author

Travis is failing because of a weird bug, I think related to pytest: when I run pytest -k test_sdml_raises_warning_non_psd, it passes, but it doesn't passes when I run pytest -k metric_learn_test (i.e. the whole test file). This is fixed in pytest 4, but pytest 4 raises errors on fixtures, so I would like for now to stay on pytest 3... I'll try to find something to make it work...

@wdevazelhes
Copy link
Member Author

wdevazelhes commented Mar 20, 2019

This will be able to be fixed as soon as #186 is merged (i.e. if we go to the latest pytest)

@wdevazelhes
Copy link
Member Author

I think it should be ready to merge now ! :)

@bellet bellet merged commit e8c74d0 into scikit-learn-contrib:master Mar 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants