Skip to content

Differentiable 1-D, 2-D covariance (numpy.cov) clone #19037

@LemonPi

Description

@LemonPi

🚀 Feature

A differentiable way to calculate covariance for a tensor of random variables similar to numpy.cov.

Motivation

Sometimes the covariance of the data can be used as a norm (as opposed to the implicit identity matrix in the standard x'y norm we have x'Sy where S is the covariance matrix, for example between x and y). It would be good to have a differentiable way of calculating these to allow backpropagation.

Pitch

Here is my partial clone of numpy.cov that's tested with pytest against what numpy.cov finds.

def cov(x, rowvar=False, bias=False, ddof=None, aweights=None):
    """Estimates covariance matrix like numpy.cov"""
    # ensure at least 2D
    if x.dim() == 1:
        x = x.view(-1, 1)

    # treat each column as a data point, each row as a variable
    if rowvar and x.shape[0] != 1:
        x = x.t()

    if ddof is None:
        if bias == 0:
            ddof = 1
        else:
            ddof = 0

    w = aweights
    if w is not None:
        if not torch.is_tensor(w):
            w = torch.tensor(w, dtype=torch.float)
        w_sum = torch.sum(w)
        avg = torch.sum(x * (w/w_sum)[:,None], 0)
    else:
        avg = torch.mean(x, 0)

    # Determine the normalization
    if w is None:
        fact = x.shape[0] - ddof
    elif ddof == 0:
        fact = w_sum
    elif aweights is None:
        fact = w_sum - ddof
    else:
        fact = w_sum - ddof * torch.sum(w * w) / w_sum

    xm = x.sub(avg.expand_as(x))

    if w is None:
        X_T = xm.t()
    else:
        X_T = torch.mm(torch.diag(w), xm).t()

    c = torch.mm(X_T, xm)
    c = c / fact

    return c.squeeze()

Testing

def assert_same_cov(A, w=None):
    c1 = np.cov(A, rowvar=False, aweights=w)
    c2 = cov(torch.tensor(A, dtype=torch.float), aweights=w)
    assert np.linalg.norm(c2.numpy() - c1) < 1e-6


def test_cov():
    a = [1, 2, 3, 4]
    assert_same_cov(a)
    A = [[1, 2], [3, 4]]
    assert_same_cov(A)

    assert_same_cov(a, w=[1, 1, 1, 1])
    assert_same_cov(a, w=[2, 0.5, 3, 1])

    assert_same_cov(A, w=[1, 1])
    assert_same_cov(A, w=[2, 0.5])

Alternatives

I don't think there's an existing method to compute the covariance in pytorch - are there alternatives? And is this an OK format for computation (instead of a forward and backward method)?

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @mruberry @rgommers @heitorschueroff

Metadata

Metadata

Labels

enhancementNot as big of a feature, but technically not a bug. Should be easy to fixhigh prioritymodule: numpyRelated to numpy support, and also numpy compatibility of our operatorstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions