Kullback Leibler Divergence broadcasting no longer works #13707

swhalemwo · 2021-03-20T12:08:04Z

I revisited some older code which involved the calculation of pairwise KLDs. Some years back I had implemented the calculation based on numpy broadcasting. Now I noticed this approach no longer works, as since commit 473dd08 scipy.stats.entropy requires the arrays to have identical shapes (previously only identical lengths were required).

The text was updated successfully, but these errors were encountered:

tupui · 2021-03-20T14:31:47Z

Hi. Reading at the solution you linked, it seems like it was more of a trick instead of something which was officially supported. Can you maybe provide a minimal example of what you think should be working?

mdhaber · 2021-03-20T15:22:15Z

I don't think it was really a trick. It's just that the code supported nd-array input, performing the operation along the last axis, without advertising it. Many scipy functions support this sort of thing, but they will have an axis argument. @tupui another good example is ttest_ind that takes two nd arrays and does the t-test between corresponding rows (when axis is -1, or whatever the last axis is). Of course, those arrays shouldn't need to be the same shape explicitly- they can also rely on NumPy broadcasting rules. gh-13312 would add this behavior to many more hypothesis tests, and that could probably also be used here, but it would be better if we could make the code natively support vectorized operation like it used to.

update: oops, this wasn't accurate. It's easier than that. I didn't realize the axis argument was already added. In that case, it should have performed the broadcasting for you rather than expecting the shapes to match.

mdhaber · 2021-03-20T15:52:32Z

This looks like an easy fix. It's still vectorized; we just need to broadcast the two arrays instead of requiring them to be the same shape. @swhalemwo take a look at np.broadcast_arrays in the meantime. Use that before passing the arrays in, and it should do the trick.

swhalemwo · 2021-03-20T16:10:49Z

thanks for the ideas! I have to admit I don't know how broadcasting works in detail, and my code has been copied together to a large extent from SO. Just to clarify my issue, previously the following code would work to calculate pairwise KLDs, but now fails due to the shape requirement.

distributions = np.random.rand(3, 5)                                               
distributions /= distributions.sum(axis=1, keepdims=True)

pairwise_klds = entropy(distributions.T[:,:,None], distributions.T[:,None,:])

however, when I remove the shape requirement, it works just as before.

from scipy.special import rel_entr

def entropy_custom(pk, qk=None, base=None, axis=0):
    """custom version of entropy without shape requirements"""
    pk = np.asarray(pk)
    pk = 1.0*pk / np.sum(pk, axis=axis, keepdims=True)
    if qk is None:
        vec = entr(pk)
    else:
        qk = np.asarray(qk)
        # if qk.shape != pk.shape:
        #     raise ValueError("qk and pk must have same shape.")
        qk = 1.0*qk / np.sum(qk, axis=axis, keepdims=True)
        vec = rel_entr(pk, qk)
    S = np.sum(vec, axis=axis)
    if base is not None:
        S /= log(base)
    return S

pairwise_klds = entropy_custom(distributions.T[:,:,None], distributions.T[:,None,:])

array([[0.        , 0.2012053 , 0.09129983],
       [0.14336347, 0.        , 0.30077942],
       [0.09786879, 0.54668651, 0.        ]])

mdhaber · 2021-03-20T16:20:35Z

With the required imports and using explicit broadcasting, that's:

import numpy as np
from scipy.special import rel_entr, entr
from scipy.stats import entropy

distributions = np.random.rand(3, 5)
distributions /= distributions.sum(axis=1, keepdims=True)

# broadcasting before passing the arrays in works
a, b = np.broadcast_arrays(distributions.T[:,:,None], distributions.T[:,None,:])
pairwise_klds1 = entropy(a, b)

def entropy_custom(pk, qk=None, base=None, axis=0):
    """custom version of entropy without shape requirements"""
    pk = np.asarray(pk)
    pk = 1.0*pk / np.sum(pk, axis=axis, keepdims=True)
    if qk is None:
        vec = entr(pk)
    else:
        qk = np.asarray(qk)
        # if qk.shape != pk.shape:
        #     raise ValueError("qk and pk must have same shape.")
        qk = 1.0*qk / np.sum(qk, axis=axis, keepdims=True)
        vec = rel_entr(pk, qk)
    S = np.sum(vec, axis=axis)
    if base is not None:
        S /= np.log(base)
    return S

pairwise_klds2 = entropy_custom(distributions.T[:,:,None], distributions.T[:,None,:])
np.testing.assert_equal(pairwise_klds1, pairwise_klds2)

mdhaber · 2021-03-20T18:46:29Z

Addressed in gh-13711.

swhalemwo · 2021-03-21T13:49:51Z

@mdhaber thanks for the explanation!

tupui added the scipy.stats label Mar 20, 2021

mdhaber mentioned this issue Mar 20, 2021

MAINT: stats: fix broadcasting for scipy.stats.entropy #13711

Merged

mdhaber added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Mar 20, 2021

tupui closed this as completed in #13711 Mar 29, 2021

tylerjereddy added this to the 1.7.0 milestone Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kullback Leibler Divergence broadcasting no longer works #13707

Kullback Leibler Divergence broadcasting no longer works #13707

swhalemwo commented Mar 20, 2021 •

edited by mdhaber

Loading

tupui commented Mar 20, 2021

mdhaber commented Mar 20, 2021 •

edited

Loading

mdhaber commented Mar 20, 2021 •

edited

Loading

swhalemwo commented Mar 20, 2021

mdhaber commented Mar 20, 2021 •

edited

Loading

mdhaber commented Mar 20, 2021

swhalemwo commented Mar 21, 2021

Kullback Leibler Divergence broadcasting no longer works #13707

Kullback Leibler Divergence broadcasting no longer works #13707

Comments

swhalemwo commented Mar 20, 2021 • edited by mdhaber Loading

tupui commented Mar 20, 2021

mdhaber commented Mar 20, 2021 • edited Loading

mdhaber commented Mar 20, 2021 • edited Loading

swhalemwo commented Mar 20, 2021

mdhaber commented Mar 20, 2021 • edited Loading

mdhaber commented Mar 20, 2021

swhalemwo commented Mar 21, 2021

swhalemwo commented Mar 20, 2021 •

edited by mdhaber

Loading

mdhaber commented Mar 20, 2021 •

edited

Loading

mdhaber commented Mar 20, 2021 •

edited

Loading

mdhaber commented Mar 20, 2021 •

edited

Loading