Simplify KL divergence to (reduced) cross-entropy #369

fweber144 · 2022-11-24T08:49:25Z

This PR simplifies the actual Kullback-Leibler (KL) divergence calculated by the former family$kl() functions to the corresponding cross-entropy (now calculated by family$ce() functions), i.e., the reference model's negative entropy is dropped (when regarding KL divergence as the sum of the reference model's negative entropy and the cross-entropy of the submodel with respect to the reference model) or, in other words, the reference model's entropy is dropped (when regarding KL divergence as the cross-entropy of the submodel with respect to the reference model minus the reference model's entropy).

Furthermore, for some families, the actual cross-entropy is further reduced to only those terms which would not cancel out when calculating the KL divergence. In case of the Gaussian family, that reduced cross-entropy is further modified, yielding merely a proxy.

The reason for all this is consistency in custom reference models: Previously, for the actual KL divergence, projpred assumed that the reference model was of the same family as the submodel. Typically, this is the case, but in general, custom reference models don't need this assumption. Omitting the reference model's (negative) entropy from the actual KL divergence is not a problem because the actual KL divergence (output element kl of .init_submodel(), now called element ce) was only used in search_forward() where it was minimized over all submodels of a given model size. Since the (negative) entropy of the reference model is a constant there, this PR is able to drop it without affecting the minimization. In fact, the actual KL divergence was also passed forward to varsel()'s and cv_varsel()'s output, but there, it didn't seem to be used apart from unit tests (which are adapted by this PR as necessary).

drop the reference model's negative entropy), for consistency in custom reference models.

fweber144 added 3 commits November 22, 2022 22:26

Simplify the actual KL divergence to the cross entropy (i.e.,

fdd297b

drop the reference model's negative entropy), for consistency in custom reference models.

NEWS.md: Insert GitHub PR number.

ddc7c50

Update the tests (before, only replacements have been performed).

e1e23ef

fweber144 changed the title ~~Actual KL divergence to cross-entropy~~ Reduce KL divergence to cross-entropy Nov 24, 2022

fweber144 changed the title ~~Reduce KL divergence to cross-entropy~~ Reduce KL divergence to (reduced) cross-entropy Nov 24, 2022

fweber144 changed the title ~~Reduce KL divergence to (reduced) cross-entropy~~ KL divergence to (reduced) cross-entropy Nov 24, 2022

fweber144 changed the title ~~KL divergence to (reduced) cross-entropy~~ Simplify KL divergence to (reduced) cross-entropy Nov 24, 2022

fweber144 merged commit cb966c5 into stan-dev:master Nov 24, 2022

fweber144 deleted the kl2ce branch November 24, 2022 08:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify KL divergence to (reduced) cross-entropy #369

Simplify KL divergence to (reduced) cross-entropy #369

fweber144 commented Nov 24, 2022

Simplify KL divergence to (reduced) cross-entropy #369

Simplify KL divergence to (reduced) cross-entropy #369

Conversation

fweber144 commented Nov 24, 2022