Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify KL divergence to (reduced) cross-entropy #369

Merged
merged 3 commits into from
Nov 24, 2022

Conversation

fweber144
Copy link
Collaborator

This PR simplifies the actual Kullback-Leibler (KL) divergence calculated by the former family$kl() functions to the corresponding cross-entropy (now calculated by family$ce() functions), i.e., the reference model's negative entropy is dropped (when regarding KL divergence as the sum of the reference model's negative entropy and the cross-entropy of the submodel with respect to the reference model) or, in other words, the reference model's entropy is dropped (when regarding KL divergence as the cross-entropy of the submodel with respect to the reference model minus the reference model's entropy).

Furthermore, for some families, the actual cross-entropy is further reduced to only those terms which would not cancel out when calculating the KL divergence. In case of the Gaussian family, that reduced cross-entropy is further modified, yielding merely a proxy.

The reason for all this is consistency in custom reference models: Previously, for the actual KL divergence, projpred assumed that the reference model was of the same family as the submodel. Typically, this is the case, but in general, custom reference models don't need this assumption. Omitting the reference model's (negative) entropy from the actual KL divergence is not a problem because the actual KL divergence (output element kl of .init_submodel(), now called element ce) was only used in search_forward() where it was minimized over all submodels of a given model size. Since the (negative) entropy of the reference model is a constant there, this PR is able to drop it without affecting the minimization. In fact, the actual KL divergence was also passed forward to varsel()'s and cv_varsel()'s output, but there, it didn't seem to be used apart from unit tests (which are adapted by this PR as necessary).

@fweber144 fweber144 changed the title Actual KL divergence to cross-entropy Reduce KL divergence to cross-entropy Nov 24, 2022
@fweber144 fweber144 changed the title Reduce KL divergence to cross-entropy Reduce KL divergence to (reduced) cross-entropy Nov 24, 2022
@fweber144 fweber144 changed the title Reduce KL divergence to (reduced) cross-entropy KL divergence to (reduced) cross-entropy Nov 24, 2022
@fweber144 fweber144 changed the title KL divergence to (reduced) cross-entropy Simplify KL divergence to (reduced) cross-entropy Nov 24, 2022
@fweber144 fweber144 merged commit cb966c5 into stan-dev:master Nov 24, 2022
@fweber144 fweber144 deleted the kl2ce branch November 24, 2022 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant