Why Normalizing Flows Fail to Detect Out-of-Distribution Data #74

howardyclo · 2021-01-16T07:41:59Z

Metadata

Authors: Polina Kirichenko, Pavel Izmailov, Andrew Gordon Wilson
Organization: New York University
Conference: NeurIPS 2020
Paper: https://arxiv.org/pdf/2006.08545.pdf
Code: https://github.com/PolinaKirichenko/flows_ood.

Background: Training and sampling in Flows

Training and computing p(x): f: x -> z --> Feed x into function to get z, then maximize log_prob(x) = log_prob(z) + log_det_jacobian(f(x)), where log_prob(z) is usually log likelihood of Normal distribution N(z; mean=0, std=1).
Sampling: f-1: z -> x --> Sample z from the prior distribution (i.e., Normal distribution), and then feed into the inverse function to get x.

TL;DR

Normalizing flows can compute exact p(x) and we can train them by MLE (i.e., assign high density to training data).
However, most of the time we fail to use p(x) to distinguish out-of-distribution (OOD) data.
MLE training has a limited influence on OOD detection: Model are only trained to assign high probability on training data, instead of assigning low density on OOD data.
I.e., flows are learned to generate data, this objective does not necessarily need to learn semantics. Instead, learning pixel correlations (i.e., nearby pixels have similar colors) will generate high quality images.
Whether data is in or out-of-distribution is mainly distinguished by their semantics (i.e., label y), not by their pixel correlations.
The inductive bias of Normalizing flows (mainly study the coupling layer based NNs): They learn pixel correlations instead of semantics, so that's why flows fail to detect OOD data.
If given image embeddings that pretrained with images and labels, flows can detect OOD successfully from image embeddings.
They study the intermediate output of affine coupling layers of flows by injecting different masks (e.g., checkerboard mask, horizontal mask, and their proposed cycle mask), and find that even the first two masks applied to intermediate layers, flows can still learn to predict pixels by their neighbors. However, with their proposed cycle mask mechanism, flows cannot easily predict pixels by their neighbors, thus achieve successful OOD detection. However, since neighbors cannot easily obtain to predict pixels, the generation quality is not good. (Tradeoff between OOD and high-quality image generation?)

howardyclo added NIPS Generative Model NeurIPS Normalizing Flows OOD detection and removed NIPS labels Jan 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why Normalizing Flows Fail to Detect Out-of-Distribution Data #74

Why Normalizing Flows Fail to Detect Out-of-Distribution Data #74

howardyclo commented Jan 16, 2021 •

edited

Why Normalizing Flows Fail to Detect Out-of-Distribution Data #74

Why Normalizing Flows Fail to Detect Out-of-Distribution Data #74

Comments

howardyclo commented Jan 16, 2021 • edited

Metadata

Background: Training and sampling in Flows

TL;DR

howardyclo commented Jan 16, 2021 •

edited