how to normalize? #4

brando90 · 2021-10-26T17:56:30Z

Can you clarify what this means in an equation?

 Specifically, for a raw representation A we first
subtract the mean value from each column, then divide by the Frobenius norm, to produce the
normalized representation A∗
, used in all our dissimilarity computations. In this work we study
dissimilarity measures d(A∗
, B∗
) that allow for quantitative comparisons of representations both
within and across different networks.

which of these two is it:


def _matrix_normalize(input: Tensor,
                      dim: int
                      ) -> Tensor:
    """
    Center and normalize according to the forbenius norm (not the standard deviation).

    Warning: this does not create standardized random variables in a random vectors.

    Note: careful with this, it makes CCA behave in unexpected ways
    :param input:
    :param dim:
    :return:
    """
    from torch.linalg import norm
    return (input - input.mean(dim=dim, keepdim=True)) / norm(input, 'fro')

def _matrix_normalize_using_centered_data(X: Tensor, dim: int = 1) -> Tensor:
    """
    Normalize matrix of size wrt to the data dimension according to the similarity preprocessing standard.
    Assumption is that X is of size [n, d].
    Otherwise, specify which simension to normalize with dim.

    ref: https://stats.stackexchange.com/questions/544812/how-should-one-normalize-activations-of-batches-before-passing-them-through-a-si
    """
    from torch.linalg import norm
    X_centered: Tensor = (X - X.mean(dim=dim, keepdim=True))
    X_star: Tensor = X_centered / norm(X_centered, "fro")
    return X_star

ref: https://stats.stackexchange.com/questions/544812/how-should-one-normalize-activations-of-batches-before-passing-them-through-a-si

The text was updated successfully, but these errors were encountered:

js-d · 2021-10-27T03:03:40Z

Hi @brando90, thanks for leaving a comment!

When we normalize, we divide by the Frobenius norm of the centered matrix.
This happens here in the code.

This corresponds to _matrix_normalize_using_centered_data, not to _matrix_normalize.
(with one minor difference: it seems like in _matrix_normalize_using_centered_data, you can change the dimension from dim=1 to something else, while we always set it to 1)

brando90 · 2021-10-27T15:08:05Z

Hi @brando90, thanks for leaving a comment!

When we normalize, we divide by the Frobenius norm of the centered matrix. This happens here in the code.

This corresponds to _matrix_normalize_using_centered_data, not to _matrix_normalize. (with one minor difference: it seems like in _matrix_normalize_using_centered_data, you can change the dimension from dim=1 to something else, while we always set it to 1)

Thanks!

Just to double confirm, this is done for all metrics, not just orthogonal procrustes distance OPD - right?

Thanks again and really interesting paper + work.

brando90 · 2021-10-27T15:18:12Z

btw, js-d if you are curious, I think only OPD should divide be forbenius norm of centered data. For CCA it is already implicitly normalized by the variance so it's not needed to normalize again. Furthermore, normalizing by forbenius norm maps to the unic circle, which might increase the similrity artificially. However, dividing by std usually does not have that effect, instead it restrics on average to be in the unit circle (which doesn't lose information like forbenius norm division does).

I am not familiar with CKA right now so I am unsure what to do with that but my gut feeling is that it already divides by the forbenius norm in the equation itself, so dividing there I'd expect not difference.

For all distances centering should always be done and computing things relative to that! In general I think I prefer dividing by the sqrt variance/std (or something like that) than forbenius norm. But I admit my sanity checks all pass with division for frobenius norm so it might not matter too much. My 2 cents.

js-d · 2021-10-27T15:49:29Z

Just to double confirm, this is done for all metrics, not just orthogonal procrustes distance OPD - right?

Yes, we do this for all metrics: in the code, we normalize rep1 and rep2 just before we apply the different metrics (CCA-based, CKA-based, OPD).

brando90 · 2021-10-27T21:09:55Z

Just to double confirm, this is done for all metrics, not just orthogonal procrustes distance OPD - right?

Yes, we do this for all metrics: in the code, we normalize rep1 and rep2 just before we apply the different metrics (CCA-based, CKA-based, OPD).

Thanks for the reply!

brando90 mentioned this issue Oct 27, 2021

Add Orthogonal Procrustes Distance + similarity->distance moskomule/anatome#11

Merged

js-d closed this as completed Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to normalize? #4

how to normalize? #4

brando90 commented Oct 26, 2021 •

edited

Loading

js-d commented Oct 27, 2021

brando90 commented Oct 27, 2021

brando90 commented Oct 27, 2021 •

edited

Loading

js-d commented Oct 27, 2021

brando90 commented Oct 27, 2021

how to normalize? #4

how to normalize? #4

Comments

brando90 commented Oct 26, 2021 • edited Loading

js-d commented Oct 27, 2021

brando90 commented Oct 27, 2021

brando90 commented Oct 27, 2021 • edited Loading

js-d commented Oct 27, 2021

brando90 commented Oct 27, 2021

brando90 commented Oct 26, 2021 •

edited

Loading

brando90 commented Oct 27, 2021 •

edited

Loading