Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to normalize? #4

Closed
brando90 opened this issue Oct 26, 2021 · 5 comments
Closed

how to normalize? #4

brando90 opened this issue Oct 26, 2021 · 5 comments

Comments

@brando90
Copy link

brando90 commented Oct 26, 2021

Can you clarify what this means in an equation?

 Specifically, for a raw representation A we first
subtract the mean value from each column, then divide by the Frobenius norm, to produce the
normalized representation A∗
, used in all our dissimilarity computations. In this work we study
dissimilarity measures d(A∗
, B∗
) that allow for quantitative comparisons of representations both
within and across different networks.

which of these two is it:


def _matrix_normalize(input: Tensor,
                      dim: int
                      ) -> Tensor:
    """
    Center and normalize according to the forbenius norm (not the standard deviation).

    Warning: this does not create standardized random variables in a random vectors.

    Note: careful with this, it makes CCA behave in unexpected ways
    :param input:
    :param dim:
    :return:
    """
    from torch.linalg import norm
    return (input - input.mean(dim=dim, keepdim=True)) / norm(input, 'fro')

def _matrix_normalize_using_centered_data(X: Tensor, dim: int = 1) -> Tensor:
    """
    Normalize matrix of size wrt to the data dimension according to the similarity preprocessing standard.
    Assumption is that X is of size [n, d].
    Otherwise, specify which simension to normalize with dim.

    ref: https://stats.stackexchange.com/questions/544812/how-should-one-normalize-activations-of-batches-before-passing-them-through-a-si
    """
    from torch.linalg import norm
    X_centered: Tensor = (X - X.mean(dim=dim, keepdim=True))
    X_star: Tensor = X_centered / norm(X_centered, "fro")
    return X_star

ref: https://stats.stackexchange.com/questions/544812/how-should-one-normalize-activations-of-batches-before-passing-them-through-a-si

@js-d
Copy link
Owner

js-d commented Oct 27, 2021

Hi @brando90, thanks for leaving a comment!

When we normalize, we divide by the Frobenius norm of the centered matrix.
This happens here in the code.

This corresponds to _matrix_normalize_using_centered_data, not to _matrix_normalize.
(with one minor difference: it seems like in _matrix_normalize_using_centered_data, you can change the dimension from dim=1 to something else, while we always set it to 1)

@brando90
Copy link
Author

Hi @brando90, thanks for leaving a comment!

When we normalize, we divide by the Frobenius norm of the centered matrix. This happens here in the code.

This corresponds to _matrix_normalize_using_centered_data, not to _matrix_normalize. (with one minor difference: it seems like in _matrix_normalize_using_centered_data, you can change the dimension from dim=1 to something else, while we always set it to 1)

Thanks!

Just to double confirm, this is done for all metrics, not just orthogonal procrustes distance OPD - right?

Thanks again and really interesting paper + work.

@brando90
Copy link
Author

brando90 commented Oct 27, 2021

btw, js-d if you are curious, I think only OPD should divide be forbenius norm of centered data. For CCA it is already implicitly normalized by the variance so it's not needed to normalize again. Furthermore, normalizing by forbenius norm maps to the unic circle, which might increase the similrity artificially. However, dividing by std usually does not have that effect, instead it restrics on average to be in the unit circle (which doesn't lose information like forbenius norm division does).

I am not familiar with CKA right now so I am unsure what to do with that but my gut feeling is that it already divides by the forbenius norm in the equation itself, so dividing there I'd expect not difference.

For all distances centering should always be done and computing things relative to that! In general I think I prefer dividing by the sqrt variance/std (or something like that) than forbenius norm. But I admit my sanity checks all pass with division for frobenius norm so it might not matter too much. My 2 cents.

@js-d
Copy link
Owner

js-d commented Oct 27, 2021

Just to double confirm, this is done for all metrics, not just orthogonal procrustes distance OPD - right?

Yes, we do this for all metrics: in the code, we normalize rep1 and rep2 just before we apply the different metrics (CCA-based, CKA-based, OPD).

@brando90
Copy link
Author

Just to double confirm, this is done for all metrics, not just orthogonal procrustes distance OPD - right?

Yes, we do this for all metrics: in the code, we normalize rep1 and rep2 just before we apply the different metrics (CCA-based, CKA-based, OPD).

Thanks for the reply!

@js-d js-d closed this as completed Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants