-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to normalize? #4
Comments
Hi @brando90, thanks for leaving a comment! When we normalize, we divide by the Frobenius norm of the centered matrix. This corresponds to |
Thanks! Just to double confirm, this is done for all metrics, not just orthogonal procrustes distance OPD - right? Thanks again and really interesting paper + work. |
btw, js-d if you are curious, I think only OPD should divide be forbenius norm of centered data. For CCA it is already implicitly normalized by the variance so it's not needed to normalize again. Furthermore, normalizing by forbenius norm maps to the unic circle, which might increase the similrity artificially. However, dividing by std usually does not have that effect, instead it restrics on average to be in the unit circle (which doesn't lose information like forbenius norm division does). I am not familiar with CKA right now so I am unsure what to do with that but my gut feeling is that it already divides by the forbenius norm in the equation itself, so dividing there I'd expect not difference. For all distances centering should always be done and computing things relative to that! In general I think I prefer dividing by the sqrt variance/std (or something like that) than forbenius norm. But I admit my sanity checks all pass with division for frobenius norm so it might not matter too much. My 2 cents. |
Can you clarify what this means in an equation?
which of these two is it:
ref: https://stats.stackexchange.com/questions/544812/how-should-one-normalize-activations-of-batches-before-passing-them-through-a-si
The text was updated successfully, but these errors were encountered: