-
Notifications
You must be signed in to change notification settings - Fork 242
add option to normalize embeddings #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@blakechi do you have an idea how to best add this to the differentiable head ? Somewhere here: Line 232 in 5313939
Can we just change this: outputs = self.model_body(features) to outputs = self.model_body.encode(features, normalize_embeddings=self.normalize_embeddings) ? |
Hi @PhilipMay, Sorry for the late reply. It's not totally the same case but I tried adding a |
eb35a00
to
ac0c2aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me :)
Just curious, have you tested the performance? Or do you have other purposes? |
I tested performance with the "normal" Logistic Regression from SKlearn. @blakechi do you have an idea why? PS: To test the diff. head is still on my todo list. |
Can be merged from my point of view. |
I don't have a clear answer. But my guess is the magnitude of the embeddings might contains some information that helps classification, like the occurring frequency of each words? Btw, is it better to open embedding normalization to users only when we find merits from it? |
I would say that ML is a field where experimentation needs to be done. You never know what exactly helps and what not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this option @PhilipMay ! Since it doesn't interfere with the existing training, I'm happy to include it :)
Good point, that's what I didn't notice and it makes sense to me! |
Add option to normalize embeddings.