Conversation
|
Pixi is extremely annoying. It has negative returns on productivity as far as I am concerned. |
|
I have an intriguing segfault due to pytorch, that I can't reproduce locally with pixi or mamba. I'd be happy to get feedback on this. I suspect pixi doesn't download torch along with sentence-transformers from Pypi. |
|
Ping @jovan-stojanovic :) |
|
@GaelVaroquaux can we merge this one? |
|
I'm having a look right now. I played with the example on my computer. It's really cool I've created a small PR to your branch here: Vincent-Maladiere#1 Also, there is a conflict with main, which means that we cannot merge |
Focus on the string encoding, rather than the sentiment analysis
GaelVaroquaux
left a comment
There was a problem hiding this comment.
LGTM, but please merge my doc PR Vincent-Maladiere#1 :)
|
Nice!!! thanks @Vincent-Maladiere ! very exciting -- let's try to release it soon :) 🚀 |
|
Very exciting indeed! Bravo! |
|
Darn, we have a build failure on circleCI on main:
Yeah I restarted it once already :/ it doesn't seem related to a PR but
example 7 getting killed later (at least the first failure I saw)
|
|
Yes indeed. It fails reliably. I fear that it might be related to the PR: some nasty interaction between examples. We'll need to bisect to know whether it's related to the dependencies that are being installed, or the code that is being run. I've created an issue to track this: #1143 |
|
Memory issues? |
|
I can try to bisect on this tomorrow! |

closes #1047
Reference
#1047
What does this PR propose?
This PR wraps the SentenceTransformers library, in the same fashion as ragger-duck and embetter does.
Following the good empirical baselines of arxiv.org/abs/2312.09634, this PR also proposes to:
n_componentsparameter. This also allows the class to stay consistent with the other skrub encoders and perform HP tuning more easily. Following the paper, we setn_components=30by default.What the SentenceEncoder does not:
See the implementation of the paper for more details.
Todo in another PR
Enriching an existing example from skrub with the SentenceEncoder.