embedding_model bug #35

suneelmatham · 2021-01-09T20:34:36Z

Thanks for providing an easy to use library. When setting an embedding_model parameter in bertopic initialization, it isn't loading the model I want but defaults to 'distilbert-base-nli-stsb-mean-tokens'. I think this is the case because the elif clause of _select_embedding_model function in _bertopic.py

BERTopic/bertopic/_bertopic.py

Line 875 in c271ec6

def _select_embedding_model(self) -> SentenceTransformer:

self.language is referenced before self.embedding_model and since the default language value is 'english', it is returning the transformer models under the self.language clause in spite of whatever embedding models I choose.

MaartenGr · 2021-01-10T06:44:20Z

Thank you for finding the bug. For now, you can set language to None and embedding_model to whatever model you want and it should work.

Having said that, it obviously should not work like that. I'll fix it by simply swapping the elif statements of self.language and self.embedding_model. This would result in the appropriate behavior, namely selecting an embedding model will work regardless of whether the language is used.

MaartenGr · 2021-01-10T08:01:01Z

This issue was fixed in v0.4.2. Please update BERTopic with pip install --upgrade bertopic and the issue should be resolved. Let me know if it isn't or if you find any other issues!

MaartenGr mentioned this issue Jan 10, 2021

Fixed embedding model not working #36

Merged

suneelmatham closed this as completed Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embedding_model bug #35

embedding_model bug #35

suneelmatham commented Jan 9, 2021 •

edited

MaartenGr commented Jan 10, 2021

MaartenGr commented Jan 10, 2021

embedding_model bug #35

embedding_model bug #35

Comments

suneelmatham commented Jan 9, 2021 • edited

MaartenGr commented Jan 10, 2021

MaartenGr commented Jan 10, 2021

suneelmatham commented Jan 9, 2021 •

edited