Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embedding_model bug #35

Closed
suneelmatham opened this issue Jan 9, 2021 · 2 comments
Closed

embedding_model bug #35

suneelmatham opened this issue Jan 9, 2021 · 2 comments

Comments

@suneelmatham
Copy link

suneelmatham commented Jan 9, 2021

Thanks for providing an easy to use library. When setting an embedding_model parameter in bertopic initialization, it isn't loading the model I want but defaults to 'distilbert-base-nli-stsb-mean-tokens'. I think this is the case because the elif clause of _select_embedding_model function in _bertopic.py

def _select_embedding_model(self) -> SentenceTransformer:

self.language is referenced before self.embedding_model and since the default language value is 'english', it is returning the transformer models under the self.language clause in spite of whatever embedding models I choose.

@MaartenGr
Copy link
Owner

Thank you for finding the bug. For now, you can set language to None and embedding_model to whatever model you want and it should work.

Having said that, it obviously should not work like that. I'll fix it by simply swapping the elif statements of self.language and self.embedding_model. This would result in the appropriate behavior, namely selecting an embedding model will work regardless of whether the language is used.

@MaartenGr
Copy link
Owner

This issue was fixed in v0.4.2. Please update BERTopic with pip install --upgrade bertopic and the issue should be resolved. Let me know if it isn't or if you find any other issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants