You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When attempting to look at different keyphrase lists by adjusting diversity for the MMR similarity metric, I find that the keyphrases never change regardless of the diversity value used. The keyphrases' output is corrected when I take the exemplar document out of a list form and just feed it as a single documentPlease see below code and results (using the text snippet from the README tutorial):
Version with a single piece of text in a single-element list:
Looking at the code for KeyBERT.extract_keywords(), it appears this is intentional, as there aren't even MMR/MaxSum options for _extract_keywords_multiple_docs(). Is there a reason why these diversity-enhancing metrics can't be used in the multi-document scenario?
The text was updated successfully, but these errors were encountered:
This was indeed on purpose since MMR and MaxSum work on a individual-document level whereas the multiple_docs implementation compares matrices of embeddings. During that process, MMR and MaxSum cannot be executed as it would significantly slow down the application. If I were to implement that option, it would essentially be same as iterating over extract_keywords which defeats its purpose.
Since this issue has been a while without activity, I'll be closing it for now. However, if you are still experiencing the issue or want to discuss it further, let me know!
When attempting to look at different keyphrase lists by adjusting
diversity
for the MMR similarity metric, I find that the keyphrases never change regardless of thediversity
value used. The keyphrases' output is corrected when I take the exemplar document out of a list form and just feed it as a single documentPlease see below code and results (using the text snippet from the README tutorial):Version with a single piece of text in a single-element list:
Version with a single piece of text fed directly as just a single document:
Looking at the code for
KeyBERT.extract_keywords()
, it appears this is intentional, as there aren't even MMR/MaxSum options for_extract_keywords_multiple_docs()
. Is there a reason why these diversity-enhancing metrics can't be used in the multi-document scenario?The text was updated successfully, but these errors were encountered: