You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Though this is the expected behavior, perhaps it could be improved further. There is a function sliding_window from more-itertools which have seemingly implemented the same function as nltk.ngrams:
This implementation is faster, and it seems that the performance does not decrease when the size of n-grams increases. I suppose that this would be a better alternative to the current implementation, but I'm not sure about the compatibility with other optional parameters (e.g. pad_left, pad_right) and the license issue (Apache vs. MIT).
If the maintainers do not have time, I could work on this.
Notes: more-itertools has another function windowed which likewise would possibly be a better replacement for nltk.skipgrams.
The text was updated successfully, but these errors were encountered:
Although the performance increase is interesting, I'm hesitant to increase the number of dependencies of NLTK. Unless you're suggesting to rework the implementation of NLTK's ngram to mirror that of sliding_window, because that is worth considering.
@tomaarsen Yes, I'm suggesting that this function with only a few lines of code be incorporated (perhaps after some modifications?) into NLTK and the issue that needs to be considered is license compatibility.
With the current implementation of
nltk.ngrams
, the performance decreases slightly when the size of n-grams increases:Though this is the expected behavior, perhaps it could be improved further. There is a function
sliding_window
from more-itertools which have seemingly implemented the same function asnltk.ngrams
:This implementation is faster, and it seems that the performance does not decrease when the size of n-grams increases. I suppose that this would be a better alternative to the current implementation, but I'm not sure about the compatibility with other optional parameters (e.g. pad_left, pad_right) and the license issue (Apache vs. MIT).
If the maintainers do not have time, I could work on this.
Notes: more-itertools has another function
windowed
which likewise would possibly be a better replacement fornltk.skipgrams
.The text was updated successfully, but these errors were encountered: