I used the fantastic quanteda package to preprocess a huge amount of news texts an happened upon a curious bug when converting my dfm to the stm format via convert().
More precisely, I managed to locate the bug in the supporting function stm:::dfm2stm in line 12-14:
if (sum(empty_feats) > 0) warning("zero-count features: ", paste0(featnames(x)[empty_feats], collapse = ", "))
The problem in my case was a prior subsetting of the dfm without also re-trimming the features, resulting in the convert function trying to paste over 500.000 empty features in the warning and causing R to crash.
I guess this is only relevant for huge amounts of data and feature-counts as in my case, but a maximum number of warning prints for empty features could solve this easily.
Lastly, I don't know if this bug is repeated in the supporting functions for the other conversion formats, but it might be worth checking.
Thank you very much for the great package!
I used the fantastic quanteda package to preprocess a huge amount of news texts an happened upon a curious bug when converting my dfm to the stm format via
convert().More precisely, I managed to locate the bug in the supporting function
stm:::dfm2stmin line 12-14:if (sum(empty_feats) > 0) warning("zero-count features: ", paste0(featnames(x)[empty_feats], collapse = ", "))The problem in my case was a prior subsetting of the dfm without also re-trimming the features, resulting in the convert function trying to paste over 500.000 empty features in the warning and causing R to crash.
I guess this is only relevant for huge amounts of data and feature-counts as in my case, but a maximum number of warning prints for empty features could solve this easily.
Lastly, I don't know if this bug is repeated in the supporting functions for the other conversion formats, but it might be worth checking.
Thank you very much for the great package!