Skip to content

Curious bug in stm::convert() #2189

@M-Gartiser

Description

@M-Gartiser

I used the fantastic quanteda package to preprocess a huge amount of news texts an happened upon a curious bug when converting my dfm to the stm format via convert().

More precisely, I managed to locate the bug in the supporting function stm:::dfm2stm in line 12-14:

if (sum(empty_feats) > 0) warning("zero-count features: ", paste0(featnames(x)[empty_feats], collapse = ", "))

The problem in my case was a prior subsetting of the dfm without also re-trimming the features, resulting in the convert function trying to paste over 500.000 empty features in the warning and causing R to crash.

I guess this is only relevant for huge amounts of data and feature-counts as in my case, but a maximum number of warning prints for empty features could solve this easily.

Lastly, I don't know if this bug is repeated in the supporting functions for the other conversion formats, but it might be worth checking.

Thank you very much for the great package!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions