Skip to content

Applying thesaurus / dictionary to n-grams (with n > 1) #116

@LucFrachon

Description

@LucFrachon

E-mail conversation refers.

When building a DFM with n-grams (rather than unigrams), the option to apply a thesaurus or dictionary fails because there is no match between an n-gram and dictionary keys (which are usually unigrams).
We would need a way to apply a dictionary before building the DFM, as per your suggestion:

I get what you are saying on the ngrams and the dictionary. Can’t be done using the existing tools, for the reasons you specify, but could be easily solved by adding an applyDictionary() method for tokenizedTexts. Steps would be:

unigram tokenise
apply dictionary, exclusive = FALSE to unigram tokens
dfm the thesaurus-ized tokens with ngrams = 2.

Thanks.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions