E-mail conversation refers.
When building a DFM with n-grams (rather than unigrams), the option to apply a thesaurus or dictionary fails because there is no match between an n-gram and dictionary keys (which are usually unigrams).
We would need a way to apply a dictionary before building the DFM, as per your suggestion:
I get what you are saying on the ngrams and the dictionary. Can’t be done using the existing tools, for the reasons you specify, but could be easily solved by adding an applyDictionary() method for tokenizedTexts. Steps would be:
unigram tokenise
apply dictionary, exclusive = FALSE to unigram tokens
dfm the thesaurus-ized tokens with ngrams = 2.
Thanks.
E-mail conversation refers.
When building a DFM with n-grams (rather than unigrams), the option to apply a thesaurus or dictionary fails because there is no match between an n-gram and dictionary keys (which are usually unigrams).
We would need a way to apply a dictionary before building the DFM, as per your suggestion:
Thanks.