Skip to content

Multi-word entries in dictionary appear to be ignored #188

@pablobarbera

Description

@pablobarbera

The following code is a simplified version of the example in quanteda::dictionary

mycorpus <- subset(inaugCorpus, Year>1900)
mydict <- dictionary(list(country = "united states"))
sum(dfm(mycorpus, dictionary = mydict)[,"country"])

Another example:

mycorpus <- corpus("this should work")
mydict <- dictionary(list(example = "should work"))
sum(dfm(mycorpus, dictionary = mydict)[,"example"])

Trying with "_" as concatenator:

mycorpus <- corpus("this should work")
mydict <- dictionary(list(example = "should_work"), concatenator="_")
sum(dfm(mycorpus, dictionary = mydict)[,"example"])

Am I missing something? This came up as I was trying to use quanteda with Lexicoder, which has multi-word entries in the dictionary. I'm running quanteda 0.9.6-9

┆Issue is synchronized with this Asana task

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions