Skip to content

Non-exclusive tokens lookup is wrong when there are paddings #1743

@koheiw

Description

@koheiw
> toks <- tokens("a x c y d") %>% 
+   tokens_remove(c("x", "y"), padding = TRUE)
> toks
tokens from 1 document.
text1 :
[1] "a" ""  "c" ""  "d"

> tokens_lookup(toks, dictionary(list("aa" = "a", "bb" = "b")))
tokens from 1 document.
text1 :
[1] "aa"

> tokens_lookup(toks, dictionary(list("aa" = "a", "bb" = "b")), exclusive = FALSE)
tokens from 1 document.
text1 :
[1] "AA" "AA" "c"  "AA" "d" 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions