Add exclusive but non-overlapping mode to tokens_lookup #502

koheiw · 2017-01-23T10:54:21Z

When I simply want to count the occurrences of dictionary entities, matching overlapping ones is sometimes harmful. For example, tokens_lookup counts 'Czech Republic' in a text twice with this:

- CZ: Czech Republic, Czech*, Prague

The text was updated successfully, but these errors were encountered:

koheiw added the enhancement label Jan 23, 2017

koheiw self-assigned this Jan 23, 2017

kbenoit modified the milestone: CRAN refresh Jan 24, 2017

kbenoit added a commit that referenced this issue Jan 27, 2017

Add more tests for #502

cb0cb93

kbenoit closed this as completed Feb 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add exclusive but non-overlapping mode to tokens_lookup #502

Add exclusive but non-overlapping mode to tokens_lookup #502

koheiw commented Jan 23, 2017

Add exclusive but non-overlapping mode to tokens_lookup #502

Add exclusive but non-overlapping mode to tokens_lookup #502

Comments

koheiw commented Jan 23, 2017