-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 2124 #2129
Issue 2124 #2129
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2129 +/- ##
==========================================
- Coverage 96.17% 96.14% -0.04%
==========================================
Files 87 87
Lines 4938 4949 +11
==========================================
+ Hits 4749 4758 +9
- Misses 189 191 +2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good addition and move to make the coercion methods more consistent as well. However I see two things.
-
Better to deprecate or just make
noRd
the currentphrase()
methods that will be moved toas.phrase()
, just so not to break existing code in a minor update. (I'm happy to do this if you want.) -
I'm happy with the use the
separator
here. But I did take opportunity to review our various uses ofcontatenator
versusseparator
to make sure our usage makes sense and is consistent. I think it is.
Function | concatenator |
separator |
(none) |
---|---|---|---|
kwic() |
✓ | ||
dictionary() |
✓ | ||
as.dictionary() |
✓ | ||
tokens_split() |
✓ | ||
tokens_ngrams() |
✓ | ||
tokens_skipgrams() |
✓ | ||
corpus_group() |
✓ | ||
spacyr::entity_consolidate() |
✓ | ||
spacyr::entity_extract() |
✓ | ||
quanteda.textstats::textstat_collocations() |
✓ |
I think this is consistent because all of our functions that use concatenator
are designed to take parts and put them together, using the value of concatenator
to be the joining character. The functions that use separator
on the other hand either take things that are already concatenated and split them using the value of separator
to know what value to use for this split.
So by that logic, for as.phrase()
or phrase()
the use of separator
is correct.
I doubt that more than one or two people are using |
For #2124, add
separator
tophrase()
and makeas.phrase()
for objects for which separator has no effect.