-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev contexts #603
Dev contexts #603
Conversation
…n_new class to make our life easier.
Merge branch 'master' into dev-collocaton # Conflicts: # R/tokens_compound.R # R/utils.R
It does not cause errors on my system, but please try it again. I fixed something. |
On CHECK, I am still getting:
|
On the tests above, it's 50% better: > toks <- tokens(c("This is a test of a sentence.",
+ "A second sentence makes this two documents long."))
> kwic(toks, "not")
NULL
>
> kwic(data_char_inaugural, "secur*")
Error in qatd_cpp_kwic(x, types, keywords_id, window) : vector but be aware of #242 |
I believe my last commit will make it another 50% better. |
…to dev-contexts
Merge branch 'master' into dev-contexts # Conflicts: # R/dfm_select.R # tests/testthat/test-dfm_select.R # vignettes/plotting.html
These generated warnings because of the .Deprecated messages
Almost there... the segfault error is gone, but something is amiss with the appveyor/pr. Click on Details above and see. |
Codecov Report
@@ Coverage Diff @@
## master #603 +/- ##
==========================================
+ Coverage 80.99% 81.61% +0.62%
==========================================
Files 91 91
Lines 6545 6409 -136
==========================================
- Hits 5301 5231 -70
+ Misses 1244 1178 -66 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so basically this re-implements kwic in a faster way, and adds an indexed version of the kwic as attributes, to make compounding faster downstream?
I assume there are also performance improvements for kwic itself?
It looks to me like the codecov is decreased because you disabled the tests for the older kwic. I'd be happy if you want to kill off the older kwic code entirely, to have just one kwic, and that would not hurt us on the codecov. Would you like to do that in this branch before merging, or first merge and then kill off the old functions later?
Yes, it makes kwic 20 times faster and adds as.tokens() method, which returns the tokens object in its attributes. This is basically the same structure as the new collocations (data.frame + tokens). |
I added
remove_keywords
andoverlap
. If you have an idea how a newkwic.print()
will look like, it is great to see that first, so that I can make changes in the C++ code.