Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev contexts #603

Merged
merged 87 commits into from
Mar 25, 2017
Merged

Dev contexts #603

merged 87 commits into from
Mar 25, 2017

Conversation

koheiw
Copy link
Collaborator

@koheiw koheiw commented Mar 17, 2017

I added remove_keywords and overlap. If you have an idea how a new kwic.print() will look like, it is great to see that first, so that I can make changes in the C++ code.

@koheiw
Copy link
Collaborator Author

koheiw commented Mar 22, 2017

It does not cause errors on my system, but please try it again. I fixed something.

@kbenoit
Copy link
Collaborator

kbenoit commented Mar 23, 2017

On CHECK, I am still getting:

* creating vignettes ... ERROR

 *** caught segfault ***
address 0x10c934ffc, cause 'memory not mapped'

Traceback:
 1: .Call("quanteda_qatd_cpp_kwic", PACKAGE = "quanteda", texts_,     types_, words_, window)
 2: qatd_cpp_kwic(x, types, keywords_id, window)
 3: kwic.tokens(tokens(x, ...), keywords, window, valuetype, case_insensitive,     new = TRUE)
 4: kwic(tokens(x, ...), keywords, window, valuetype, case_insensitive,     new = TRUE)
 5: kwic.character(novel.v, "chapter")
 6: kwic(novel.v, "chapter")
 7: head(kwic(novel.v, "chapter"))
 8: eval(expr, envir, enclos)
 9: eval(expr, envir, enclos)
10: withVisible(eval(expr, envir, enclos))
11: withCallingHandlers(withVisible(eval(expr, envir, enclos)), warning = wHandler,     error = eHandler, message = mHandler)
12: handle(ev <- withCallingHandlers(withVisible(eval(expr, envir,     enclos)), warning = wHandler, error = eHandler, message = mHandler))
13: timing_fn(handle(ev <- withCallingHandlers(withVisible(eval(expr,     envir, enclos)), warning = wHandler, error = eHandler, message = mHandler)))
14: evaluate_call(expr, parsed$src[[i]], envir = envir, enclos = enclos,     debug = debug, last = i == length(out), use_try = stop_on_error !=         2L, keep_warning = keep_warning, keep_message = keep_message,     output_handler = output_handler, include_timing = include_timing)
15: evaluate(code, envir = env, new_device = FALSE, keep_warning = !isFALSE(options$warning),     keep_message = !isFALSE(options$message), stop_on_error = if (options$error &&         options$include) 0L else 2L, output_handler = knit_handlers(options$render,         options))
16: in_dir(input_dir(), evaluate(code, envir = env, new_device = FALSE,     keep_warning = !isFALSE(options$warning), keep_message = !isFALSE(options$message),     stop_on_error = if (options$error && options$include) 0L else 2L,     output_handler = knit_handlers(options$render, options)))
17: block_exec(params)
18: call_block(x)
19: process_group.block(group)
20: process_group(group)
21: withCallingHandlers(if (tangle) process_tangle(group) else process_group(group),     error = function(e) {        setwd(wd)        cat(res, sep = "\n", file = output %n% "")        message("Quitting from lines ", paste(current_lines(i),             collapse = "-"), " (", knit_concord$get("infile"),             ") ")    })
22: process_file(text, output)
23: knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet,     encoding = encoding)
24: rmarkdown::render(file, encoding = encoding, quiet = quiet, envir = globalenv())
25: vweave_rmarkdown(...)
26: engine$weave(file, quiet = quiet, encoding = enc)
27: doTryCatch(return(expr), name, parentenv, handler)
28: tryCatchOne(expr, names, parentenv, handlers[[1L]])
29: tryCatchList(expr, classes, parentenv, handlers)
30: tryCatch({    engine$weave(file, quiet = quiet, encoding = enc)    setwd(startdir)    find_vignette_product(name, by = "weave", engine = engine)}, error = function(e) {    stop(gettextf("processing vignette '%s' failed with diagnostics:\n%s",         file, conditionMessage(e)), domain = NA, call. = FALSE)})
31: tools::buildVignettes(dir = ".", tangle = TRUE)
An irrecoverable exception occurred. R is aborting now ...
Error: Command failed:
 '/Library/Frameworks/R.framework/Resources/bin/R' CMD build '/Users/kbenoit/Dropbox (Personal)/GitHub/quanteda' --no-resave-data --no-manual 
 sh: line 1: 92573 Segmentation fault: 11  '/Library/Frameworks/R.framework/Resources/bin/Rscript' --vanilla --default-packages= -e "tools::buildVignettes(dir = '.', tangle = TRUE)" > '/var/folders/46/zfn6gwj15d3_n6dhyy1cvwc00000gp/T//Rtmpu9qqaG/xshell1691813a7152e' 2>&1
Execution halted

@kbenoit
Copy link
Collaborator

kbenoit commented Mar 23, 2017

On the tests above, it's 50% better:

> toks <- tokens(c("This is a test of a sentence.", 
+                  "A second sentence makes this two documents long."))
> kwic(toks, "not")
NULL
> 
> kwic(data_char_inaugural, "secur*")

 Error in qatd_cpp_kwic(x, types, keywords_id, window) : vector 

but be aware of #242

@koheiw
Copy link
Collaborator Author

koheiw commented Mar 23, 2017

I believe my last commit will make it another 50% better.

koheiw and others added 7 commits March 23, 2017 17:35
Merge branch 'master' into dev-contexts

# Conflicts:
#	R/dfm_select.R
#	tests/testthat/test-dfm_select.R
#	vignettes/plotting.html
These generated warnings because of the .Deprecated messages
@kbenoit
Copy link
Collaborator

kbenoit commented Mar 25, 2017

Almost there... the segfault error is gone, but something is amiss with the appveyor/pr. Click on Details above and see.

@codecov-io
Copy link

codecov-io commented Mar 25, 2017

Codecov Report

Merging #603 into master will increase coverage by 0.62%.
The diff coverage is 86.56%.

@@            Coverage Diff             @@
##           master     #603      +/-   ##
==========================================
+ Coverage   80.99%   81.61%   +0.62%     
==========================================
  Files          91       91              
  Lines        6545     6409     -136     
==========================================
- Hits         5301     5231      -70     
+ Misses       1244     1178      -66

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 43a20cd...9008847. Read the comment docs.

Copy link
Collaborator

@kbenoit kbenoit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so basically this re-implements kwic in a faster way, and adds an indexed version of the kwic as attributes, to make compounding faster downstream?

I assume there are also performance improvements for kwic itself?

It looks to me like the codecov is decreased because you disabled the tests for the older kwic. I'd be happy if you want to kill off the older kwic code entirely, to have just one kwic, and that would not hurt us on the codecov. Would you like to do that in this branch before merging, or first merge and then kill off the old functions later?

@koheiw
Copy link
Collaborator Author

koheiw commented Mar 25, 2017

Yes, it makes kwic 20 times faster and adds as.tokens() method, which returns the tokens object in its attributes. This is basically the same structure as the new collocations (data.frame + tokens).

@kbenoit kbenoit merged commit 941e1b1 into master Mar 25, 2017
@kbenoit kbenoit deleted the dev-contexts branch March 25, 2017 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants