Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors on Solaris #94

Closed
trinker opened this issue Aug 14, 2014 · 13 comments
Closed

Errors on Solaris #94

trinker opened this issue Aug 14, 2014 · 13 comments
Assignees
Milestone

Comments

@trinker
Copy link

trinker commented Aug 14, 2014

Love the work in this package. I think that the work many are doing to make R faster is awesome and this package is moving to do this for string manipulation.

I am working on a package that contains canned regexes and wraps them in a way that the user can either substitute or extract them. More for convenience than speed. stringi has the speed and if users want the speed they can use the regex dictionaries and supply directly to stringi functions like: stri_replace_all_regex and stri_extract_all_regex.

Any way I was fiddling with a snippet of code along the lines of:

regmatches(text.var, gregexpr(pattern, text.var, perl=TRUE))

and see that replacing that with:

stri_extract_all_regex(text.var, pattern)

yields a substantial speed up that is about 8x faster on 1 million simple strings (I only have tested this with a few regexes but assume it is generalizable).

I want to make this switch but am a little hesitant because in CRAN checks: http://cran.r-project.org/web/checks/check_results_stringi.html stringi has errors on solaris. Is there a planned fix for this? I know this isn't a CRAN requirement (and I know of no one who uses Solaris but I am sure there are users).

@gagolews
Copy link
Owner

Dear Tyler, thanks for noting that error. In short, do not worry, everything works perfectly also on Solaris machines. Basically, the error is on the CRAN side. After ICU is compilled, the installer tries to download a data file from one of our mirror servers. Unfortunately, from time to time a download error occurs out there (firewall issues? I don't know). Basically, the CRAN test results complain about missing icudt files; I think of removing fragile R CMD check tests completely to avoid that... I have access to a very old Solaris/Sparc serv and everything run smoothly there (unless I forbid the installer to fetch icudt, of course). Normally a call to stri_install_check() helps, but I cannot issue the command on CRAN...

@gagolews gagolews added the to do label Aug 14, 2014
@gagolews gagolews added this to the stringi-0.3 milestone Aug 14, 2014
@gagolews gagolews self-assigned this Aug 14, 2014
@gagolews
Copy link
Owner

@ TODO: get rid of icudt-dependent R CMD check tests; will start working on that after my vacation period

@trinker
Copy link
Author

trinker commented Aug 14, 2014

Thanks for the response. Sounds reasonable.

gagolews added a commit that referenced this issue Oct 18, 2014
gagolews added a commit that referenced this issue Oct 18, 2014
@gagolews
Copy link
Owner

All right, icu-dependent test has been wrapped within \dontruns. R CMD check shouldn't now fail on icudt download failure. We'll see if everything's ok on next CRAN submission.

Thanks again for the report, closing.

gagolews added a commit that referenced this issue Oct 18, 2014
@gagolews
Copy link
Owner

gagolews commented Nov 6, 2014

Dear @trinker, http://cran.r-project.org/web/checks/check_results_stringi.html --- everything's OK now :)

@gagolews gagolews added bug and removed to do labels Nov 7, 2014
@gagolews gagolews modified the milestones: stringi-0.4, stringi-0.3 Nov 7, 2014
@gagolews
Copy link
Owner

gagolews commented Nov 7, 2014

Unfortunately, actions made didn't solve the problem --> http://cran.r-project.org/web/checks/check_results_qdapRegex.html

I have some ideas, but first I must replicate this error

> rm_default(x, pattern = S("@around_", 1, "is not|is|are|am", 1), extract=TRUE)
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported

*** caught segfault ***
address 67f33810, cause 'memory not mapped'

Traceback:
1: makeRestartList(...)
2: withRestarts({ .Internal(.signalCondition(simpleWarning(msg, call), msg, call)) .Internal(.dfltWarn(msg, call))}, muffleWarning = function() NULL)
3: .signalSimpleWarning("empty search patterns are not supported", quote(stringi::stri_extract_all_regex(text.var, pattern)))
4: .Call("stri_extract_all_regex", str, pattern, simplify, opts_regex, PACKAGE = "stringi")
5: stringi::stri_extract_all_regex(text.var, pattern)
6: rm_default(x, pattern = S("@around_", 1, "is not|is|are|am", 1), extract = TRUE)
aborting ...

@gagolews gagolews reopened this Nov 7, 2014
@trinker
Copy link
Author

trinker commented Nov 7, 2014

Setting up a Solaris machine with R for testing purposes is a major pain. I've set up Solaris on VB but not been able to run an instance of R.

@gagolews
Copy link
Owner

gagolews commented Nov 7, 2014

Well, I do have access to an old Solaris/SPARC (little-endian) server and everything works fine there. The problem is on the CRAN side -- they block downloading data files (even though they told me I should do that this way). Thus, I gotta think of how to do a clean "landing" iff there's no icudt available (normally, it always is...)

@gagolews
Copy link
Owner

resolving #115 may do the trick.

UPDATE 2014-11-30: I don't think so
It seems that icudt is out there on CRAN/Solaris:

http://www.r-project.org/nosvn/R.check/r-patched-solaris-x86/stringi-00check.html
** x86 Solaris 10 8x Opteron 8218 (dual core) @ 2.6 GHz Solaris Studio 12.3 **

checking installed package size ... NOTE
installed size is 29.1Mb   #### this wouldn't be so heavy if there was no icudt installed :/
sub-directories of 1Mb or more:
libs 28.4Mb

www.r-project.org/nosvn/R.check/r-patched-solaris-sparc/stringi-00check.html
** sparc Solaris 10 8-core UltraSPARC T2 CPU @ 1.2 GHz Solaris Studio 12.3 **

checking installed package size ... NOTE
installed size is 28.8Mb  #### !!!
sub-directories of 1Mb or more:
libs 28.2Mb

BTW, http://www.r-project.org/nosvn/R.check/r-patched-solaris-x86/qdapRegex-00check.html:

> rm_citation(x)

*** caught segfault ***
address 95b2c9eb, cause 'memory not mapped'

Traceback:
1: .Call("stri_replace_all_regex", str, pattern, replacement, vectorize_all, opts_regex, PACKAGE = "stringi")
2: stringi::stri_replace_all_regex(text.var, pattern, replacement)
3: rm_citation(x)
aborting ...

BTW, http://www.r-project.org/nosvn/R.check/r-patched-solaris-sparc/qdapRegex-00check.html:

checking examples ... ERROR
Running examples in ‘qdapRegex-Ex.R’ failed
The error most likely occurred in:

> ### Name: pastex
> ### Title: Paste Regular Expressions
> ### Aliases: %|% pastex
> ### Keywords: paste regex
>
> ### ** Examples
>
> x <- c("There is $5.50 for me.", "that's 45.6% of the pizza",
+ "14% is $26 or $25.99", "It's 12:30 pm to 4:00 am")
>
> pastex("@rm_percent", "@rm_dollar")
[1] "\\(?[0-9.]+\\)?%|\\$\\(?[0-9.]+\\)?"
> pastex("@rm_percent", "@time_12_hours")
[1] "\\(?[0-9.]+\\)?%|(1[012]|[1-9]):[0-5][0-9](\\s?)(am|pm)"
>
> rm_dollar(x, extract=TRUE, pattern=pastex("@rm_percent", "@rm_dollar"))
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] NA

[[4]]
[1] NA

> rm_dollar(x, extract=TRUE, pattern=pastex("@rm_dollar", "@rm_percent", "@time_12_hours"))
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] NA

[[4]]
[1] NA

>
> ## retrieve regexes from dictionary
> pastex("@rm_email")
[1] "([_+a-z0-9-]+(\\.[_+a-z0-9-]+)*@[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,14}))"
> pastex("@rm_url3")
[1] "(https?|ftps?)://(-\\.)?([^\\s/?\\.#-]+\\.?)+(/[^\\s]*)?"
> pastex("@version")
[1] "(?<=\\b(v|version)\\s?)([0-9]+)\\.([0-9]+)\\.([0-9]+)(?:\\.([0-9]+))?\\b"
>
> ## pipe operator (%|%)
> "x" %|% "y"
[1] "x|y"
> "@rm_url" %|% "@rm_twitter_url"
[1] "(http[^ ]*)|(ftp[^ ]*)|(www\\.[^ ]*)|(https?://t\\.co[^ ]*)|(t\\.co[^ ]*)"
>
> ## Remove Twitter Short URL
> x <- c("download file from http://example.com",

*** caught segfault ***
address 0, cause 'memory not mapped'
aborting ...

(!!!) THIS IS WEIRD

Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported

@gagolews
Copy link
Owner

gagolews commented Dec 3, 2014

Hooray, hooray, hooray! I spent 2 days on installing Solaris 10 + Solaris Studio on a VM -- the bug IS reproducible, thank goodness!!!! :))))))

It has nothing to do with an icudt download fail.

gagolews added a commit that referenced this issue Dec 3, 2014
gagolews added a commit that referenced this issue Dec 10, 2014
@gagolews
Copy link
Owner

Seems that the bug is caused by -DU_DISABLE_RENAMING=1 compiler flag/define. And this causes ICU services to fail (i.a. UnicodeString).

@gagolews
Copy link
Owner

@trinker
Copy link
Author

trinker commented Dec 14, 2014

👍 Yay! Thanks for the work :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants