Errors on Solaris #94

Closed
trinker opened this Issue Aug 14, 2014 · 13 comments

Comments

Projects
None yet
2 participants
@trinker

trinker commented Aug 14, 2014

Love the work in this package. I think that the work many are doing to make R faster is awesome and this package is moving to do this for string manipulation.

I am working on a package that contains canned regexes and wraps them in a way that the user can either substitute or extract them. More for convenience than speed. stringi has the speed and if users want the speed they can use the regex dictionaries and supply directly to stringi functions like: stri_replace_all_regex and stri_extract_all_regex.

Any way I was fiddling with a snippet of code along the lines of:

regmatches(text.var, gregexpr(pattern, text.var, perl=TRUE))

and see that replacing that with:

stri_extract_all_regex(text.var, pattern)

yields a substantial speed up that is about 8x faster on 1 million simple strings (I only have tested this with a few regexes but assume it is generalizable).

I want to make this switch but am a little hesitant because in CRAN checks: http://cran.r-project.org/web/checks/check_results_stringi.html stringi has errors on solaris. Is there a planned fix for this? I know this isn't a CRAN requirement (and I know of no one who uses Solaris but I am sure there are users).

@gagolews

This comment has been minimized.

Show comment
Hide comment
@gagolews

gagolews Aug 14, 2014

Owner

Dear Tyler, thanks for noting that error. In short, do not worry, everything works perfectly also on Solaris machines. Basically, the error is on the CRAN side. After ICU is compilled, the installer tries to download a data file from one of our mirror servers. Unfortunately, from time to time a download error occurs out there (firewall issues? I don't know). Basically, the CRAN test results complain about missing icudt files; I think of removing fragile R CMD check tests completely to avoid that... I have access to a very old Solaris/Sparc serv and everything run smoothly there (unless I forbid the installer to fetch icudt, of course). Normally a call to stri_install_check() helps, but I cannot issue the command on CRAN...

Owner

gagolews commented Aug 14, 2014

Dear Tyler, thanks for noting that error. In short, do not worry, everything works perfectly also on Solaris machines. Basically, the error is on the CRAN side. After ICU is compilled, the installer tries to download a data file from one of our mirror servers. Unfortunately, from time to time a download error occurs out there (firewall issues? I don't know). Basically, the CRAN test results complain about missing icudt files; I think of removing fragile R CMD check tests completely to avoid that... I have access to a very old Solaris/Sparc serv and everything run smoothly there (unless I forbid the installer to fetch icudt, of course). Normally a call to stri_install_check() helps, but I cannot issue the command on CRAN...

@gagolews gagolews added the to do label Aug 14, 2014

@gagolews gagolews added this to the stringi-0.3 milestone Aug 14, 2014

@gagolews gagolews self-assigned this Aug 14, 2014

@gagolews

This comment has been minimized.

Show comment
Hide comment
@gagolews

gagolews Aug 14, 2014

Owner

@ TODO: get rid of icudt-dependent R CMD check tests; will start working on that after my vacation period

Owner

gagolews commented Aug 14, 2014

@ TODO: get rid of icudt-dependent R CMD check tests; will start working on that after my vacation period

@trinker

This comment has been minimized.

Show comment
Hide comment
@trinker

trinker Aug 14, 2014

Thanks for the response. Sounds reasonable.

trinker commented Aug 14, 2014

Thanks for the response. Sounds reasonable.

gagolews added a commit that referenced this issue Oct 18, 2014

gagolews added a commit that referenced this issue Oct 18, 2014

@gagolews

This comment has been minimized.

Show comment
Hide comment
@gagolews

gagolews Oct 18, 2014

Owner

All right, icu-dependent test has been wrapped within \dontruns. R CMD check shouldn't now fail on icudt download failure. We'll see if everything's ok on next CRAN submission.

Thanks again for the report, closing.

Owner

gagolews commented Oct 18, 2014

All right, icu-dependent test has been wrapped within \dontruns. R CMD check shouldn't now fail on icudt download failure. We'll see if everything's ok on next CRAN submission.

Thanks again for the report, closing.

@gagolews gagolews closed this Oct 18, 2014

gagolews added a commit that referenced this issue Oct 18, 2014

@gagolews

This comment has been minimized.

Show comment
Hide comment

@gagolews gagolews added bug and removed to do labels Nov 7, 2014

@gagolews gagolews modified the milestones: stringi-0.4, stringi-0.3 Nov 7, 2014

@gagolews

This comment has been minimized.

Show comment
Hide comment
@gagolews

gagolews Nov 7, 2014

Owner

Unfortunately, actions made didn't solve the problem --> http://cran.r-project.org/web/checks/check_results_qdapRegex.html

I have some ideas, but first I must replicate this error

> rm_default(x, pattern = S("@around_", 1, "is not|is|are|am", 1), extract=TRUE)
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported

*** caught segfault ***
address 67f33810, cause 'memory not mapped'

Traceback:
1: makeRestartList(...)
2: withRestarts({ .Internal(.signalCondition(simpleWarning(msg, call), msg, call)) .Internal(.dfltWarn(msg, call))}, muffleWarning = function() NULL)
3: .signalSimpleWarning("empty search patterns are not supported", quote(stringi::stri_extract_all_regex(text.var, pattern)))
4: .Call("stri_extract_all_regex", str, pattern, simplify, opts_regex, PACKAGE = "stringi")
5: stringi::stri_extract_all_regex(text.var, pattern)
6: rm_default(x, pattern = S("@around_", 1, "is not|is|are|am", 1), extract = TRUE)
aborting ...
Owner

gagolews commented Nov 7, 2014

Unfortunately, actions made didn't solve the problem --> http://cran.r-project.org/web/checks/check_results_qdapRegex.html

I have some ideas, but first I must replicate this error

> rm_default(x, pattern = S("@around_", 1, "is not|is|are|am", 1), extract=TRUE)
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported

*** caught segfault ***
address 67f33810, cause 'memory not mapped'

Traceback:
1: makeRestartList(...)
2: withRestarts({ .Internal(.signalCondition(simpleWarning(msg, call), msg, call)) .Internal(.dfltWarn(msg, call))}, muffleWarning = function() NULL)
3: .signalSimpleWarning("empty search patterns are not supported", quote(stringi::stri_extract_all_regex(text.var, pattern)))
4: .Call("stri_extract_all_regex", str, pattern, simplify, opts_regex, PACKAGE = "stringi")
5: stringi::stri_extract_all_regex(text.var, pattern)
6: rm_default(x, pattern = S("@around_", 1, "is not|is|are|am", 1), extract = TRUE)
aborting ...

@gagolews gagolews reopened this Nov 7, 2014

@trinker

This comment has been minimized.

Show comment
Hide comment
@trinker

trinker Nov 7, 2014

Setting up a Solaris machine with R for testing purposes is a major pain. I've set up Solaris on VB but not been able to run an instance of R.

trinker commented Nov 7, 2014

Setting up a Solaris machine with R for testing purposes is a major pain. I've set up Solaris on VB but not been able to run an instance of R.

@gagolews

This comment has been minimized.

Show comment
Hide comment
@gagolews

gagolews Nov 7, 2014

Owner

Well, I do have access to an old Solaris/SPARC (little-endian) server and everything works fine there. The problem is on the CRAN side -- they block downloading data files (even though they told me I should do that this way). Thus, I gotta think of how to do a clean "landing" iff there's no icudt available (normally, it always is...)

Owner

gagolews commented Nov 7, 2014

Well, I do have access to an old Solaris/SPARC (little-endian) server and everything works fine there. The problem is on the CRAN side -- they block downloading data files (even though they told me I should do that this way). Thus, I gotta think of how to do a clean "landing" iff there's no icudt available (normally, it always is...)

@gagolews

This comment has been minimized.

Show comment
Hide comment
@gagolews

gagolews Nov 27, 2014

Owner

resolving #115 may do the trick.

UPDATE 2014-11-30: I don't think so
It seems that icudt is out there on CRAN/Solaris:

http://www.r-project.org/nosvn/R.check/r-patched-solaris-x86/stringi-00check.html
** x86 Solaris 10 8x Opteron 8218 (dual core) @ 2.6 GHz Solaris Studio 12.3 **

checking installed package size ... NOTE
installed size is 29.1Mb   #### this wouldn't be so heavy if there was no icudt installed :/
sub-directories of 1Mb or more:
libs 28.4Mb

www.r-project.org/nosvn/R.check/r-patched-solaris-sparc/stringi-00check.html
** sparc Solaris 10 8-core UltraSPARC T2 CPU @ 1.2 GHz Solaris Studio 12.3 **

checking installed package size ... NOTE
installed size is 28.8Mb  #### !!!
sub-directories of 1Mb or more:
libs 28.2Mb

BTW, http://www.r-project.org/nosvn/R.check/r-patched-solaris-x86/qdapRegex-00check.html:

> rm_citation(x)

*** caught segfault ***
address 95b2c9eb, cause 'memory not mapped'

Traceback:
1: .Call("stri_replace_all_regex", str, pattern, replacement, vectorize_all, opts_regex, PACKAGE = "stringi")
2: stringi::stri_replace_all_regex(text.var, pattern, replacement)
3: rm_citation(x)
aborting ...

BTW, http://www.r-project.org/nosvn/R.check/r-patched-solaris-sparc/qdapRegex-00check.html:

checking examples ... ERROR
Running examples in ‘qdapRegex-Ex.R’ failed
The error most likely occurred in:

> ### Name: pastex
> ### Title: Paste Regular Expressions
> ### Aliases: %|% pastex
> ### Keywords: paste regex
>
> ### ** Examples
>
> x <- c("There is $5.50 for me.", "that's 45.6% of the pizza",
+ "14% is $26 or $25.99", "It's 12:30 pm to 4:00 am")
>
> pastex("@rm_percent", "@rm_dollar")
[1] "\\(?[0-9.]+\\)?%|\\$\\(?[0-9.]+\\)?"
> pastex("@rm_percent", "@time_12_hours")
[1] "\\(?[0-9.]+\\)?%|(1[012]|[1-9]):[0-5][0-9](\\s?)(am|pm)"
>
> rm_dollar(x, extract=TRUE, pattern=pastex("@rm_percent", "@rm_dollar"))
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] NA

[[4]]
[1] NA

> rm_dollar(x, extract=TRUE, pattern=pastex("@rm_dollar", "@rm_percent", "@time_12_hours"))
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] NA

[[4]]
[1] NA

>
> ## retrieve regexes from dictionary
> pastex("@rm_email")
[1] "([_+a-z0-9-]+(\\.[_+a-z0-9-]+)*@[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,14}))"
> pastex("@rm_url3")
[1] "(https?|ftps?)://(-\\.)?([^\\s/?\\.#-]+\\.?)+(/[^\\s]*)?"
> pastex("@version")
[1] "(?<=\\b(v|version)\\s?)([0-9]+)\\.([0-9]+)\\.([0-9]+)(?:\\.([0-9]+))?\\b"
>
> ## pipe operator (%|%)
> "x" %|% "y"
[1] "x|y"
> "@rm_url" %|% "@rm_twitter_url"
[1] "(http[^ ]*)|(ftp[^ ]*)|(www\\.[^ ]*)|(https?://t\\.co[^ ]*)|(t\\.co[^ ]*)"
>
> ## Remove Twitter Short URL
> x <- c("download file from http://example.com",

*** caught segfault ***
address 0, cause 'memory not mapped'
aborting ...

(!!!) THIS IS WEIRD

Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Owner

gagolews commented Nov 27, 2014

resolving #115 may do the trick.

UPDATE 2014-11-30: I don't think so
It seems that icudt is out there on CRAN/Solaris:

http://www.r-project.org/nosvn/R.check/r-patched-solaris-x86/stringi-00check.html
** x86 Solaris 10 8x Opteron 8218 (dual core) @ 2.6 GHz Solaris Studio 12.3 **

checking installed package size ... NOTE
installed size is 29.1Mb   #### this wouldn't be so heavy if there was no icudt installed :/
sub-directories of 1Mb or more:
libs 28.4Mb

www.r-project.org/nosvn/R.check/r-patched-solaris-sparc/stringi-00check.html
** sparc Solaris 10 8-core UltraSPARC T2 CPU @ 1.2 GHz Solaris Studio 12.3 **

checking installed package size ... NOTE
installed size is 28.8Mb  #### !!!
sub-directories of 1Mb or more:
libs 28.2Mb

BTW, http://www.r-project.org/nosvn/R.check/r-patched-solaris-x86/qdapRegex-00check.html:

> rm_citation(x)

*** caught segfault ***
address 95b2c9eb, cause 'memory not mapped'

Traceback:
1: .Call("stri_replace_all_regex", str, pattern, replacement, vectorize_all, opts_regex, PACKAGE = "stringi")
2: stringi::stri_replace_all_regex(text.var, pattern, replacement)
3: rm_citation(x)
aborting ...

BTW, http://www.r-project.org/nosvn/R.check/r-patched-solaris-sparc/qdapRegex-00check.html:

checking examples ... ERROR
Running examples in ‘qdapRegex-Ex.R’ failed
The error most likely occurred in:

> ### Name: pastex
> ### Title: Paste Regular Expressions
> ### Aliases: %|% pastex
> ### Keywords: paste regex
>
> ### ** Examples
>
> x <- c("There is $5.50 for me.", "that's 45.6% of the pizza",
+ "14% is $26 or $25.99", "It's 12:30 pm to 4:00 am")
>
> pastex("@rm_percent", "@rm_dollar")
[1] "\\(?[0-9.]+\\)?%|\\$\\(?[0-9.]+\\)?"
> pastex("@rm_percent", "@time_12_hours")
[1] "\\(?[0-9.]+\\)?%|(1[012]|[1-9]):[0-5][0-9](\\s?)(am|pm)"
>
> rm_dollar(x, extract=TRUE, pattern=pastex("@rm_percent", "@rm_dollar"))
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] NA

[[4]]
[1] NA

> rm_dollar(x, extract=TRUE, pattern=pastex("@rm_dollar", "@rm_percent", "@time_12_hours"))
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] NA

[[4]]
[1] NA

>
> ## retrieve regexes from dictionary
> pastex("@rm_email")
[1] "([_+a-z0-9-]+(\\.[_+a-z0-9-]+)*@[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,14}))"
> pastex("@rm_url3")
[1] "(https?|ftps?)://(-\\.)?([^\\s/?\\.#-]+\\.?)+(/[^\\s]*)?"
> pastex("@version")
[1] "(?<=\\b(v|version)\\s?)([0-9]+)\\.([0-9]+)\\.([0-9]+)(?:\\.([0-9]+))?\\b"
>
> ## pipe operator (%|%)
> "x" %|% "y"
[1] "x|y"
> "@rm_url" %|% "@rm_twitter_url"
[1] "(http[^ ]*)|(ftp[^ ]*)|(www\\.[^ ]*)|(https?://t\\.co[^ ]*)|(t\\.co[^ ]*)"
>
> ## Remove Twitter Short URL
> x <- c("download file from http://example.com",

*** caught segfault ***
address 0, cause 'memory not mapped'
aborting ...

(!!!) THIS IS WEIRD

Warning in stringi::stri_extract_all_regex(text.var, pattern) :
empty search patterns are not supported
@gagolews

This comment has been minimized.

Show comment
Hide comment
@gagolews

gagolews Dec 3, 2014

Owner

Hooray, hooray, hooray! I spent 2 days on installing Solaris 10 + Solaris Studio on a VM -- the bug IS reproducible, thank goodness!!!! :))))))

It has nothing to do with an icudt download fail.

Owner

gagolews commented Dec 3, 2014

Hooray, hooray, hooray! I spent 2 days on installing Solaris 10 + Solaris Studio on a VM -- the bug IS reproducible, thank goodness!!!! :))))))

It has nothing to do with an icudt download fail.

@gagolews

This comment has been minimized.

Show comment
Hide comment
@gagolews

gagolews Dec 10, 2014

Owner

Seems that the bug is caused by -DU_DISABLE_RENAMING=1 compiler flag/define. And this causes ICU services to fail (i.a. UnicodeString).

Owner

gagolews commented Dec 10, 2014

Seems that the bug is caused by -DU_DISABLE_RENAMING=1 compiler flag/define. And this causes ICU services to fail (i.a. UnicodeString).

@gagolews gagolews closed this in 9e35cbd Dec 10, 2014

@gagolews

This comment has been minimized.

Show comment
Hide comment
@trinker

This comment has been minimized.

Show comment
Hide comment
@trinker

trinker Dec 14, 2014

👍 Yay! Thanks for the work :-)

trinker commented Dec 14, 2014

👍 Yay! Thanks for the work :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment