New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stri_startswith & stri_endswith #41

Closed
bartektartanus opened this Issue Oct 25, 2013 · 5 comments

Comments

Projects
None yet
2 participants
@bartektartanus
Contributor

bartektartanus commented Oct 25, 2013

stri_startswith & stri_endswith
logical functions, like java startsWith and endsWith

@gagolews gagolews changed the title from stri_startwith & stri_endswith to stri_startswith & stri_endswith Mar 18, 2014

@gagolews

This comment has been minimized.

Show comment
Hide comment
@gagolews

gagolews Apr 3, 2014

Owner

TODO: stri_startswith_fixed, stri_startswith_charclass, stri_endswith_fixed, stri_endswith_charclass
(for regex it's unnecessary, I suppose or at least it may be done by pasting "$"/"^" to stri_detect_regex)

Owner

gagolews commented Apr 3, 2014

TODO: stri_startswith_fixed, stri_startswith_charclass, stri_endswith_fixed, stri_endswith_charclass
(for regex it's unnecessary, I suppose or at least it may be done by pasting "$"/"^" to stri_detect_regex)

@gagolews

This comment has been minimized.

Show comment
Hide comment
@gagolews

gagolews Oct 31, 2014

Owner

Some benchmarks:

x <- stri_paste(sample(letters, 10000, replace=TRUE), sample(0:9, 10000, replace=TRUE))
> microbenchmark::microbenchmark(stri_startswith_fixed(x, "a"), stri_detect_regex(x, "^a"))
Unit: microseconds
                          expr      min       lq      mean    median       uq      max neval
 stri_startswith_fixed(x, "a")  422.731  424.326  437.3262  429.3845  433.179  622.517   100
    stri_detect_regex(x, "^a") 2734.971 2759.820 2860.8376 2786.6480 2848.744 5441.878   100
> x <- stri_paste(sample(letters, 10000, replace=TRUE), sample(c(" ", 0:9), 10000, replace=TRUE))
> microbenchmark::microbenchmark(stri_endswith_charclass(x, "\\p{Wspace}"), stri_detect_regex(x, "\\p{Wspace}$"))
Unit: microseconds
                                        expr      min        lq      mean   median        uq      max neval
 stri_endswith_charclass(x, "\\\\p{Wspace}")  598.753  604.3275  620.1813  607.586  615.2145  928.240   100
      stri_detect_regex(x, "\\\\p{Wspace}$") 2471.734 2492.3410 2574.7461 2518.105 2603.1665 5181.951   100
Owner

gagolews commented Oct 31, 2014

Some benchmarks:

x <- stri_paste(sample(letters, 10000, replace=TRUE), sample(0:9, 10000, replace=TRUE))
> microbenchmark::microbenchmark(stri_startswith_fixed(x, "a"), stri_detect_regex(x, "^a"))
Unit: microseconds
                          expr      min       lq      mean    median       uq      max neval
 stri_startswith_fixed(x, "a")  422.731  424.326  437.3262  429.3845  433.179  622.517   100
    stri_detect_regex(x, "^a") 2734.971 2759.820 2860.8376 2786.6480 2848.744 5441.878   100
> x <- stri_paste(sample(letters, 10000, replace=TRUE), sample(c(" ", 0:9), 10000, replace=TRUE))
> microbenchmark::microbenchmark(stri_endswith_charclass(x, "\\p{Wspace}"), stri_detect_regex(x, "\\p{Wspace}$"))
Unit: microseconds
                                        expr      min        lq      mean   median        uq      max neval
 stri_endswith_charclass(x, "\\\\p{Wspace}")  598.753  604.3275  620.1813  607.586  615.2145  928.240   100
      stri_detect_regex(x, "\\\\p{Wspace}$") 2471.734 2492.3410 2574.7461 2518.105 2603.1665 5181.951   100
@gagolews

This comment has been minimized.

Show comment
Hide comment
@gagolews

gagolews Oct 31, 2014

Owner

A little performance improvement now:

> x <- stri_paste(sample(letters, 10000, replace=TRUE), sample(0:9, 10000, replace=TRUE))
> microbenchmark::microbenchmark(stri_startswith_fixed(x, "a"), stri_detect_regex(x, "^a"))
Unit: microseconds
                          expr      min       lq      mean    median       uq      max neval
 stri_startswith_fixed(x, "a")  349.881  353.397  394.3754  357.9515  364.401 3289.742   100
    stri_detect_regex(x, "^a") 2746.516 2794.945 2881.3577 2835.2630 2887.126 6248.495   100
> x <- stri_paste(sample(letters, 10000, replace=TRUE), sample(c(" ", 0:9), 10000, replace=TRUE))
> microbenchmark::microbenchmark(stri_endswith_charclass(x, "\\p{Wspace}"), stri_detect_regex(x, "\\p{Wspace}$"))
Unit: microseconds
                                        expr      min        lq      mean    median       uq      max neval
 stri_endswith_charclass(x, "\\\\p{Wspace}")  460.047  465.6625  496.2252  471.5175  476.504 2393.366   100
      stri_detect_regex(x, "\\\\p{Wspace}$") 2456.307 2471.2835 2527.8533 2497.8940 2559.771 3020.136   100
Owner

gagolews commented Oct 31, 2014

A little performance improvement now:

> x <- stri_paste(sample(letters, 10000, replace=TRUE), sample(0:9, 10000, replace=TRUE))
> microbenchmark::microbenchmark(stri_startswith_fixed(x, "a"), stri_detect_regex(x, "^a"))
Unit: microseconds
                          expr      min       lq      mean    median       uq      max neval
 stri_startswith_fixed(x, "a")  349.881  353.397  394.3754  357.9515  364.401 3289.742   100
    stri_detect_regex(x, "^a") 2746.516 2794.945 2881.3577 2835.2630 2887.126 6248.495   100
> x <- stri_paste(sample(letters, 10000, replace=TRUE), sample(c(" ", 0:9), 10000, replace=TRUE))
> microbenchmark::microbenchmark(stri_endswith_charclass(x, "\\p{Wspace}"), stri_detect_regex(x, "\\p{Wspace}$"))
Unit: microseconds
                                        expr      min        lq      mean    median       uq      max neval
 stri_endswith_charclass(x, "\\\\p{Wspace}")  460.047  465.6625  496.2252  471.5175  476.504 2393.366   100
      stri_detect_regex(x, "\\\\p{Wspace}$") 2456.307 2471.2835 2527.8533 2497.8940 2559.771 3020.136   100
@gagolews

This comment has been minimized.

Show comment
Hide comment
@gagolews

gagolews Oct 31, 2014

Owner

stri_endswith_coll -- this is tricky!

> (pat <- "\u0635\u0644\u0649 \u0627\u0644\u0644\u0647 \u0639\u0644\u064a\u0647 \u0648\u0633\u0644\u0645XYZ")
[1] "صلى الله عليه وسلمXYZ"
> "\ufdfaXYZ"
[1] "ﷺXYZ"
> stri_locate_last_coll("\ufdfa\ufdfa\ufdfaXYZ", pat, stri_opts_collator(strength = 1))
     start end
[1,]     3   6
> stri_locate_last_coll("\ufdfaXYZ\ufdfaXYZ\ufdfaXYZ", pat, stri_opts_collator(strength = 1))
     start end
[1,]     9  12
Owner

gagolews commented Oct 31, 2014

stri_endswith_coll -- this is tricky!

> (pat <- "\u0635\u0644\u0649 \u0627\u0644\u0644\u0647 \u0639\u0644\u064a\u0647 \u0648\u0633\u0644\u0645XYZ")
[1] "صلى الله عليه وسلمXYZ"
> "\ufdfaXYZ"
[1] "ﷺXYZ"
> stri_locate_last_coll("\ufdfa\ufdfa\ufdfaXYZ", pat, stri_opts_collator(strength = 1))
     start end
[1,]     3   6
> stri_locate_last_coll("\ufdfaXYZ\ufdfaXYZ\ufdfaXYZ", pat, stri_opts_collator(strength = 1))
     start end
[1,]     9  12

gagolews added a commit that referenced this issue Nov 1, 2014

@gagolews gagolews closed this Nov 1, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment