Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accidentally quadratic #68

Open
mnieper opened this issue Jul 8, 2023 · 5 comments
Open

Accidentally quadratic #68

mnieper opened this issue Jul 8, 2023 · 5 comments

Comments

@mnieper
Copy link

mnieper commented Jul 8, 2023

The proposed solution for string-split uses string-ref and substring. Both are allowed to be O(n) where n is the length of the string. Thus the solution can be accidentally quadratic on some Schemes. It should not be promoted but rewritten.

(if (not (char-delimiter? (string-ref string b)))

@jcubic
Copy link
Contributor

jcubic commented Jul 8, 2023

You can add a warning to the file about the performance on some schemes.

@lassik
Copy link
Member

lassik commented Jul 8, 2023

Note that the cookbook permits (and maybe even encourages) multiple solutions with different tradeoffs for the same problem.

For many (perhaps most) practical uses cases, the best approach is the one with the simplest code.

For string split, the only portable alternatives I can think of are:

  • Use string ports
  • First convert to bytevector, then split the bytevector, then convert back to strings. (Noting that splitting on bytes is not the same as splitting on chars.)

Both involve messier code, and the performance on small strings on most implementations may be worse.

@mnieper
Copy link
Author

mnieper commented Jul 8, 2023

I guess that the Cookbook is aimed chiefly at beginners or casual users of Scheme. People will copy-and-paste code and not read the fine print (especially when it is a so-called AI copying the code). This is why I opened this issue and why I think it is very important to think twice about the code offered. Scheme has only a few places like this Cookbook, so I think it is important for the Cookbook to get it right.

The best approach is, in fact, string ports, both for scanning and for constructing the strings. The code would not be messier in any way but very straightforward.

If you are worrying about poor performance for small strings then you can either ignore it because the algorithm is fast for small strings in any case or you can add a small dispatch at the top of the procedure so that really small strings are handled differently.

"Simple" code that would not be accidentally quadratic would convert the string to a list and then use list utilities.

@lassik
Copy link
Member

lassik commented Jul 8, 2023

Well, can you write down either or both of those solutions? If the present solution has no advantages compared to them, we can remove it.

@mnieper
Copy link
Author

mnieper commented Jul 8, 2023

Something like this:

(library (string-split)
  (export string-split)
  (import (rnrs))

  (define string-split
    (lambda (delim? s)
      (define who 'string-split)
      (unless (procedure? delim?)
        (assertion-violation who "invalid delim? argument" delim?))
      (unless (string? s)
        (assertion-violation who "invalid string argument" s))
      (let ([p (open-string-input-port s)])
        (define peek-char (lambda () (lookahead-char p)))
        (define read-char (lambda () (get-char p)))
        (define skip-delimiter!
          (lambda ()
            (let ([c (peek-char)])
              (when (and (not (eof-object? c))
                         (delim? c))
                (read-char)
                (skip-delimiter!)))))
        (let f ()
          (skip-delimiter!)
          (let ([c (read-char)])
            (if (eof-object? c)
                '()
                (let-values ([(q extract) (open-string-output-port)])
                  (put-char q c)
                  (let g ()
                    (let ([c (read-char)])
                      (cond
                       [(eof-object? c)
                        (list (extract))]
                       [(delim? c)
                        (cons (extract) (f))]
                       [else
                        (put-char q c)
                        (g)])))))))))))

This is R6RS code, but it can be easily translated to R7RS. Note that the original solution using string-ref is less a problem for R6RS because R6RS should implement string-ref in O(1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants