new string reversal procedure in collects/racket/string.rkt #3552

koratkar · 2020-12-10T22:48:47Z

No description provided.

Added string reversal function!

rfindler · 2020-12-11T00:53:56Z

Thank you for offering a contribution to Racket! Here are a few thoughts on your code:

this is an O(n^2) algorithm and it should be linear.
there are no test cases
probably we cannot afford an export with a generic name like that from this library as it might easily break existing code that also uses that name.

I think all but the third is fixable, but I don't see how to fix the third.

koratkar · 2020-12-11T03:12:12Z

I'm somewhat familiar with orders of growth, but I'm not sure how to tell if a procedure is actually tail recursive. I'm using https://mitpress.mit.edu/sites/default/files/sicp/full-text/sicp/book/node15.html as my reference.
I suppose we could rename it to: (reverse-string!), (string-reverse!) or (string-flip).
However, if there is a better place to put this, I love to put it there instead.
I added the test case:

# lang racket
(define (string-reverse str)
  (define (f x final count)
        (define length (string-length x))
        (if (= count 0) final
            (f (substring x 1 length) (string-append (substring x 0 1) final) (- count 1))))
  (if (nor (string? str) #f)
      "Error: input to (reverse-string) was not a string"
      (f str "" (string-length str))))

Thank you for providing insightful feedback!

samth · 2020-12-11T03:19:40Z

My larger concern with this API is that string reversal is not necessarily a well-defined operation in general, particularly character by character as done here. This article has a discussion of some of the problems you can run into: https://mathiasbynens.be/notes/javascript-unicode as well as a link to code that handles at least most of them properly.

97jaz · 2020-12-11T03:27:47Z

The situation is a bit simpler for Racket, though, right? Since it doesn't allow code points to be specified as surrogate pairs?

Still have the combining character issue, though.

samth · 2020-12-11T03:29:40Z

Yes. The algorithm described here for Rust is probably sufficient: https://github.com/mbrubeck/unicode-reverse#algorithm

97jaz · 2020-12-11T03:31:35Z

Since it doesn't allow code points to be specified as surrogate pairs?

~~Erm, but I fear this is no longer true in CS.~~

[Edit] Sorry, my mistake: the reader accepts surrogate pairs in its escape syntax but the pair is represented by a single code point nonetheless.

sorawee · 2020-12-11T03:52:08Z

@koratkar FWIW, your algorithm is tail-recursive, but it's also O(n^2). The (substring x 1 length) operation take linear time, and you do it n times, so that's O(n^2).

sorawee · 2020-12-11T09:03:08Z

@samth @97jaz Not sure if I missed anything, but @koratkar's algorithm seems to work correctly already? Here are some examples from https://mathiasbynens.be/notes/javascript-unicode:

> (string-reverse "mañana")
"anañam"
> (string-reverse "💩")
"💩"

Reversing character by character is fine if the primitive operations (substring, etc.) work correctly. The issue with JavaScript is that they don't work correctly. But for Racket they apparently do?

rmculpepper · 2020-12-11T10:29:35Z

There is already a string-reverse function in srfi/13. It reverses the codepoints of the string (and the docs say so).

I think Sam's point is that reversing the codepoints can give nonsensical results if the string contains grapheme clusters that take multiple codepoints. Decomposed accented characters have that property:

(string-normalize-nfc (string-reverse (string-normalize-nfd "áe"))) = "éa"

And some grapheme clusters do not have single codepoint representations. (I learned about grapheme clusters from this post: https://hsivonen.fi/string-length/.) For example, if you reverse the Austalian (AU) flag, you get the Ukraine (UA) flag, because flags are represented as two adjacent regional indicator symbol codepoints.

koratkar · 2020-12-11T17:48:31Z

I guess there isn't a good reason to push a string reversal function (it might break other's code, and no one really wants this, save for algorithms problems), but I'm still really interested in the code part.
How could I make this a linear procedure?

@koratkar FWIW, your algorithm is tail-recursive, but it's also O(n^2). The (substring x 1 length) operation take linear time, and you do it n times, so that's O(n^2).

97jaz · 2020-12-11T19:08:09Z

How could I make this a linear procedure?

@koratkar One way would be to transform the string into a list of characters via string->list, which is linear. Then reverse the list, which is also linear, then transform the list back into a string with list->string. This could be defined as a simple composition of the above functions, like:

(define string-reverse (compose list->string reverse string->list))

The above method is linear, but it might not be the most efficient approach, particularly for smaller strings where constant overhead dominates the running time. (It makes three passes over the contents of the string and creates two intermediate lists.)

Possibly the fastest approach is to create a string of equal length to the original, then copy the characters from the original string into the new string in reverse order. The standard library has a function, build-string, which facilitates this approach. (It takes care of creating the new string for you; you just have to supply a function that produces a character for each index in the new string.)

(define (string-reverse original-string)
  (define length (string-length original-string))
  (build-string length
                (λ (index)
                  (string-ref original-string (- length index 1)))))

Without using build-string, it might look like:

(define (string-reverse original-string)
  (define length (string-length original-string))
  (define new-string (make-string length))

  (for ([c (in-string original-string)]
        [i (in-range (sub1 length) -1 -1)])
    (string-set! new-string i c))

  new-string)

koratkar · 2020-12-12T04:50:41Z

Wow, that's really interesting!
Thanks for the help!

sorawee · 2020-12-12T05:32:47Z

@koratkar do you still want to continue refining the PR? If not, feel free to close it.

koratkar · 2020-12-12T16:37:31Z

Yes, I think it's time to close this. Thanks everybody for all the help and insights!

koratkar added 2 commits December 10, 2020 16:44

Added string reversal function!

27233d2

Merge pull request #1 from koratkar/string-reversal

a168b48

Added string reversal function!

koratkar closed this Dec 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new string reversal procedure in collects/racket/string.rkt #3552

new string reversal procedure in collects/racket/string.rkt #3552

koratkar commented Dec 10, 2020

rfindler commented Dec 11, 2020

koratkar commented Dec 11, 2020

samth commented Dec 11, 2020

97jaz commented Dec 11, 2020

samth commented Dec 11, 2020

97jaz commented Dec 11, 2020 •

edited

sorawee commented Dec 11, 2020

sorawee commented Dec 11, 2020

rmculpepper commented Dec 11, 2020

koratkar commented Dec 11, 2020

97jaz commented Dec 11, 2020 •

edited

koratkar commented Dec 12, 2020

sorawee commented Dec 12, 2020

koratkar commented Dec 12, 2020

new string reversal procedure in collects/racket/string.rkt #3552

new string reversal procedure in collects/racket/string.rkt #3552

Conversation

koratkar commented Dec 10, 2020

rfindler commented Dec 11, 2020

koratkar commented Dec 11, 2020

samth commented Dec 11, 2020

97jaz commented Dec 11, 2020

samth commented Dec 11, 2020

97jaz commented Dec 11, 2020 • edited

sorawee commented Dec 11, 2020

sorawee commented Dec 11, 2020

rmculpepper commented Dec 11, 2020

koratkar commented Dec 11, 2020

97jaz commented Dec 11, 2020 • edited

koratkar commented Dec 12, 2020

sorawee commented Dec 12, 2020

koratkar commented Dec 12, 2020

97jaz commented Dec 11, 2020 •

edited

97jaz commented Dec 11, 2020 •

edited