Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new string reversal procedure in collects/racket/string.rkt #3552

Closed
wants to merge 2 commits into from

Conversation

koratkar
Copy link

No description provided.

@rfindler
Copy link
Member

Thank you for offering a contribution to Racket! Here are a few thoughts on your code:

  • this is an O(n^2) algorithm and it should be linear.

  • there are no test cases

  • probably we cannot afford an export with a generic name like that from this library as it might easily break existing code that also uses that name.

I think all but the third is fixable, but I don't see how to fix the third.

@koratkar
Copy link
Author

I'm somewhat familiar with orders of growth, but I'm not sure how to tell if a procedure is actually tail recursive. I'm using https://mitpress.mit.edu/sites/default/files/sicp/full-text/sicp/book/node15.html as my reference.
I suppose we could rename it to: (reverse-string!), (string-reverse!) or (string-flip).
However, if there is a better place to put this, I love to put it there instead.
I added the test case:

# lang racket
(define (string-reverse str)
  (define (f x final count)
        (define length (string-length x))
        (if (= count 0) final
            (f (substring x 1 length) (string-append (substring x 0 1) final) (- count 1))))
  (if (nor (string? str) #f)
      "Error: input to (reverse-string) was not a string"
      (f str "" (string-length str))))

Thank you for providing insightful feedback!

@samth
Copy link
Sponsor Member

samth commented Dec 11, 2020

My larger concern with this API is that string reversal is not necessarily a well-defined operation in general, particularly character by character as done here. This article has a discussion of some of the problems you can run into: https://mathiasbynens.be/notes/javascript-unicode as well as a link to code that handles at least most of them properly.

@97jaz
Copy link
Contributor

97jaz commented Dec 11, 2020

The situation is a bit simpler for Racket, though, right? Since it doesn't allow code points to be specified as surrogate pairs?

Still have the combining character issue, though.

@samth
Copy link
Sponsor Member

samth commented Dec 11, 2020

Yes. The algorithm described here for Rust is probably sufficient: https://github.com/mbrubeck/unicode-reverse#algorithm

@97jaz
Copy link
Contributor

97jaz commented Dec 11, 2020

Since it doesn't allow code points to be specified as surrogate pairs?

Erm, but I fear this is no longer true in CS.

[Edit] Sorry, my mistake: the reader accepts surrogate pairs in its escape syntax but the pair is represented by a single code point nonetheless.

@sorawee
Copy link
Collaborator

sorawee commented Dec 11, 2020

@koratkar FWIW, your algorithm is tail-recursive, but it's also O(n^2). The (substring x 1 length) operation take linear time, and you do it n times, so that's O(n^2).

@sorawee
Copy link
Collaborator

sorawee commented Dec 11, 2020

@samth @97jaz Not sure if I missed anything, but @koratkar's algorithm seems to work correctly already? Here are some examples from https://mathiasbynens.be/notes/javascript-unicode:

> (string-reverse "mañana")
"anañam"
> (string-reverse "💩")
"💩"

Reversing character by character is fine if the primitive operations (substring, etc.) work correctly. The issue with JavaScript is that they don't work correctly. But for Racket they apparently do?

@rmculpepper
Copy link
Collaborator

There is already a string-reverse function in srfi/13. It reverses the codepoints of the string (and the docs say so).

I think Sam's point is that reversing the codepoints can give nonsensical results if the string contains grapheme clusters that take multiple codepoints. Decomposed accented characters have that property:

(string-normalize-nfc (string-reverse (string-normalize-nfd "áe"))) = "éa"

And some grapheme clusters do not have single codepoint representations. (I learned about grapheme clusters from this post: https://hsivonen.fi/string-length/.) For example, if you reverse the Austalian (AU) flag, you get the Ukraine (UA) flag, because flags are represented as two adjacent regional indicator symbol codepoints.

@koratkar
Copy link
Author

I guess there isn't a good reason to push a string reversal function (it might break other's code, and no one really wants this, save for algorithms problems), but I'm still really interested in the code part.
How could I make this a linear procedure?

@koratkar FWIW, your algorithm is tail-recursive, but it's also O(n^2). The (substring x 1 length) operation take linear time, and you do it n times, so that's O(n^2).

@97jaz
Copy link
Contributor

97jaz commented Dec 11, 2020

How could I make this a linear procedure?

@koratkar One way would be to transform the string into a list of characters via string->list, which is linear. Then reverse the list, which is also linear, then transform the list back into a string with list->string. This could be defined as a simple composition of the above functions, like:

(define string-reverse (compose list->string reverse string->list))

The above method is linear, but it might not be the most efficient approach, particularly for smaller strings where constant overhead dominates the running time. (It makes three passes over the contents of the string and creates two intermediate lists.)

Possibly the fastest approach is to create a string of equal length to the original, then copy the characters from the original string into the new string in reverse order. The standard library has a function, build-string, which facilitates this approach. (It takes care of creating the new string for you; you just have to supply a function that produces a character for each index in the new string.)

(define (string-reverse original-string)
  (define length (string-length original-string))
  (build-string length
                (λ (index)
                  (string-ref original-string (- length index 1)))))

Without using build-string, it might look like:

(define (string-reverse original-string)
  (define length (string-length original-string))
  (define new-string (make-string length))

  (for ([c (in-string original-string)]
        [i (in-range (sub1 length) -1 -1)])
    (string-set! new-string i c))

  new-string)

@koratkar
Copy link
Author

Wow, that's really interesting!
Thanks for the help!

@sorawee
Copy link
Collaborator

sorawee commented Dec 12, 2020

@koratkar do you still want to continue refining the PR? If not, feel free to close it.

@koratkar
Copy link
Author

Yes, I think it's time to close this. Thanks everybody for all the help and insights!

@koratkar koratkar closed this Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants