-
-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new string reversal procedure in collects/racket/string.rkt #3552
Conversation
Added string reversal function!
Thank you for offering a contribution to Racket! Here are a few thoughts on your code:
I think all but the third is fixable, but I don't see how to fix the third. |
I'm somewhat familiar with orders of growth, but I'm not sure how to tell if a procedure is actually tail recursive. I'm using https://mitpress.mit.edu/sites/default/files/sicp/full-text/sicp/book/node15.html as my reference.
Thank you for providing insightful feedback! |
My larger concern with this API is that string reversal is not necessarily a well-defined operation in general, particularly character by character as done here. This article has a discussion of some of the problems you can run into: https://mathiasbynens.be/notes/javascript-unicode as well as a link to code that handles at least most of them properly. |
The situation is a bit simpler for Racket, though, right? Since it doesn't allow code points to be specified as surrogate pairs? Still have the combining character issue, though. |
Yes. The algorithm described here for Rust is probably sufficient: https://github.com/mbrubeck/unicode-reverse#algorithm |
[Edit] Sorry, my mistake: the reader accepts surrogate pairs in its escape syntax but the pair is represented by a single code point nonetheless. |
@koratkar FWIW, your algorithm is tail-recursive, but it's also O(n^2). The |
@samth @97jaz Not sure if I missed anything, but @koratkar's algorithm seems to work correctly already? Here are some examples from https://mathiasbynens.be/notes/javascript-unicode:
Reversing character by character is fine if the primitive operations ( |
There is already a I think Sam's point is that reversing the codepoints can give nonsensical results if the string contains grapheme clusters that take multiple codepoints. Decomposed accented characters have that property:
And some grapheme clusters do not have single codepoint representations. (I learned about grapheme clusters from this post: https://hsivonen.fi/string-length/.) For example, if you reverse the Austalian (AU) flag, you get the Ukraine (UA) flag, because flags are represented as two adjacent regional indicator symbol codepoints. |
I guess there isn't a good reason to push a string reversal function (it might break other's code, and no one really wants this, save for algorithms problems), but I'm still really interested in the code part.
|
@koratkar One way would be to transform the string into a list of characters via (define string-reverse (compose list->string reverse string->list)) The above method is linear, but it might not be the most efficient approach, particularly for smaller strings where constant overhead dominates the running time. (It makes three passes over the contents of the string and creates two intermediate lists.) Possibly the fastest approach is to create a string of equal length to the original, then copy the characters from the original string into the new string in reverse order. The standard library has a function, (define (string-reverse original-string)
(define length (string-length original-string))
(build-string length
(λ (index)
(string-ref original-string (- length index 1))))) Without using (define (string-reverse original-string)
(define length (string-length original-string))
(define new-string (make-string length))
(for ([c (in-string original-string)]
[i (in-range (sub1 length) -1 -1)])
(string-set! new-string i c))
new-string) |
Wow, that's really interesting! |
@koratkar do you still want to continue refining the PR? If not, feel free to close it. |
Yes, I think it's time to close this. Thanks everybody for all the help and insights! |
No description provided.