Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Why are String#downcase/upcase etc considered making a string unsafe? #11936
This comment has been minimized.
This comment has been minimized.Show comment Hide comment
Can you guarantee that a given string, after calling upcase/downcase, won't have their bytes changed in a way it would generate unsafe characters for a given encoding? Unless we can safely guarantee that, I don't think we should just go ahead and change it.
Also not clear how
Only theory is that non-ASCII strings may behave differently than we expect, for some values.
Ok, suddenly I get it... by default they would return string classes that you could then call OTHER unsafe methods on... so it's not the call itself that's unsafe, it's the return... but why wouldn't we just call super and wrap that in a new SafeBuffer?
That's what my original question is asking... if the calls themselves can't really "break" the string then why aren't they returning a nicely modified SafeBuffer?
That was my only thought - but it seems like a stretch. And would you say the same for strip/lstrip/rstrip?
I can understand why removing whitespace from a string could be cause for making a string unsafe depending on what the string is used for. I can't really think of an example.
My thought process behind this, is that with upcase, downcase, and capitalize etc. You're not changing the actual content of the string. would become but that doesn't change the fact it's still the same (for all intensive purposes).
But I would consider " " to be different from "". Again I don't have an example of why it would really matter.
Yes there's a technical difference in that a and A are different characters, but..
I may be missing something large here so feel free to educate me.