-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strings,bytes: ToLower(), ToUpper() by using bitwise operator #64361
Comments
@ksw2000 I don't see in what way this is change the api or behavior, if this is purely an internal change you don't need a proposal and can just submit a patch. 🙂 |
Sorry, this is my first time creating an issue for Golang, and I'm not sure what type of issue to use. Can you please tell me what type of issue I should use for an issue like this or could I just pull request directly? I am so sorry for my mistake. I will make sure to learn how to use it correctly next time. |
Taking this out of the proposal milestone as it doesn't involve API changes. |
@ksw2000 See golang proposal for details. |
@panjf2000 |
I think you should, when you submit your CL don't forget to post benchmark results using benchstat. 🙂 |
Leave this issue and add a line |
I'm not sure I buy the premise that bitwise operators are any faster than adding or subtracting. |
After testing, I found that the improvement is negligible. I will close this issue. Thank you for the discussion. |
This makes the code branchless which helps a auto vectorizers, so this does not apply to gc. |
@joroppo Branchless would be great if you can do it. How would you do that here, exactly? |
你好!我是JYQin!您的邮件已经收到!!!
|
I'm fairly confident branch-less would be slower here since it would need to generate masks. But SIMD with masks would be faster than SISD. |
After trying it to be sure, I'm not able to measure anything. |
In ToUpperby add/sub operation (original method) func ToUpper(s string) string {
isASCII, hasLower := true, false
for i := 0; i < len(s); i++ {
c := s[i]
if c >= utf8.RuneSelf {
isASCII = false
break
}
hasLower = hasLower || ('a' <= c && c <= 'z')
}
if isASCII { // optimize for ASCII-only strings.
if !hasLower {
return s
}
var (
b Builder
pos int
)
b.Grow(len(s))
for i := 0; i < len(s); i++ {
c := s[i]
if 'a' <= c && c <= 'z' {
c -= 'a' - 'A'
if pos < i {
b.WriteString(s[pos:i])
}
b.WriteByte(c)
pos = i + 1
}
}
if pos < len(s) {
b.WriteString(s[pos:])
}
return b.String()
}
return Map(unicode.ToUpper, s)
} by binary operation // ToUpper returns s with all Unicode letters mapped to their upper case.
func ToUpper(s string) string {
isASCII, hasLower := true, false
for i := 0; i < len(s); i++ {
c := s[i]
if c >= utf8.RuneSelf {
isASCII = false
break
}
hasLower = hasLower || ('a' <= c && c <= 'z')
}
if isASCII { // optimize for ASCII-only strings.
if !hasLower {
return s
}
var (
b Builder
pos int
)
b.Grow(len(s))
for i := 0; i < len(s); i++ {
c := s[i]
if 'a' <= c && c <= 'z' {
c ^= 0b100000
if pos < i {
b.WriteString(s[pos:i])
}
b.WriteByte(c)
pos = i + 1
}
}
if pos < len(s) {
b.WriteString(s[pos:])
}
return b.String()
}
return Map(unicode.ToUpper, s)
}
ToLowerby add/sub operation (original method) // ToLower returns s with all Unicode letters mapped to their lower case.
func ToLower(s string) string {
isASCII, hasUpper := true, false
for i := 0; i < len(s); i++ {
c := s[i]
if c >= utf8.RuneSelf {
isASCII = false
break
}
hasUpper = hasUpper || ('A' <= c && c <= 'Z')
}
if isASCII { // optimize for ASCII-only strings.
if !hasUpper {
return s
}
var (
b Builder
pos int
)
b.Grow(len(s))
for i := 0; i < len(s); i++ {
c := s[i]
if 'A' <= c && c <= 'Z' {
c += 'a' - 'A'
if pos < i {
b.WriteString(s[pos:i])
}
b.WriteByte(c)
pos = i + 1
}
}
if pos < len(s) {
b.WriteString(s[pos:])
}
return b.String()
}
return Map(unicode.ToLower, s)
} by binary operation // ToLower returns s with all Unicode letters mapped to their lower case.
func ToLower(s string) string {
isASCII, hasUpper := true, false
for i := 0; i < len(s); i++ {
c := s[i]
if c >= utf8.RuneSelf {
isASCII = false
break
}
hasUpper = hasUpper || ('A' <= c && c <= 'Z')
}
if isASCII { // optimize for ASCII-only strings.
if !hasUpper {
return s
}
var (
b Builder
pos int
)
b.Grow(len(s))
for i := 0; i < len(s); i++ {
c := s[i]
if 'A' <= c && c <= 'Z' {
c |= 0b100000
if pos < i {
b.WriteString(s[pos:i])
}
b.WriteByte(c)
pos = i + 1
}
}
if pos < len(s) {
b.WriteString(s[pos:])
}
return b.String()
}
return Map(unicode.ToLower, s)
}
|
When the string consists of only ASCII characters, we can convert lower case to upper case using bitwise operators. Bitwise operators are more efficient than using addition or subtraction.
go/src/strings/strings.go
Lines 616 to 625 in 3e67f46
For example, in Rust, they used bitwise operator to implement the converter.
https://github.com/rust-lang/rust/blob/c387f012b14a3d64e0d580b7ebe65e5325bcf822/library/core/src/num/mod.rs#L576-L579
I propose that we replace the addition and subtraction operators with bitwise operators in the following four places.
go/src/strings/strings.go
Lines 616 to 625 in 3e67f46
go/src/strings/strings.go
Lines 656 to 666 in 3e67f46
go/src/bytes/bytes.go
Lines 638 to 652 in 3e67f46
go/src/bytes/bytes.go
Lines 669 to 682 in 3e67f46
The text was updated successfully, but these errors were encountered: