Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings: refactor logic ToLower #52371

Open
ghost opened this issue Apr 15, 2022 · 17 comments
Open

strings: refactor logic ToLower #52371

ghost opened this issue Apr 15, 2022 · 17 comments
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@ghost
Copy link

ghost commented Apr 15, 2022

hi everyone, I was thinking about the method ToLower in standard library strings then I saw as the pkg Redis implements this idea. Then I identified that is very better than the standard library. Maybe we can improve this package strings.

Let's go to talk about this?

More details below:

e.g

func isLower(s string) bool {
	for i := 0; i < len(s); i++ {
		c := s[i]
		if c >= 'A' && c <= 'Z' {
			return false
		}
	}
	return true
}

func ToLower(s string) string {
	if isLower(s) {
		return s
	}

	b := make([]byte, len(s))
	for i := range b {
		c := s[i]
		if c >= 'A' && c <= 'Z' {
			c += 'a' - 'A'
		}
		b[i] = c
	}
	return BytesToString(b)
}

func BytesToString(b []byte) string {
	return *(*string)(unsafe.Pointer(&b))
}

reference: https://github.com/go-redis/redis/blob/6e4eb2e3acad9575fb1e2c8690a3c1decf2e59e5/internal/util.go#L22

One little detail I believe that we cannot use unsafe, I would use just string(bytes)

//BenchmarkTestToLowerStdLib           4497225               265.1 ns/op            64 B/op          1 allocs/op
func BenchmarkTestToLowerStdLib(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = strings.ToLower(`RENAN BASTOS 93 AOSDAJDJAIDJAIDAJIaidsjjaidijadsjiadjiOOKKO`)
	}
}

//BenchmarkTestToLowerInternalRedis    15342235                77.65 ns/op           64 B/op          1 allocs/op
func BenchmarkTestToLowerInternalRedis(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = ToLower(`RENAN BASTOS 93 AOSDAJDJAIDJAIDAJIaidsjjaidijadsjiadjiOOKKO`)
	}
}

I am sorry for anything but I want to help our loved Go.

@ghost ghost added the Proposal label Apr 15, 2022
@gopherbot gopherbot added this to the Proposal milestone Apr 15, 2022
@ghost
Copy link
Author

ghost commented Apr 15, 2022

I opened this issue using my work account I will follow discuss using @renanbastos93

@ianlancetaylor ianlancetaylor changed the title proposal: affected/package: strings refactor logic ToLower strings: refactor logic ToLower Apr 15, 2022
@ianlancetaylor
Copy link
Contributor

There is no API change here so taking it out of the proposal process.

We don't want to to use unsafe here. We also don't want to use a string conversion, as that will add an allocation. The current code already uses strings.Builder, which effectively handles the unsafe code in a safe way.

Note that your replacement code does not correctly handle UTF-8 encodings that are not ASCII.

It's worth investigating why the current code is slower than your version, and see if we can fix that without introducing unsafe. Perhaps it is the fact that WriteByte calls copyCheck? But I don't really know.

@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels Apr 15, 2022
@ianlancetaylor ianlancetaylor modified the milestones: Proposal, Backlog Apr 15, 2022
@renanbastos93
Copy link

@ianlancetaylor I got it, also I want not to use it unsafely. Thanks for your fast answer.

About WriteByte calls copyCheck: I don't know too. How can I help you?

@renanbastos93
Copy link

I saw code from copyCheck it using unsafe o.O

func (b *Builder) copyCheck() {
	if b.addr == nil {
		// This hack works around a failing of Go's escape analysis
		// that was causing b to escape and be heap allocated.
		// See issue 23382.
		// TODO: once issue 7921 is fixed, this should be reverted to
		// just "b.addr = b".
		b.addr = (*Builder)(noescape(unsafe.Pointer(b)))
	} else if b.addr != b {
		panic("strings: illegal use of non-zero Builder copied by value")
	}
}

The WriteByte uses it

func (b *Builder) WriteByte(c byte) error {
	b.copyCheck()
	b.buf = append(b.buf, c)
	return nil
}

Maybe we should review this approach.

@ianlancetaylor
Copy link
Contributor

We decided to permit strings.Builder to use unsafe to minimize allocations, but we don't want it to leak out to the rest of the strings package.

@renanbastos93
Copy link

Alright, I understood.
I was trying running unit tests local but it raises not allow that we use the package internal. =/

I am analyzing code ToLower, it uses Grow too. This method Grow uses a method named copy.

func (b *Builder) grow(n int) {
	buf := make([]byte, len(b.buf), 2*cap(b.buf)+n)
	copy(buf, b.buf)
	b.buf = buf
}

I am trying to understand all logic to help us.

@elagergren-spideroak
Copy link

I was trying running unit tests local but it raises not allow that we use the package internal. =/

@renanbastos93 clone the Go repo, build the compiler with make.bash, then use that compiler to run the strings unit tests.

@renanbastos93
Copy link

renanbastos93 commented Apr 18, 2022

I was trying running unit tests local but it raises not allow that we use the package internal. =/

@renanbastos93 clone the Go repo, build the compiler with make.bash, then use that compiler to run the strings unit tests.

I tried many times but keep to the problem.

 strings/strings.go:14:2: use of internal package internal/bytealg not allowed

@elagergren-spideroak
Can you help me?

@renanbastos93
Copy link

I was trying running unit tests local but it raises not allow that we use the package internal. =/

@renanbastos93 clone the Go repo, build the compiler with make.bash, then use that compiler to run the strings unit tests.

I tried many times but keep to the problem.

 strings/strings.go:14:2: use of internal package internal/bytealg not allowed

@elagergren-spideroak Can you help me?

I got run unit tests now I am try run bench tests.

@renanbastos93
Copy link

Did someone get to discover something?

@gopherbot
Copy link

Change https://go.dev/cl/423874 mentions this issue: strings: roll back string.Builder to get better performances from ToUpper()/ToLower() with long string

@renanbastos93
Copy link

@gopherbot Awesome, but I don't see an open PR. I saw this link, it is an excellent approach. Maybe we can close to this issue, right?

@renanbastos93
Copy link

@panjf2000 Do you need any help?

@panjf2000
Copy link
Member

@gopherbot Awesome, but I don't see an open PR. I saw this link, it is an excellent approach. Maybe we can close to this issue, right?

In fact, Github repo of Go is just a mirror repo, the actual Git repo of Go is located at https://go.googlesource.com/go, described in https://github.com/golang/go/blob/master/README.md.

Thus, Go team and contributors work in https://go-review.googlesource.com, managing the source code of Go and reviewing PR/CL over there.

@renanbastos93

@panjf2000
Copy link
Member

panjf2000 commented Aug 15, 2022

@panjf2000 Do you need any help?

I've tried several ways to optimize strings.ToLower()/ToUpper(), but it seemed to be unlikely to achieve the approximate improvement as BenchmarkTestToLowerInternalRedis you provided without introducing unsafe.

@renanbastos93
Copy link

@panjf2000 Do you need any help?

I've tried several ways to optimize strings.ToLower()/ToUpper(), but it seemed to be unlikely to achieve the approximate improvement as BenchmarkTestToLowerInternalRedis you provided without introducing unsafe.

@panjf2000 Yeah, I do agree with you dude we need to find a way simple to resolve it safely. Even so, tried many ways too but did not succeed. Maybe, we can think together. We go to create a list of references that help we want to resolve it.

  1. Greater performance
  2. Safely
  3. Using Builder or not
  4. Some algorithm to this (exists?)

@gopherbot Awesome, but I don't see an open PR. I saw this link, it is an excellent approach. Maybe we can close to this issue, right?

In fact, Github repo of Go is just a mirror repo, the actual Git repo of Go is located at https://go.googlesource.com/go, described in https://github.com/golang/go/blob/master/README.md.

Thus, Go team and contributors work in https://go-review.googlesource.com, managing the source code of Go and reviewing PR/CL over there.

@renanbastos93

Thanks for explaining to me

@gopherbot
Copy link

Change https://go.dev/cl/424139 mentions this issue: strings: speed up ToUpper()/ToLower() by batch writing data with Builder

gopherbot pushed a commit that referenced this issue Aug 19, 2022
Updates #52371
Updates CL 423874

name                                                                    old time/op    new time/op    delta
ToUpper/#00-10                                                            2.85ns ± 0%    2.81ns ± 0%   -1.31%  (p=0.000 n=10+10)
ToUpper/ONLYUPPER-10                                                      12.7ns ± 0%    12.5ns ± 0%   -1.35%  (p=0.000 n=10+10)
ToUpper/abc-10                                                            20.9ns ± 1%    20.1ns ± 1%   -3.92%  (p=0.000 n=8+10)
ToUpper/AbC123-10                                                         26.9ns ± 1%    28.5ns ± 0%   +5.78%  (p=0.000 n=9+9)
ToUpper/azAZ09_-10                                                        27.4ns ± 1%    24.5ns ± 0%  -10.82%  (p=0.000 n=9+9)
ToUpper/longStrinGwitHmixofsmaLLandcAps-10                                95.9ns ± 1%   100.3ns ± 0%   +4.52%  (p=0.000 n=9+10)
ToUpper/RENAN_BASTOS_93_AOSDAJDJAIDJAIDAJIaidsjjaidijadsjiadjiOOKKO-10     188ns ± 0%     121ns ± 0%  -35.52%  (p=0.000 n=9+10)
ToUpper/longɐstringɐwithɐnonasciiⱯchars-10                                 168ns ± 0%     164ns ± 0%   -2.02%  (p=0.000 n=8+10)
ToUpper/ɐɐɐɐɐ-10                                                           134ns ± 0%     132ns ± 0%   -1.59%  (p=0.000 n=9+10)
ToUpper/a\u0080\U0010ffff-10                                              67.6ns ± 0%    66.4ns ± 0%   -1.73%  (p=0.000 n=10+10)
ToLower/#00-10                                                            2.87ns ± 4%    2.83ns ± 0%   -1.46%  (p=0.004 n=9+9)
ToLower/abc-10                                                            6.35ns ± 0%    6.29ns ± 0%   -0.98%  (p=0.000 n=9+9)
ToLower/AbC123-10                                                         25.6ns ± 1%    28.1ns ± 1%   +9.81%  (p=0.000 n=10+10)
ToLower/azAZ09_-10                                                        29.9ns ± 1%    30.1ns ± 1%   +0.64%  (p=0.023 n=9+10)
ToLower/longStrinGwitHmixofsmaLLandcAps-10                                96.7ns ± 1%    73.0ns ± 0%  -24.50%  (p=0.000 n=10+10)
ToLower/renan_bastos_93_AOSDAJDJAIDJAIDAJIaidsjjaidijadsjiadjiOOKKO-10     177ns ± 0%     118ns ± 0%  -33.61%  (p=0.000 n=7+8)
ToLower/LONGⱯSTRINGⱯWITHⱯNONASCIIⱯCHARS-10                                 159ns ± 1%     158ns ± 0%   -0.97%  (p=0.000 n=8+10)
ToLower/ⱭⱭⱭⱭⱭ-10                                                           125ns ± 1%     123ns ± 1%   -1.67%  (p=0.000 n=9+9)
ToLower/A\u0080\U0010ffff-10                                              68.4ns ± 1%    67.1ns ± 0%   -1.95%  (p=0.000 n=9+9)

name                                                                    old alloc/op   new alloc/op   delta
ToUpper/#00-10                                                             0.00B          0.00B          ~     (all equal)
ToUpper/ONLYUPPER-10                                                       0.00B          0.00B          ~     (all equal)
ToUpper/abc-10                                                             3.00B ± 0%     3.00B ± 0%     ~     (all equal)
ToUpper/AbC123-10                                                          8.00B ± 0%     8.00B ± 0%     ~     (all equal)
ToUpper/azAZ09_-10                                                         8.00B ± 0%     8.00B ± 0%     ~     (all equal)
ToUpper/longStrinGwitHmixofsmaLLandcAps-10                                 32.0B ± 0%     32.0B ± 0%     ~     (all equal)
ToUpper/RENAN_BASTOS_93_AOSDAJDJAIDJAIDAJIaidsjjaidijadsjiadjiOOKKO-10     64.0B ± 0%     64.0B ± 0%     ~     (all equal)
ToUpper/longɐstringɐwithɐnonasciiⱯchars-10                                 48.0B ± 0%     48.0B ± 0%     ~     (all equal)
ToUpper/ɐɐɐɐɐ-10                                                           48.0B ± 0%     48.0B ± 0%     ~     (all equal)
ToUpper/a\u0080\U0010ffff-10                                               16.0B ± 0%     16.0B ± 0%     ~     (all equal)
ToLower/#00-10                                                             0.00B          0.00B          ~     (all equal)
ToLower/abc-10                                                             0.00B          0.00B          ~     (all equal)
ToLower/AbC123-10                                                          8.00B ± 0%     8.00B ± 0%     ~     (all equal)
ToLower/azAZ09_-10                                                         8.00B ± 0%     8.00B ± 0%     ~     (all equal)
ToLower/longStrinGwitHmixofsmaLLandcAps-10                                 32.0B ± 0%     32.0B ± 0%     ~     (all equal)
ToLower/renan_bastos_93_AOSDAJDJAIDJAIDAJIaidsjjaidijadsjiadjiOOKKO-10     64.0B ± 0%     64.0B ± 0%     ~     (all equal)
ToLower/LONGⱯSTRINGⱯWITHⱯNONASCIIⱯCHARS-10                                 48.0B ± 0%     48.0B ± 0%     ~     (all equal)
ToLower/ⱭⱭⱭⱭⱭ-10                                                           24.0B ± 0%     24.0B ± 0%     ~     (all equal)
ToLower/A\u0080\U0010ffff-10                                               16.0B ± 0%     16.0B ± 0%     ~     (all equal)

name                                                                    old allocs/op  new allocs/op  delta
ToUpper/#00-10                                                              0.00           0.00          ~     (all equal)
ToUpper/ONLYUPPER-10                                                        0.00           0.00          ~     (all equal)
ToUpper/abc-10                                                              1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToUpper/AbC123-10                                                           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToUpper/azAZ09_-10                                                          1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToUpper/longStrinGwitHmixofsmaLLandcAps-10                                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToUpper/RENAN_BASTOS_93_AOSDAJDJAIDJAIDAJIaidsjjaidijadsjiadjiOOKKO-10      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToUpper/longɐstringɐwithɐnonasciiⱯchars-10                                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToUpper/ɐɐɐɐɐ-10                                                            2.00 ± 0%      2.00 ± 0%     ~     (all equal)
ToUpper/a\u0080\U0010ffff-10                                                1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToLower/#00-10                                                              0.00           0.00          ~     (all equal)
ToLower/abc-10                                                              0.00           0.00          ~     (all equal)
ToLower/AbC123-10                                                           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToLower/azAZ09_-10                                                          1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToLower/longStrinGwitHmixofsmaLLandcAps-10                                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToLower/renan_bastos_93_AOSDAJDJAIDJAIDAJIaidsjjaidijadsjiadjiOOKKO-10      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToLower/LONGⱯSTRINGⱯWITHⱯNONASCIIⱯCHARS-10                                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToLower/ⱭⱭⱭⱭⱭ-10                                                            1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ToLower/A\u0080\U0010ffff-10                                                1.00 ± 0%      1.00 ± 0%     ~     (all equal)

Change-Id: Id3998ac4bae054ba3e6cf30545a257d5992b48be
Reviewed-on: https://go-review.googlesource.com/c/go/+/424139
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

6 participants