Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized Tokenize function #6

Merged
merged 2 commits into from
May 11, 2023
Merged

Optimized Tokenize function #6

merged 2 commits into from
May 11, 2023

Conversation

dnnspaul
Copy link
Contributor

It's more like an V2.5, but Go doesn't allow dots within the function names ... so here we are with an V3. Feel free to use these small changes into the existing V2 ... but for better showing of the performance increase, I left it in for now.

~ go test ./... -bench=. -test.benchmem
goos: linux
goarch: amd64
pkg: github.com/emetriq/gourltokenizer/tokenizer
cpu: AMD Ryzen 7 3700X 8-Core Processor             
BenchmarkEscapedURLTokenizerV3-16        1271954               935.8 ns/op           464 B/op          3 allocs/op
BenchmarkURLTokenizerV3-16               2607987               457.4 ns/op           256 B/op          1 allocs/op
BenchmarkURLTokenizerV3Fast-16           4605972               248.3 ns/op           256 B/op          1 allocs/op
BenchmarkEscapedURLTokenizerV2-16        1234116               959.0 ns/op           464 B/op          3 allocs/op
BenchmarkURLTokenizerV2-16               2165749               552.2 ns/op           256 B/op          1 allocs/op
BenchmarkURLTokenizerV2Fast-16           4374403               272.5 ns/op           256 B/op          1 allocs/op
BenchmarkEscapedURLTokenizerV1-16        1246105               963.0 ns/op           464 B/op          3 allocs/op
BenchmarkURLTokenizerV1-16               1668098               718.9 ns/op           272 B/op          2 allocs/op
BenchmarkTokenizerV1-16                  6767202               169.8 ns/op           256 B/op          1 allocs/op
BenchmarkTokenizerV2-16                  5556195               214.9 ns/op           256 B/op          1 allocs/op
BenchmarkTokenizerV3-16                  6243402               186.7 ns/op           256 B/op          1 allocs/op
PASS
ok      github.com/emetriq/gourltokenizer/tokenizer     18.833s

~ go test ./...
ok      github.com/emetriq/gourltokenizer/tokenizer     0.002s

Added an [..]Escaped[..] benchmark, because most performance profits come up, if the given URL doesn't contain any escaped parts. Apart from that, using a byte instead of a rune gives us some performance increase, but results in only accepting the ASCII chars... but since it's already filtering out anything else than a-z, it shouldn't have an impact.

@coveralls
Copy link

coveralls commented May 10, 2023

Pull Request Test Coverage Report for Build 4939090760

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 90 of 99 (90.91%) changed or added relevant lines in 1 file are covered.
  • 67 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.7%) to 3.595%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tokenizer/tokenizer.go 90 99 90.91%
Files with Coverage Reduction New Missed Lines %
tokenizer/tokenizer.go 67 44.3%
Totals Coverage Status
Change from base Build 2656003651: 0.7%
Covered Lines: 111
Relevant Lines: 3088

💛 - Coveralls

@dnnspaul dnnspaul merged commit c31f339 into emetriq:main May 11, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants