New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use lookup table and loop unrolling to optimize integer to ascii convertion #9

Merged
merged 5 commits into from Dec 1, 2014

Conversation

Projects
None yet
2 participants
@imasahiro
Contributor

imasahiro commented Nov 28, 2014

This code is about 2 times faster than original on Haswell for ipv4addr.c.

Original idea of this code was taken from http://www.slideshare.net/andreialexandrescu1/three-optimization-tips-for-c-15708507

$ cat /proc/cpuinfo | grep model | tail -1
model name      : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

Original:

$ git co master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
$ ./bin/qrintf-gcc -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.22s user 0.00s system 99% cpu 0.219 total

Optimized version:

$ git co optimize_itoa
Switched to branch 'optimize_itoa'
Your branch is up-to-date with 'imasahiro/optimize_itoa'.
$ ./bin/qrintf-gcc -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.12s user 0.00s system 99% cpu 0.120 total
use lookup table, loop unrolle to optimize itoa (integer to ascii) co…
…nvertion.

original code was taken from http://www.slideshare.net/andreialexandrescu1/three-optimization-tips-for-c-15708507

benchmark code:
  unsigned i, addr=123456790;
  for (i = 0; i != 100000000; ++i) {
      sprintf(buf, "%u.%u.%u.%u",
              (addr >> 24) & 0xff,
              (addr >> 16) & 0xff,
              (addr >> 8) & 0xff,
              addr & 0xff);
  }
benchmark result:
Execution time (sec)
    original : 0.23
    built : 0.11

Speedup ratio: compare with the result of
`original' (greater is better)
built : 2.09
@kazuho

This comment has been minimized.

Show comment
Hide comment
@kazuho

kazuho Dec 1, 2014

Member

Thank you for the PR! It looks great.

Before merging it to master, I would like to request some modifications.

  • ilog10ull
    • please rename the function; all functions should be prefixed with _qrintf_
    • please add asserts that check the size of long long is 64 bits
    • use unsigned long long instead of uint64_t
  • _qrintf_int_core
    • please accept a pointer next to the last char instead of a pointer to the last char, and use *--p instead of *p--
  • the PR does not perform well on 32-bit architecture, please create long versions of ilog10ull, _qrintf_int_core and use them when stringifying the smaller types
Member

kazuho commented Dec 1, 2014

Thank you for the PR! It looks great.

Before merging it to master, I would like to request some modifications.

  • ilog10ull
    • please rename the function; all functions should be prefixed with _qrintf_
    • please add asserts that check the size of long long is 64 bits
    • use unsigned long long instead of uint64_t
  • _qrintf_int_core
    • please accept a pointer next to the last char instead of a pointer to the last char, and use *--p instead of *p--
  • the PR does not perform well on 32-bit architecture, please create long versions of ilog10ull, _qrintf_int_core and use them when stringifying the smaller types
@imasahiro

This comment has been minimized.

Show comment
Hide comment
@imasahiro

imasahiro Dec 1, 2014

Contributor

@kazuho
Thanks for the feedback. I updated the branch with the suggested changes:

  • replace ilog10ull(uint64_t) to _qrintf_ilog10ull(unsigned long long) (6d84553)
  • fix _qrintf_int_core. This api now accept a pointer next to the last char and use *--p (7148ea6)

And I fixed a bug which cause tests failure on 32bit architecture. (b7cb770)
If we still need to add long versions of _qrintf_ilog10ull, _qrintf_int_core, I will continue to work to implement these apis.

Contributor

imasahiro commented Dec 1, 2014

@kazuho
Thanks for the feedback. I updated the branch with the suggested changes:

  • replace ilog10ull(uint64_t) to _qrintf_ilog10ull(unsigned long long) (6d84553)
  • fix _qrintf_int_core. This api now accept a pointer next to the last char and use *--p (7148ea6)

And I fixed a bug which cause tests failure on 32bit architecture. (b7cb770)
If we still need to add long versions of _qrintf_ilog10ull, _qrintf_int_core, I will continue to work to implement these apis.

@kazuho

This comment has been minimized.

Show comment
Hide comment
@kazuho

kazuho Dec 1, 2014

Member

Thank you for the changes.

If we still need to add long versions of _qrintf_ilog10ull, _qrintf_int_core, I will continue to work to implement these apis.

It would be great if you could implement them. The reason I ask for them is because I do see performance degradation compared to master on 32-bit architecture. Your proposed optimization should work fine for both 32-bit and 64-bit platforms, and I am hoping that it would be implemented as such.

$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
$ bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real    0m0.352s
user    0m0.348s
sys 0m0.001s
$ git checkout -
Switched to branch 'pr/9'
Your branch is up-to-date with 'origin/pr/9'.
$ bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real    0m0.509s
user    0m0.504s
sys 0m0.003s
Member

kazuho commented Dec 1, 2014

Thank you for the changes.

If we still need to add long versions of _qrintf_ilog10ull, _qrintf_int_core, I will continue to work to implement these apis.

It would be great if you could implement them. The reason I ask for them is because I do see performance degradation compared to master on 32-bit architecture. Your proposed optimization should work fine for both 32-bit and 64-bit platforms, and I am hoping that it would be implemented as such.

$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
$ bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real    0m0.352s
user    0m0.348s
sys 0m0.001s
$ git checkout -
Switched to branch 'pr/9'
Your branch is up-to-date with 'origin/pr/9'.
$ bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real    0m0.509s
user    0m0.504s
sys 0m0.003s
@imasahiro

This comment has been minimized.

Show comment
Hide comment
@imasahiro

imasahiro Dec 1, 2014

Contributor

@kazuho
Thank you for your comment.
Performance degradations caused by this PR is not acceptable for me. So I'll try to fix it.

Contributor

imasahiro commented Dec 1, 2014

@kazuho
Thank you for your comment.
Performance degradations caused by this PR is not acceptable for me. So I'll try to fix it.

fix perfomance degradation on 32-bit architecture
Implement _qrintf_(chk|nck)_long_core, _qrintf_long_core, _qrintf_ilog10u32, _qrintf_ilog10ul.
Rename _qrintf_(chk|nck)_int_core to _qrintf_(chk|nck)_long_long_core.
@imasahiro

This comment has been minimized.

Show comment
Hide comment
@imasahiro

imasahiro Dec 1, 2014

Contributor

I've revisited this and I think the performance degradation was removed.
@kazuho, Could you check it?

$ git co master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

$./bin/qrintf-gcc  -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.22s user 0.00s system 99% cpu 0.219 total

$ ./bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.27s user 0.00s system 99% cpu 0.268 total
$ git co optimize_itoa
Switched to branch 'optimize_itoa'
Your branch is up-to-date with 'imasahiro/optimize_itoa'.

$./bin/qrintf-gcc  -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.12s user 0.00s system 99% cpu 0.122 total

$ ./bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.15s user 0.00s system 99% cpu 0.154 total
Contributor

imasahiro commented Dec 1, 2014

I've revisited this and I think the performance degradation was removed.
@kazuho, Could you check it?

$ git co master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

$./bin/qrintf-gcc  -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.22s user 0.00s system 99% cpu 0.219 total

$ ./bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.27s user 0.00s system 99% cpu 0.268 total
$ git co optimize_itoa
Switched to branch 'optimize_itoa'
Your branch is up-to-date with 'imasahiro/optimize_itoa'.

$./bin/qrintf-gcc  -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.12s user 0.00s system 99% cpu 0.122 total

$ ./bin/qrintf-gcc -m32 -O3 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210
./a.out 1234567890  0.15s user 0.00s system 99% cpu 0.154 total

kazuho added a commit that referenced this pull request Dec 1, 2014

Merge pull request #9 from imasahiro/optimize_itoa
Use lookup table and loop unrolling to optimize integer to ascii convertion

@kazuho kazuho merged commit 3ba87ca into h2o:master Dec 1, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@kazuho

This comment has been minimized.

Show comment
Hide comment
@kazuho

kazuho Dec 1, 2014

Member

Thank you for the hard work! Merged to master.

Member

kazuho commented Dec 1, 2014

Thank you for the hard work! Merged to master.

@imasahiro imasahiro deleted the imasahiro:optimize_itoa branch Dec 1, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment