Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: memmove executes unaligned accesses on riscv64 #48248

Open
mundaym opened this issue Sep 8, 2021 · 0 comments
Open

runtime: memmove executes unaligned accesses on riscv64 #48248

mundaym opened this issue Sep 8, 2021 · 0 comments
Assignees
Milestone

Comments

@mundaym
Copy link
Member

@mundaym mundaym commented Sep 8, 2021

The performance of memmove when copying more than ~16 bytes of unaligned data is very poor on the HiFive Unmatched. Looking at the code it only attempts to align the source operand before using word-sized load and store operations. This means that stores to the destination operand will be unaligned. On the HiFive Unmatched unaligned accesses result in a trap that is handled by the kernel and so performance is extremely poor (~10x slower than performing a byte-by-byte copy).

Benchmarks:

name                               speed
Memmove/0-4
Memmove/1-4                        37.0MB/s ± 5%
Memmove/2-4                        55.6MB/s ± 8%
Memmove/3-4                        67.2MB/s ± 5%
Memmove/4-4                        85.5MB/s ± 6%
Memmove/5-4                        98.1MB/s ± 6%
Memmove/6-4                         112MB/s ± 3%
Memmove/7-4                         127MB/s ± 4%
Memmove/8-4                         241MB/s ± 3%
Memmove/9-4                         229MB/s ± 7%
Memmove/10-4                        230MB/s ± 9%
Memmove/11-4                        198MB/s ± 5%
Memmove/12-4                        202MB/s ± 3%
Memmove/13-4                        206MB/s ± 4%
Memmove/14-4                        212MB/s ± 3%
Memmove/15-4                        213MB/s ± 6%
Memmove/16-4                        407MB/s ± 4%
Memmove/32-4                        577MB/s ± 3%
Memmove/64-4                        890MB/s ± 4%
Memmove/128-4                      1.28GB/s ± 6%
Memmove/256-4                      1.52GB/s ± 5%
Memmove/512-4                      1.67GB/s ± 2%
Memmove/1024-4                     1.81GB/s ± 2%
Memmove/2048-4                     1.91GB/s ± 1%
Memmove/4096-4                     1.94GB/s ± 1%
MemmoveOverlap/32-4                 485MB/s ± 5%
MemmoveOverlap/64-4                 694MB/s ± 6%
MemmoveOverlap/128-4                899MB/s ± 3%
MemmoveOverlap/256-4               1.06GB/s ± 3%
MemmoveOverlap/512-4               1.18GB/s ± 2%
MemmoveOverlap/1024-4              1.24GB/s ± 1%
MemmoveOverlap/2048-4              1.28GB/s ± 1%
MemmoveOverlap/4096-4              1.30GB/s ± 1%
MemmoveUnalignedDst/0-4
MemmoveUnalignedDst/1-4            31.6MB/s ± 5%
MemmoveUnalignedDst/2-4            54.1MB/s ±12%
MemmoveUnalignedDst/3-4            66.2MB/s ±10%
MemmoveUnalignedDst/4-4            79.0MB/s ± 7%
MemmoveUnalignedDst/5-4            95.3MB/s ± 5%
MemmoveUnalignedDst/6-4             104MB/s ± 7%
MemmoveUnalignedDst/7-4             115MB/s ± 5%
MemmoveUnalignedDst/8-4            11.9MB/s ± 1%
MemmoveUnalignedDst/9-4            13.2MB/s ± 1%
MemmoveUnalignedDst/10-4           14.5MB/s ± 2%
MemmoveUnalignedDst/11-4           16.0MB/s ± 0%
MemmoveUnalignedDst/12-4           17.3MB/s ± 1%
MemmoveUnalignedDst/13-4           18.7MB/s ± 1%
MemmoveUnalignedDst/14-4           20.0MB/s ± 0%
MemmoveUnalignedDst/15-4           21.3MB/s ± 0%
MemmoveUnalignedDst/16-4           12.2MB/s ± 2%
MemmoveUnalignedDst/32-4           12.5MB/s ± 1%
MemmoveUnalignedDst/64-4           12.6MB/s ± 1%
MemmoveUnalignedDst/128-4          12.7MB/s ± 0%
MemmoveUnalignedDst/256-4          12.8MB/s ± 0%
MemmoveUnalignedDst/512-4          12.8MB/s ± 1%
MemmoveUnalignedDst/1024-4         12.8MB/s ± 0%
MemmoveUnalignedDst/2048-4         12.8MB/s ± 1%
MemmoveUnalignedDst/4096-4         12.8MB/s ± 1%
MemmoveUnalignedDstOverlap/32-4    16.2MB/s ± 1%
MemmoveUnalignedDstOverlap/64-4    14.3MB/s ± 0%
MemmoveUnalignedDstOverlap/128-4   13.5MB/s ± 1%
MemmoveUnalignedDstOverlap/256-4   13.2MB/s ± 0%
MemmoveUnalignedDstOverlap/512-4   13.0MB/s ± 0%
MemmoveUnalignedDstOverlap/1024-4  12.9MB/s ± 1%
MemmoveUnalignedDstOverlap/2048-4  12.9MB/s ± 0%
MemmoveUnalignedDstOverlap/4096-4  12.9MB/s ± 0%
MemmoveUnalignedSrc/0-4
MemmoveUnalignedSrc/1-4            30.2MB/s ±10%
MemmoveUnalignedSrc/2-4            54.8MB/s ±15%
MemmoveUnalignedSrc/3-4            66.5MB/s ± 5%
MemmoveUnalignedSrc/4-4            75.5MB/s ± 7%
MemmoveUnalignedSrc/5-4            92.0MB/s ± 6%
MemmoveUnalignedSrc/6-4             100MB/s ± 4%
MemmoveUnalignedSrc/7-4             115MB/s ± 3%
MemmoveUnalignedSrc/8-4             110MB/s ± 4%
MemmoveUnalignedSrc/9-4             114MB/s ± 5%
MemmoveUnalignedSrc/10-4            116MB/s ± 5%
MemmoveUnalignedSrc/11-4            124MB/s ± 4%
MemmoveUnalignedSrc/12-4            127MB/s ± 3%
MemmoveUnalignedSrc/13-4            133MB/s ± 5%
MemmoveUnalignedSrc/14-4            144MB/s ± 4%
MemmoveUnalignedSrc/15-4           21.5MB/s ± 0%
MemmoveUnalignedSrc/16-4           22.4MB/s ± 2%
MemmoveUnalignedSrc/32-4           16.2MB/s ± 1%
MemmoveUnalignedSrc/64-4           14.3MB/s ± 1%
MemmoveUnalignedSrc/128-4          13.6MB/s ± 1%
MemmoveUnalignedSrc/256-4          13.1MB/s ± 1%
MemmoveUnalignedSrc/512-4          13.0MB/s ± 1%
MemmoveUnalignedSrc/1024-4         12.9MB/s ± 1%
MemmoveUnalignedSrc/2048-4         12.8MB/s ± 1%
MemmoveUnalignedSrc/4096-4         12.8MB/s ± 0%
MemmoveUnalignedSrcOverlap/32-4    12.5MB/s ± 0%
MemmoveUnalignedSrcOverlap/64-4    12.7MB/s ± 1%
MemmoveUnalignedSrcOverlap/128-4   12.8MB/s ± 0%
MemmoveUnalignedSrcOverlap/256-4   12.7MB/s ± 1%
MemmoveUnalignedSrcOverlap/512-4   12.8MB/s ± 1%
MemmoveUnalignedSrcOverlap/1024-4  12.8MB/s ± 1%
MemmoveUnalignedSrcOverlap/2048-4  12.8MB/s ± 0%
MemmoveUnalignedSrcOverlap/4096-4  12.8MB/s ± 1%
@mundaym mundaym added this to the Go1.18 milestone Sep 8, 2021
@mundaym mundaym self-assigned this Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant