Skip to content

Commit

Permalink
time_bench_memset: playing with alternative clearings
Browse files Browse the repository at this point in the history
On my Skylake CPU (i7-6700K) I cannot get past 32 cycles per 256 bytes
which is 8 Bytes per cycle, that is good, but I was expecting 16 Bytes per cycle.

I found doc that says Sandy Bridge have 16 Bytes store (per port) to L1 data cache:
 http://www.7-cpu.com/cpu/SandyBridge.html

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
  • Loading branch information
netoptimizer committed Nov 17, 2016
1 parent 1eca52d commit 981a7d1
Showing 1 changed file with 49 additions and 0 deletions.
49 changes: 49 additions & 0 deletions kernel/lib/time_bench_memset.c
Original file line number Diff line number Diff line change
Expand Up @@ -640,6 +640,53 @@ static int time_memset_movq_256(struct time_bench_record *rec, void *data)
return loops_cnt;
}

inline static void alternative_clear_movq_256(void *page)
{
int i;

for (i = 0; i < 256/128; i++) {
__asm__ __volatile__(
" movq $0, (%0)\n" //A
" movq $0, 64(%0)\n"
" movq $0, 8(%0)\n" //A
" movq $0, 72(%0)\n"
" movq $0, 16(%0)\n" //A
" movq $0, 80(%0)\n"
" movq $0, 24(%0)\n" //A
" movq $0, 88(%0)\n"
" movq $0, 32(%0)\n" //A
" movq $0, 96(%0)\n"
" movq $0, 40(%0)\n" //A
" movq $0, 104(%0)\n"
" movq $0, 48(%0)\n" //A
" movq $0, 112(%0)\n"
" movq $0, 56(%0)\n" //A
" movq $0, 120(%0)\n"
: : "r" (page) : "memory");
page += 128;
}

}

static int time_alternative_movq_256(struct time_bench_record *rec, void *data)
{
int i;
uint64_t loops_cnt = 0;

time_bench_start(rec);

for (i = 0; i < rec->loops; i++) {
loops_cnt++;
barrier();
alternative_clear_movq_256(global_buf);
barrier();
}

time_bench_stop(rec, loops_cnt);
return loops_cnt;
}


int run_timing_tests(void)
{
uint32_t loops = 10000000;
Expand Down Expand Up @@ -713,6 +760,8 @@ int run_timing_tests(void)
NULL, time_memset_mmx_256);
time_bench_loop(loops, 0, "memset_MOVQ_256",
NULL, time_memset_movq_256);
time_bench_loop(loops, 0, "alternative_MOVQ_256",
NULL, time_alternative_movq_256);

time_bench_loop(loops, 512, "mem_zero_hacks",
NULL, time_mem_zero_hacks);
Expand Down

0 comments on commit 981a7d1

Please sign in to comment.