[SLP] gcc generate better code than clang base on stride offset

* test case, see https://gcc.godbolt.org/z/9E5beqnWW
```c
void foo ( float *restrict  fi, real *restrict  f, int ci) {
  
    /* Add accumulated i-forces to the force array */
    for (int i = 0; i < UNROLLI; i++) {
         for (int d = 0; d < DIM; d++) {
             f[(ci*UNROLLI+i)*F_STRIDE+d] += fi[i*FI_STRIDE+d];
         }
    }

  return;
}
```
* gcc: SLP
* llvm: unroll