# Nekbone
Nekbone solves a standard Poisson equation using a conjugate gradient iteration with a simple or spectral element multigrid preconditio
ner on a block or linear geometry. It exposes the principal computational kernel to reveal the essential elements of the algorithmic ar
chitectural coupling that is pertinent to Nek5000.

---
## Parameters
```
Compiler = icc (ICC) 18.0.1 20171018
Build_Flags = -g -O3 -march=native
Run_Parameters =  
```

---
## Scaling
1 Thread 1 Node

---
## Intel Software Development Emulator
| SDE Metrics | Nekbone |
|:-----------|:---:|
| Arithmetic Intensity | 0.03 |
| Bytes per Load Inst | 5.96 |
| Bytes per Store Inst | 6.44 |

---
## Roofline  -  Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
### 1 Threads - 1 - Cores 2300.0 Mhz
|     GB/sec     |  L1 B/W |  L2 B/W |  L3 B/W | DRAM B/W |
|:---------------|:-------:|:-------:|:-------:|:--------:|
|**1 Threads**  | 143.1 |  44.87 | 33.12 |   16.04  |

---
## Experiment Aggregate Metrics

| CPUTIME % | IPC per Core | Loads per Cycle | L1 Hits per Cycle | |
|:---:|:---:|:---:|:---:|:---:|
| 100 | 3.34 | 1.13 | 1.20 |  |
|**L1 Miss Ratio** | **L2 Miss Ratio** | **L3 Miss Ratio** | **L2 B/W Utilized** | **L3 B/W Utilized** |
| 0.70% | 24.12% | 0.28% | 3.48% | 1.99% |

---
#### `mxf10_`

| CPUTIME % | IPC per Core | Loads per Cycle | L1 Hits per Cycle | |
|:---:|:---:|:---:|:---:|:---:|
| 72.3% | 3.54 | 1.12 | 1.14 |  |
|**L1 Miss Ratio** | **L2 Miss Ratio** | **L3 Miss Ratio** | **L2 B/W Utilized** | **L3 B/W Utilized** |
| 0.24% | 16.51% | 0.43% | 1.14% | 0.34% |

```fortran
 211 c-----------------------------------------------------------------------
 212       subroutine mxf10(a,n1,b,n2,c,n3)
 213 c
 214       real a(n1,10),b(10,n3),c(n1,n3)
 215 c
 216       do j=1,n3
 217          do i=1,n1
 218             c(i,j) = a(i,1)*b(1,j)
 219      $             + a(i,2)*b(2,j)
 220      $             + a(i,3)*b(3,j)
 221      $             + a(i,4)*b(4,j)
 222      $             + a(i,5)*b(5,j)
 223      $             + a(i,6)*b(6,j)
 224      $             + a(i,7)*b(7,j)
 225      $             + a(i,8)*b(8,j)
 226      $             + a(i,9)*b(9,j)
 227      $             + a(i,10)*b(10,j)
 228          enddo
 229       enddo
 230       return
 231       end
 232 c-----------------------------------------------------------------------
 ```

---
#### `ax_e_`

| CPUTIME % | IPC per Core | Loads per Cycle | L1 Hits per Cycle | |
|:---:|:---:|:---:|:---:|:---:|
| 83.9% | 3.43 | 1.12 | 1.17 |  |
|**L1 Miss Ratio** | **L2 Miss Ratio** | **L3 Miss Ratio** | **L2 B/W Utilized** | **L3 B/W Utilized** |
| 0.40% | 20.24% | 0.36% | 1.99% | 0.76% |
```fortran
141 c-------------------------------------------------------------------------
142       subroutine ax_e(w,u,g,ur,us,ut,wk) ! Local matrix-vector product
143       include 'SIZE'
144       include 'TOTAL'
145 
146       parameter (lxyz=lx1*ly1*lz1)
147       real ur(lxyz),us(lxyz),ut(lxyz),wk(lxyz)
148       real w(nx1*ny1*nz1),u(nx1*ny1*nz1),g(2*ldim,nx1*ny1*nz1)
149 
150 
151       nxyz = nx1*ny1*nz1
152       n    = nx1-1
153 
154       call local_grad3(ur,us,ut,u,n,dxm1,dxtm1)
155 
156       do i=1,nxyz
157          wr = g(1,i)*ur(i) + g(2,i)*us(i) + g(3,i)*ut(i)
158          ws = g(2,i)*ur(i) + g(4,i)*us(i) + g(5,i)*ut(i)
159          wt = g(3,i)*ur(i) + g(5,i)*us(i) + g(6,i)*ut(i)
160          ur(i) = wr
161          us(i) = ws
162          ut(i) = wt
163       enddo
164 
165       call local_grad3_t(w,ur,us,ut,n,dxm1,dxtm1,wk)
166 
167       return
168       end
169 c-------------------------------------------------------------------------
```

#### `add2s2_`

| CPUTIME % | IPC per Core | Loads per Cycle | L1 Hits per Cycle | |
|:---:|:---:|:---:|:---:|:---:|
| 5.3% | 2.77 | 1.17 | 1.40 |  |
|**L1 Miss Ratio** | **L2 Miss Ratio** | **L3 Miss Ratio** | **L2 B/W Utilized** | **L3 B/W Utilized** |
| 1.62% | 27.90% | 0.16% | 10.43% | 6.95% |
```fortran
 543 c-----------------------------------------------------------------------
 544       subroutine add2s2(a,b,c1,n)
 545       real a(1),b(1)
 546 
 547       DO 100 I=1,N
 548         A(I)=A(I)+C1*B(I)
 549   100 CONTINUE
 550       return
 551       END
 552 
 553 c-----------------------------------------------------------------------
```

 #### `glsc3_`

| CPUTIME % | IPC per Core | Loads per Cycle | L1 Hits per Cycle | |
|:---:|:---:|:---:|:---:|:---:|
| 4.9% | 3.32 | 1.29 | 1.55 |  |
|**L1 Miss Ratio** | **L2 Miss Ratio** | **L3 Miss Ratio** | **L2 B/W Utilized** | **L3 B/W Utilized** |
| 2.18% | 29.36% | 0.13% | 11.01% | 11.09% |
 ```fortran 
 602 C----------------------------------------------------------------------------
 603       function glsc3(a,b,mult,n)
 604 C
 605 C     Perform inner-product in double precision
 606 C
 607       real a(1),b(1),mult(1)
 608       real tmp,work(1)
 609 
 610       tmp = 0.0
 611       do 10 i=1,n
 612          tmp = tmp + a(i)*b(i)*mult(i)
 613  10   continue
 614       call gop(tmp,work,'+  ',1)
 615       glsc3 = tmp
 616       return
 617       end
 618 c-----------------------------------------------------------------------
 ```

#### `add2_`

| CPUTIME % | IPC per Core | Loads per Cycle | L1 Hits per Cycle | |
|:---:|:---:|:---:|:---:|:---:|
| 3.3% | 2.29 | 1.03 | 1.21 |  |
|**L1 Miss Ratio** | **L2 Miss Ratio** | **L3 Miss Ratio** | **L2 B/W Utilized** | **L3 B/W Utilized** |
| 0.39% | 4.53% | 0.21% | 1.86% | 0.21% |
```fortran
 503 c-----------------------------------------------------------------------
 504       subroutine add2(a,b,n)
 505       real a(1),b(1)
 506 
 507 !xbm* unroll (10)
 508       do i=1,n
 509          a(i)=a(i)+b(i)
 510       enddo
 511       return
 512       end
 513 c-----------------------------------------------------------------------
```