Optimize matrix multiplication cache-friendliness

In the matrix multiplication code, noticed that the last dimension of looping, `k`, does not match the last dimension of memory access, `j`:

https://github.com/kth-competitive-programming/kactl/blob/ba85dcdfb37f0425e6102f21bfad239dcf83e5a6/content/data-structures/Matrix.h#L18-L23


I believe this can be made more cache-friendly simply by swapping order of summation so that memory accesses are linear.
For example, here is a one-change that speeds up matrix multiplication by 50% locally: 

```cpp
 	rep(i,0,N) rep(j,0,N) 
 		rep(k,0,N) a.d[i][k] += d[i][j] * m.d[j][k];
```


	M operator*(const M& m) const {
	M a;
	rep(i,0,N) rep(j,0,N)
	rep(k,0,N) a.d[i][j] += d[i][k]*m.d[k][j];
	return a;
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize matrix multiplication cache-friendliness #306

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize matrix multiplication cache-friendliness #306

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions