# Performance optimization exercise 1

Optimize the function `work!` in the following code. You may change the function name, the function signature, and the function body.

However, the types and sizes of the inputs `N`, `A`, `b`, `c` are fixed and **may not be changed.**

In [4]:
# function you should optimze
function work!(A, N)
    D = zeros(N,N)
    for i in 1:N
        D = b[i]*c*A
        b[i] = sum(D)/N^2
    end
    return b
end

# fixed input (do not change!)
N = 10
A = [float(i+j) for i in 1:N, j in 1:N] # matrix of size NxN
b = collect(Float64, 1:N) # vector of length N
c = 1.23;

# desired result (do not change!)
const RESULT = [13.53, 27.06, 40.59, 54.12, 67.65, 81.18, 94.71, 108.24, 121.77, 135.3];

In [5]:
# baseline benchmark
using BenchmarkTools
@btime work!($A, $N);

  2.001 μs (41 allocations: 10.09 KiB)


## ↓ Go optimize! ↓

Your optimized `work_optimized!` function (**change it!**):

In [12]:
function work_optimized!(A, N, b, c)
    D = zeros(N,N)
    for i in 1:N
        @inbounds for j in eachindex(D)
            D[j] = b[i] * c * A[j]
        end
        b[i] = sum(D)/N^2
    end
    return b
end

work_optimized! (generic function with 1 method)

Testing that it actually gives the correct result:

In [13]:
N = 10
A = [float(i+j) for i in 1:N, j in 1:N] # matrix of size NxN
b = collect(Float64, 1:N) # vector of length N
c = 1.23

work_optimized!(A, N, b, c) ≈ RESULT

true

Benchmarking it:

In [14]:
using BenchmarkTools
@btime work_optimized!($A, $N, $b, $c);

  342.892 ns (1 allocation: 896 bytes)
