# FIFA Learner's League

2018 Men's World Cup round of 16 predictions based on group round performance.

Team rankings taken from [FIFA ranking table](https://www.fifa.com/fifa-world-ranking/ranking-table/men/index.html). Scraping is easy, since they nicely structured the data in a table:
```
points = document.querySelectorAll('td.tbl-points')
Array.forEach(points, p => console.log(p.textContent))
```

## Forward Propagation

Our forward pass passes our activations through a series of nonlinear transformations to give an estimate of the outcome, ŷ.

In [163]:
# rectified linear unit for introducing non-linearity into model
relu(values) = max.(values, 0.0)

# Given a vector or matrix of inputs, computes predicted outcome ŷ
#
# activations - initial activations
# weights - array of weight matrices
# biases - array of bias vectors
function forwardPass(activations, weights, biases)
    a′ = activations
    for i in 1:length(weights) - 1
        a′ = relu(weights[i] * a′ .+ biases[i])
    end
    weights[length(weights)] * a′ .+ biases[length(biases)]
end
; # silence output

## Load our training data

We want to train our model to predict how many goals each team will score in a given match. To make this simpler, we will train a single model to guess the score of one team against another. Later, we will ask that model for the score of A in A vs B as well as the score of B in B vs A. This doubles our training data, since each match will be represented by two output labels and two sets of inputs.

### Inputs
- Team Identifier A
- Team Identifier B
- FIFA World Ranking A
- FIFA World Ranking B

### Outputs
- Team A Score


In [188]:
srand(11235)
W = [rand(4, 4) .* 0.01 for i in 1 : 3]
B = [zeros(4, 1) for i in 1: 3]
push!(W, rand(1, 4))
push!(B, zeros(1, 1))
A = [0.1 0.2
     0.2 0.5
     0.6 0.33
     0.2 0.4]
Y = [ 0.000001 .5 ]

Ŷ = forwardPass(A, W, B)

1×2 Array{Float64,2}:
 5.61747e-6  7.99972e-6

## Compute Loss

Given all our estimates Ŷ, compute loss relative to actual game outcomes Y

We could train two models, one for each output.

In [195]:
function squareLoss(y, ŷ)
    (y .- ŷ).^2
end
squareLoss(Y, Ŷ)

1×2 Array{Float64,2}:
 2.13211e-11  0.249992

In [207]:
# simple unit tests
using Base.Test
@testset "Square Loss" begin
    @test squareLoss(2, 1) == 1
    @test squareLoss(3, 1) == 4
    @test squareLoss([3; 5], [1; 2]) == [4; 9]
    @test squareLoss(1.5, 1) ≈ 0.25
end

[1m[37mTest Summary: | [39m[22m[1m[32mPass  [39m[22m[1m[36mTotal[39m[22m
Square Loss   | [32m   4  [39m[36m    4[39m


Base.Test.DefaultTestSet("Square Loss", Any[], 4, false)

## Back Propagation
Minimize loss by moving weights and biases along their gradients.

Compute the gradient of the loss function with respect to parameters of each function used to generate it.

## Train Parameters

Using the change of cost relative to change of each parameter, change parameters to minimize cost