# Appendix B: Graph Analytics

## A Problem with Prestige Scores

It was noted in the [introduction](1_introduction.ipynb#Graph-features "Introduction: Graph features") that
vertex scoring methods like PageRank and the normalised eigenvector produce prestige scores that measure the effect of in-edges but not out-edges. It was also noted that we should treat such scores with caution.

To see why, consider the example graph below composed of only the historical matches between two teams, say A and B. Here we see that, either in a single match or aggregated over multiple matches, team A has scored a total of 60 points against team B, and team B has scored a total of 120 points against team A.
Which team do you think is stronger?

<img src="graph_A_vs_B.png" title="Graph of team A versus team B" width="50%">

Given the vertex ordering $(A,B)$, the weighted adjacency matrix of this example graph is
\begin{eqnarray}
\mathbf{A} & = & \left[\begin{array}
\\
0 & 120\\
60 & 0
\end{array}\right]\,.
\end{eqnarray}
Note that each edge is directed from the loser to the winner - thus, team A loses 0 points to itself and 120 points to team B, whereas team B loses 60 points to team A and 0 points to itself.

In general, we amalgamate the edge scores across all matches between each given pair of teams, such that a single edge from team A to B represents a loss of prestige of A to B, with edge weight $w_{A\rightarrow B}$.
Likewise, the single edge from team B to A represents a loss of 
prestige of B to A, with edge weight $w_{B\rightarrow A}$.
The two-team weighted adjacency matrix is now
\begin{eqnarray}
\mathbf{A} & = & \left[\begin{array}
\\
0 & w_{A\rightarrow B}\\
w_{B\rightarrow A} & 0
\end{array}\right]\,.
\end{eqnarray}
We may consider the prestige graph of an arbitrary number of teams in general.
For example, we could consider all the matches between all teams in a league, either in a single season or amalgamated over multiple seasons.

### Normalised eigenvector scores

To compute the eigenvector scores, we first let the probability of leaving a vertex along a given edge be proportional to the edge weight.
Thus, we normalise each row of $\mathbf{A}$ to unity, giving rise to the row-stochastic matrix $\tilde{\bf A}$ that obeys $\tilde{\bf A}\mathbf{1}=\mathbf{1}$.
The prestige score vector $\mathbf{x}$ is then the principle left-eigenvector of 
$\tilde{\bf A}$, obeying $\mathbf{x}^{T}\tilde{\bf A}=\mathbf{x}^{T}$.

For the two-team graph above, we therefore obtain
\begin{eqnarray}
\tilde{\bf A} & = & \left[
\begin{array}
\\
0 & 1\\
1 & 0
\end{array}\right]\,,
\end{eqnarray}
with scores $\mathbf{x}=(x_A,x_B)=(0.5,0.5)$.
In other words, the prestige scores are equal, regardless of the values of the edge
weights! Consequently, according to the normalised eigenvector scores, teams A and B are always equal in strength,
which is certainly not what we expected.

What is going wrong here? The problem is that the normalised eigenvector and related methods measure only the effect of in-edges. In essence, they measure the gain in prestige of each vertex due to in-edges, but fail to measure the corresponding loss of prestige 
due to out-edges.

This is why we caution that such scores require interpretation. In social network analysis (SNA), the eigenvector prestige scores are often useful, for example in the context of a friendship graph where an edge $A\rightarrow B$ indicates that person A likes person B. In this situation, person B obtains prestige for being liked, but person A does not lose prestige for the act of liking. However, in the adversarial context of opposing sporting teams, prestige is both won and lost.

### Unnormalised eigenvector scores

In the [previous](#Normalised-eigenvector-scores 
"Section: Normalised eigenvector scores") section, the main issue was that normalising the out-edge weights of each vertex to sum to unity led to teams A and B appearing equal in strength. The question then arises: what if we didn't normalise the edge weights? Computing the unnormalised eigenvector scores is also a common technique in SNA.

Here we now have the system $\mathbf{A}^T\mathbf{x}=\lambda\mathbf{x}$ of equations.
For our [two-team](#A-Problem-with-Prestige-Scores 
"Section: A Problem with Prestige Scores") 
 weighted adjacency matrix $\mathbf{A}$, the system of equations becomes
\begin{eqnarray}
\left[\begin{array}\\
0 & w_{B\rightarrow A}\\
w_{A\rightarrow B} & 0
\end{array}\right]
\,\left[\begin{array}\\
x_A\\x_B
\end{array}\right] & = &
\lambda\,\left[\begin{array}\\
x_A\\x_B
\end{array}\right]
\,.
\end{eqnarray}
The solution gives the unit-sum eigenvector
\begin{eqnarray}
\left[\begin{array}\\
x_A\\x_B
\end{array}\right] & = &
\frac{1}{\sqrt{w_{B\rightarrow A}}+\sqrt{w_{A\rightarrow B}}}
\left[\begin{array}\\
\sqrt{w_{B\rightarrow A}}\\\sqrt{w_{A\rightarrow B}}
\end{array}\right]\,,
\end{eqnarray}
with eigenvalue $\lambda=\sqrt{w_{B\rightarrow A}\,w_{A\rightarrow B}}$.

For our example graph, this gives $x_A\approx 0.414$ and $x_B\approx 0.586$.
Although mathematically this is an acceptable solution, philosophically
it doesn't accord with our intuition. Also, physically just what does a dimensional unit of square-root point actually mean?

### Steady-state flow scores

Let us now reconsider prestige as being akin to an incompressible fluid that flows through the graph. Like the mass of a fluid, we shall see that the total prestige is a conserved quantity.

In order to obtain the flow equations, we now suppose that the edge weights,
$w_{A\rightarrow B}$ and $w_{B\rightarrow A}$, represent flow rates. For example,
we might let $w_{A\rightarrow B}$ be the averaged points per match that team A concedes to B,
over all games played between teams A and B. This represents the contribution from team A to team B's
'for' score. Conversely, we let $w_{B\rightarrow A}$ be the average points per match that team A
scores against B, which represents team A's contribution to team B's 'against' score.

We now see that team A gains prestige from B at the rate $w_{B\rightarrow A}$, but
simultaneously loses prestige to team B at the rate $w_{A\rightarrow B}$. Since each team can only lose at most the prestige it currently has, the amout of flow out of a vertex must
be proportional to the vertex's prestige score. Similarly,
the amount of flow into a vertex along a given edge must be proportional to the prestige of the in-vertex.

Consequently, for the two-team graph 
[above](#A-Problem-with-Prestige-Scores "Section: A Problem with Prestige Scores"), 
we obtain the fluid flow system

\begin{eqnarray}
\left[\begin{array}\\
\dot{x}_A\\
\dot{x}_B
\end{array}\right]
& = &
\left[\begin{array}
\\
-w_{A\rightarrow B} & +w_{B\rightarrow A}\\
+w_{A\rightarrow B} & -w_{B\rightarrow A}
\end{array}\right]
\,
\left[\begin{array}\\
x_A\\
x_B
\end{array}\right]
\,.
\end{eqnarray}

This can more generally be denoted as $\dot{\bf x}=\mathbf{R}\mathbf{x}$ or 
$\dot{\bf x}(t)=\mathbf{R}\mathbf{x}(t)$.
Observe that the columns of the flow matrix $\mathbf{R}$ sum to zero, i.e.
$\mathbf{1}^{T}\mathbf{R}=\mathbf{0}^{T}$. This simply reflects the fact that,
for the example graph,
the flow out of A into B must equal the flow into B out of A. 
More generally, the sum of flows out of some vertex X into all other vertices must equal the sum of flows into all vertices from vertex X.
Further note that, in comparison to the weighted adjacency matrix $\mathbf{A}$ above,
we have 
\begin{eqnarray}
\mathbf{R} & \doteq & 
\mathbf{A}^{T}-\mathtt{diag}\left[\mathbf{1}^{T}\mathbf{A}^{T}
\right]
\,,
\end{eqnarray}
which therefore trivially satisfies $\mathbf{1}^{T}\mathbf{R}=\mathbf{0}^T$.

As a consequence, we deduce
that
\begin{eqnarray}
\mathbf{1}^{T}\dot{\bf x}(t)=
\mathbf{1}^{T}\mathbf{R}\mathbf{x}(t)=0
& ~\Rightarrow~ & \mathbf{1}^{T}\mathbf{x}(t)=\mbox{constant}
\,.
\end{eqnarray}
In other words, prestige is a conserved quantity (by construction).
For convenience, we assume that initially $\mathbf{1}^{1}\mathbf{x}(0)=1$.

The (or a) steady-state solution of the system of flow equations now occurs when
\begin{eqnarray}
\dot{\mathbf{x}}=\mathbf{0}
& \Rightarrow & \mathbf{R}\mathbf{x}=\mathbf{0}
%& ~\Rightarrow~ (\mathbf{I}+\mathbf{R})\mathbf{x}=\mathbf{x}\,.
\end{eqnarray}
Hence, the prestige score vector
$\mathbf{x}$ is a right-eigenvector of matrix $\mathbf{R}$ with zero eigenvalue.
Such a (non-trivial) solution must exist since $\mathbf{1}^T\mathbf{R}=\mathbf{0}$,
i.e. $\mathbf{1}$ is the left-eigenvector of $\mathbf{R}$
with eigenvalue 0. However, the uniqueness of the solution depends upon the connectedness of the graph,
and its stability depends upon the (real parts of the) other eigenvalues.

For our example graph, we see that $x_B=1-x_A$ (due to conservation of prestige), and hence the steady-state flow for team A obeys
\begin{eqnarray}
\dot{x}_A & = & w_{B\rightarrow A}\,(1-x_A) - w_{A\rightarrow B}\,x_A~=~0
\\
\Rightarrow x_A & = & 
\frac{w_{B\rightarrow A}}
{w_{A\rightarrow B}+w_{B\rightarrow A}}\,.
\end{eqnarray}
Thus, the flow prestige of team A against team B is just the proportion of
team A's 'for' score out of its total 'for' and 'against' scores.
Indeed, this is just what one might intuitively expect from the two-team graph
[above](#A-Problem-with-Prestige-Scores "Section: A Problem with Prestige Scores"),
for which
\begin{eqnarray}
x_A~=~\frac{60}{120+60}~=~\frac{1}{3}\,,
&&
x_B~=~\frac{120}{120+60}~=~\frac{2}{3}
\,.
\end{eqnarray}
In other words, out of the total of 120+60=180 points scored, one-third of those points were scored by team A,
and two-thirds were scored by team B. This suggests a 2:1 ratio of the relative strength of team B to team A.

### Probabilitistic modelling

For the [two-team graph](#A-Problem-with-Prestige-Scores "Section A Problem with Prestige Scores"), the 
[flow prestige](#Steady-state-flow-scores "Section: Steady-state flow scores")
score $x_A$ may be interpreted as the probability of team A winning an arbitrary match against team B,
since $x_A+x_B=1$. Consequently, we may use the model
\begin{eqnarray}
P(\texttt{win}_A\mid A,B) & \doteq & \frac{x_A}{x_A+x_B}
\end{eqnarray}
to estimate team A's chances of defeating team B.

However, for a more general graph with $N>2$ teams, the situation becomes more complicated. In effect, $x_A$ is
the proportion of total prestige obtained by team A against all $N-1$ other teams simultaneously. Thus, we might expect any (arbitrary) one of these other teams to have approximate prestige
\begin{eqnarray}
\bar{x}_A & \doteq & \frac{1-x_A}{N-1}\,.
\end{eqnarray}
Thus, from team A's perspective against a single oppenent, the renormalised
proportion of prestige for A is
\begin{eqnarray}
P(\texttt{win}_A\mid A,*) & = & p_A ~\doteq~ \frac{x_A}{x_A+\bar{x}_A}~=~\frac{(N-1)\,x_A}{1+(N-2)\,x_A}\,.
\end{eqnarray}

Similarly, team B will have a renormalised proportion $p_B=P(\texttt{win}_B\mid B,*)$ of prestige against an arbitrary opponent.
Thus, team B's estimate of the opponent winning is therefore 
$P(\texttt{lose}_B\mid *,B)=q_B\doteq 1-p_B$. Consequently, from the perspective of team A's chances of winning against team B, A's estimate is $p_A$ and B's estimate is $q_B$.
If there is no prior reason to believe that one estimate is any better or worse than the other, then the
combined estimate is just the average
\begin{eqnarray}
\bar{P}(\texttt{win}_A\mid A,B) & \doteq & \frac{1}{2}p_A + \frac{1}{2}\,(1-p_B)\,.
\end{eqnarray}
Observe that for $N=2$, the renormalised variables reduce to $p_A=x_A$ and $p_B=x_B=1-x_A$, such that
$\bar{P}(\texttt{win}_A\mid A,B)=x_A$, as expected.