## Homework Problem - The Trillion Dollar Eigenvector
by Curtis Johnson

Let the diagram below represent the entire internet where each circle represents a webpage. Outgoing arrows represent links to external websites and incoming arrows represent websites that link to your website (e.g. an example to illustrate).

![](4SiteInternet.png)

Each website's relative importance can be written as a function of how many websites link to it (e.g. if website 1 is linked to by website 2 and website 3, the relative importance of website 1 ($x_1$) can be written as $x_1 = x_2 + x_3$). 

You want to rank all of these websites by their relative importance in order to place ads where they are most likely to be seen (i.e. place ads in the most important websites).

a) Find the probability matrix $P$ (also known as a [left stochastic matrix](https://en.wikipedia.org/wiki/Stochastic_matrix)) that describes the probability from moving from each webpage to any other webpage. NOTE: Because this matrix contains probabilities, make sure that each column sum is equal to 1. Row sums do not need to be 1.

First, we write out the relative importance of each website. Remember that the relative importance of a website is how many pages link to it.

$$
\begin{align}
x_1 &= x_3 + x_4\\
x_2 &= x_1\\
x_3 &= x_1 + x_2 + x_4\\
x_4 &= x_1 + x_2
\end{align}
$$

Putting this into matrix form, we get the following:

$$\begin{pmatrix}
0 & 0 & 1 & 1\\ 
1 & 0 & 0 & 0\\ 
1 & 1 & 0 & 1\\ 
1 & 1 & 0 & 0
\end{pmatrix}
\begin{pmatrix}
x_1\\
x_2\\ 
x_3\\ 
x_4
\end{pmatrix}
= 
\begin{pmatrix}
x_1\\ 
x_2\\ 
x_3\\
x_4
\end{pmatrix}$$

Then $P$ is
$$
\begin{pmatrix}
0 & 0 & 1 & 1/2\\ 
1/3 & 0 & 0 & 0\\ 
1/3 & 1/2 & 0 & 1/2\\ 
1/3 & 1/2 & 0 & 0
\end{pmatrix}
$$

where each column sum is 1. 


b) Show that $\lambda=1$ is an eigenvalue of any 4x4 left stochastic matrix $A$ of the same form as $P$ (i.e. column sum is 1). (Turns out this is the case for any left stochastic matrix, but the algebra to prove it gets really hairy.)

Suppose $\lambda$ is a valid eigenvalue of $A$, then the following must hold for some cooresponding eigenvector $\boldsymbol{v}$.

$$ A \boldsymbol{v} = \lambda \boldsymbol{v}$$

$$
\implies \lambda \boldsymbol{v} - A \boldsymbol{v} = \left(\lambda I - A\right)\boldsymbol{v} = 0
$$

which implies that if $\boldsymbol{v}$ is a valid eigenvector associated with $\lambda$, $ x\in \mathcal{N}(\lambda I  - A)$

$$
\implies det(\lambda I - A) = 0.
$$

Carrying out the subtraction we get

$$ 
\begin{pmatrix}
\lambda & 0 & 0 & 0\\ 
0 & \lambda & 0 & 0\\ 
0 & 0 & \lambda & 0\\ 
0 & 0 & 0 & \lambda
\end{pmatrix}
- 
\begin{pmatrix}
0 & a_{12} & a_{13} & a_{14}\\ 
a_{21} & 0 & a_{23} & a_{24}\\ 
a_{31} & a_{32} & 0 & a_{34}\\ 
a_{41} & a_{42} & a_{43} & 0\\ 
\end{pmatrix}
= 
\begin{pmatrix}
\lambda & -a_{12} & -a_{13} & -a_{14}\\ 
-a_{21} & \lambda & -a_{23} & -a_{24}\\ 
-a_{31} & -a_{32} & \lambda & -a_{34}\\ 
-a_{41} & -a_{42} & -a_{43} & \lambda\\ 
\end{pmatrix}
$$

Now, it can be shown that if $\lambda = 1$, the sum of each column = 0.

$$ \implies det\left(I - A \right) = \boldsymbol{0}$$.

Therefore, $\lambda = 1$ is a valid eigenvalue. 


c) Find the probability vector $\boldsymbol{x} = [x_1, x_2, x_3, x_4]^T$ that ranks each of the webpages in the internet by their relative importance. HINT: This is an eigenvector problem.

In [12]:
import numpy as np

P = np.array([[0,0,1,0.5],[1/3, 0, 0, 0],[1/3,1/2,0,1/2],[1/3,1/2,0,0]])

[eigvals, eigvecs] = np.linalg.eig(P)

print("eigenvalues: \n", eigvals)
print('eigenvectors: \n', eigvecs)


eigenvalues: 
 [ 1.        +0.j         -0.36062333+0.41097555j -0.36062333-0.41097555j
 -0.27875333+0.j        ]
eigenvectors: 
 [[ 0.72101012+0.j         -0.75521571+0.j         -0.75521571-0.j
   0.50648562+0.j        ]
 [ 0.24033671+0.j          0.3036721 +0.34607247j  0.3036721 -0.34607247j
  -0.60565568+0.j        ]
 [ 0.54075759+0.j          0.09315321-0.2746779j   0.09315321+0.2746779j
  -0.38153917+0.j        ]
 [ 0.36050506+0.j          0.3583904 -0.07139457j  0.3583904 +0.07139457j
   0.48070923+0.j        ]]


In [13]:
#get eigenvector corresponding to eigenvalue of 1 @ index 0
prob_vec = eigvec[:,0]
print(np.real(prob_vec))

# normalize the vector to ensure total probability = 1
answer = np.real(prob_vec/np.sum(prob_vec))

print("PageRank: \n", answer)

[0.72101012 0.24033671 0.54075759 0.36050506]
PageRank: 
 [0.38709677 0.12903226 0.29032258 0.19354839]


Therefore ads should be placed according to the probability above. (38.7% on website 1, etc)