# Markov Chains, Random Walks and PageRank

The purpose of this notebook is to explore the algorithm of PageRank, developed by Google, and create toy examples while exploring extensive concepts like Markov matrices, probability and ranking websites.

## Importing Libraries

Here we import all necessary libraries.

In [1]:
using LinearAlgebra

The following is just experimental code.

In [2]:
# inputs, ask for # of websites, # of links per website
# let # of websites = w
a = [0, 1, 1, 1]
b = [1, 0, 0, 1]
c = [0, 0, 0, 1]
d = [0, 1, 1, 0]
# normalize by factor of transpose row * row so dot product of self
m = [transpose(a); transpose(b); transpose(c); transpose(d)]
m = transpose(m)
m
# create the link matrix

# vector r would be vector of ones with size w x 1, and divide by w

# find steady state vector, pagerank function using set number of iterations
# while loop: if last vector == current vector, break loop cause we've approached steady state
# rank lists by percentage in steady state vector

4×4 transpose(::Matrix{Int64}) with eltype Int64:
 0  1  0  0
 1  0  0  1
 1  0  0  1
 1  1  1  0

## Processing

We create algorithms to process and conduct a mini PageRank.

In [3]:
# function userInput which gets user input, processing into L matrix
function userInput()
    # getting number of sites
    #println("How many websites do you have?")
    #websites = readline()
    #websites = parse(Int, websites)
    println("You have five websites.")
    websites = 5
    println("Format: 2 3 4 (website 1 points to 2, 3, 4)")
    
    v1 = []
    v2 = []
    v3 = []
    v4 = []
    v5 = []
    
    # iterating through sites
    for i in 1:websites
        # getting links each site points to
        print("For website number ")
        print(i)
        println(" input the links which it points to.")
        links = readline()
        # converting input to array 
        arr = split(links, " ")
        tempVector = zeros(websites, 1)
        # looping through links
        for j in arr
            number = parse(Int, j)
            # normalizing and inputting into vector
            tempVector[number] = 1/length(arr)
            if i == 1
                v1 = tempVector
            elseif i == 2
                v2 = tempVector
            elseif i == 3
                v3 = tempVector
            elseif i == 4
                v4 = tempVector
            elseif i == 5
                v5 = tempVector
            end
        end
    end
    
    L = [reshape(v1, 1, :); reshape(v2, 1, :); reshape(v3, 1, :); reshape(v4, 1, :); reshape(v5, 1, :);]
    
    return L
end

userInput (generic function with 1 method)

### Using Functions

Here, we assign variables to our functions and use them as such.

In [4]:
# this is our L matrix
L = userInput()

You have five websites.
Format: 2 3 4 (website 1 points to 2, 3, 4)
For website number 1 input the links which it points to.


stdin>  1 2 3


For website number 2 input the links which it points to.


stdin>  2 4


For website number 3 input the links which it points to.


stdin>  3 1


For website number 4 input the links which it points to.


stdin>  5


For website number 5 input the links which it points to.


stdin>  3 4


5×5 Matrix{Float64}:
 0.333333  0.333333  0.333333  0.0  0.0
 0.0       0.5       0.0       0.5  0.0
 0.5       0.0       0.5       0.0  0.0
 0.0       0.0       0.0       0.0  1.0
 0.0       0.0       0.5       0.5  0.0

In [5]:
L = transpose(L)

5×5 transpose(::Matrix{Float64}) with eltype Float64:
 0.333333  0.0  0.5  0.0  0.0
 0.333333  0.5  0.0  0.0  0.0
 0.333333  0.0  0.5  0.0  0.5
 0.0       0.5  0.0  0.0  0.5
 0.0       0.0  0.0  1.0  0.0

In [6]:
# this is our r vector
r = ones(5, 1)
for i in 1:length(r)
    r[i] = r[i]/5
end

Now, we create our Page Rank algorithm.

In [65]:
function PageRank(L, r, iterations)
    v = r
    for i in 1:iterations
        v = L * v
    end
    sorted = sort!(copy(v), dims = 1, rev = true)
    ranks = []
    for i in sorted
        
        for j in 1:5
            if v[j] == i && j ∉ ranks
                append!(ranks, j)
            end
        end
    end
    return v, ranks
end

PageRank (generic function with 1 method)

In [67]:
# using PageRank with 100 iterations
v, ranks = PageRank(L, r, 100)
# page ranks for each website, top being rank
println(v)
println(ranks)

[0.2307692307692305; 0.1538461538461537; 0.3076923076923074; 0.1538461538461537; 0.1538461538461537]
Any[3, 1, 2, 4, 5]


In [68]:
# using PageRank function with 200 iterations
v2, ranks2 = PageRank(L, r, 200)
# page ranks for each website, top being rank
println(v2)
println(ranks2)
# can see that the steady state vector v2 does not change with more iterations

[0.2307692307692305; 0.1538461538461537; 0.3076923076923074; 0.1538461538461537; 0.1538461538461537]
Any[3, 1, 2, 4, 5]


## References

* https://en.wikipedia.org/wiki/PageRank#Simplified_algorithm
* http://blog.kleinproject.org/?p=280 
* https://www.youtube.com/watch?v=F5fcEtqysGs
