PageRank essentially works by counting the number of links to a page to obtain a rough estimate of website's importance. The base assumption in PageRank is that more important websites are likely to receive more links from other websites.
We have 4 pages A.html, B.html, C.html, D.html
PageRank is initialized to the same value for all pages, with a probability distribution between 0 and 1. Hence the initial value for each page in this example is 0.25.
const pages = ['A', 'B', 'C', 'D'];
const initialRanks = pages.map(() => 1 / pages.length); // [0.25, 0.25, 0.25, 0.25]
The PageRank of a given page gets equally distributed through its outbound links on each iteration of the algorithm.
If the only links we had were A
, B
, and C
to D
, each link would transfer 0.25 PageRank to D
upon the next iteration, for a total of 0.75.
PR('D') = PR('A') + PR('B') + PR('C')
Suppose instead that page B
had a link to pages C
and A
, page C
had a link to page A
, and page D
had links to all three pages.
<!-- B.html -->
<a href="A.html">A</a>
<a href="C.html">C</a>
<!-- C.html -->
<a href="A.html">A</a>
<!-- D.html -->
<a href="A.html">A</a>
<a href="B.html">B</a>
<a href="C.html">C</a>
In this example, upon the first iteration, page B
would transfer half of its existing rank, or 0.125, to page A
and the other half, to page C
. Page C
would transfer all of its existing rank, 0.25, to the only page it links to, A
. Since D
had three outbound links, it would transfer one third of its rank, or approximately 0.083, to A
.
At the end of this iteration, page A
will have a PageRank of approximately 0.458.
pageRankA = PR('B') / 2 + PR('C') + PR('D') / 3` // (0.25 / 2) + (0.25 / 1) + (0.25 / 3) ~= 0.458
In other words, the PageRank conferred by an outbound link is equal to the document's own PageRank score divided by the number of outbound links L
.
pageRankA = PR('B') / L('B') +
PR('C') / L('C') +
PR('D') / L('D')`
In the general case, the PageRank value for any page X
can be expressed as:
pageRankX = sum(
pagesLinkingTo('X').
map(page =>
getPageRank(page) / countOutgoingLinks(page)
)
)
It is important to note that the original algorithm ignores repeated links as well as self-links.
- Original Paper
The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85.
- Wikipedia
The PageRank theory holds that an imaginary random surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor
d
. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85.
pageRankX = (1 - d) + d * sum(
pagesLinkingTo('X').
map(page =>
PR(page) / L(page)
)
);
Here's the full algorithm:
const getRanksOverNumberOfBackLinks = (url) => {
return getBackLinkedPages(url).map(backLinkPage => {
return backLinkPage.rank / countOutLinks(backLinkPage.url)
});
};
const normalizedSum = (ranksOverNumberOfBackLinks) => {
return (1 - d) + d * sum(ranksOverNumberOfBackLinks);
};
const pageRank = normalizedSum(
getRanksOverNumberOfBackLinks(
url
)
);
MIT