Switch branches/tags
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
31 lines (27 sloc) 1.3 KB

Notes for Link Analysis


graph.txt has X lines of text where each line is in the form " " Number of unique...

  • IDs: 242047
  • senders: 19802
  • receivers: 77660
  • senders+receivers: 86293

For nodes on the graph it is likely that we use s+r. Some emails are malformed, though the task description indicates we should completely ignore them and consider them normal. We are to ignore any links of the for A->A. If the node A only emails itself it becomes a 'freestanding' node and is excluded from the N count.

Hubs and Authorities

The slides indicate to calculate the hubs first, then authorities. The hubs and authorities algorithm gets the sanity check numbers on the 9th iteration. Though the numbers are slightly different on the 10th iteration. This may be due to float point arithmetic or other flaw in the code. However the code is fairly straight forward and I cannot see what's wrong. I suggest using both 9 and 10 iterations for

Sanity checks

  • 0.00174604 -- pagerank L1-normalised
  • 0.00017473 -- pagerank unnormalised
  • 0.00401388 -- hub score
  • 0.06632742 -- authority score

  • 0.00100596 -- hub score
  • 0.00021004 -- authority score
  • 0.00060603 -- pagerank unnormalised
  • 0.00373347 -- pagerank L1-normalised