Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

TFIDF vs PageRank demo for a trivial 3-page web

  • Loading branch information...
commit 66915a047ac94806f35bb350cbea203007f23987 1 parent 8325e0d
Ilya Grigorik authored
36 web-ferret.rb
View
@@ -0,0 +1,36 @@
+# Ilya Grigorik
+#
+# TFIDF vs PageRank demo for a trivial 3-page web:
+# page 1 -> page 2, page 3 (PageRank: 0.05)
+# page 2 -> page 3 (PageRank: 0.07)
+# page 3 -> page 3, page 3 (PageRank: 0.87)
+#
+
+require 'ferret'
+include Ferret
+
+index = Index::Index.new()
+
+index << {:title => "Page 1", :content => open("web/page-1.html").read, :pagerank => 0.05 }
+index << {:title => "Page 2", :content => open("web/page-2.html").read, :pagerank => 0.07 }
+index << {:title => "Page 3", :content => open("web/page-3.html").read, :pagerank => 0.87 }
+
+index.search_each('content:"world"') do |id, score|
+ puts "Score: #{score}, #{index[id][:title]} (PageRank: #{index[id][:pagerank]})"
+end
+
+puts "*" * 50
+
+sf_pagerank = Search::SortField.new(:pagerank, :type => :float, :reverse => true)
+
+index.search_each('content:"world"', :sort => sf_pagerank) do |id, score|
+ puts "Score: #{score}, #{index[id][:title]}, #{index[id][:pagerank]}"
+end
+
+# Score: 0.267119228839874, Page 3 (PageRank: 0.87)
+# Score: 0.17807948589325, Page 1 (PageRank: 0.05)
+# Score: 0.17807948589325, Page 2 (PageRank: 0.07)
+# **************************************************
+# Score: 0.267119228839874, Page 3, 0.87
+# Score: 0.17807948589325, Page 2, 0.07
+# Score: 0.17807948589325, Page 1, 0.05
8 web/page-1.html
View
@@ -0,0 +1,8 @@
+<html>
+ <body>
+ Hello world, I'm page #1!
+
+ <a href="page-2.html">Page 2</a>
+ <a href="page-3.html">Page 3</a>
+ </body>
+</html>
7 web/page-2.html
View
@@ -0,0 +1,7 @@
+<html>
+ <body>
+ Cruel world, I'm page #2!
+
+ <a href="page-3.html">Page 3</a>
+ </body>
+</html>
5 web/page-3.html
View
@@ -0,0 +1,5 @@
+<html>
+ <body>
+ Everyone in the world links to me!
+ </body>
+</html>
Please sign in to comment.
Something went wrong with that request. Please try again.