Permalink
Browse files

Github import.

  • Loading branch information...
0 parents commit c4c81ebd54cf6c2e35d7071c287a9d1814a8a360 Joseph Pearson committed Jan 20, 2009
Showing with 27,808 additions and 0 deletions.
  1. +1 −0 .gitignore
  2. +26 −0 README.markdown
  3. +10 −0 config.ru
  4. +56 −0 knapsack.rb
  5. +79 −0 models/page.rb
  6. +103 −0 models/resource.rb
  7. +12 −0 models/stylesheet.rb
  8. +62 −0 vendor/hpricot-0.6/CHANGELOG
  9. +18 −0 vendor/hpricot-0.6/COPYING
  10. +284 −0 vendor/hpricot-0.6/README
  11. +211 −0 vendor/hpricot-0.6/Rakefile
  12. +1,340 −0 vendor/hpricot-0.6/ext/hpricot_scan/HpricotScanService.java
  13. +6 −0 vendor/hpricot-0.6/ext/hpricot_scan/extconf.rb
  14. +76 −0 vendor/hpricot-0.6/ext/hpricot_scan/hpricot_common.rl
  15. +5,976 −0 vendor/hpricot-0.6/ext/hpricot_scan/hpricot_scan.c
  16. +79 −0 vendor/hpricot-0.6/ext/hpricot_scan/hpricot_scan.h
  17. +363 −0 vendor/hpricot-0.6/ext/hpricot_scan/hpricot_scan.java.rl
  18. +273 −0 vendor/hpricot-0.6/ext/hpricot_scan/hpricot_scan.rl
  19. +176 −0 vendor/hpricot-0.6/extras/mingw-rbconfig.rb
  20. +26 −0 vendor/hpricot-0.6/lib/hpricot.rb
  21. +63 −0 vendor/hpricot-0.6/lib/hpricot/blankslate.rb
  22. +200 −0 vendor/hpricot-0.6/lib/hpricot/builder.rb
  23. +510 −0 vendor/hpricot-0.6/lib/hpricot/elements.rb
  24. +672 −0 vendor/hpricot-0.6/lib/hpricot/htmlinfo.rb
  25. +107 −0 vendor/hpricot-0.6/lib/hpricot/inspect.rb
  26. +37 −0 vendor/hpricot-0.6/lib/hpricot/modules.rb
  27. +297 −0 vendor/hpricot-0.6/lib/hpricot/parse.rb
  28. +228 −0 vendor/hpricot-0.6/lib/hpricot/tag.rb
  29. +164 −0 vendor/hpricot-0.6/lib/hpricot/tags.rb
  30. +821 −0 vendor/hpricot-0.6/lib/hpricot/traverse.rb
  31. +94 −0 vendor/hpricot-0.6/lib/hpricot/xchar.rb
  32. +17 −0 vendor/hpricot-0.6/test/files/basic.xhtml
  33. +2,266 −0 vendor/hpricot-0.6/test/files/boingboing.html
  34. +3,653 −0 vendor/hpricot-0.6/test/files/cy0.html
  35. +400 −0 vendor/hpricot-0.6/test/files/immob.html
  36. +1,320 −0 vendor/hpricot-0.6/test/files/pace_application.html
  37. +16 −0 vendor/hpricot-0.6/test/files/tenderlove.html
  38. +220 −0 vendor/hpricot-0.6/test/files/uswebgen.html
  39. +1,054 −0 vendor/hpricot-0.6/test/files/utf8.html
  40. +1,723 −0 vendor/hpricot-0.6/test/files/week9.html
  41. +19 −0 vendor/hpricot-0.6/test/files/why.xml
  42. +7 −0 vendor/hpricot-0.6/test/load_files.rb
  43. +65 −0 vendor/hpricot-0.6/test/test_alter.rb
  44. +24 −0 vendor/hpricot-0.6/test/test_builder.rb
  45. +379 −0 vendor/hpricot-0.6/test/test_parser.rb
  46. +16 −0 vendor/hpricot-0.6/test/test_paths.rb
  47. +66 −0 vendor/hpricot-0.6/test/test_preserved.rb
  48. +28 −0 vendor/hpricot-0.6/test/test_xml.rb
  49. +68 −0 vendor/sinatra/ChangeLog
  50. +22 −0 vendor/sinatra/LICENSE
  51. +523 −0 vendor/sinatra/README.rdoc
  52. +162 −0 vendor/sinatra/Rakefile
  53. BIN vendor/sinatra/images/404.png
  54. BIN vendor/sinatra/images/500.png
  55. +1,466 −0 vendor/sinatra/lib/sinatra.rb
  56. +76 −0 vendor/sinatra/lib/sinatra/test/methods.rb
  57. +10 −0 vendor/sinatra/lib/sinatra/test/rspec.rb
  58. +10 −0 vendor/sinatra/lib/sinatra/test/spec.rb
  59. +13 −0 vendor/sinatra/lib/sinatra/test/unit.rb
  60. +77 −0 vendor/sinatra/sinatra.gemspec
  61. +299 −0 vendor/sinatra/test/app_test.rb
  62. +318 −0 vendor/sinatra/test/application_test.rb
  63. +101 −0 vendor/sinatra/test/builder_test.rb
  64. +62 −0 vendor/sinatra/test/custom_error_test.rb
  65. +136 −0 vendor/sinatra/test/erb_test.rb
  66. +15 −0 vendor/sinatra/test/event_context_test.rb
  67. +65 −0 vendor/sinatra/test/events_test.rb
  68. +30 −0 vendor/sinatra/test/filter_test.rb
  69. +233 −0 vendor/sinatra/test/haml_test.rb
  70. +7 −0 vendor/sinatra/test/helper.rb
  71. +72 −0 vendor/sinatra/test/mapped_error_test.rb
  72. +66 −0 vendor/sinatra/test/pipeline_test.rb
  73. +1 −0 vendor/sinatra/test/public/foo.xml
  74. +57 −0 vendor/sinatra/test/sass_test.rb
  75. +39 −0 vendor/sinatra/test/sessions_test.rb
  76. +118 −0 vendor/sinatra/test/streaming_test.rb
  77. +19 −0 vendor/sinatra/test/sym_params_test.rb
  78. +30 −0 vendor/sinatra/test/template_test.rb
  79. +47 −0 vendor/sinatra/test/use_in_file_templates_test.rb
  80. +1 −0 vendor/sinatra/test/views/foo.builder
  81. +1 −0 vendor/sinatra/test/views/foo.erb
  82. +1 −0 vendor/sinatra/test/views/foo.haml
  83. +2 −0 vendor/sinatra/test/views/foo.sass
  84. +2 −0 vendor/sinatra/test/views/foo_layout.erb
  85. +2 −0 vendor/sinatra/test/views/foo_layout.haml
  86. +1 −0 vendor/sinatra/test/views/layout_test/foo.builder
  87. +1 −0 vendor/sinatra/test/views/layout_test/foo.erb
  88. +1 −0 vendor/sinatra/test/views/layout_test/foo.haml
  89. +2 −0 vendor/sinatra/test/views/layout_test/foo.sass
  90. +3 −0 vendor/sinatra/test/views/layout_test/layout.builder
  91. +1 −0 vendor/sinatra/test/views/layout_test/layout.erb
  92. +1 −0 vendor/sinatra/test/views/layout_test/layout.haml
  93. +2 −0 vendor/sinatra/test/views/layout_test/layout.sass
  94. +1 −0 vendor/sinatra/test/views/no_layout/no_layout.builder
  95. +1 −0 vendor/sinatra/test/views/no_layout/no_layout.haml
@@ -0,0 +1 @@
+Capfile
@@ -0,0 +1,26 @@
+# Knapsack: pack pages into data URIs.
+
+Initial development: Joseph Pearson (Inventive Labs)
+Home: http://github.com/inventive/knapsack/tree/master
+
+
+## What is Knapsack?
+
+Knapsack is a simple web service that takes a URL, pulls down the resource and
+all the resources it references, and 'compiles' them into a data URI. What's
+interesting about a data URI is that you can bookmark it and access it offline.
+This can be really useful if you have a device that has an intermittent net
+connection (an iPod Touch, for instance).
+
+It works a lot like Hixie's [Data URI
+Kitchen](http://software.hixie.ch/utilities/cgi/data/data), but it preserves
+images, stylesheets and external javascript libraries.
+
+You can install it on any web server that supports Rack.
+
+
+## License
+
+Copyright (C) 2009 Inventive Labs.
+
+Released under the WTFPL: http://sam.zoy.org/wtfpl.
@@ -0,0 +1,10 @@
+require 'rubygems'
+require 'vendor/sinatra/lib/sinatra.rb'
+
+Sinatra::Application.default_options.merge!(
+ :run => false,
+ :env => :production
+)
+
+require 'knapsack.rb'
+run Sinatra.application
@@ -0,0 +1,56 @@
+require 'rubygems'
+require 'open-uri'
+require 'base64'
+require 'vendor/hpricot-0.6/lib/hpricot.rb'
+require 'vendor/sinatra/lib/sinatra.rb' unless defined?(Sinatra)
+
+require 'models/resource'
+require 'models/page'
+require 'models/stylesheet'
+
+get '/' do
+ if params[:url]
+ url = params[:url]
+ url = "http://#{url}" unless url.match("://")
+ @data = Resource.fetch_and_convert(url)
+ erb :result
+ else
+ erb :index
+ end
+end
+
+error do
+ request.env['sinatra.error'].to_s
+end
+
+use_in_file_templates!
+
+__END__
+
+@@ index
+<html>
+ <head>
+ <title>Knapsack</title>
+ </head>
+ <body>
+ <h1>Knapsack</h1>
+ <p>
+ Enter the URL you want to store offline:
+ <form action="/" method="GET">
+ <input type="text" name="url" />
+ <input type="submit" value="Pack it" />
+ </form>
+ </p>
+ </body>
+</html>
+
+
+@@ result
+<html>
+ <head>
+ <title>Loading...</title>
+ </head>
+ <body>
+ <script>location.href = "<%= @data %>";</script>
+ </body>
+</html>
@@ -0,0 +1,79 @@
+class Page < Resource
+
+ def rewrite
+ doc = Hpricot(@data)
+
+ # Replace <style>@import "xxx";</style> lines with inline css.
+ if true
+ doc.search('style') do |style|
+ spinner do
+ style.inner_html = style.inner_html.gsub(/@import ["'](.*?)["']/) do
+ ss = Resource.fetch($1, @uri)
+ ss.rewrite
+ end
+ end
+ end
+ end
+
+ # Replace stylesheet link hrefs with data uris
+ if true
+ doc.search('link[@rel$="tylesheet"]') do |sslink|
+ spinner do
+ d = Resource.fetch_and_convert(sslink.attributes['href'], @uri)
+ sslink.set_attribute('href', d)
+ end
+ end
+ end
+
+ # Handle inline style definitions.
+ if true
+ doc.search('*[@style]') do |styled_elem|
+ if styled_elem.attributes['style'].match(/url\(.*?\)/)
+ spinner do
+ d = styled_elem.attributes['style'].gsub(/url\((.*?)\)/) do
+ "url(" + Resource.fetch_and_convert($1, @uri) + ")"
+ end
+ styled_elem.set_attribute('style', d)
+ end
+ end
+ end
+ end
+
+ # Replace <img> src attributes with data uris.
+ if true
+ doc.search("img[@src]") do |img|
+ spinner do
+ d = Resource.fetch_and_convert(img.attributes['src'], @uri)
+ img.set_attribute('src', d)
+ end
+ end
+ end
+
+ # Likewise for <script> elements.
+ if true
+ doc.search("script[@src]") do |script|
+ spinner do
+ d = Resource.fetch_and_convert(script.attributes['src'], @uri)
+ script.set_attribute('src', d)
+ end
+ end
+ end
+
+ knotter
+
+ doc.to_html
+ end
+
+ private
+ def spinner
+ @threads ||= []
+ @threads << Thread.new { yield }
+ end
+
+ def knotter
+ if @threads && @threads.any?
+ @threads.each {|thd| thd.join}
+ end
+ end
+
+end
@@ -0,0 +1,103 @@
+class Resource
+
+ def self.fetch_and_convert(url, active_uri = nil)
+ resource = fetch(url, active_uri)
+ resource ? resource.convert : ''
+ end
+
+ def self.fetch(url, active_uri = nil)
+ if url.match(/^data:/)
+ url.instance_eval("def convert; self; end")
+ return url
+ end
+
+ uri = URI.parse(url)
+ unless uri.methods.include?('read')
+ if active_uri
+ uri = active_uri.merge(uri)
+ else
+ raise "Cannot form open-able URI from URL: #{url}"
+ end
+ end
+ log "Fetching: #{url}"
+
+ @@fetched_resources ||= {}
+ key = uri.to_s
+ response = @@fetched_resources[key]
+ unless response
+ begin
+ response = uri.read(
+ "User-Agent" => "Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) " +
+ "AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1C28 " +
+ "Safari/419.3"
+ )
+ rescue OpenURI::HTTPError => e
+ log e.inspect
+ return nil
+ end
+
+ if response.content_encoding.any?
+ if response.content_encoding.first.downcase == "gzip"
+ rdr = Zlib::GzipReader.new(StringIO.new(response))
+ response.replace(rdr.read)
+ else
+ raise "Unknown encoding: #{response.content_encoding.join(',')}"
+ end
+
+ if response.content_encoding.size > 1
+ raise "Multiple encodings: #{response.content_encoding.join(',')}"
+ end
+ end
+ @@fetched_resources[key] = response
+ end
+
+ recognise(response)
+ end
+
+ def self.recognise(response)
+ raise "Unrecognised response" unless mime_type = response.content_type
+ if mime_type == "text/html"
+ Page.new(response)
+ elsif mime_type == "text/css"
+ Stylesheet.new(response)
+ else
+ new(response)
+ end
+ end
+
+ def initialize(response)
+ @data = response
+ @mime_type = response.content_type
+ @uri = response.base_uri
+ @charset = response.charset
+ end
+
+ def convert
+ @data = rewrite
+ @data = encode
+ @data = format_for_output
+ end
+
+ # Override this in subclasses to modify self (including other resources, etc).
+ def rewrite
+ (@data && !@data.empty?) ? @data : ''
+ end
+
+ def encode
+ (@data && !@data.empty?) ? Base64.encode64(@data).gsub(/\n/,'') : ''
+ end
+
+ def format_for_output
+ cs = @charset ? "charset=#{@charset};" : ''
+ (@data && !@data.empty?) ? "data:#{@mime_type};#{cs}base64,#{@data}" : ''
+ end
+
+ def log(msg)
+ puts msg
+ end
+
+ def self.log(msg)
+ puts msg
+ end
+
+end
@@ -0,0 +1,12 @@
+class Stylesheet < Resource
+
+ def rewrite
+ # Replace urls with data uris
+ if true
+ @data.gsub!(/url\((.*?)\)/) do
+ "url(" + Resource.fetch_and_convert($1, @uri) + ")"
+ end
+ end
+ end
+
+end
@@ -0,0 +1,62 @@
+= 0.6
+=== 15th June, 2007
+* Hpricot for JRuby -- nice work Ola Bini!
+* Inline Markaby for Hpricot documents.
+* XML tags and attributes are no longer downcased like HTML is.
+* new syntax for grabbing everything between two elements using a Range in the search method: (doc/("font".."font/br")) or in nodes_at like so: (doc/"font").nodes_at("*".."br"). Only works with either a pair of siblings or a set of a parent and a sibling.
+* Ignore self-closing endings on tags (such as form) which are containers. Treat them like open parent tags. Reported by Jonathan Nichols on the hpricot list.
+* Escaping of attributes, yanked from Jim Weirich and Sam Ruby's work in Builder.
+* Element#raw_attributes gives unescaped data. Element#attributes gives escaped.
+* Added: Elements#attr, Elements#remove_attr, Elements#remove_class.
+* Added: Traverse#preceding, Traverse#following, Traverse#previous, Traverse#next.
+
+= 0.5
+=== 31rd January, 2007
+
+* support for a[text()="Click Me!"] and h3[text()*="space"] and the like.
+* Hpricot.buffer_size accessor for increasing Hpricot's buffer if you're encountering huge ASP.NET viewstate attribs.
+* some support for colons in tag names (not full namespace support yet.)
+* Element.to_original_html will attempt to preserve the original HTML while merging your changes.
+* Element.to_plain_text converts an element's contents to a simple text format.
+* Element.inner_text removes all tags and returns text nodes concatenated into a single string.
+* no @raw_string variable kept for comments, text, and cdata -- as it's redundant.
+* xpath-style indices (//p/a[1]) but keep in mind that they aren't zero-based.
+* node_position is the index among all sibling nodes, while position is the position among children of identical type.
+* comment() and text() search criteria, like: //p/text(), which selects all text inside paragraph tags.
+* every element has css_path and xpath methods which return respective absolute paths.
+* more flexibility all around: in parsing attributes, tags, comments and cdata.
+
+= 0.4
+=== 11th August, 2006
+
+* The :fixup_tags option will try to sort out the hierarchy so elements end up with the right parents.
+* Elements such as *script* and *style* (identified as having CDATA contents) receive a single text node as their children now. Previously, Hpricot was parsing out tags found in scripts.
+* Better scanning of partially quoted attributes (found by Brent Beardsly on http://uswebgen.com/)
+* Better scanning of unquoted attributes -- thanks to Aaron Patterson for the test cases!
+* Some tags were being output in the empty tag style, although browsers hated that. FIXED!
+* Added Elements#at for finding single elements.
+* Added Elem::Trav#[] and Elem::Trav#[]= for reading and writing attributes.
+
+= 0.3
+=== 7th July, 2006
+
+* Fixed negative string size error on empty tokens. (news.bbc.co.uk)
+* Allow the parser to accept just text nodes. (such as: <tt>Hpricot.parse('TEXT')</tt>)
+* from JQuery to Hpricot::Elements: remove, empty, append, prepend, before, after, wrap, set,
+ html(...), to_html, to_s.
+* on containers: to_html, replace_child, insert_before, insert_after, innerHTML=.
+* Hpricot(...) is an alias for parse.
+* open up all properties to setters, let people do as they may.
+* use to_html for the full html of a node or set of elements.
+* doctypes were messed.
+
+= 0.2
+=== 4th July, 2006
+
+* Rewrote the HTree parser to be simpler, more adequate for the common man. Will add encoding back in later.
+
+= 0.1
+=== 3rd July, 2006
+
+* For whatever reason, wrote this HTML parser in C.
+ I guess Ragel is addictive and I want to improve HTree.
@@ -0,0 +1,18 @@
+Copyright (c) 2006 why the lucky stiff
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to
+deal in the Software without restriction, including without limitation the
+rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+sell copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Oops, something went wrong.

0 comments on commit c4c81eb

Please sign in to comment.