Permalink
Browse files

Cache referenced image files

Find any img tags, pull down a copy of their src and keep it locally,
and then rewrite the url to reference the local copy.

This has the benefit of truly snapshotting how my blog looks at a point
in time, at the expense of a much larger _deploy repo.

Idea for the future, where internet connections may be to shitty to
continue doing this; add a flag in the yaml front matter and do this
conditionally based on that so that already cached stuff continues to
use the cache, and new stuff can use live URLs until I find better
internet again.
  • Loading branch information...
lparry committed Jan 10, 2014
1 parent 624d787 commit 6b01d40e1373d581e31cfa3ac055c4c8664da8db
Showing with 83 additions and 1 deletion.
  1. +1 −1 .gitignore
  2. +1 −0 Gemfile
  3. +5 −0 Gemfile.lock
  4. +76 −0 plugins/image_cacher.rb
View
@@ -6,4 +6,4 @@ public
vendor/bundle
bin
.bundle
-
+source/images/cache
View
@@ -18,6 +18,7 @@ group :development do
gem 'pry'
gem 'pry-stack_explorer'
gem 'pry-plus'
+ gem 'httparty'
gem 'nokogiri'
end
View
@@ -29,6 +29,9 @@ GEM
ffi (1.0.11)
fssm (0.2.10)
haml (3.1.8)
+ httparty (0.12.0)
+ json (~> 1.8)
+ multi_xml (>= 0.5.2)
interception (0.3)
jekyll (0.11.2)
albino (~> 1.3)
@@ -45,6 +48,7 @@ GEM
maruku (0.7.0)
method_source (0.8.2)
mini_portile (0.5.2)
+ multi_xml (0.5.5)
nokogiri (1.6.1)
mini_portile (~> 0.5.0)
posix-spawn (0.3.6)
@@ -108,6 +112,7 @@ DEPENDENCIES
compass (~> 0.12.2)
directory_watcher (~> 1.4.1)
haml (~> 3.1.6)
+ httparty
jekyll (~> 0.11.2)
liquid (~> 2.3.0)
nokogiri
View
@@ -0,0 +1,76 @@
+#custom filters for Octopress
+require './plugins/post_filters'
+require 'nokogiri'
+require 'httparty'
+
+module Jekyll
+ class ImageCacher < PostFilter
+ def post_render(post)
+ if post.ext.match('html|textile|markdown|md|haml|slim|xml') && !ENV["ONLINE_IMAGES"].nil?
+ post.content = ImageFetcher.cache_images(post.content)
+ end
+ end
+
+ class ImageFetcher
+
+ def self.cache_images(content)
+ new.cache_images(content)
+ end
+
+ def line_contains_img_tag?(line)
+ line.include?(%(<img ))
+ end
+
+ def cache_images(content)
+ lines = content.lines
+ lines.map! do |line|
+ if line_contains_img_tag?(line)
+ fetch_image_from(line)
+ else
+ line
+ end
+ end
+ lines.join
+ end
+
+ def fetch_image_from(line)
+ doc = Nokogiri.parse(line)
+ img_url = (doc / "img")[0].attributes["src"].value
+ uri = URI.parse(img_url)
+ if uri.host.nil? || uri.host == "www.lucasthenomad.com"
+ line
+ else
+ local_path = "/images/cache#{uri.path}"
+ cache(img_url, local_path)
+ line.sub(img_url, local_path)
+ end
+ end
+
+ def cache(url, local_path)
+ destination = "source#{local_path}"
+ if File.exist?(destination)
+ puts %(already downloaded "#{destination}")
+ else
+ download_file(url, destination)
+ end
+ end
+
+ def download_file(uri_str, destination)
+ resp = fetch(uri_str)
+ FileUtils.mkdir_p(File.expand_path("..", destination))
+ File.open(destination, "wb") do |file|
+ file.write(resp.body)
+ end
+ puts %(downloaded "#{destination}")
+ end
+
+ def fetch(uri_str, limit = 10)
+ response = HTTParty.get(uri_str)
+ raise "response: #{response.code}" unless response.code == 200
+ response
+ end
+
+ end
+
+ end
+end

0 comments on commit 6b01d40

Please sign in to comment.