Skip to content

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also .

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also .
...
  • 2 commits
  • 5 files changed
  • 0 commit comments
  • 1 contributor
Showing with 128 additions and 3 deletions.
  1. +48 −0 .rvmrc
  2. +51 −0 processors/episode_summary.rb
  3. +19 −0 processors/to_file.rb
  4. +8 −1 runner.rb
  5. +2 −2 scrapers/pluzz_francetv_fr.rb
View
48 .rvmrc
@@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+
+# This is an RVM Project .rvmrc file, used to automatically load the ruby
+# development environment upon cd'ing into the directory
+
+# First we specify our desired <ruby>[@<gemset>], the @gemset name is optional,
+# Only full ruby name is supported here, for short names use:
+# echo "rvm use 1.9.3" > .rvmrc
+environment_id="ruby-1.9.3-p327"
+
+# Uncomment the following lines if you want to verify rvm version per project
+# rvmrc_rvm_version="1.16.20 (stable)" # 1.10.1 seams as a safe start
+# eval "$(echo ${rvm_version}.${rvmrc_rvm_version} | awk -F. '{print "[[ "$1*65536+$2*256+$3" -ge "$4*65536+$5*256+$6" ]]"}' )" || {
+# echo "This .rvmrc file requires at least RVM ${rvmrc_rvm_version}, aborting loading."
+# return 1
+# }
+
+# First we attempt to load the desired environment directly from the environment
+# file. This is very fast and efficient compared to running through the entire
+# CLI and selector. If you want feedback on which environment was used then
+# insert the word 'use' after --create as this triggers verbose mode.
+if [[ -d "${rvm_path:-$HOME/.rvm}/environments"
+ && -s "${rvm_path:-$HOME/.rvm}/environments/$environment_id" ]]
+then
+ \. "${rvm_path:-$HOME/.rvm}/environments/$environment_id"
+ [[ -s "${rvm_path:-$HOME/.rvm}/hooks/after_use" ]] &&
+ \. "${rvm_path:-$HOME/.rvm}/hooks/after_use" || true
+else
+ # If the environment file has not yet been created, use the RVM CLI to select.
+ rvm --create "$environment_id" || {
+ echo "Failed to create RVM environment '${environment_id}'."
+ return 1
+ }
+fi
+
+# If you use bundler, this might be useful to you:
+# if [[ -s Gemfile ]] && {
+# ! builtin command -v bundle >/dev/null ||
+# builtin command -v bundle | GREP_OPTIONS= \grep $rvm_path/bin/bundle >/dev/null
+# }
+# then
+# printf "%b" "The rubygem 'bundler' is not installed. Installing it now.\n"
+# gem install bundler
+# fi
+# if [[ -s Gemfile ]] && builtin command -v bundle >/dev/null
+# then
+# bundle install | GREP_OPTIONS= \grep -vE '^Using|Your bundle is complete'
+# fi
View
51 processors/episode_summary.rb
@@ -0,0 +1,51 @@
+# Creates a formatted summary of a collection of episodes.
+#
+class EpisodeSummary
+
+ attr_accessor :items
+
+ # Converts the passed episode items in a summary
+ # that is formatted based on the passed format.
+ # @param [Array<#title>] items The episodes to summarize
+ # @param [Symbol] format The summary format (:html and :json
+ # supported)
+ # @return [String]
+ def process(items, format=:html)
+ self.items = items
+ if format == :html
+ html_header + "\n" + \
+ items.map{|i| html_episode_summary(i)}.join("\n") + \
+ html_footer
+ elsif format == :json
+ items.map(&:to_json)
+ else
+ "Format #{format} not supported"
+ end
+ end
+
+ def html_header
+ <<-EOS
+ <!DOCTYPE html>
+ <html>
+ <head><meta charset="utf-8"></head>
+ <body>
+ <div>
+ <h1>List of episodes</h1>
+ <ul>
+ EOS
+ end
+
+ def html_footer
+ "</ul></div></body></html>"
+ end
+
+ def html_episode_summary(item)
+ <<-EOS
+ <li>
+ <h2>#{item.show_name} - #{item.title}</h2>
+ <a href="#{item.url}">Link (#{item.notes})</a>
+ </li>
+ EOS
+ end
+
+end
View
19 processors/to_file.rb
@@ -0,0 +1,19 @@
+require 'tempfile'
+
+class ToFile
+
+ # Saves the passed content to a file.
+ # @param [#to_s] content The content to save to file.
+ # @param [String, NilClass] destination The path to save the content,
+ # if none is passed, a tmpfile is used.
+ # @return [String] The path of the file the content was saved to.
+ def process(content, destination=nil)
+ if destination
+ file = File.open(destination, 'w'){|f| f << content}
+ else
+ file = Tempfile.new('scrapbook'){|f| f << content}
+ end
+ file.path
+ end
+
+end
View
9 runner.rb
@@ -1,9 +1,16 @@
require 'bundler'
Bundler.require
+require 'fileutils'
+
+STDOUT.sync = true
# Require all the scrapers
Dir.glob("./scrapers/*.rb"){|file| require file }
+Dir.glob("./processors/*.rb"){|file| require file }
+FileUtils.mkdir_p('output')
# TODO: use a scheduler and send to processors
episodes = FranceTVJeunesse.run
-puts episodes.map(&:to_json)
+summary = EpisodeSummary.new.process(episodes)
+destination = File.join(File.expand_path(File.dirname(__FILE__)), "output", "summary_#{Time.now.strftime("%Y-%m-%d")}.html")
+puts ToFile.new.process(summary, destination)
View
4 scrapers/pluzz_francetv_fr.rb
@@ -8,8 +8,8 @@ def self.run
url = "http://pluzz.francetv.fr/ajax/launchsearch/rubrique/jeunesse/datedebut/#{Time.now.strftime("%Y-%m-%dT00:00")}/datefin/#{Time.now.strftime("%Y-%m-%dT23:59")}/type/lesplusrecents/nb/100/"
page = agent.get(url)
episodes = fetch_episodes(page)
- puts "success" unless episodes.find{|e| e.failed?}
- episides
+ STDERR << "Error scraping #{url}" if episodes.find{|e| e.failed?}
+ episodes
end
def self.fetch_episodes(page)

No commit comments for this range

Something went wrong with that request. Please try again.