Skip to content

Commit

Permalink
RailsCasts Offline basic code with setup and configuration
Browse files Browse the repository at this point in the history
(Lost Git History during recovery from Timemachine)
  • Loading branch information
sairam committed Dec 23, 2012
0 parents commit 857b144
Show file tree
Hide file tree
Showing 11 changed files with 4,119 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,3 @@
railscasts
subscription_code.rb
railscasts.txt
8 changes: 8 additions & 0 deletions LINCENSE
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,8 @@
The MIT License (MIT)
Copyright (c) 2012 Sairam Kunala

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
51 changes: 51 additions & 0 deletions README.md
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,51 @@
# Scraper for [Railscasts](http://railscasts.com/)
* Pro and Revised episodes require subscription at [railscasts.com/pro](http://railscasts.com/pro)

# Disclaimer
* This was a code written in mid 2011 when I got frustrated with my internet connection and was eager to go through Railscasts.
* No standards were followed during the making of these files.

# Written for ruby 1.9.3
```bash
# requires rubygems
gem install nokogiri
```

# Setup

```bash
mkdir -p railscasts/{free,pro,revised}/{raw,asciicasts} railscasts/{free,pro,revised}/asciicasts/images
cd railscasts
wget http://railscasts.com/assets/railscasts_logo.png
wget http://www.feedicons.com/download/feedicons-standard.zip
unzip feedicons-standard.zip && rm feedicons-standard.zip
mv feed-icon-14x14.png feed-icon-small.png
mv feed-icon-28x28.png feed-icon.png
cd ..
```
# Configuring
Fill up your subscription code, cookie token and target_url where you plan to host.
```bash
cp subscription_code.rb.example subscription_code.rb
# Open http://railscasts.com
# copy cookie and subscription code
vim subscription_code.rb
```

# Getting the latest episode versions
```bash
ruby download_episodes.rb # This will populate the wget urls in the free, pro and revised
# run ruby download_episodes.rb 0 for the first time to download all the versions . Otherwise only last 1 page in the railscasts list will be considered for download
sh wget # script to download all the files.
ruby generate_index.html.rb # to generate the index.html, index.rss for viewing on a web browser
```
# TODO
1. add a script given the username and password fetches and populates subscription\_code.rb
1. Fix publication dates in RSS Feeds
1. Refactor code in classes.

# LICENSE
See LICENSE file for distribution

# NOTE - [Content Distribution of RailsCasts](http://railscasts.com/about)
All free RailsCasts episodes are under the Creative Commons license. You are free to distribute unedited versions of those episodes for non-commercial purposes. You are also free to translate them into any language. If you would like to edit the video please contact [Ryan Bates](http://github.com/ryanb). All pro and revised episodes are not licensed for redistribution.
140 changes: 140 additions & 0 deletions download_episodes.rb
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,140 @@
require 'open-uri'
require 'net/https'
require 'rubygems'
require 'nokogiri'

begin
require_relative 'subscription_code' #code = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
rescue
puts "Please add your subscription code from subscription_code.rb.example"
exit(0)
end

subscription_code = SubscriptionCode::CODE
CookieString = "token="+SubscriptionCode::COOKIE

LimitPages = 0 # 0 will ignore this config
# Get all episode names

# traverse through each
# check if video exists with me

# else download it

# Used from https://github.com/defunkt/gist/blob/master/lib/gist.rb
if ENV['https_proxy'] && !ENV['https_proxy'].empty?
PROXY = URI(ENV['https_proxy'])
elsif ENV['http_proxy'] && !ENV['http_proxy'].empty?
PROXY = URI(ENV['http_proxy'])
else
PROXY = nil
end

media_url = {:free => "http://media.railscasts.com/assets/episodes/videos/", :subscription => "http://media.railscasts.com/assets/subscriptions/#{subscription_code}/videos/"}
format = ".mp4"

def subscription(type)
if type == :revised || type == :pro
:subscription
elsif type == :free
:free
else
type
end
end

def pageopen(url,use_cookies=false)
puts "Opening page #{url}"
ck = ""
if use_cookies
ck = CookieString
end

begin
page = open(url, :proxy => PROXY, "Cookie" => ck)
rescue OpenURI::HTTPError => e
page = nil
puts "The request for a page at #{url} returned an error. #{e.message}"
end
page
end

def download full_url, to_here
writeOut = open(to_here, "wb")
data = pageopen(full_url)
if data
writeOut.write(data.read)
writeOut.close
end
!data.nil?
end


episode_urls = { :free => [ "http://railscasts.com/?type=free"], :revised => %w{http://railscasts.com/?type=revised}, :pro => %w{http://railscasts.com/?type=pro } }
base_url = "http://railscasts.com"
eplist = open('./railscasts.txt','w+')
asciiview = "?view=asciicast"
asciiformat = ".html"
asciicasts = "asciicasts"
asciiimages = "#{asciicasts}/images"

episode_urls.each do |type,urls|
urls.each do |url|
limiter = ARGV[0].to_i || 1
until url.nil? == true
p = pageopen(url)

page = Nokogiri::HTML(p.read)

page.css("h2 > a").each do |link|

ep = link["href"].split("/")[2]

epa = (ep.split("-")[0]).size
ep = ("0"*(3-epa))+ep if epa - 3 < 0

unless File.exists?("railscasts/#{type}/#{ep}#{format}") && File.open("railscasts/#{type}/#{ep}#{format}", 'r').size > 1000000
open("railscasts/#{type}/get.url",'a+').write( "wget -c "+media_url[subscription(type)]+ep+format + "\n")
else
eplist.write(link["href"]+"\n")
end

# now process asciicasts
ascii_file = "railscasts/#{type}/raw/#{ep}#{asciiformat}"
unless File.exists?(ascii_file) && (File.open(ascii_file).size > 10) && (File.open(ascii_file).read =~ /Currently no transcriptions/).nil?
puts "Processing Asciicast for #{type}/#{ep}"
ascii = pageopen(base_url+link["href"]+asciiview, subscription(type)==:subscription )
Nokogiri::HTML(ascii.read).css(".asciicast").each do |epread|
epread.css('img').each do |x|
next unless x['src'][0..3] == base_url[0..3]
ai = "railscasts/#{type}/#{asciiimages}/#{x['src'].split('/')[-1].split('?')[0]}"
download x['src'],ai
x['src'] = "/"+ai
end

# we dont need this additional clippy
epread.css('.clippy').each {|d| d.inner_html = ""}
epread.css('.languages').each {|d| d.inner_html = ""}

File.open("railscasts/#{type}/raw/#{ep}#{asciiformat}","w+") do |f|
f.write(epread.inner_html)
end
end
end

end
url = nil
next if (LimitPages - limiter == 0)
limiter +=1
p = page.css(".pagination > a[@rel='next']")
p.each do |link|
url = base_url+link["href"]
end

end
end
# now go to next page and perform the same
end

eplist.close

138 changes: 138 additions & 0 deletions generate_index.html.rb
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,138 @@
begin
require_relative 'subscription_code' #code = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
rescue
puts "Please add your subscription code from subscription_code.rb.example"
exit(0)
end

def filelists(dir)
Dir.entries("./railscasts/#{dir}").sort{|x,y| y.split("-")[0] <=> x.split("-")[0] }
end
EpisodeTypes = %w{pro revised free}
MainUrl = SubscriptionCode::TARGET_URL

def head(style_attr=:default,title="", title_link="")

stylesheets = {}
stylesheets[:default] = %{
<link href="/stylesheets/bootstrap.css" media="screen" rel="stylesheet" type="text/css" />
}.to_s
stylesheets[:ascii] =%{
<link href="/stylesheets/coderay.css" media="screen" rel="stylesheet" type="text/css" />
<link href="/stylesheets/application.css" media="screen" rel="stylesheet" type="text/css" />
}.to_s

%{<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title>Rails Casts Episodes (cached) #{title}- DO NOT DISTRIBUTE</title>
<link rel="alternate" type="application/rss+xml" title="RSS" href="index.xml" />
#{stylesheets[style_attr]}
<style>
body {
margin: 0px;
padding-left: 40px;
}
</style>
</head>
<body>
<img src="/railscasts/railscasts_logo.png" >
<h1>#{title}<a href="index.xml"><img src='/railscasts/feed-icon.png'></a></h1>#{title_link}<br />
}.to_s

end


foot= %{
</body>
</html>
}.to_s

def rss_item(title,rel_link)
%{
<item>
<title>#{title}</title>
<description>Video of #{title}</description>
<link>#{MainUrl}#{rel_link}</link>
<pubDate>Tue, 31 Jan 2012 09:00:00 -0400</pubDate>
<enclosure url="#{MainUrl}#{rel_link}" length="1010698" type="video/mpeg"/>
<guid isPermaLink="false">http://tuts.local.crypsis.net/railscasts/#{rel_link}</guid>
</item>
}
end
def make_rss_main(items)
%{<rss version="2.0">
<channel>
<title>Rails Casts (Crypsis)</title>
<description> Cached videos of Rails Casts </description>
<link>#{MainUrl}</link>
<lastBuildDate>Mon, 30 Jan 2012 11:12:55 -0400</lastBuildDate>
<pubDate>Tue, 31 Jan 2012 09:00:00 -0400</pubDate>
#{items}
</channel>
</rss>
}
end


def print_nav(types = EpisodeTypes)
out = ""
out << "<h2>"
types.each do |type|
out << %{<a href="##{type}">#{type.to_s.capitalize}</a> &nbsp;&nbsp;&nbsp;}
out << %{<a href="#{type}/index.xml"><img src='/railscasts/feed-icon-small.png'></a>&nbsp;&nbsp;&nbsp;}
end
out << "</h2>"
out
end

def print_episodes(types = EpisodeTypes,relative_path="")
out = ""
rss = ""
types.each do |type|
filelist = filelists(type)
out << %{<h1><a name="#{type}">#{type.to_s.capitalize}</a> </h1>}
filelist.each do |link|
if link.reverse.split(".")[0] == "4pm"
out << "<div class='row'>"
episode_name = link.to_s.split(".")[0].gsub("-"," ").capitalize
out << %{<a class="btn span4" href="#{relative_path}#{type}/#{link}">#{episode_name}</a>}
htmlfile = link.to_s.split(".")[0]+".html"
ascii = "asciicasts/"+ htmlfile
out << %{<a class="btn offset1 span2" href="#{relative_path}#{type}/#{ascii}">Read Episode</a>} if File.exists?("./railscasts/#{type}/raw/#{htmlfile}")
out << "</div>" + "<br>"*2
rss << rss_item("#{type} - #{episode_name}","#{type}/#{link}")
end

end
out << "<br>"*3
end
[out,rss]
end
eps = print_episodes
open("railscasts/index.html","w+").write(head+print_nav+eps[0]+foot)
open("railscasts/index.xml","w+").write(make_rss_main(eps[1]))

EpisodeTypes.each do |type|
rel_path = "../"
eps = print_episodes([type],rel_path)
open("railscasts/#{type}/index.html","w+").write(head+eps[0]+foot)
open("railscasts/#{type}/index.xml","w+").write(make_rss_main(eps[1]))
filelists(type).each do |ep|
next unless ep.reverse.split(".")[0] == "4pm"
htmlfile = ep.to_s.split(".")[0]+".html"
ascii = "asciicasts/"+ htmlfile
raw = "raw/"+ htmlfile
watch_ep = %{<a class="btn" href="#{rel_path}#{ep}">Watch Episode</a>}
if File.exists?("railscasts/#{type}/#{raw}")
open("railscasts/#{type}/#{ascii}","w+") do |asci|
asci.write( head(:ascii, ep.split(".")[0].split(".")[0].gsub("-"," ").capitalize, watch_ep)+open("railscasts/#{type}/#{raw}").read+foot )
end
end
end

end

19 changes: 19 additions & 0 deletions index.html
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,19 @@
<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title>Rails Casts Episodes (cached)</title>
<link href="/stylesheets/bootstrap.css" media="screen" rel="stylesheet" type="text/css" />
<style>
body {
margin: 0px;
padding-left: 40px;
}
</style>
</head>
<body>

<a href= "/railscasts/index.html"><img src="/railscasts/railscasts_logo.png"></a>

</body>
</html>
Loading

0 comments on commit 857b144

Please sign in to comment.