Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 50 lines (42 sloc) 1.605 kb
437a6c1 @alecbenzer added flsd and dcd scrapers (dcd currently has https issues)
alecbenzer authored
1 class Dcd
2 include Expect # always include this
6589bdc flsd and dcd use OPINIONS_PAGE and match instead of =~
Alec Benzer authored
3
4 OPINIONS_PAGE = "https://ecf.dcd.uscourts.gov/cgi-bin/Opinions.pl"
437a6c1 @alecbenzer added flsd and dcd scrapers (dcd currently has https issues)
alecbenzer authored
5
6 def accept_host
7 "www.dcd.uscourts.gov"
8 end
9
10 def accept?(download)
6589bdc flsd and dcd use OPINIONS_PAGE and match instead of =~
Alec Benzer authored
11 download.request_uri == OPINIONS_PAGE
437a6c1 @alecbenzer added flsd and dcd scrapers (dcd currently has https issues)
alecbenzer authored
12 end
13
14 def request
6589bdc flsd and dcd use OPINIONS_PAGE and match instead of =~
Alec Benzer authored
15 DownloadRequest.new(OPINIONS_PAGE)
437a6c1 @alecbenzer added flsd and dcd scrapers (dcd currently has https issues)
alecbenzer authored
16 end
17
18 def parse(download, receiver)
19 doc = Hpricot(download.response_body_as('US-ASCII'))
20 table = doc.at('table#ts')
21 raise Exception.new("opinions table not found") unless table
22 header_anchors = table.search('th a')
23 match(header_anchors[0].inner_html,"Date Filed")
24 match(header_anchors[1].inner_html,"Case")
25 match(header_anchors[2].inner_html,"Opinion")
26 table.search('/tbody/tr').each do |row|
27 document = Document.new
28 document.court = "http://id.altlaw.org/courts/us/fed/dist/dcd"
29
30 date_text = row.at('td[1]').inner_html
6589bdc flsd and dcd use OPINIONS_PAGE and match instead of =~
Alec Benzer authored
31 # date_text =~ /(\d{2})\/(\d{2})\/(\d{4})<!--(.*)-->/
32 md = match(date_text,/(\d{2})\/(\d{2})\/(\d{4})<!--(.*)-->/)
33 document.date = Date.new(md[3].to_i, md[1].to_i, md[2].to_i)
437a6c1 @alecbenzer added flsd and dcd scrapers (dcd currently has https issues)
alecbenzer authored
34 number_text = row.at('td[2]').children.first.to_s
6589bdc flsd and dcd use OPINIONS_PAGE and match instead of =~
Alec Benzer authored
35 # number_text =~ /(\d{4}-\d{4})/
36 md = match(number_text,/(\d{4}-\d{4})/)
37 document.dockets << md[1]
437a6c1 @alecbenzer added flsd and dcd scrapers (dcd currently has https issues)
alecbenzer authored
38 document.name = row.at('td[2] br').next.to_s
39 row.search('td[3] a').each do |anchor|
40 document.add_link('application/pdf', 'https://ecf.dcd.uscourts.gov/cgi-bin/' + anchor['href'])
41 end
42 judge_text = row.at('td[3] br').next.to_s
43 judge_text =~ /by (.*)/
44 document.opinion_by = $1
45
46 receiver << document
47 end
48 end
49 end
Something went wrong with that request. Please try again.