Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3 from everypolitician-scrapers/scrape-multiple-t…
…erms-tb Scrape multiple terms
- Loading branch information
Showing
11 changed files
with
163 additions
and
99 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
data.sqlite |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
AllCops: | ||
TargetRubyVersion: 2.1 | ||
|
||
inherit_from: | ||
- https://raw.githubusercontent.com/everypolitician/everypolitician-data/master/.rubocop_base.yml | ||
- .rubocop_todo.yml |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
class KhuralMember < NokogiriDocument | ||
field :name do | ||
tds[-4].xpath('.//a').text.strip | ||
end | ||
|
||
field :name_mn do | ||
tds[-3].text.strip | ||
end | ||
|
||
field :party do | ||
tds[-1].text.strip | ||
end | ||
|
||
field :wikiname do | ||
tds[-4].xpath('.//a[not(@class="new")]/@title').text.strip | ||
end | ||
|
||
field :constituency do | ||
tds[0].text.strip.gsub("\n", ' — ') | ||
end | ||
|
||
private | ||
|
||
def tds | ||
@tds ||= noko.css('td') | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
require_relative 'nokogiri_document' | ||
require_relative 'unspanned_table' | ||
require_relative 'khural_member' | ||
|
||
class MemberTable < NokogiriDocument | ||
field :members do | ||
table.xpath('.//tr[td]').map do |tr| | ||
KhuralMember.new(tr).to_h | ||
end | ||
end | ||
|
||
private | ||
|
||
def table | ||
UnspannedTable.new(noko).transformed | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
require 'field_serializer' | ||
require 'nokogiri' | ||
|
||
class NokogiriDocument | ||
include FieldSerializer | ||
|
||
def initialize(noko) | ||
@noko = noko | ||
end | ||
|
||
private | ||
|
||
attr_reader :noko | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
require_relative 'member_table' | ||
require 'nokogiri' | ||
|
||
class TermPage < NokogiriDocument | ||
field :members do | ||
MemberTable.new(table).members | ||
end | ||
|
||
private | ||
|
||
def table | ||
noko.xpath('.//h2/span[text()[contains(.,"Constituency")]]/following::table[1]') | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
class UnspannedTable | ||
def initialize(noko_table) | ||
@original = noko_table | ||
end | ||
|
||
def transformed | ||
@transformed ||= Nokogiri.HTML( | ||
'<table>' + | ||
reparsed.map { |c| '<tr>' + c.map(&:to_html).join + '</tr>' }.join + | ||
'</table>' | ||
) | ||
end | ||
|
||
private | ||
|
||
attr_reader :original | ||
|
||
def reparsed | ||
grid = [] | ||
|
||
original.css('tr').each_with_index do |row, curr_x| | ||
row.css('td, th').each_with_index do |cell, curr_y| | ||
rowspan = cell.remove_attribute('rowspan').value.to_i rescue 1 | ||
colspan = cell.remove_attribute('colspan').value.to_i rescue 1 | ||
|
||
0.upto(rowspan - 1).each do |x| | ||
0.upto(colspan - 1).each do |y| | ||
curr_y += 1 while (grid[curr_x + x] ||= [])[curr_y + y] | ||
grid[curr_x + x][curr_y + y] = cell | ||
end | ||
end | ||
end | ||
end | ||
|
||
grid | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters