Skip to content
Define an errata in table format (CSV) and then apply it to an arbitrary source. Inspired by RFC Errata, lets you keep your own errata in a transparent way.
Ruby
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
lib
test
.document
.gitignore
Gemfile
LICENSE
README.md
Rakefile
errata.gemspec

README.md

errata

Correct strings based on remote errata files.

Example

Every errata has a table structure based on the IETF RFC Editor's "How to Report Errata".

date name email type section action x y condition notes
2011-03-22 Ian Hough ian@brighterplanet.com meta Intended use http://example.com/original-data-with-errors.xls A hypothetical document that uses non-ISO country names
2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /ANTIGUA & BARBUDA/ ANTIGUA AND BARBUDA
2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /BOLIVIA/ BOLIVIA, PLURINATIONAL STATE OF
2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /BOSNIA & HERZEGOVINA/ BOSNIA AND HERZEGOVINA
2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /BRITISH VIRGIN ISLANDS/ VIRGIN ISLANDS, BRITISH
2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /COTE D'IVOIRE/ CÔTE D'IVOIRE
2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /DEM\. PEOPLE'S REP\. OF KOREA/ KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF
2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /DEM\. REP\. OF THE CONGO/ CONGO, THE DEMOCRATIC REPUBLIC OF THE
2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /HONG KONG SAR/ HONG KONG
2011-03-22 Ian Hough ian@brighterplanet.com technical Country Name replace /IRAN \(ISLAMIC REPUBLIC OF\)/ IRAN, ISLAMIC REPUBLIC OF

Which would be saved as a CSV:

date,name,email,type,section,action,x,y,condition,notes
2011-03-22,Ian Hough,ian@brighterplanet.com,meta,Intended use,,http://example.com/original-data-with-errors.xls,,A hypothetical document that uses non-ISO country names
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/ANTIGUA & BARBUDA/,ANTIGUA AND BARBUDA,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/BOLIVIA/,"BOLIVIA, PLURINATIONAL STATE OF",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/BOSNIA & HERZEGOVINA/,BOSNIA AND HERZEGOVINA,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/BRITISH VIRGIN ISLANDS/,"VIRGIN ISLANDS, BRITISH",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/COTE D'IVOIRE/,CÔTE D'IVOIRE,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/DEM\.  PEOPLE'S REP\. OF KOREA/,"KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/DEM\. REP\. OF THE CONGO/,"CONGO, THE DEMOCRATIC REPUBLIC OF THE",,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/HONG KONG SAR/,HONG KONG,,
2011-03-22,Ian Hough,ian@brighterplanet.com,technical,Country Name,replace,/IRAN \(ISLAMIC REPUBLIC OF\)/,"IRAN, ISLAMIC REPUBLIC OF",,

And then used

errata = Errata.new(:url => 'http://example.com/errata.csv')
original = RemoteTable.new(:url => 'http://example.com/original-data-with-errors.xls')
original.each do |row|
  errata.correct! row # destructively correct each row
end

UTF-8

Assumes all input strings are UTF-8. Otherwise there can be problems with Ruby 1.9 and Regexp::FIXEDENCODING. Specifically, ASCII-8BIT regexps might be applied to UTF-8 strings (or vice-versa), resulting in Encoding::CompatibilityError.

Real-life usage

Used by data_miner

Authors

Copyright

Copyright (c) 2011 Brighter Planet. See LICENSE for details.

Something went wrong with that request. Please try again.