Skip to content
This repository has been archived by the owner on Apr 6, 2019. It is now read-only.

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
James McKinney committed Feb 6, 2014
1 parent 54cfa4a commit d366506
Showing 1 changed file with 45 additions and 41 deletions.
86 changes: 45 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,63 +43,67 @@ If your project uses a lot of scrapers – for example, [OpenCorporates](http:/

You can validate a scraper's metadata (how often it runs, what fields it stores, etc.) like so:

require 'scraperwiki-api'
api = ScraperWiki::API.new
```ruby
require 'scraperwiki-api'
api = ScraperWiki::API.new

info = api.scraper_getinfo('example-scraper').first
info = api.scraper_getinfo('example-scraper').first

describe 'example-scraper' do
include ScraperWiki::API::Matchers
subject {info}
describe 'example-scraper' do
include ScraperWiki::API::Matchers
subject {info}

it {should be_protected}
it {should be_editable_by('frabcus')}
it {should run(:daily)}
it {should_not be_broken}
it {should be_protected}
it {should be_editable_by('frabcus')}
it {should run(:daily)}
it {should_not be_broken}

# Validate the properties of a SQLite table by chaining on a +on+.
it {should have_a_row_count_of(42).on('swdata')}
# Validate the properties of a SQLite table by chaining on a +on+.
it {should have_a_row_count_of(42).on('swdata')}

# Ensure that the scraper sets required fields.
it {should have_at_least_the_keys(['name', 'email']).on('swdata')}
# Ensure that the scraper sets required fields.
it {should have_at_least_the_keys(['name', 'email']).on('swdata')}

# Ensure that the scraper doesn't set too many fields.
it {should have_at_most_the_keys(['name', 'email', 'tel', 'fax']).on('swdata')}
end
# Ensure that the scraper doesn't set too many fields.
it {should have_at_most_the_keys(['name', 'email', 'tel', 'fax']).on('swdata')}
end
```

And you can validate the scraped data like so:

require 'scraperwiki-api'
api = ScraperWiki::API.new
```ruby
require 'scraperwiki-api'
api = ScraperWiki::API.new

data = api.datastore_sqlite('example-scraper', 'SELECT * from `swdata`')
data = api.datastore_sqlite('example-scraper', 'SELECT * from `swdata`')

describe 'example-scraper' do
include ScraperWiki::API::Matchers
subject {data}
describe 'example-scraper' do
include ScraperWiki::API::Matchers
subject {data}

# If you need at least one of a set of fields to be set:
it {should set_any_of(['name', 'first_name', 'last_name'])}
# If you need at least one of a set of fields to be set:
it {should set_any_of(['name', 'first_name', 'last_name'])}

# Validate the values of individual fields by chaining on an +in+.
it {should_not have_blank_values.in('name')}
it {should have_unique_values.in('email')}
it {should have_values_of(['M', 'F']).in('gender')}
it {should have_values_matching(/\A[^@\s]+@[^@\s]+\z/).in('email')}
it {should have_values_starting_with('http://').in('url')}
it {should have_values_ending_with('Inc.').in('company_name')}
it {should have_integer_values.in('year')}
# Validate the values of individual fields by chaining on an +in+.
it {should_not have_blank_values.in('name')}
it {should have_unique_values.in('email')}
it {should have_values_of(['M', 'F']).in('gender')}
it {should have_values_matching(/\A[^@\s]+@[^@\s]+\z/).in('email')}
it {should have_values_starting_with('http://').in('url')}
it {should have_values_ending_with('Inc.').in('company_name')}
it {should have_integer_values.in('year')}

# If you store a hash or an array of hashes in a field as a JSON string,
# you can validate the values of these subfields by chaining on an +at+.
it {should have_values_of(['M', 'F']).in('extra').at('gender')}
# If you store a hash or an array of hashes in a field as a JSON string,
# you can validate the values of these subfields by chaining on an +at+.
it {should have_values_of(['M', 'F']).in('extra').at('gender')}

# Check for missing keys within subfields.
it {should have_values_with_at_least_the_keys(['subfield1', 'subfield2']).in('fieldA')}
# Check for missing keys within subfields.
it {should have_values_with_at_least_the_keys(['subfield1', 'subfield2']).in('fieldA')}

# Check for extra keys within subfields.
it {should have_values_with_at_most_the_keys(['subfield1', 'subfield2', 'subfield3', 'subfield4']).in('fieldA')}
end
# Check for extra keys within subfields.
it {should have_values_with_at_most_the_keys(['subfield1', 'subfield2', 'subfield3', 'subfield4']).in('fieldA')}
end
```

More documentation at [RubyDoc.info](http://rdoc.info/gems/scraperwiki-api/ScraperWiki/API/Matchers).

Expand Down

0 comments on commit d366506

Please sign in to comment.