Ruby library and tools for working with datapackages
Ruby Python HTML
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
datapackage/schemas
lib
spec
.coveralls.yml
.gitignore
.rspec
.ruby-version
.travis.yml
CHANGELOG.md
Gemfile
Gemfile.lock
LICENSE.md
README.md
Rakefile
datapackage.gemspec

README.md

DataPackage.rb

Travis Coveralls Gem Version SemVer Gitter

A ruby library for working with Data Packages.

The library is intending to support:

  • Parsing and using data package metadata and data
  • Validating data packages to ensure they conform with the Data Package specification

Installation

Add the gem into your Gemfile:

gem 'datapackage.rb'

Or:

gem install datapackage

Reading a Data Package

Require the gem, if you need to:

require 'datapackage'

Parsing a Data Package from a remote location:

package = DataPackage::Package.new( "http://example.org/datasets/a" )

This assumes that http://example.org/datasets/a/datapackage.json exists, or specifically load a JSON file:

package = DataPackage::Package.new( "http://example.org/datasets/a/datapackage.json" )

Similarly you can load a package from a local JSON file, or specify a directory:

package = DataPackage::Package.new( "/my/data/package" )
package = DataPackage::Package.new( "/my/data/package/datapackage.json" )

There are a set of helper methods for accessing data from the package, e.g:

package = DataPackage::Package.new( "/my/data/package" )
package.name
package.title
package.description
package.homepage
package.license

Reading a Data Package and its resources

require 'datapackage'

dp = DataPackage::Package.new('http://data.okfn.org/data/core/gdp/datapackage.json')

data = CSV.parse(dp.resources[0].data, headers: true)
brazil_gdp = data.select { |r| r["Country Code"] == "BRA" }.
                  map { |row| { year: Integer(row["Year"]), value: Float(row['Value']) } }

max_gdp = brazil_gdp.max_by { |r| r[:value] }
min_gdp = brazil_gdp.min_by { |r| r[:value] }

percentual_increase = (max_gdp[:value] / min_gdp[:value]).round(2)
max_gdp_val = max_gdp[:value].to_s.reverse.gsub(/(\d{3})(?=\d)/, '\\1,').reverse

msg =  "The highest Brazilian GDP occured in #{max_gdp[:year]}, when it peaked at US$ " +
"#{max_gdp_val}. This was #{percentual_increase}% more than its minumum GDP " +
"in #{min_gdp[:year]}"

print msg

# The highest Brazilian GDP occured in 2011, when it peaked at US$ 2,615,189,973,181. This was 172.44% more than its minimum GDP in 1960.

Creating a Data Package

package = DataPackage::Package.new

package.name = 'my_sleep_duration'
package.resources =  [
  {'name': 'data'}
]

resource = package.resources[0]
resource.descriptor['data'] = [
  7, 8, 5, 6, 9, 7, 8
]

File.open('datapackage.json', 'w') do |f|
  f.write(package.to_json)
end

# {"name": "my_sleep_duration", "resources": [{"name": "data", "data": [7, 8, 5, 6, 9, 7, 8]}]}

Validating a Data Package

package = DataPackage::Package.new('http://data.okfn.org/data/core/gdp/datapackage.json')

package.valid?
#=> true
package.errors
#=> [] # An array of errors

Using a different schema

By default, the gem uses the standard Data Package Schema, but alternative schemas are available.

Schemas in the local cache

The gem comes with schemas for the standard Data Package Schema, as well as the Tabular Data Package Schema, and the Fiscal Data Package Schema. These can be referred to via an identifier, expressed as a symbol.

package = DataPackage::Package.new(nil, :tabular) # Or :fiscal

Schemas from elsewhere

If you have a schema stored in an alternative registry, you can pass a registry_url option to the initializer.

package = DataPackage::Package.new(nil, :identifier, {registry_url: 'http://example.org/my-registry.csv'} )

Developer notes

These notes are intended to help people that want to contribute to this package itself. If you just want to use it, you can safely ignore them.

Updating the local schemas cache

We cache the schemas from https://github.com/dataprotocols/schemas using git-subtree. To update it, use:

  git subtree pull --prefix datapackage/schemas https://github.com/dataprotocols/schemas.git master --squash