Metadata extraction from the Pure Research Information System.
Ruby
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
test
.gitignore
CHANGELOG.md
Gemfile
LICENSE.txt
PITCHME.md
README.md
Rakefile
puree.gemspec

README.md

Purée

Metadata extraction from the Pure Research Information System.

Status

Gem Version Maintainability

Installation

Add this line to your application's Gemfile:

gem 'puree'

And then execute:

$ bundle

Or install it yourself as:

$ gem install puree

Configuration

# For Extractor and REST modules.
config = {
  url:      'https://YOUR_HOST/ws/api/59',
  username: 'YOUR_USERNAME',
  password: 'YOUR_PASSWORD',
  api_key:  'YOUR_API_KEY'
}

Extractor module

Find a resource by identifier and get Ruby objects.

# Configure an extractor
extractor = Puree::Extractor::Dataset.new config
# Fetch the metadata for a resource with a particular identifier
dataset = extractor.find 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
#=> #<Puree::Model::Dataset:0x00c0ffee>
# Access specific metadata e.g. an internal person's name
dataset.persons_internal[0].name
#=> #<Puree::Model::PersonName:0x00c0ffee @first="Foo", @last="Bar">
# Select a formatting style for a person's name
dataset.persons_internal[0].name.last_initial
#=> "Bar, F."

XMLExtractor module

Get Ruby objects from Pure XML.

Single resource

xml = '<project> ... </project>'
# Configure an XML extractor
xml_extractor = Puree::XMLExtractor::Project.new xml
# Get a single piece of metadata
xml_extractor.title
#=> "An interesting project title"
# Get all the metadata together
xml_extractor.model
#=> #<Puree::Model::Project:0x00c0ffee>

Homogeneous resource collection

xml = '<result>
        <dataSet> ... </dataSet>
        <dataSet> ... </dataSet>
        ...
      </result>'
# Get an array of datasets
Puree::XMLExtractor::Collection.datasets xml
#=> [#<Puree::Model::Dataset:0x00c0ffee>, ...]

Heterogeneous resource collection

xml = '<result>
        <contributionToJournal> ... </contributionToJournal>
        <contributionToConference> ... </contributionToConference>
        ...
      </result>'
# Get a hash of research outputs
Puree::XMLExtractor::Collection.research_outputs xml
#=> {
#     journal_articles: [#<Puree::Model::JournalArticle:0x00c0ffee>, ...],
#     conference_papers: [#<Puree::Model::ConferencePaper:0x00c0ffee>, ...],
#     theses: [#<Puree::Model::Thesis:0x00c0ffee>, ...],
#     other: [#<Puree::Model::ResearchOutput:0x00c0ffee>, ...]
#   }

REST module

Query the Pure REST API.

Client

# Configure a client
client = Puree::REST::Client.new config
# Find a person
client.persons.find id: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
#=> #<HTTP::Response:0x00c0ffee>
# Find a person, limit the metadata to ORCID and employee start date
client.persons.find id: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx',
                    params: {fields: ['orcid', 'employeeStartDate']}
#=> #<HTTP::Response:0x00c0ffee>
# Find five people, response body as JSON
client.persons.all params: {size: 5}, accept: :json
#=> #<HTTP::Response:0x00c0ffee>
# Find research outputs for a person
client.persons.research_outputs id: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
#=> #<HTTP::Response:0x00c0ffee>

Resource

# Configure a resource
persons = Puree::REST::Person.new config
# Find a person
persons.find id: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
#=> #<HTTP::Response:0x00c0ffee>

REST module with XMLExtractor module

Query the Pure REST API and get Ruby objects from Pure XML.

# Configure a client
client = Puree::REST::Client.new config
# Find projects for a person
response = client.persons.projects id: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
# Extract metadata from XML
Puree::XMLExtractor::Collection.projects response.to_s
#=> [#<Puree::Model::Project:0x00c0ffee>, ...]