Extraction from the Pure Research Information System and transformation for loading by Archivematica.
Includes transfer preparation, reporting and disk space management.
Add this line to your application's Gemfile:
gem 'preservation'
And then execute:
$ bundle
Or install it yourself as:
$ gem install preservation
Configure Preservation. If log_path
is omitted, logging (standard library)
writes to STDOUT.
Preservation.configure do |config|
config.db_path = ENV['ARCHIVEMATICA_DB_PATH']
config.ingest_path = ENV['ARCHIVEMATICA_INGEST_PATH']
config.log_path = ENV['PRESERVATION_LOG_PATH']
end
Create a hash for passing to a transfer.
# Pure host with authentication.
config = {
url: ENV['PURE_URL'],
username: ENV['PURE_USERNAME'],
password: ENV['PURE_PASSWORD']
}
# Pure host without authentication.
config = {
url: ENV['PURE_URL']
}
Configure a transfer to retrieve data from a Pure host.
transfer = Preservation::Transfer::Dataset.new config
If necessary, fetch the metadata, prepare a directory in the ingest path and populate it with the files and JSON description file.
transfer.prepare uuid: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
For multiple Pure datasets, if necessary, fetch the metadata, prepare a directory in the ingest path and populate it with the files and JSON description file.
A maximum of 10 will be prepared using the doi_short directory naming scheme. Each dataset will only be prepared if 20 days have elapsed since the metadata record was last modified.
transfer.prepare_batch max: 10,
dir_scheme: :doi_short,
delay: 20
The following are permitted values for the dir_scheme parameter:
:uuid_title
:title_uuid
:date_uuid_title
:date_title_uuid
:date_time_uuid
:date_time_title
:date_time_uuid_title
:date_time_title_uuid
:uuid
:doi
:doi_short
A transfer-ready directory, with a name built according to the directory scheme specified, in this case doi_short. This particular example has only one file Ebola_data_Jun15.zip in the dataset.
.
├── 10.17635-lancaster-researchdata-6
│ ├── Ebola_data_Jun15.zip
│ └── metadata
│ └── metadata.json
metadata.json:
[
{
"filename": "objects/Ebola_data_Jun15.zip",
"dc.title": "Ebolavirus evolution 2013-2015",
"dc.description": "Data used for analysis of selection and evolutionary rate in Zaire Ebolavirus variant Makona",
"dcterms.created": "2015-06-04",
"dcterms.available": "2015-06-04",
"dc.publisher": "Lancaster University",
"dc.identifier": "http://dx.doi.org/10.17635/lancaster/researchdata/6",
"dcterms.spatial": [
"Guinea, Sierra Leone, Liberia"
],
"dc.creator": [
"Gatherer, Derek"
],
"dc.contributor": [
"Robertson, David",
"Lovell, Simon"
],
"dc.subject": [
"Ebolavirus",
"evolution",
"phylogenetics",
"virulence",
"Filoviridae",
"positive selection"
],
"dcterms.license": "CC BY",
"dc.relation": [
"http://dx.doi.org/10.1136/ebmed-2014-110127",
"http://dx.doi.org/10.1099/vir.0.067199-0"
]
}
]
Free up disk space for completed transfers. Can be done at any time.
Preservation::Storage.cleanup
Can be used for scheduled monitoring of transfers.
Preservation::Report::Transfer.exception
Formatted as JSON:
{
"pending": {
"count": 3,
"data": [
{
"path": "10.17635-lancaster-researchdata-72",
"path_timestamp": "2016-09-29 12:08:58 +0100"
},
{
"path": "10.17635-lancaster-researchdata-74",
"path_timestamp": "2016-09-29 12:08:59 +0100"
},
{
"path": "10.17635-lancaster-researchdata-75",
"path_timestamp": "2016-09-29 12:09:00 +0100"
}
]
},
"current": {
"path": "10.17635-lancaster-researchdata-90",
"unit_type": "ingest",
"status": "PROCESSING",
"current": 1,
"id": 91,
"uuid": "ebf048c3-0ca8-409c-94cf-ab3e5d97e901",
"path_timestamp": "2016-09-28 17:09:33 +0100"
},
"failed": {
"count": 0
},
"incomplete": {
"count": 1,
"data": [
{
"path": "10.17635-lancaster-researchdata-90",
"unit_type": "ingest",
"status": "PROCESSING",
"current": 1,
"id": 91,
"uuid": "ebf048c3-0ca8-409c-94cf-ab3e5d97e901",
"path_timestamp": "2016-09-28 17:09:33 +0100"
}
]
},
"complete": {
"count": 78
}
}