A PREMIS-compliant fixity checking microservice.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

RipRap

Riprap

Contribution Guidelines LICENSE

A fixity-auditing microservice that addresses https://github.com/Islandora-CLAW/CLAW/issues/847. Developed as a successfor to Islandora 7.x's Checksum Checker module, it is intended primarily to be used with repositories compliant with the Fedora API Specification, but can be used to provide fixity validation for other repositories as well (e.g., an OCFL repository). In fact, Riprap ships with sample plugins that allow it to monitor the fixity of files on a standard attached filesystem and call sha1sum to get their current digests.

Riprap periodcally requests fixity digests for resources from a repository and compares the digest with a previously request digest. It then persists the outocome of that comparison so the process can be repeated again. Riprap also provides a REST interface so that external applications (in Islandora's case, Drupal) can retrieve fixity checking event data for use in reports, etc.

Overview

Riprap generates and records fixity check events as described in the "Fixity, integrity, authenticity" section of the PREMIS Data Dictionary for Preservation Metadata, Version 3.0. It can also record fixity information available during "ingestion" events and available at the time of "deletion" events. "fixity check" events are generated by Riprap, typically in a job scheduled via cron, but "ingestion" and "deletion" events are generated by external systems, which may persist information about those events at any time in Riprap's database via either its REST API or via an ActiveMQ message. Initial fixity checks (that is, after a resource is ingested into the repository) and final fixity checks (just before a resource is deleted from the repository) can be identified by adding a brief note to the event.

All events must have a value of suc or fail, using values from the PREMIS Event Outcome vocabulary (not yet published but will be soon).

Current status

Riprap is still in early development, but all the major functional components are working using test/sample data and the mock Fedora repository endpoint (see below). Riprap will be ready for production by December 2018.

Requirements

  • PHP 7.1.3 or higher
  • composer
  • SQLite (other RDBMSs will be supported soon).
    • To install the PHP driver for SQLite on Ubuntu, run sudo apt-get install php7.2-sqlite3, replacing 7.2 with your version of PHP.

While not a requirement, a module for Islandora is available that provides node-level reports on binary resources using data from Riprap.

Installation

  1. Clone this git repository
  2. cd riprap
  3. php composer.phar install (or equivalent on your system, e.g., ./composer install)
  4. Create the database as described in the next section.

We will eventually support deployment via Ansible.

Trying it out

If you want to play with Riprap, and you're on a Linux or OSX machine, you should not need to configure anything. Assuming you have sqlite installed, you should be able to run the check_fixity command against the sample data and local web server, and perform basic API requests as documented below. A couple of things you will want to know:

  • the database created by Symfony will be in located at [riprap directory]/var/data.db
  • Riprap will write its log to /tmp/riprap.log
  • the test webserver runs on port 8000

Creating the database

As stated above, for now we use SQLite as our database. To create the database that Riprap persists fixity event data into, follow these instructions from within the riprap directory:

  1. rm var/data.db (might not exist)
  2. rm src/Migrations/* (might be empty)
  3. php bin/console -n make:migration
  4. php bin/console -n doctrine:migrations:migrate
  5. Optional: When you run the check_fixity command as described below, it will create events based on the fixity checks. If you want to populate the database with some sample fixity events prior to running check_fixity (you don't need to), run php bin/console -n doctrine:fixtures:load

Running the check_fixity command

From within the riprap directory, start the web server by running the server:start command. Then, run the app:riprap:check_fixity command, e.g.:

  • php bin/console server:start
  • php bin/console app:riprap:check_fixity

You should see output similar to:

Riprap checked 5 resources (5 successful events, 0 failed events) in 0.223 seconds.

Here is what is going on when you run the check_fixity command:

  1. Riprap calls whatever fetchresourcelist plugins are enabled (there can be more than one), and from them gets a list of all resources to check. In the default sample configuration, this list of resources is a plain text file at resources/iprap_resource_ids.txt.
  2. For each of the resources identifed by the fetchresourcelist plugins, Riprap calls the fetchdigest plugin that is enabled, and gets the resource's digest value from the repository. In the default sample configuration, Riprap is calling its mock repository endpoint.
  3. Riprap then gets the digest value in the most recent fixity check event stored in its database (in the default sample configuration, this is fixity events stored in the SQLite database), and compares the newly retrieved digest value with the most recent one on record.
  4. Riprap then persists information about the fixity check event it just performed (in the default sample configuration, back into the SQLite database). If you repeat the SQL query above, you will see five more events in your database, one corresponding to each URL listed in resources/iprap_resource_ids.txt.
  5. Riprap then executes all postcheck plugins that are enabled.
  6. After Riprap has checked all resources in the current list, it reports out how many resources it checked, including how many checks were successful and how many failed.

If you query the table you will see the following output:

SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
sqlite> .headers on
sqlite> select * from fixity_check_event;
id|event_uuid|event_type|resource_id|timestamp|digest_algorithm|digest_value|event_detail|event_outcome|event_outcome_detail_note
1|92224d93-563a-4c6e-8a2e-251084fb9cdc|fix|http://localhost:8000/mockrepository/rest/10|2018-10-01 07:49:13|SHA-1|c28097ad29ab61bfec58d9b4de53bcdec687872e|Initial fixity check.|suc|
2|4e0efd4e-f6c5-4e7d-af4c-015b696f6047|fix|http://localhost:8000/mockrepository/rest/11|2018-10-01 07:49:13|SHA-1|339e2ebc99d2a81e7786a466b5cbb9f8b3b81377|Initial fixity check.|suc|
3|70c91ae4-6a3e-4160-8985-00b4ffc626f7|fix|http://localhost:8000/mockrepository/rest/12|2018-10-01 07:49:13|SHA-1|0bad865a02d82f4970687ffe1b80822b76cc0626|Initial fixity check.|suc|
4|10af3a5f-309a-4962-9b80-a6f8c17d8a0c|fix|http://localhost:8000/mockrepository/rest/13|2018-10-01 07:49:13|SHA-1|667be543b02294b7624119adc3a725473df39885|Initial fixity check.|suc|
5|e64db74c-471e-4347-b256-5597470157c4|fix|http://localhost:8000/mockrepository/rest/14|2018-10-01 07:49:13|SHA-1|86cf294a07a8aa25f6a2d82a8938f707a2d80ac3|Initial fixity check.|suc|
sqlite>
sqlite> .quit

If you populated the database prior to running check_fixity, you will see 20 additional events in the database.

REST API

Preliminary scaffolding is in place for a simple HTTP REST API, which will allow external applications like Drupal to retrieve fixity check data on specific Fedora resources and to add new and updated fixity check data. For example, a GET request to:

curl -v -H "Resource-ID:http://example.com/repository/resource/12345" http://localhost:8000/api/fixity

would return a list of all fixity events for the Fedora resource http://example.com/repository/resource/12345.

To see the API in action,

  1. run php bin/console server:start
  2. run curl -v -H 'Resource-ID:http://localhost:8000/mockrepository/rest/10' http://localhost:8000/api/fixity

You should get a response like this:

*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /api/fixity HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.58.0
> Accept: */*
> Resource-ID:http://localhost:8000/mockrepository/rest/10
> 
< HTTP/1.1 200 OK
< Host: localhost:8000
< Date: Sun, 30 Sep 2018 10:13:49 -0700
< Connection: close
< X-Powered-By: PHP/7.2.10-0ubuntu0.18.04.1
< Cache-Control: no-cache, private
< Date: Sun, 30 Sep 2018 17:13:49 GMT
< Content-Type: application/json
< 

The returned JSON looks like this:

[
   {
      "event_uuid":"4cd2edc9-f292-49a1-9b05-d025684de559",
      "resource_id":"http:\/\/localhost:8000\/mockrepository\/rest\/10",
      "event_type":"fix",
      "timestamp":"2018-10-03T07:23:40-07:00",
      "hash_algorithm":"SHA-1",
      "hash_value":"c28097ad29ab61bfec58d9b4de53bcdec687872e",
      "event_detail":"Initial fixity check.",
      "event_outcome":"suc",
      "event_outcome_detail_note":""
   },
   {
      "event_uuid":"fb73a36a-df64-4ba8-a437-ea277b65ebb7",
      "resource_id":"http:\/\/localhost:8000\/mockrepository\/rest\/10",
      "event_type":"fix",
      "timestamp":"2018-12-03T07:26:39-07:00",
      "hash_algorithm":"SHA-1",
      "hash_value":"c28097ad29ab61bfec58d9b4de53bcdec687872e",
      "event_detail":"",
      "event_outcome":"suc",
      "event_outcome_detail_note":""
   }
   [...]
]

Note that if the resource identified by Resource-ID does not have any events in Riprap, the REST API will return a 200 response and an empty body, e.g.,

[]

This means that consumers of this API will need to not only check for the HTTP response code, but also count the number of members in the returned list.

HTTP POST and PATCH will also be supported, e.g.:

curl -v -X POST -H "Resource-ID:http://localhost:8080/mockrepository/rest/17" http://localhost:8000/api/fixity
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> POST /api/fixity HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.58.0
> Accept: */*
> Resource-ID:http://localhost:8080/mockrepository/rest/17
> 
< HTTP/1.1 200 OK
< Host: localhost:8000
< Date: Thu, 27 Sep 2018 11:56:02 -0700
< Connection: close
< X-Powered-By: PHP/7.2.10-0ubuntu0.18.04.1
< Cache-Control: no-cache, private
< Date: Thu, 27 Sep 2018 18:56:02 GMT
< Content-Type: application/json
< 
* Closing connection 0
["new fixity event for resource http:\/\/localhost:8080\/mockrepository\/rest\/17"]

GET requests can optionally take the following URL parameters:

  • timestamp_start: ISO8601 (full or partial) date indicating start of date range in queries.
  • timestamp_end: ISO8601 (full or partial) date indicating end of date range in queries.
  • outcome: Coded outcome of the event, either suc or fail.
  • offset: The number of items in the result set, starting at the beginning, that are skipped in the result set (i.e., same as standard SQL use of 'offset'). Default is 0.
  • limit: Number of items in the result set to return, starting at the value of offset.
  • sort: Sort events on timestamp. Specify "desc" or "asc" (if not present, will sort "asc").

For example, curl -v -H 'Resource-ID:http://localhost:8000/mockrepository/rest/10' http://localhost:8000/api/fixity?timestamp_start=2018-12-03 would return only the events for http://localhost:8000/mockrepository/rest/10 that have a timestamp equal to or later than 2018-12-03.

Mock Fedora repository endpoint

To assist in development and testing, Riprap includes an endpoint that simulates the behaviour described in section 7.2 of the spec. If you start Symfony's test server as described above, this endpoint is available via GET or HEAD requests at http://localhost:8000/mockrepository/rest/{id}, where {id} is a number from 1-20 (these are mock "resource IDs" included in the sample data). Calls to it should include a Want-Digest header with the value SHA-1, e.g.:

curl -v -X HEAD -H 'Want-Digest: SHA-1' http://localhost:8000/mockrepository/rest/2

If the {id} is valid, the response will contain the Digest header containing the specified SHA-1 hash:

*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> HEAD /mockrepository/rest/2 HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.58.0
> Accept: */*
> Want-Digest: SHA-1
> 
< HTTP/1.1 200 OK
< Host: localhost:8000
< Date: Thu, 20 Sep 2018 05:28:57 -0700
< Connection: close
< X-Powered-By: PHP/7.2.7-0ubuntu0.18.04.2
< Cache-Control: no-cache, private
< Date: Thu, 20 Sep 2018 12:28:57 GMT
< Digest: b1d5781111d84f7b3fe45a0852e59758cd7a87e5
< Content-Type: text/html; charset=UTF-8
< 
* Closing connection 0

If the resource is not found, the response will be 404. If the {id} is not valid for some other reason, the HTTP response will be 400.

More about Riprap

Plugins

One of Riprap's principle design requirements is flexibility. To meet this goal, it uses plugins to process most of its input and output. It supports four types of plugins:

  • "fetchresourcelist" plugins fetch a set of resource URIs/URLs to fixity check (e.g., from a Fedora repository's triplestore, from Drupal, from a CSV file). A sample plugin that reads resource URLs from a text file, app:riprap:plugin:fetch:from:file, already exists and is configured in config/services.yaml. Multiple fetchresourcelist plugins can be configured at once.
  • "fetchdigest" plugins query an external utility or service to get the digest of the current resource. A plugin that queries a Fedora API Specification-compliant repository, app:riprap:plugin:fetchdigest:from:fedoraapi, and is configured in config/services.yaml. Only one fetchdigest plugin can be configured.
  • "persist" plugins persist data after performing a fixity check on each resource (e.g. to a RDBMS, back into the Fedora repository that manages the resources, etc.). A plugin to persist fixity events to a relational database, app:riprap:plugin:persist:to:database, already exists and is configured in config/services.yaml. Multiple persist plugins can be configured at once.
  • "postcheck" plugins execute after performing a fixity check on each resource. Two plugins of this type currently exist (neither one is complete yet): a plugin that sends an email on failure, app:riprap:plugin:postcheck:mailfailures, and a plugin that migrates fixity events from Fedora 3.x AUDIT data. Both plugins are confiured in config/services.yaml. Multiple postcheck plugins can be configured at once.

A second set of simple example plugins is included in the resources/filesystemexample/src/Command directory. See their README.md file for more information.

Message queue listener

Riprap will also be able to listen to an ActiveMQ queue and generate corresponding fixity events for newly added or updated resources. Not implemented yet.

Security

  • Riprap retrieves fixity digests from other applications via HTTP or some other mechanism. If Riprap is used with a Fedora-based repository, it needs access to the repository's REST interface in order to request resources' digests.
  • Riprap also provides a REST interface so other applications can retrieve fixity check event data from it and add/modify fixity check event data. Using Symfony's firewall to provide IP-based access to the API should provide sufficient security.

Miscellaneous

Contributing

See CONTRIBUTING.md.

Running tests

From within the riprap directory, run:

./bin/phpunit

Coding standards

Riprap follows the PSR2 coding standard. To check you code, from within the riprap directory, run:

./vendor/bin/phpcs

Maintainer

Mark Jordan (https://github.com/mjordan)

License

MIT