The baglidate
library validates BagIt bags using the "in development"
BagIt 1.0 spec. For the most up-to-date spec, see the bagit1.0 branch of
the BagIt GitHub page.
All this repo really does is compare the non-normative ABNF grammars in section
7 of the BagIt 1.0 spec to the prose descriptions and examples provided in
other areas of that text. It uses the Clojure Instaparse library to create
parsers from the supplied BagIt ABNF grammars and compares them against example
inputs (BagIt files) that are expected to be valid. If errors are discovered, a
new grammar is written (suffixed with _fixed
) which accepts the example
inputs.
Assuming you have Clojure >= 1.8 installed, run the tests as you develop:
$ lein test-refresh
Or, just run the tests once:
$ lein test
You should see something like:
Ran 5 tests containing 10 assertions. 0 failures, 0 errors.
The ABNF grammars are in files with the .abnf
extension under
resources/
. Files with the _fixed
suffix are attempts to improve on
the otherwise identically named files (which are copied directly from the
BagIt 1.0 spec):
├── resources │ ├── bag_declaration.abnf │ ├── bag_metadata.abnf │ ├── bag_metadata_fixed.abnf │ ├── fetch_file.abnf │ ├── fetch_file_fixed.abnf │ ├── payload_manifest.abnf │ ├── payload_manifest_fixed.abnf │ └── uri.abnf
The resources/ directory also contains example input files that are assumed to be valid. These are used in the tests:
├── resources │ ├── sample-bag-info.txt │ ├── sample-fetch-file.txt │ ├── sample-manifest-sha256.txt
The logic for reading the grammars and input files, using Instaparse to create parsers from the grammars, and testing the inputs against the parsers is in the sole files in the src/ and test/ directories:
├── src │ └── bagit_instaparse │ └── core.clj └── test └── bagit_instaparse └── core_test.clj
- The instaparse docs for ABNF
- The Wikipedia article on ABNF
- The URI ABNF
- Unicode in ABNF (experimental)
Copyright © 2018 Joel Dunham
Distributed under the Eclipse Public License version 1.0.