Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
63 lines (40 sloc) 4.4 KB

A JHOVE Use Case – JPEG-HUL at Archives NZ

2016-06-24: By Ross Spencer

Archives New Zealand has, at present, successfully completed four born-digital deposits. Because we use Ex-Libris’ Rosetta Digital Preservation system, part of the workflow allows us to send our digital objects via JHOVE as part of a validation step where we look for two simple words, ‘Well-formed, and valid’.

When our files fail this test, we have to inspect the files manually and understand if we can fix them under strict pre-conditioning rules (changes must be reversibly documented), or whether we have to accept the file as not-valid and ingest as-is until we have the capacity to be able to create a second preservation master correcting the issue.

An example of our tests failing appeared early in our second deposit with a JPEG failing in JHOVE with the following errors:

jhove-messages

At first, to see three messages like that, and for a single file was quite overwhelming. Any issues that arise during a transfer that we haven’t seen before will take an undetermined amount of time to fix, and we hadn’t yet come up against these issues.

Opting to tackle the messages, one error at a time, we manage to fix all three with a single byte stream change, all driven by reading a JPEG specification we were able to find via the Just Solve It wiki (1). The second error message described plainly to us in that documentation:

jpeg-specification

With the error reading ‘APP0 marker not at beginning of file’ it was noted that other offsets JHOVE might be looking for may be displaced. And so we set about fixing this problem (all the time testing fixes against our potential to do it under pre-conditioning.)

On searching around for documents that would help us to understand what an APP0 marker looks like (2), we were able to see the displaced segment inside the bitstream (fortunately the APP0 marker was still there!)

Taking an extract of the file header we can see side-by-side how we modified the file to make it valid:

hex-comparison

Testing this as a fix meant first checking it against JHOVE – it was ‘Well-formed and valid’ – our APP0 change had the desired cascade effect on the other two errors (3).

For our pre-conditioning provenance note we could accurately describe a reversible fix with the following paragraph:

PRECONDITIONING BYTESTEAM 001: Bytes Swapped. JFIF application segment (18 bytes, beginning 0xFFE0) moved directly in front of Photoshop IRB segment (beginning 0xFFDB) and directly after the start of image marker (0xFFD8). MD5 (Original): 02a1b0254fe08cf07e03dde6819a63a9 becomes MD5 (pre-conditioned): 7f0b14e62afb8a02fe101faee39a324c. Modified Date (original) 10 March 2004 becomes modified date (pre-conditioned)

Finally, before deposit, working closely with our digital archivist, we could demonstrate clearly, no change to the display, or intellectual content of the record.

The file was ingested into Rosetta without issue and made available December 2014, see record number R24684315 in Archive New Zealand’s Archway Catalogue.

------

Footnotes

[1] http://fileformats.archiveteam.org/wiki/JPG - Just Solve It entry for JPG, accessed 24 June 2016
[2] http://www.ozhiker.com/electronics/pjmt/jpeg_info/app_segments.html – for example, via http://www.ozhiker.com/ - Accessed 24 June 2016
[3] Something I think we might spot with other format errors as we continue to understand the output of JHOVE in the community.

------

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.