Mark Jordan edited this page Jun 5, 2013 · 4 revisions

Introduction

LOCKSSdm in an open source tool that enables the preservation of CONTENTdm collections in a Private LOCKSS Network (PLN), and makes the preserved collections available to end users. It has two parts, a CONTENTdm plugin and a proxy script. LOCKSSdm works on CONTENTdm 6.1 or higher, and can be used by local or hosted CONTENTdm sites. The CONTENTdm plugin exposes items to the LOCKSS harvester for preservation in a PLN, and the proxy allows end users to access the items. If a user accesses a CONTENTdm item through the proxy, she is redirected seamlessly to the item in the CONTENTdm instance if the CONTENTdm server is up and responding (which will normally be the case). If CONTENTdm is not responding, the user will be presented with the version of the requested item that LOCKSS harvested.

The LOCKSSdm plugin is installed the same way that any CONTENTdm plugin is: administrators define the plugin's configuration options in a PHP file, and then they upload this and the rest of the plugin's files using the administrative interface to their CONTENTdm server using the website configuration tools. Administrators also need to perform one additional step before informing LOCKSS staff to harvest your collections (or before harvesting the content themselves if they run a self-managed network): generate the collection manifests. This is done by running a command-line script (documented in the plugin's README.txt file) and then uploading the resulting manifest files into the plugin's directory on the server. Generating LOCKSS manifests is not actually required; if the plugin doesn't find any manifests it will generate them on demand when the LOCKSS harvester asks for them. However, pregenerating manifests is highly recommended, since having the plugin to generate them on demand is only suitable with very small numbers of very small CONTENTdm collections.

Just-in-case copies

Even though LOCKSSdm allows collections to be harvested, preserved in a PLN, and accessed by end users, it doesn't make a complete copy of the CONTENTdm collection. Each item is preserved as a stripped-down, simplified version of the original. LOCKSSdm doesn't reproduce CONTENTdm's search functionality either; instead, it provides only a rudimentary way to browse all the preserved collections and to browse a single collection.

Keep in mind that a user may never need to see the LOCKSS version of an item or collection. LOCKSSdm provides a "just-in-case" copy of items in CONTENTdm, not a full-featured replacement for CONTENTdm's native interface. End users only see the LOCKSSdm version of your collections if the CONTENTdm server is down. The proxy script ensures that when the CONTENTdm server is responding normally (which should be 99.999% of the time), the user is redirected to CONTENTdm's version of the requested item transparently.

When the LOCKSS harvester crawls the manifests for a site's collections, the CONTENTdm plugin exposes a simplified version of each item in the collection for harvesting. This version contains the metadata for the item (including any full text), its thumbnail (or for a compound item, the thumbnails for all the child items), and a note to the user that she is "viewing a simplified, backup-friendly version of our CONTENTdm collections" (this text can be configured by individual libraries). When the user clicks on the thumbnail, the full-sized version of the file displays. If this file is an image or a PDF, it displays immediately. If it's a movie or an audio file, the user's browser plays the file with whatever application it is configured for the file's type/format. LOCKSSdm does not provide an embedded media player like CONTENTdm or a content management system does.

The following two screenshots illustrate the differences between the version of an item in CONTENTdm and its simplified LOCKSSdm version. The first shows CONTENTdm's native interface for an item containing an image:

cdm_native_interface 2

The second one shows the same item as it was harvested by LOCKSS. This is the version that the end user sees when CONTENTdm is down:

image_lockss 2

When the user clicks on the thumbnail, she sees the full sized, raw image, with no HTML wrapped around it:

image_lockss clicked 2

That actually is the image at its full size (it's not a very big image). The version in the native CONTENTdm interface is larger because CONTENTdm's embedded image viewer is configured to display images so they fill the width of the web page, which, in this case, is 200% the width of the actual image.

LOCKSS needs to harvest a simplified version of CONTENTdm items because CONTENTdm's native interface relies heavily on Javascript. LOCKSS has difficulty with rich interfaces that use a lot of Javascript. If LOCKSS harvested the native versions of CONTENTdm items, the copies of these items served to the end user by LOCKSS would not work properly. As a compromise, LOCKSSdm presents a no-frills yet functional version of the item that will work reliably with the LOCKSS harvester and with the proxy script.

The LOCKSSdm proxy

The proxy is an important component of LOCKSSdm (although it is not required, as described at the end of this section). First, it redirects the user to the native CONTENTdm interface if the CONTENTdm server is up and running. If the CONTENTdm server is down, the proxy detects that is the case and brings the library's PLN LOCKSS box into the loop, showing the user the simplified version it has harvested.

Secondly, content preserved in PLNs is by design inaccessible by end users. That's what makes PLNs "private." Each library's LOCKSS box needs to be configured to make an exception for the proxy script (configuring this is in the proxy's README.txt file), which in turn provides access to preserved content on behalf of the end user. In effect, the user doesn't request content directly from the LOCKSS box, she requests it through the proxy. The proxy sends the request on behalf of the user to the LOCKSS box, which then sends its preserved content back to the proxy, which then presents it to the end user.

This is exactly how EZproxy works when it is used to provide off-campus access to electronic journals and databases which are restricted to a university's IP addresses. Unlike EZproxy, however, the LOCKSSdm proxy doesn't require users to provide credentials -- in fact, it doesn't support user authentication of any type. If you want to use LOCKSSdm with a CONTENTdm collection that is restricted, you will need to do some custom configuration.

The LOCKSS website provides extensive information on how proxies like Ezproxy are used in the Global LOCKSS Network to provide access to publishers' ejournals. The LOCKSSdm proxy works the same way, except that it is tuned specifically to provide access to CONTENTdm collections harvested through the LOCKSSdm CONTENTdm plugin.

Another important function of the LOCKSSdm proxy is that it modifies the content it gets from LOCKSS in several ways before presenting it to the end user. It makes sure all URLs in the CONTENTdm items are rewritten so they will work with the proxy, and it also cleans up the HTML that LOCKSS presents when it encounters an error. The proxy also allows local administrators to add custom HTML markup and CSS files to the proxied content. This feature enables libraries to put back some of the local branding, such as their library's standard header or navigation links, that is stripped out of the simplified copies of items preserved in the PLN, and to customize the message that explains why the user is seeing a simplified version of the CONTENTdm item.

Proxied URLs to specific CONTENTdm items simply append the items 'reference URL' to the proxy script's URL so that they look like this (notice how similar they are to Ezproxy's):

http://lib-general.lib.sfu.ca/lockssdm/proxy/lockssdm.php?url=http://content.lib.sfu.ca/cdm/ref/collection/vpl/id/2714

Proxied URLs to CONTENTdm collections look like this:

http://lib-general.lib.sfu.ca/lockssdm/proxy/lockssdm.php?url=hhttp://content.lib.sfu.ca/cdm/landingpage/collection/vpl

Here is what the user sees after following the proxied collection URL:

single_collection 2

A proxied link to the entire set of preserved CONTENTdm collections looks like this:

http://lib-general.lib.sfu.ca/lockssdm/proxy/lockssdm.php?url=http://content.lib.sfu.ca

This URL displays a list of all the collections, which looks like this:

browse_collections 2

This list is also what the user sees when the LOCKSS box can't find something that the user asks for.

Providing access via the proxy is not actually required -- all that is required to harvest content into a PLN for preservation is the CONTENTdm LOCKSSdm plugin. The proxy's purpose is to provide access to the preserved items. If a library is only interested in getting their CONTENTdm collections into a PLN, and they don't intend to provide access to those collections to end users, they don't need the proxy.

Why go to all this trouble?

Libraries that use CONTENTdm should really be preserving the high-quality master versions of their CONTENTdm collections, not the the versions of those collections they make accessible on the web. The versions libraries make accessible to end users on the web (whether through CONTENTdm or any other web publishing platform) are generally lower-quality derivatives of these master versions. This is particularly true for digitized image collections, probably the most common type of collection libraries make available through CONTENTdm. So it would make sense to preserve the master versions, and not the versions that end users access through CONTENTdm.

However, getting stuff into a LOCKSS PLN requires that libraries package their content so that LOCKSS can harvest it. This means 1) they will need to choose and implement a packaging format and 2) someone will need to write a LOCKSS plugin to harvest the packages. Both of these activities take considerable resources and time. No standardized, off-the-shelf package format exists, although some institutions are starting to use METS in combination with BagIt to create long-term digital preservation packages. In addition, packaging content into preservation-quality formats is only one of many tasks and processes necessary for proactive, systematic, and robust digital preservation.

The benefit of preserving CONTENTdm collections using LOCKSSdm is that it is relatively easy. Libraries don't need to package their content up in any special way, and a LOCKSS plugin already exists for harvesting the material. All they need to do is install a CONTENTdm plugin. An additional benefit of LOCKSSdm is that users can access versions of CONTENTdm items in real time should the CONTENTdm server go down, provided of course that they accessed the item via a proxied URL.

The obvious deficiency of LOCKSSdm is that it doesn't solve the problem of what libraries should do with the high-quality, preservation-friendly master versions of their content. That problem is, conveniently, outside the scope of LOCKSSdm. In some cases, libraries may decide that having the web-facing versions of their content in a PLN is not sufficient or worthwhile and preserve the master versions in their PLN (or through other means) instead, acknowledging the work required to do that. Others may decide that LOCKSSdm offers an easy, sustainable method for populating a PLN with their content. The most appropriate option for a given collection should be determined by a library's digital preservation policies.

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.