Skip to content

Storage Expansion Proposal

Darren Hardy edited this page Jan 27, 2017 · 17 revisions

The Problem

Our current infrastructure limits storage to 50 TB per filesystem mount, and we have applications that will surpass that threshold. Depending on our storage growth needs, we will need to implement a, hopefully, common solution that all our applications can use.

Web Archiving (WARC files)

Currently we have 200k web crawl archival (WARC) files using ~25TB on a single NFS mount (namely, /web-archiving-stacks/data/collections). Our WARC content is expected to grow at a minimum of ~5TB a year.

Our current implementation of our local wayback software expects the location of the /web-archiving stacks as a single directory (as does the common accessioning robots, which puts WARC files into that location).

DOR stacks

DOR stacks has a similar problem, and does not have a solution as yet. It has used roughly half of its storage (25TB out of 50TB -- not sure of growth rate).

SDR preservation files

SDR also has this problem. They developed custom (i.e., not reusable) application-level code to implement a scheme similar to (Proposal A) below.

This solution has significant operational management implications, especially with remediation. They currently have several mounts and more than 200TB.

Digital preservation files

The digital preservation group also has several mounts and manually manages their storage needs.

Other content

There may be other applications that have increasing rates for storage that exceed the 50TB maximum.

Proposed Solutions

In December 2016, we developed several proposals for how we can increase the storage of the WARC files. Here we try to weigh the costs of developing a solution at the application-level vs. the system-level. Note that this is not necessarily a complete list, but it's the options we brainstormed and investigated during our sprint.

Proposal A: Sequential multi-volume storage at application-level

Implement a storage plan at the application-layer that uses relatively small (~10TB) NFS mounts that, as a collection, will hold all the WARC files. That is:

  • /web-archiving-stacks/data/disk01
  • /web-archiving-stacks/data/disk02
  • /web-archiving-stacks/data/disk03
  • ...

implications

The implications of this is that the wayback software's path-index.txt file which maps WARC files into file system locations would need to accommodate the multiple partitions.

Also the common-accessioning shelving mechanism would need to accommodate the multiple partitions when processing web crawls, as would any other robot code that reads from the web archiving stacks. Note that common-accessioning actually looks at the contentMetadata for the location of the stacks folder.

Storage logic

The application logic that does the CRUD operations for this scheme would be implemented as an open-source Ruby gem (hopefully) so that it may be shared with other projects. We could follow the logic in the SDR preservation core code (specifically, Moab's StorageRepository), that is, roughly:

  • config: list all partitions in the repository
  • create: use last partition (assumes to have free space)
  • find: use a key and probe each partition to locate the key (as a top-level folder)
  • update: find the folder, and update in place (assumes to have free space)
  • delete: find the folder, and delete in place

Over time, when space is needed, we would:

  1. create the new partition
  2. update the configuration to list that storage partition in the repository
  3. restart applications

The guidelines from SDR is to add another partition when the latest partition is at 75% capacity. This allows for updates in place, because once an object is assigned to a partition, it cannot be moved except with manual intervention.

Shelving logic

Note that DOR's common-accessioning's shelve step uses contentMetadata XML to determine where to shelve the item (it does it by collection druid and then item druid). This is hardcoded in two XSLT files -- one for public and another for dark.

An example of the content metadata:

<contentMetadata type="file" stacks="/web-archiving-stacks/data/collections/kh149kf8484" id="druid:pf139tj8228">
 ...
</contentMetadata>

and a snippet from the XSLT:

     <xsl:attribute name="stacks">
        <xsl:value-of select="concat('/web-archiving-stacks/data/collections/',collectionId)"/>
     </xsl:attribute>

If we were to follow the SDR scheme, we would need to hardcode the partitions within the contentMetadata.

Implications

  • The operational management costs are significant, especially for remediation in particular, and it may require tools for migrating objects from an older partition to a newer one and manual intervention and monitoring.
  • As mounts proliferate, we have operational costs to manage those mounts. For example, we currently have 50 NFS mounts on our infrastructure workflow (sul-robots1-prod) VM.
  • The implementation may not be reusable for other and/or future applications. It would be extra cost for us to design and develop a more generic solution at the application-level as other applications, like stacks and SDR, have differing requirements.
  • There are several limitations to this approach, such as hardcoding data locations in contentMetadata, which may make moving to a different implementation later more difficult.
  • This is a custom application-level solution that requires a non-trivial amount of developer effort.

Costs

The implementation would need a major application development effort involving a dedicated team for multiple sprints.

  • developer effort to create new storage logic (ideally a gem)
  • developer effort to update common accessioning robots
    • How can we change the contentMetadata generator to use the appropriate stacks volume?
    • What if we need to reorganize partitions later?
  • developer effort to update web archiving robots
  • developer effort to update stacks application
    • Do we ever need to update/delete files in the stacks?
  • Are there ANY other applications that need read or read/write access to the web archiving stacks?
  • developer effort to update SDR preservation code to use common gem
  • other?

Open questions

  • Would we want to rename the /web-archiving-stacks/data/ to /web-archiving-stacks/data/disk01/? It will involve updating the path-index.txt file, and remediating the DOR objects if we want to have the contentMetadata/@stacks metadata correct.
  • Can we reuse or version the Moab::StorageRepository class in the moab-versioning gem?
  • Why don't we use /stacks for all the WARCs in the first place?

Proposal B: Distributed file system at system-level (and application-level)

Implement a clustered distributed file system at the system-level. Depending on the solution, it may change how the API of how applications read/write data (i.e., use custom API calls rather than file I/O -- see these examples). Most distributed files systems implement sharding at the block-level and manage free space at the node-level.

There are a variety of open-source solutions, such as HDFS (Hadoop) or iRODS, and vendor products, like Amazon S3.

Many HDFS deployments, for example, are petabyte-scale with inexpensive disk and hardware. Node deployments use local disk rather than NFS mounts, typically. HDFS, in particular, provides an NFS gateway for a cluster which makes the HDFS data storage accessible via the Linux filesystem, but it does not support random write workloads.

Implications

This is a non-trivial amount of effort for both systeam and the application developers.

  • Applications would need to use the storage API, not the native Linux filesystem. We'd need to investigate whether we could implement an NFS or FUSE gateway.
  • For an open source solution, we would manage our own storage clusters and software on an ongoing basis.
  • For a vendor solution, we would need to manage that relationship and monetary costs.

Costs

The implementation would need a major application development effort involving a dedicated team for multiple sprints.

  • vendor software, if chosen: purchase vs. maintain on our own
  • open source software: manage our own storage clusters and the software
  • systeam effort to implement NFS or FUSE gateway to this storage
  • developer effort for updating all access to use new storage API
    • WARC files
    • stacks files
    • SDR preservation files
    • other?

Open questions

  • We could implement a middleware layer for a Storage Web Service that generically solves a repo-based read/write file service. That is, a shim in front of the actual storage API designed to allow us to replace the storage solution in the future.

Proposal C: Virtual mount at system-level

Implement a hardware or system-level solution to make very large "virtual" mounts. Vendors have solutions that provide multi-PB volumes, for example, NetApp has an "infinite volume" offering, and AWS has Amazon Elastic File System (EFS)

Applications would simply use the filesystem as-is via NFS.

Implications

  • The implementation would be transparent to our all applications.
  • Depending on the vendor, our deployment may be different (e.g., AWS requires AWS deployments).
  • systeam would need to implement and manage the solution at the systems-level, including interacting with UIT.

Costs

Most likely this would be a vendor solution that would cost money and require systeam to manage. The costs would be significant for both updates to our deployment strategies (if needed) and infrastructure.