Motivation and Requirements

rolfl edited this page Mar 17, 2012 · 1 revision
Clone this wiki locally

Why create another Resolver? It's simple, really: there is a functionality gap between the existing 'static' web catalog concept proposed by OASIS (with various implementations), and the fully dynamic default resolver available to the parser.

Background

Web Catalog repositories solve the issues of:

  1. not having the required network connections available
  2. the performance issues of having network traffic for each resolution

The issues with the static catalogs are that:

  1. catalogs may need to be custom built for your application because you cannot find catalogs containing the resource on the web.
  2. once built into a catalog, there is no way to indicate that a resource is out of date.
  3. you may not know ahead of time what resources you need.

The existing default Java EntityResolver solves the problems of:

  1. making any resource available
  2. up-to-date resources
  3. no need to know ahead of time what resources will be needed.

The issues with fully dynamic resources are that:

  1. network connectivity may not be available, or prohibitively slow.
  2. excessive and wasteful traffic to resource sites (like www.w3c.org)

There is an intermediate solution of using a caching proxy server that all your applications would need to communicate through, and this proxy server would reduce traffic to the remote sites. The issue with these servers is that they are not easy to configure, that they would still (potentially) need to be configured for each application, and they can be easily bypassed.

A further consideration is that only one EntityResolver can be assigned to a particulare parsing process. Thus there is also a functionality gap in creating a 'cascading' system of failing multiple resolutions sources, before failing-over to the default parser's resolver.

What's Needed

The primary goals of this Resolver project will be to:

  1. reduce network usage to a minimum
  2. make (previously stored) resources available even if there is no network available.
  3. make it easy to configure

In essence, to be successful, this Resolver will need to be so easy, and so beneficial to use, that it will become the 'best' thing to do...

Core assumptions

In order to be useful, and to be 'better' than other solutions, this Resolver will need to:

  1. have no 'installation' procedure - the only thing needed is to add a Jar to the classpath, and to reference those Jar classes from your code.
  2. have no 'central' network service - we do not want to have to configure a new 'proxy server'. We want it all to be on the 'client' side.
  3. be able to take advantage of existing solutions (web catalogs, current network availability, etc.)
  4. be extendable to allow other resolvers to be 'plugged in' where needed.
  5. allow resorces to be added and updated as needed.

Making some assumptions about a typical application that would find this Resolver most useful:

  • there is some available Disk storage space (with write permissions)
  • there could be multiple applications running on the same machine.
  • each application could be a multi-threaded one, with different threads also accessing net resources.

In order to store the 'cached' resource, an assumption must be made that there is available disk storage for the cache. To accomodate the cache, and to accomodate multiple JVM's all accessing the same cache, the assumption must be made that the cache contents can be individually locked. The only mechanism available in Java for doing file locking (Java 5+) is the low-level file-locking offered through the FileChannel instance.

It follows that if we define a file-storage system for the web resources, it becomes a case of saying:

  1. here is a folder of cached web resources
  2. here are some applications that can read and write to this cache.
  3. each application has it's own Resolver, but the Resolvers will co-ordinate with each other in order to meet the requirements

Thus, the cache directory itself becomes the 'core' of the cache, with it being used to both contain and control the resources. Further, it follows that if you do not have access to a cache folder, that you essentially cannot use this Resolver.

Further technical requirements

In order to co-ordinate between multiple Java applications it becomes mandatory to have some form of control protocol so that only one application is updating a reource at any one time. The 'obvious' solution to this is to use File-based locking. Java natively supports file locking, but not all filesystems are compatible. Using the Resolver with a shared network-based file-system (samba, nfs) will probably not work, so it will be eliminated as a 'supported' environment.

It would have been convenient to use some form of continer (zip file, jar, tar or equivalent) for the container, but, because they cannot have individually locked resources it would not be possible.

Additional requirements

The final Resolver tool should be able to 'make life easier'. The sorts of issues that plague Java users in this area are:

  • Web catalogs are established, and fast. Using them makes sense. The Resolver should make it easy (even easier) to use catalogs before using the disk cache. The assumption is that users make catalogs of static information, and only if they fail to provide a resource should the web be queried (but only after checking for a local cache).
  • the cache can be 'seeded' if neccessary by just copying in 'seed' cache (or, unzipping a cache, for example).
  • the cache itself can be used to create web catalogs.
  • the Resolver can be supplied an ordered list of additional resolvers to use before the cache to make it easier to 'chain' resolvers.