
## The problem
Application developers often find themselves in a situation where they have to pre-process relatively static input files.  This processing can often be time consuming due to the size of the input file and/or the complexity of the processing task itself.  As such, it often makes sense to preserve the output of the process, repeating it only when:
* The contents of the input file changes
* The input parameters to the process change
* The process algorithm itself changes

## One solution
`cachejar` is designed to provide a generic solution to this problem. Given:
* An application identifier
* A uri, file name
* An object identifier

`cachejar` can determine whether processing is necessary and, if not, can return an image of an object that corresponds to the supplied parameters.

It is up to the software developer to determine what constitutes an "application identifier" -- typically one might use a combination of a package name and a version identifier.

## Overview
`cachejar` includes two global objects:
1. `factory` - a default instance of `CacheFactory` that keeps track of the caches for individual application versions.  By default, `cachjar` caches are located in the OS equivalent of the path: `~/.cachejar`.  The cache itself is organized as:
```text
.cachejar
    |
    + <application id 1>
    |       |
    |       + index -- A map that associates URI or file name, its size and modification time
    |       |           with a list of object identifier to file names
    |       |
    |       + Axxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - pickled object 1
    |       |
    |       + Axxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - pickled object 2
    |       |
    |       + ...
    |
    + <application id 2>
            |
            + ...
```
2. `jar` - a function that returns returns a `CacheJar` object for a given application id. It is not necessary to persist the returned object, as it can be re-fetched whenever needed.

In the example below, we hava a class (`MyClass`) that does some sort of processing on a named file or uri, based on the 'forward' parameter.  For this example, we just tweak the file name, but in real life the processing would most likely be done on the file or URI itself.

We have also defined an input processor that returns an instance of `MyClass` using the parameters. It first checks the cache and, if not found, constructs an actual instance.


In [1]:
import cachejar

class MyClass:
    def __init__(self, fname: str, forward: bool):
        self.fname = fname if forward else fname[::-1]

__version__ = "1.3.0"
appid = __name__ + __version__
cachejar.factory.clear(appid)

def process_input(fname: str, forward: bool) -> MyClass:
    my_obj = cachejar.jar(appid).object_for(fname, MyClass, forward)
    if not my_obj:
        print(f"Processing {fname}")
        my_obj = MyClass(fname, forward)
        cachejar.jar(appid).update(fname, MyClass(fname, forward), MyClass, forward)
    else:
        print(f"Using cached image for {fname}")
    return my_obj
        
print(process_input("https://github.com/hsolbrig/cachejar", forward=True).fname)
print(process_input("https://github.com/hsolbrig/cachejar", forward=True).fname)
print(process_input("https://github.com/hsolbrig/cachejar", forward=False).fname)
print(process_input("https://github.com/hsolbrig/cachejar", forward=False).fname)

Processing https://github.com/hsolbrig/cachejar
https://github.com/hsolbrig/cachejar
Using cached image for https://github.com/hsolbrig/cachejar
https://github.com/hsolbrig/cachejar
Processing https://github.com/hsolbrig/cachejar
rajehcac/girblosh/moc.buhtig//:sptth
Using cached image for https://github.com/hsolbrig/cachejar
rajehcac/girblosh/moc.buhtig//:sptth


## Details
### `CacheFactory`

The `cachejar` package supplies a default instance of `CacheFactory`, which uses the OS specific equivalent of the path: `~/.cachejar` as its root.  The `cachejar.cachejar` method references this class when returning instances of the `CacheJar` class.

One can change the location of the factory root through the `CacheFactory` constructor.  If you wish to change this globally for the entire application, assign the new value to the global `cachejar.cachefactory`.

`CacheFactory.cache_root` returns the current cache directory and `cache_directory` the directory being used to cache a given application id (if any)

In [3]:
import cachejar
from cachejar.jar import CacheFactory

print(cachejar.factory.cache_root)
jar = cachejar.jar('someapp')       # Create a local cache
print(cachejar.factory.cache_directory('someapp'))

local_factory = CacheFactory('/tmp/caches')
print(local_factory.cache_root)
jar = local_factory.cachejar('someapp')
print(local_factory.cache_directory('someapp'))
# You can do this, but once done it holds for the life of the application (and Jupyter has
# a life that extends well beyond this cell)
# cachejar.cachefactory = local_factory

/Users/mrf7578/.cachejar
/Users/mrf7578/.cachejar/someapp
/tmp/caches
/tmp/caches/someapp


You can clear an applications cache via the `CacheFactory.clear` command:

In [4]:
print(cachejar.factory.cache_directory('someapp'))
cachejar.factory.clear('someapp')                            # Remove all contents
print(cachejar.factory.cache_directory('someapp'))
cachejar.factory.clear('someapp', remove_completely=True)    # Remove all knowledge
print(cachejar.factory.cache_directory('someapp'))

/Users/mrf7578/.cachejar/someapp
/Users/mrf7578/.cachejar/someapp
None


Caching can be globally or locally disabled.  When disabled no objects will be written or read from the cache.

jar = cachejar.jar('someapp')
print(f"Factory: {cachejar.factory.disabled}")
print(f"Jar: {jar.disabled}\n")

cachejar.factory.disabled = True
print(f"Factory: {cachejar.factory.disabled}")
print(f"Jar: {jar.disabled}\n")

jar.disabled = True
print(f"Factory: {cachejar.factory.disabled}")
print(f"Jar: {jar.disabled}\n")

cachejar.factory.disabled = False
print(f"Factory: {cachejar.factory.disabled}")
print(f"Jar: {jar.disabled}")
jar.disabled = False

### 'CacheJar`

A `CacheJar` instance manages the cached objects for a given application.  Its constructor
automatically creates a directory with the name of the application id and a JSON index file within it.

#### `CacheJar.update`
The `CacheJar.update` method takes:
1. the name a file, directory or a URL
2. an object that has been loaded from that file
3. an object identifier (typically the object itself)
4. any additional positional or keyword parameters that determined the final contents

It computes a signature on the file/url -- if it is a URL, the signature consists of the `Last-Modified`, `Content-Length` and `ETag` elements returned from the http `HEAD` command.  If it is a file, it consists of the mode, size and modtime.  If a cache entries already exist for the file/url and the signature has changed, all of those entries are removed. The supplied object is then pickled, saved into an internal file and added to the cache index.

#### `CacheJar.object_for
The `CacheJar.object_for method takes:
1. the name a file, directory or a URL
2. an object identifier (typically the object itself)
3. any additional positional or keyword parameters that determined the final contents

As with the `update` method, a signature is computed and the cache is flushed if the signature is changed.  Otherwise, if a pickled file exists for that signature, object identifier and parameters, the picked file is reconstituted into its component object

#### Other methods
**`CacheJar.disabled`** - if `True`, the jar is not retrieving or storing information in the cache.  `disabled` can be set to true directly, or as a result of setting the `disabled` property on the factory level.  If set on the factory level, the cache can only be re-enabled on the factory level as well.

**`CacheJar.clean`** - force the removal of all cache entries for a given name_or_url, the cache entries for a given object_id/parms/kwparms identifier or the intersection of the parameters if both are supplied.

**`CacheJar.clear`** - remove all the cache entries for the associated application directory.  
