Skip to content

Caching gateway proposal

Krishna Srinivas edited this page Mar 30, 2017 · 14 revisions

cachedir would look like this:

cachedir/
├── data
│   ├── 17b2ffec941a63916ef621efdc4820672637ed9b22738586fe836bf7510e64da
│   ├── 17b2ffec941a63916ef621efdc4820672637ed9b22738586fe836bf7510e64da.json
│   ├── b88f5b4c71825db7decc7da7d6da2966d62191f701e7f09a745fcb0a3233e07c
│   └── b88f5b4c71825db7decc7da7d6da2966d62191f701e7f09a745fcb0a3233e07c.json
├── format.json
└── tmp
    ├── 3d859699-bf16-4f86-9886-63270cc529db
    ├── 5c7b4062-8776-4c6c-9db3-70908f8dc021
    └── e20f233d-fc39-4a3c-ae1d-d3de7ed24c73

format.json would be:

{
  "version" : 1,
  "format" : "cachefs",
  "time" : creation-time
}

Command line will look like:

minio --cache-dir /mnt/cache --cache-max 80 --cache-verify 1d gateway azure

--cache-dir : the cache directory

--cache-max : max percent of the free disk space that cache dir can use (can be calculated with the help of statvfs call) 80% by default

--cache-verify: Used for revalidation of the cached objects. For ex 1d we check the current time and stat.atime and see if the atime is older than 1 day and revalidate using etag. Default value is 0 which means never verify.

The caching software would first write to cachedir/tmp and only when we are sure that it has complete object, we will commit it to cachedir/data. The filenames in tmp will be randomly generated uuid. When the file is moved from tmp to data the name of the file will be sha256sum of the objectName.

Each cached object will also have a json metadatafile:

{
  "version" : 1,
  "name" : "bucket/golden-gate.jpg",
  "anonymous" : false,
  "httpMeta" : map[string]string
}

version: json object version number. Will change if the format of this json changes.

name: name of the object

anonymous: indicates if the object was put in the cache because of an anonymous request. If anonymous is true then it indicates that if the backend cloud storage is down then the object can be served anonymously.

httpMeta: cached response headers.

Rough code:

type struct CacheObjectMeta {                                                                                                                                                                                                                                                                                                                                                               
        Version string    `json:"version"`
        Name string       `json:"name"`
        Anonymous bool    `json:"anonymous"`                                                                                                                                                                                                                                                                                                                                                
        HTTPMeta map[string]string `json:"httpMeta"`
}                                                                                                                                                                                                                                                                                                                                                                                           

// Implements Reader Writer Closer                                                                                                                                                                                                                                                                                                                                                                                            
type CacheResource struct {                                                                                                                                                                                                                                                                                                                                                                 
        os.File                                                                                                                                                                                                                                                                                                                                                                             
        tmpName string          // UUID in tmp dir                                                                                                                                                                                                                                                                                                                                          
        objectName string       // Should be converted to sha256sum when moving file to data dir.                                                                                                                                                                                                                                                                                           
}                                                                                                                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                                                                                            
type struct Cache {                                                                                                                                                                                                                                                                                                                                                                         
        path string                                                                                                                                                                                                                                                                                                                                                                         
        cacheMaxPercent int
        expiryDays int
}                                                                                                                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                                                                                            
func (c Cache) Put(bucket, object string, anonymous bool) (*CacheResource, error) {                                                                                                                                                                                                                                                                                                                         
        // Create a uuid file in tmp                                                                                                                                                                                                                                                                                                                                                        
        // Return CacheResource                                                                                                                                                                                                                                                                                                                                                             
}                                                                                                                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                                                                                            
func (c Cache) Commit(resource *CacheResource, httpMeta map[string]string) error {                                                                                                                                                                                                                                                                                                                
        // Move from tmp to data and create its json file.                                                                                                                                                                                                                                                                                                                                  
}                                                                                                                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                                                                                            
func (c Cache) Get(bucket, object string) (r io.ReadCloser, httpMeta map[string]string, anonymous bool, err error) {                                                                                                                                                                                                                                                                                                    
        // Open file from data directory and return it.                                                                                                                                                                                                                                                                                                                                     
}                                                                                                                                                                                                                                                                                                                                                                                           

func (c Cache) Delete(bucket, object string) error {                                                                                                                                         
        // Delete from the data dir                                                                                                                                                          
}
                                                                                                                                                                                                                                                                                                                                                                                            
func NewCache(cacheDir string, cacheMaxPercent int, expiryDays int) (*Cache, error) {                                                                                                                                                                                                                                                                                                                                                
        // If previously created format.json is of older version then cleanup the cache directory.                                                                                                                                                                                                                                                                                          
        // Create format.json if it does not exist.                                                                                                                                                                                                                                                                                                                                         
        // Create data and tmp directories.                                                                                                                                                                                                                                                                                                                                                 
}

Cache eviction algo:

cacheEvict() {
        cacheMax = cacheMaxPercent
        expiry = time in format.json

        for {
            break loop if disk-used < (80% of (80% of total-disk-size))
            remove all objects not accessed for "expiry" days
            expiry = expiry / 2
        }
}

NOTE:

The caching feature would be consumed at the object-handlers layer (S3 layer) because caching would work for both minio server and minio gateway