-
Notifications
You must be signed in to change notification settings - Fork 3
Caching gateway proposal
cachedir would look like this:
cachedir/
├── data
│ ├── 17b2ffec941a63916ef621efdc4820672637ed9b22738586fe836bf7510e64da
│ ├── 17b2ffec941a63916ef621efdc4820672637ed9b22738586fe836bf7510e64da.json
│ ├── b88f5b4c71825db7decc7da7d6da2966d62191f701e7f09a745fcb0a3233e07c
│ └── b88f5b4c71825db7decc7da7d6da2966d62191f701e7f09a745fcb0a3233e07c.json
├── format.json
└── tmp
├── 3d859699-bf16-4f86-9886-63270cc529db
├── 5c7b4062-8776-4c6c-9db3-70908f8dc021
└── e20f233d-fc39-4a3c-ae1d-d3de7ed24c73
format.json would be:
{
"version" : 1,
"format" : "cachefs",
"time" : creation-time
}
Command line will look like:
minio --cache-dir /mnt/cache --cache-max 80 --cache-verify 1d gateway azure
--cache-dir : the cache directory
--cache-max : max percent of the free disk space that cache dir can use (can be calculated with the help of statvfs call) 80% by default
--cache-verify: Used for revalidation of the cached objects. For ex 1d
we check the current time and stat.atime and see if the atime is older than 1 day
and revalidate using etag. Default value is 0 which means never verify.
The caching software would first write to cachedir/tmp
and only when we are sure that it has complete object, we will commit it to cachedir/data
. The filenames in tmp
will be randomly generated uuid.
When the file is moved from tmp
to data
the name of the file will be sha256sum of the objectName.
Each cached object will also have a json metadatafile:
{
"version" : 1,
"name" : "bucket/golden-gate.jpg",
"anonymous" : false,
"httpMeta" : map[string]string
}
version
: json object version number. Will change if the format of this json changes.
name
: name of the object
anonymous
: indicates if the object was put in the cache because of an anonymous request. If anonymous
is true then it indicates that if the backend cloud storage is down then the object can be served anonymously.
httpMeta
: cached response headers.
Rough code:
type struct CacheObjectMeta {
Version string `json:"version"`
Name string `json:"name"`
Anonymous bool `json:"anonymous"`
HTTPMeta map[string]string `json:"httpMeta"`
}
// Implements Reader Writer Closer
type CacheResource struct {
os.File
tmpName string // UUID in tmp dir
objectName string // Should be converted to sha256sum when moving file to data dir.
}
type struct Cache {
path string
cacheMaxPercent int
expiryDays int
}
func (c Cache) Put(bucket, object string, anonymous bool) (*CacheResource, error) {
// Create a uuid file in tmp
// Return CacheResource
}
func (c Cache) Commit(resource *CacheResource, httpMeta map[string]string) error {
// Move from tmp to data and create its json file.
}
func (c Cache) Get(bucket, object string) (r io.ReadCloser, httpMeta map[string]string, anonymous bool, err error) {
// Open file from data directory and return it.
}
func (c Cache) Delete(bucket, object string) error {
// Delete from the data dir
}
func NewCache(cacheDir string, cacheMaxPercent int, expiryDays int) (*Cache, error) {
// If previously created format.json is of older version then cleanup the cache directory.
// Create format.json if it does not exist.
// Create data and tmp directories.
}
Cache eviction algo:
cacheEvict() {
cacheMax = cacheMaxPercent
expiry = time in format.json
for {
break loop if disk-used < (80% of (80% of total-disk-size))
remove all objects not accessed for "expiry" days
expiry = expiry / 2
}
}
NOTE:
The caching feature would be consumed at the object-handlers layer (S3 layer) because caching would work for both minio server
and minio gateway