From 30464f128b1d31c16c1b798d7b193b90971e1342 Mon Sep 17 00:00:00 2001 From: Paul Khuong Date: Tue, 7 Sep 2021 21:51:43 -0400 Subject: [PATCH] more documentation More linkage, and add some sample usage to `lib.rs`. TESTED=it's all comments. --- src/lib.rs | 246 ++++++++++++++++++++++++++++++++++++++++++- src/plain.rs | 15 +-- src/raw_cache.rs | 6 +- src/readonly.rs | 51 ++++++--- src/second_chance.rs | 11 +- src/sharded.rs | 21 ++-- src/stack.rs | 91 +++++++++++----- 7 files changed, 374 insertions(+), 67 deletions(-) diff --git a/src/lib.rs b/src/lib.rs index c9ede62..786fa06 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -1,3 +1,235 @@ +//! Kismet implements multiprocess lock-free[^lock-free-fs] +//! application-crash-safe (roughly) bounded persistent caches stored +//! in filesystem directories, with a +//! [Second Chance (Clock)](https://en.wikipedia.org/wiki/Page_replacement_algorithm#Second-chance) +//! eviction strategy. The maintenance logic is batched and invoked +//! at periodic jittered intervals to make sure accesses amortise to a +//! constant number of filesystem system calls and logarithmic (in the +//! number of cached file) time complexity. That's good for performance, +//! and enables lock-freedom,[^unlike-ccache] but does mean that +//! caches are expected to temporarily grow past their capacity +//! limits, although rarely by more than a factor of 2 or 3. +//! +//! [^lock-free-fs]: Inasmuch as anything that makes a lot of syscalls +//! can be "lock-free." The cache access algorithms implement a +//! protocol that makes a bounded number of file open, rename, or link +//! syscalls; in other words, reads and writes are as wait-free as +//! these syscalls. The batched second-chance maintenance algorithm, +//! on the other hand, is merely lock-free: it could theoretically get +//! stuck if writers kept adding new files to a cache (sub)directory. +//! Again, this guarantee is in terms of file access and directory +//! enumeration syscalls, and maintenance is only as lock-free as the +//! underlying syscalls. However, we can assume the kernel is +//! reasonably well designed, and doesn't let any sequence of syscalls +//! keep its hand on kernel locks forever. +//! +//! [^unlike-ccache]: This design choice is different from, e.g., +//! [ccache](https://ccache.dev/)'s, which attempts to maintain +//! statistics per shard with locked files. Under high load, the lock +//! to update ccache statistics becomes a bottleneck. Yet, despite +//! taking this hit, ccache batches evictions like Kismet, because +//! cleaning up a directory is slow; up-to-date access statistics +//! aren't enough to enforce tight cache capacity limits. +//! +//! In addition to constant per-cache space overhead, each Kismet +//! cache maintains a variable-length [`std::path::PathBuf`] for the +//! directory, one byte of lock-free metadata per shard, and no other +//! non-heap resource (i.e., Kismet caches do not hold on to file +//! objects). This holds for individual cache directories; when +//! stacking multiple caches in a [`Cache`], the read-write cache and +//! all constituent read-only caches will each have their own +//! `PathBuf` and per-shard metadata. +//! +//! When a Kismet cache triggers second chance evictions, it will +//! allocate temporary data. That data's size is proportional to the +//! number of files in the cache shard subdirectory undergoing +//! eviction (or the whole directory for a plain unsharded cache), and +//! includes a copy of the name (without the path prefix) for each +//! cached file in the subdirectory (or plain cache directory). This +//! eviction process is linearithmic-time in the number of files in +//! the cache subdirectory (directory), and is invoked periodically, +//! so as to amortise the maintenance time overhead to logarithmic +//! per write to a cache subdirectory. +//! +//! Kismet does not pre-allocate any long-lived file object, so it may +//! need to temporarily open file objects. However, each call into +//! Kismet will always bound the number of concurrently allocated file +//! objects; the current logic never allocates more than two +//! concurrent file objects. +//! +//! The load (number of files) in each cache may exceed the cache's +//! capacity because there is no centralised accounting, except for +//! what filesystems provide natively. This design choice forces +//! Kismet to amortise maintenance calls with randomisation, but also +//! means that any number of threads or processes may safely access +//! the same cache directories without any explicit synchronisation. +//! +//! Filesystems can't be trusted to provide much; Kismet only relies +//! on file modification times (`mtime`), and on file access times +//! (`atime`) that are either less than or equal to the `mtime`, or +//! greater than the `mtime` (i.e., `relatime` is acceptable). This +//! implies that cached files should not be linked in multiple Kismet +//! cache directories. It is however safe to hardlink cached files in +//! multiple places, as long as the files are not modified, or their +//! `mtime` otherwise updated, through these non-Kismet links. +//! +//! Kismet cache directories are plain (unsharded) or sharded. +//! +//! Plain Kismet caches are simply directories where the cache entry for +//! "key" is the file named "key." These are most effective for +//! read-only access to cache directories managed by some other +//! process, or for small caches of up to ~100 cached files. +//! +//! Sharded caches scale to higher capacities, by indexing into one of +//! a constant number of shard subdirectories with a hash, and letting +//! each shard manage fewer files (ideally 10-100 files). They are +//! also much less likely to grow to multiples of the target capacity +//! than plain (unsharded) cache directories. +//! +//! Simple usage should be covered by the [`ReadOnlyCache`] or +//! [`Cache`] structs, which wrap [`plain::Cache`] and +//! [`sharded::Cache`] in a convenient type-erased interface. The +//! caches *do not* invoke [`std::fs::File::sync_all`] or [`std::fs::File::sync_data`]: +//! the caller should sync files before letting Kismet persist them in +//! a cache if necessary. File synchronisation is not automatic +//! because it makes sense to implement persistent filesystem caches +//! that are erased after each boot, e.g., via +//! [tmpfiles.d](https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html), +//! or by tagging cache directories with a +//! [boot id](https://man7.org/linux/man-pages/man3/sd_id128_get_machine.3.html). +//! +//! The cache code also does not sync the parent cache directories: we +//! assume that it's safe, if unfortunate, for caches to lose data or +//! revert to an older state after kernel or hardware crashes. In +//! general, the code attempts to be robust again direct manipulation +//! of the cache directories. It's always safe to delete cache files +//! from kismet directories (ideally not recently created files in +//! `.kismet_temp` directories), and even *adding* files should mostly +//! do what one expects: they will be picked up if they're in the +//! correct place (in a plain unsharded cache directory or in the +//! correct shard subdirectory), and eventually evicted if useless or +//! in the wrong shard. +//! +//! It is however essential to only publish files atomically to the +//! cache directories, and it probably never makes sense to modify +//! cached file objects in place. In fact, Kismet always set files +//! readonly before publishing them to the cache and always returns +//! read-only [`std::fs::File`] objects for cached data. +//! +//! # Sample usage +//! +//! One could access a list of read-only caches with a [`ReadOnlyCache`]. +//! +//! ```no_run +//! const NUM_SHARDS: usize = 10; +//! +//! let read_only = kismet_cache::ReadOnlyCacheBuilder::new() +//! .plain("/tmp/plain_cache") // Read first here +//! .sharded("/tmp/sharded_cache", NUM_SHARDS) // Then try there. +//! .build(); +//! +//! // Attempt to read the file for key "foo", with primary hash 1 +//! // and second hash 2, first from `/tmp/plain.cache`, and then +//! // from `/tmp/sharded_cache`. In practice, the hashes should +//! // probably be populated with by implementing the `From<&'a T>` +//! // trait, and passing a `&T` to the cache methods. +//! read_only.get(kismet_cache::Key::new("foo", 1, 2)); +//! ``` +//! +//! Read-write accesses should use a [`Cache`]: +//! +//! ```no_run +//! struct CacheKey { +//! // ... +//! } +//! +//! fn get_contents(key: &CacheKey) -> Vec { +//! // ... +//! # unreachable!() +//! } +//! +//! impl<'a> From<&'a CacheKey> for kismet_cache::Key<'a> { +//! fn from(key: &CacheKey) -> kismet_cache::Key { +//! // ... +//! # unreachable!() +//! } +//! } +//! +//! +//! // It's easier to increase the capacity than the number of shards, +//! // so, when in doubt, prefer a few too many shards with a lower +//! // capacity. It's not incorrect to increase the number of shards, +//! // but will result in lost cached data (eventually deleted), since +//! // Kismet does not assign shards with a consistent hash. +//! const NUM_SHARDS: usize = 100; +//! const CAPACITY: usize = 1000; +//! +//! # fn main() -> std::io::Result<()> { +//! use std::io::Read; +//! use std::io::Write; +//! +//! let cache = kismet_cache::CacheBuilder::new() +//! .sharded_writer("/tmp/root_cache", NUM_SHARDS, CAPACITY) +//! .plain_reader("/tmp/extra_cache") // Try to fill cache misses here +//! .build(); +//! +//! let key: CacheKey = // ... +//! # CacheKey {} +//! ; +//! +//! // Fetches the current cached value for `key`, or populates it with +//! // the closure argument if missing. +//! let mut cached_file = cache +//! .ensure(&key, |file| { +//! file.write_all(&get_contents(&key))?; +//! file.sync_all() +//! })?; +//! let mut contents = Vec::new(); +//! cached_file.read_to_end(&mut contents)?; +//! # Ok(()) +//! # } +//! ``` +//! +//! # Cache directory structure +//! +//! Plain (unsharded) cache directories simply store the value for +//! each `key` under a file named `key`. They also have a single +//! `.kismet_temp` subdirectory, for temporary files. +//! +//! The second chance algorithm relies on mtime / atime (`relatime` +//! updates suffice), so merely opening a file automatically updates +//! the relevant read tracking metadata. +//! +//! Sharded cache directories store the values for each `key` under +//! one of two shard subdirectories. The primary and second potential +//! shards are respectively determined by multiplying `Key::hash` and +//! `Key::secondary_hash` by different odd integers before mapping the +//! result to the shard universe with a fixed point scaling. +//! +//! Each subdirectory is named `.kismet_$HEX_SHARD_ID`, and contains +//! cached files with name equal to the cache key, and a +//! `.kismet_temp` subsubdirectory, just like plain unsharded caches. +//! In fact, each such shard is managed exactly like a plain cache. +//! +//! Sharded caches attempt to balance load between two potential +//! shards for each cache key in an attempt to make all shards grow at +//! roughly the same rate. Once all the shards have reached their +//! capacity, the sharded cache will slowly revert to storing cache +//! keys in their primary shards. +//! +//! This scheme lets plain cache directories easily interoperate with +//! other programs that are not aware of Kismet, and also lets an +//! application use the same directory to back both a plain and a +//! sharded cache (concurrently or not) without any possibility of +//! collision between cached files and Kismet's internal directories. +//! +//! Kismet will always store its internal data in files or directories +//! start start with a `.kismet` prefix, and cached data lives in +//! files with names equal to their keys. Since Kismet sanitises +//! cache keys to forbid them from starting with `.`, `/`, or `\\`, it +//! is always safe for an application to store additional data in +//! files or directories that start with a `.`, as long as they do not +//! collide with the `.kismet` prefix. mod cache_dir; pub mod plain; pub mod raw_cache; @@ -18,10 +250,16 @@ pub use stack::CacheHitAction; /// subdirectory. pub const KISMET_TEMPORARY_SUBDIRECTORY: &str = ".kismet_temp"; -/// Sharded cache keys consist of a filename and two hash values. The -/// two hashes should be computed by distinct functions of the key's -/// name, and each hash function must be identical for all processes -/// that access the same sharded cache directory. +/// Cache keys consist of a filename and two hash values. The two +/// hashes should ideally be computed by distinct functions of the +/// key's name, but Kismet will function correctly if the `hash` and +/// `secondary_hash` are the same. Each hash function **must** be +/// identical for all processes that access the same sharded cache +/// directory. +/// +/// The `name` should not be empty nor start with a dot, forward +/// slash, a backslash: caches will reject any operation on such names +/// with an `ErrorKind::InvalidInput` error. #[derive(Clone, Copy, Debug)] pub struct Key<'a> { pub name: &'a str, diff --git a/src/plain.rs b/src/plain.rs index 1895e3c..59c40d2 100644 --- a/src/plain.rs +++ b/src/plain.rs @@ -1,11 +1,14 @@ -//! A `plain::Cache` stores all cached file in a single directory, and -//! periodically scans for evictions with a second chance strategy. -//! This implementation does not scale up to more than a few hundred -//! files per cache directory (a `sharded::Cache` can go higher), -//! but interoperates seamlessly with other file-based programs. +//! A [`crate::plain::Cache`] stores all cached file in a single +//! directory (there may also be a `.kismet_temp` subdirectory for +//! temporary files), and periodically scans for evictions with a +//! second chance strategy. This implementation does not scale up to +//! more than a few hundred files per cache directory (a +//! [`crate::sharded::Cache`] can go higher), but interoperates +//! seamlessly with other file-based programs that store cache files +//! in flat directories. //! //! This module is useful for lower level usage; in most cases, the -//! `Cache` is more convenient and just as efficient. +//! [`crate::Cache`] is more convenient and just as efficient. //! //! The cache's contents will grow past its stated capacity, but //! should rarely reach more than twice that capacity. diff --git a/src/raw_cache.rs b/src/raw_cache.rs index 2466eb9..e2c5873 100644 --- a/src/raw_cache.rs +++ b/src/raw_cache.rs @@ -1,12 +1,12 @@ //! The raw cache module manages directories of read-only files //! subject to a (batched) Second Chance eviction policy. Calling -//! `prune` deletes files to make sure a cache directory does not +//! [`prune`] deletes files to make sure a cache directory does not //! exceed its capacity, in file count. The deletions will obey a //! Second Chance policy as long as insertions and updates go through -//! `insert_or_update` or `insert_or_touch`, in order to update the +//! [`insert_or_update`] or [`insert_or_touch`], in order to update the //! cached files' modification times correctly. Opening the cached //! file will automatically update its metadata to take that access -//! into account, but a path can also be `touch`ed explicitly. +//! into account, but a path can also be [`touch`]ed explicitly. //! //! This module implements mechanisms, but does not hardcode any //! policy... except the use of a second chance strategy. diff --git a/src/readonly.rs b/src/readonly.rs index de70d06..8b51443 100644 --- a/src/readonly.rs +++ b/src/readonly.rs @@ -4,6 +4,8 @@ //! and easy-to-use interface that erases the difference between plain //! and sharded caches. use std::fs::File; +#[allow(unused_imports)] // We refer to this enum in comments. +use std::io::ErrorKind; use std::io::Result; use std::path::Path; use std::sync::Arc; @@ -49,7 +51,7 @@ impl ReadSide for ShardedCache { } } -/// Construct a `ReadOnlyCache` with this builder. The resulting +/// Construct a [`ReadOnlyCache`] with this builder. The resulting /// cache will access each constituent cache directory in the order /// they were added. /// @@ -59,13 +61,20 @@ pub struct ReadOnlyCacheBuilder { stack: Vec>, } -/// A `ReadOnlyCache` wraps an arbitrary number of caches, and -/// attempts to satisfy `get` and `touch` requests by hitting each -/// constituent cache in order. This interface hides the difference -/// between plain and sharded cache directories, and should be the -/// first resort for read-only uses. +/// A [`ReadOnlyCache`] wraps an arbitrary number of +/// [`crate::plain::Cache`] and [`crate::sharded::Cache`], and attempts +/// to satisfy [`ReadOnlyCache::get`] and [`ReadOnlyCache::touch`] +/// requests by hitting each constituent cache in order. This +/// interface hides the difference between plain and sharded cache +/// directories, and should be the first resort for read-only uses. /// /// The default cache wraps an empty set of constituent caches. +/// +/// [`ReadOnlyCache`] objects are stateless and cheap to clone; don't +/// put an [`Arc`] on them. Avoid creating multiple +/// [`ReadOnlyCache`]s for the same stack of directories: there is no +/// internal state to maintain, so multiple instances simply waste +/// memory without any benefit. #[derive(Clone, Debug)] pub struct ReadOnlyCache { stack: Arc<[Box]>, @@ -114,7 +123,7 @@ impl ReadOnlyCacheBuilder { self } - /// Returns a fresh `ReadOnlyCache` for the builder's search list + /// Returns a fresh [`ReadOnlyCache`] for the builder's search list /// of constituent cache directories. pub fn build(self) -> ReadOnlyCache { ReadOnlyCache::new(self.stack) @@ -135,15 +144,19 @@ impl ReadOnlyCache { } /// Attempts to open a read-only file for `key`. The - /// `ReadOnlyCache` will query each constituent cache in order of - /// registration, and return a read-only file for the first hit. + /// [`ReadOnlyCache`] will query each constituent cache in order + /// of registration, and return a read-only file for the first + /// hit. /// - /// Fails with `ErrorKind::InvalidInput` if `key.name` is invalid - /// (empty, or starts with a dot or a forward or back slash). + /// Fails with [`ErrorKind::InvalidInput`] if `key.name` is + /// invalid (empty, or starts with a dot or a forward or back slash). /// - /// Returns `None` if no file for `key` can be found in any of the - /// constituent caches, and bubbles up the first I/O error + /// Returns [`None`] if no file for `key` can be found in any of + /// the constituent caches, and bubbles up the first I/O error /// encountered, if any. + /// + /// In the worst case, each call to `get` attempts to open two + /// files for each cache directory in the `ReadOnlyCache` stack. pub fn get<'a>(&self, key: impl Into>) -> Result> { fn doit(stack: &[Box], key: Key) -> Result> { for cache in stack.iter() { @@ -163,14 +176,18 @@ impl ReadOnlyCache { } /// Marks a cache entry for `key` as accessed (read). The - /// `ReadOnlyCache` will touch the same file that would be returned - /// by `get`. + /// [`ReadOnlyCache`] will touch the same file that would be + /// returned by `get`. /// - /// Fails with `ErrorKind::InvalidInput` if `key.name` is invalid - /// (empty, or starts with a dot or a forward or back slash). + /// Fails with [`ErrorKind::InvalidInput`] if `key.name` is + /// invalid (empty, or starts with a dot or a forward or back slash). /// /// Returns whether a file for `key` could be found, and bubbles /// up the first I/O error encountered, if any. + /// + /// In the worst case, each call to `touch` attempts to update the + /// access time on two files for each cache directory in the + /// `ReadOnlyCache` stack. pub fn touch<'a>(&self, key: impl Into>) -> Result { fn doit(stack: &[Box], key: Key) -> Result { for cache in stack.iter() { diff --git a/src/second_chance.rs b/src/second_chance.rs index 2610039..06a42b5 100644 --- a/src/second_chance.rs +++ b/src/second_chance.rs @@ -1,8 +1,9 @@ -//! The Second Chance or Clock page replacement policy is a simple -//! approximation of the Least Recently Used policy. Kismet uses the -//! second chance policy because it can be easily implemented on top -//! of the usual file modification and access times that we can trust -//! operating systems to update for us. +//! The [Second Chance or Clock](https://en.wikipedia.org/wiki/Page_replacement_algorithm#Second-chance) +//! page replacement policy is a simple approximation of the Least +//! Recently Used policy. Kismet uses the second chance policy +//! because it can be easily implemented on top of the usual file +//! modification and access times that we can trust operating systems +//! to update for us. //! //! This second chance implementation is optimised for *batch* //! maintenance: the caller is expected to perform a number of diff --git a/src/sharded.rs b/src/sharded.rs index bb38167..cd9ca42 100644 --- a/src/sharded.rs +++ b/src/sharded.rs @@ -1,13 +1,18 @@ -//! A `sharded::Cache` uses the same basic file-based second chance -//! strategy as a `plain::Cache`. However, while the simple plain -//! cache is well suited to small caches (down to 2-3 files, and up -//! maybe one hundred), this sharded version can scale nearly -//! arbitrarily high: each shard should have fewer than one hundred or -//! so files, but there may be arbitrarily many shards (up to -//! filesystem limits, since each shard is a subdirectory). +//! A [`crate::sharded::Cache`] uses the same basic file-based second +//! chance strategy as a [`crate::plain::Cache`]. However, while the +//! simple plain cache is well suited to small caches (down to 2-3 +//! files, and up maybe one hundred), this sharded version can scale +//! nearly arbitrarily high: each shard should have fewer than one +//! hundred or so files, but there may be arbitrarily many shards (up +//! to filesystem limits, since each shard is a subdirectory). +//! +//! A sharded cache directory consists of shard subdirectories (with +//! name equal to the shard index printed as `%04x`), each of which +//! contains the cached files for that shard, under their `key` name, +//! and an optional `.kismet_temp` subdirectory for temporary files. //! //! This module is useful for lower level usage; in most cases, the -//! `Cache` is more convenient and just as efficient. +//! [`crate::Cache`] is more convenient and just as efficient. //! //! The cache's contents will grow past its stated capacity, but //! should rarely reach more than twice that capacity, especially diff --git a/src/stack.rs b/src/stack.rs index dcef42d..fc850e6 100644 --- a/src/stack.rs +++ b/src/stack.rs @@ -1,8 +1,9 @@ -//! We expect most callers to interact with Kismet via the `Cache` -//! struct defined here. A `Cache` hides the difference in behaviour -//! between plain and sharded caches via late binding, and lets -//! callers transparently handle misses by looking in a series of -//! secondary cache directories. +//! We expect most callers to interact with Kismet via the [`Cache`] +//! struct defined here. A [`Cache`] hides the difference in +//! behaviour between [`crate::plain::Cache`] and +//! [`crate::sharded::Cache`] via late binding, and lets callers +//! transparently handle misses by looking in a series of secondary +//! cache directories. use std::borrow::Cow; use std::fs::File; use std::io::Error; @@ -98,20 +99,29 @@ impl FullCache for ShardedCache { } } -/// Construct a `Cache` with this builder. The resulting cache will +/// Construct a [`Cache`] with this builder. The resulting cache will /// always first access its write-side cache (if defined), and, on -/// misses, will attempt to service `get` and `touch` calls by -/// iterating over the read-only caches. +/// misses, will attempt to service [`Cache::get`] and +/// [`Cache::touch`] calls by iterating over the read-only caches. #[derive(Debug, Default)] pub struct CacheBuilder { write_side: Option>, read_side: ReadOnlyCacheBuilder, } -/// A `Cache` wraps either up to one plain or sharded read-write cache -/// in a convenient interface, and may optionally fulfill read +/// A [`Cache`] wraps either up to one plain or sharded read-write +/// cache in a convenient interface, and may optionally fulfill read /// operations by deferring to a list of read-only cache when the /// read-write cache misses. +/// +/// The default cache has no write-side and an empty stack of backup +/// read-only caches. +/// +/// [`Cache`] objects are cheap to clone and lock-free; don't put an +/// [`Arc`] on them. Avoid opening multiple caches for the same set +/// of directories: using the same [`Cache`] object improves the +/// accuracy of the write cache's lock-free in-memory statistics, when +/// it's a sharded cache. #[derive(Clone, Debug, Default)] pub struct Cache { write_side: Option>, @@ -129,7 +139,7 @@ pub enum CacheHit<'a> { Secondary(&'a mut File), } -/// What to do with a cache hit in a `get_or_update` call? +/// What to do with a cache hit in a [`Cache::get_or_update`] call? pub enum CacheHitAction { /// Return the cache hit as is. Accept, @@ -209,7 +219,7 @@ impl CacheBuilder { self } - /// Returns a fresh `Cache` for the builder's write cache and + /// Returns a fresh [`Cache`] for the builder's write cache and /// additional search list of read-only cache directories. pub fn build(self) -> Cache { Cache { @@ -225,12 +235,16 @@ impl Cache { /// additional read-only cache, in definition order, and return a /// read-only file for the first hit. /// - /// Fails with `ErrorKind::InvalidInput` if `key.name` is invalid + /// Fails with [`ErrorKind::InvalidInput`] if `key.name` is invalid /// (empty, or starts with a dot or a forward or back slash). /// - /// Returns `None` if no file for `key` can be found in any of the + /// Returns [`None`] if no file for `key` can be found in any of the /// constituent caches, and bubbles up the first I/O error /// encountered, if any. + /// + /// In the worst case, each call to `get` attempts to open two + /// files for the [`Cache`]'s read-write directory and for each + /// read-only backup directory. pub fn get<'a>(&self, key: impl Into>) -> Result> { fn doit( write_side: Option<&dyn FullCache>, @@ -257,8 +271,10 @@ impl Cache { /// populates the cache with a file filled by `populate`. Returns /// a file in all cases (unless the call fails with an error). /// - /// Fails with `ErrorKind::InvalidInput` if `key.name` is invalid - /// (empty, or starts with a dot or a forward or back slash). + /// Fails with [`ErrorKind::InvalidInput`] if `key.name` is + /// invalid (empty, or starts with a dot or a forward or back slash). + /// + /// See [`Cache::get_or_update`] for more control over the operation. pub fn ensure<'a>( &self, key: impl Into>, @@ -276,12 +292,24 @@ impl Cache { /// filled by `populate`; otherwise obeys the value returned by /// `judge` to determine what to do with the hit. /// - /// Fails with `ErrorKind::InvalidInput` if `key.name` is invalid - /// (empty, or starts with a dot or a forward or back slash). + /// Fails with [`ErrorKind::InvalidInput`] if `key.name` is + /// invalid (empty, or starts with a dot or a forward or back slash). /// /// When we need to populate a new file, `populate` is called with /// a mutable reference to the destination file, and the old /// cached file (in whatever state `judge` left it), if available. + /// cached file, if available. + /// + /// See [`Cache::ensure`] for a simpler interface. + /// + /// In the worst case, each call to `get_or_update` attempts to + /// open two files for the [`Cache`]'s read-write directory and + /// for each read-only backup directory, and fails to find + /// anything. `get_or_update` then publishes a new cached file + /// (in a constant number of file operations), but not before + /// triggering a second chance maintenance (time linearithmic in + /// the number of files in the directory chosen for maintenance, + /// but amortised to logarithmic). pub fn get_or_update<'a>( &self, key: impl Into>, @@ -357,13 +385,18 @@ impl Cache { /// Inserts or overwrites the file at `value` as `key` in the /// write cache directory. This will always fail with - /// `Unsupported` if no write cache was defined. + /// [`ErrorKind::Unsupported`] if no write cache was defined. /// - /// Fails with `ErrorKind::InvalidInput` if `key.name` is invalid + /// Fails with [`ErrorKind::InvalidInput`] if `key.name` is invalid /// (empty, or starts with a dot or a forward or back slash). /// /// Always consumes the file at `value` on success; may consume it /// on error. + /// + /// Executes in a bounded number of file operations, except for + /// the lock-free maintenance, which needs time linearithmic in + /// the number of files in the directory chosen for maintenance, + /// amortised to logarithmic. pub fn set<'a>(&self, key: impl Into>, value: impl AsRef) -> Result<()> { match self.write_side.as_ref() { Some(write) => write.set(key.into(), value.as_ref()), @@ -376,13 +409,19 @@ impl Cache { /// Inserts the file at `value` as `key` in the cache directory if /// there is no such cached entry already, or touches the cached - /// file if it already exists. + /// file if it already exists. This will always fail with + /// [`ErrorKind::Unsupported`] if no write cache was defined. /// - /// Fails with `ErrorKind::InvalidInput` if `key.name` is invalid + /// Fails with [`ErrorKind::InvalidInput`] if `key.name` is invalid /// (empty, or starts with a dot or a forward or back slash). /// /// Always consumes the file at `value` on success; may consume it /// on error. + /// + /// Executes in a bounded number of file operations, except for + /// the lock-free maintenance, which needs time linearithmic in + /// the number of files in the directory chosen for maintenance, + /// amortised to logarithmic. pub fn put<'a>(&self, key: impl Into>, value: impl AsRef) -> Result<()> { match self.write_side.as_ref() { Some(write) => write.put(key.into(), value.as_ref()), @@ -393,14 +432,18 @@ impl Cache { } } - /// Marks a cache entry for `key` as accessed (read). The `Cache` + /// Marks a cache entry for `key` as accessed (read). The [`Cache`] /// will touch the same file that would be returned by `get`. /// - /// Fails with `ErrorKind::InvalidInput` if `key.name` is invalid + /// Fails with [`ErrorKind::InvalidInput`] if `key.name` is invalid /// (empty, or starts with a dot or a forward or back slash). /// /// Returns whether a file for `key` could be found, and bubbles /// up the first I/O error encountered, if any. + /// + /// In the worst case, each call to `touch` attempts to update the + /// access time on two files for each cache directory in the + /// `ReadOnlyCache` stack. pub fn touch<'a>(&self, key: impl Into>) -> Result { fn doit( write_side: Option<&dyn FullCache>,