# Distributed File Systems

## What _is_ a file system?
* Contains files and directories
* Abstraction to prevent users from dealing with disk and memory blocks

![](img/typical_file.png)

## What about directories
* Treated just like files
* But intsead of containing file contents in blocks, they contain pointers to file objects
* Really just a special-case of files

### Unix File System
* Notion of __file descriptors__ Handle for a process to access a file
* Processes must _open_ a file descriptor for read/writes
    * OS charged with creating an internal datastructure for handling these process file operations
* `fd=open(name, mode)`
* `fd=create(name, mode)`
* `fd=close(name)`
* file descriptior (`fd`) maintains a read-write pointer pointing to an offset within a file
* `fd=read(filedes, buffer, num_bytes` reads `num_bytes` from the start of the pointer into the bufer, and auto-advances the pointer by `num_bytes`.
* `write(fd, buffer num_bytes)` also writes from pointer and advances pointer
* pos = `lseek(fd, offset, whence)`
    * Move rw-ptr to that offset
    * `whence` allows this to be a relative move (otherwise absolute)
* `status = link(old_link, new_link)`
    * Creates a new link at second arg to the file at first arg
    * Known as a "hard link"
    * Increments ref count
        * "sym linking" _doesn't_ increment the ref count; instead, creates a file that contains a pointer to the first arg.
* `status = unlink(old_link)` removes the hard link, decrementing the ref count.
* `status = stat/fstat(file_name, buffer)` reads file header attributes into `buffer`.

## Distributed File Systems
* Files are stored on a server
* Clients must use RPCs to perform operations on files

### Desirable Properties of DFS
* __Transparency__ Client accesses DFS files as if they were local
    * Client API for remote file ops should be the same as local ops
    * Client shouldn't need to care about behind-the-scenes replication, actual file location
* Support concurrent clients
    * Multiple clients r/w 
* Replication for fault-tolerance

### Concurrent Access in DFS
* __One-copy update__ semantics
    * When a file is replicated, clients may still operate on the file as if it had only one copy.
* At most once operation vs. At least once operation
    * Choose carefully
    * At most once: Append operations can not be repeated
    * _Idempotent_ operations have no side effects when repeated: they can use at least once
        * Reading from an absolute offset - same result, regardless of repeated operations.
* Authentication
    * Verify clients are the users they act on behalf of
* Authorization
    * After a user is authenticated, verify they can operate upon the desired file
    * __Access Control Lists__ _per file_ list of which users can access which files
        * ? Any different from file permissions?
    * __Capability Lists__ 
        * _per user_ list of files they're allowed to access and type of access allowed
            * Can be split by capability; each contains lists of `(user, file)`

## Building a DFS
Codename: "Vanilla DFS"

* Runs on a server and on multiple clients
* Has three processes
    * Flat file service @ server
        * Just one huge list of file IDs
    * Directory service @ server
        * Talks to flat file service
        * "Client of" flat file service
        * Manages and imposes the hierarchical nature of files onto the flat file service
    * Client service @ client
        * Talks to directory service and Flat file service
        
### Flat File Service API
* `read(file_id, position, num_bytes)`
    * Reads `num_bytes` from an absolute position in the `file_id` into `buffer`
        * Absolute position allows this to be idempotent
    
    * `file_id` is not a file descriptor; a unique ID for that file
    * No file descriptors
        * Server may crash
        * File descriptors imply _state_
* `write(file_id, buffer, position, num_bytes)
* `create`/`delete(file_id)`
* `get_attributes`/`set_attributes(file_id, buffer)`

### Directory Service API
* `file_id = lookup(dir, file_name)`
    * `file_id` can then be used on the flat file service
* `add_name(dir, file_name, buffer)`
    * Increments ref count
* `un_name(dir, file_name)`
    * Decrements ref count
* `list = get_name(dir, pattern)`
    * Like `ls -al` followed by a `grep`

## NFS (Network File System)

Came out of Sun Microsystems

### Architecture

![NFS Architecture](img/nfs_architecture.png)

#### NFS Client System
* Similar to Vanilla DFS "client"
* Performs RPCs to NFS Server system for DFS ops
* Integrates with kernel OS
* Optimizations
    * Caching - recently-accessed blocks
        * $T_c$: time of last validation (access)
        * $T_m$: time of last modification at server
        * $t$: Freshness interval. How old you'll tolerate a cache to be.
        * Cache entry is valid if: $(T-T_{c} < t)$ or $(T_{m_{client}} = T_{m_{server}})$
        * Delayed-write to server on any write. Update old blocks anyway

#### NFS Server System
* Plays role of both Flat File Service + Directory service from Vanilla DFS
* Allows __mounting__
    * Creates a pointer to another directory, even remote files
* Optimizations
    * Server caching - store recently-accessed files & directories in memory
        * Makes the most of locality 
    * Writes 
        * Delayed-write: Write to memory, flush periodically to disk
            * Fast, but you risk consistency with crashes
        * Write-through: Write to disk before ACK
            
    
#### Virtual File System
* Allows access via file descriptors
    * Local and remote files are indistinguishable
    * For each file op, decides between local file system or NFS client system
* Names all files (local & remote) uniquely using "NFS file handles"
* Keeps a data structure for each mounted file system
* Keeps a data structure called __v-node__ for all open files
    * If local file, `v-node` points to local disk inode
    * If remote, `v-node` contains pointer to remote file system

## AFS (Andrew File System)
Developed at CMU - "Andrew" from Andrew Carnegie

* Two principles
    * Whole file serving
        * Not in blocks; file is the smallest unit
    * Whole file caching
        * "Permamen"t cache - on disk
        * survives reboots
    * Why?
        * Most file accesses are by a single user
        * Most files are small
        * Client caches <= 100 MB is fine
        * Reads much more frequent than writes
* Client is the Venus system
* Server is the Vice system
* R/Ws are *optimistic*
    * Doneon local copy of file at client (Venus)
    * On file close, writes are propagated to server (Vice)
* When a client opens a file, the server
    1. sends the entire file 
    1. Gives client a callback promise
        * Promise the server will notify the client if any other clients modify the file
        * Callback is binary; 
            1. valid - file hasn't been modified on server
            1. cancelled - file has been modified; client should retrieve updated file
