Skip to content
This repository has been archived by the owner on Dec 1, 2023. It is now read-only.

Commit

Permalink
add more to architecture document
Browse files Browse the repository at this point in the history
  • Loading branch information
mjpitz committed Sep 29, 2021
1 parent 2cf267e commit 777efc0
Showing 1 changed file with 112 additions and 36 deletions.
148 changes: 112 additions & 36 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,28 @@
# Architecture
# AetherFS Architecture

* [Background](#background)
* [Motivation](#motivation)
* [Concepts](#concepts)
* [Overview](#overview)
* [Goals](#goals)
* [Features](#features)
* [Implementation](#implementation)
* [Requirements](#requirements)
* [Components](#components)
* [aetherfs-agent](#aetherfs-agent)
* [aetherfs-server](#aetherfs-server)
* [AetherFS Agent](#aetherfs-agent)
* [AetherFS Server](#aetherfs-server)
* [Implementation](#implementation)
* [Interfaces](#interfaces)
* [HTTP File Server](#http-file-server)
* [Agent API](#agent-api)
* [Block API](#block-api)
* [Dataset API](#dataset-api)
* [Configuration](#configuration)
* [Clustering](#clustering)
* [Persistence](#persistence)
* [Caching](#caching)
* [Security & Privacy](#security--privacy)
* [Authentication](#authentication)
* [Authorization](#authorization)
* [Encryption at Rest](#encryption-at-rest)
* [Encryption in Transit](#encryption-in-transit)

## Background

Expand All @@ -19,62 +33,124 @@ resilient artifact distribution).

Sometime after Indeed developed RAD internally, we saw a similar system open sourced from [Netflix][] called [Hollow][].
Hollow is a Java library used to distribute in-memory datasets. Unlike RAD's file-system based approach, Hollow stored
everything in S3. However, both of these approaches had their own benefits and trade-offs.

Since leaving, I've often thought about what a modern take on this technology might look like. After spending some time
digging through internals of `git` and `docker`, I made a first pass at this. In the end, I was not satisfied with how
it came out so back to the drawing board I went.
everything in S3. While I have not used Hollow myself, I can see the utility it provides to Java ecosystem.

[Indeed]: https://www.indeed.com
[RAD]: https://www.youtube.com/watch?v=lDXdf5q8Yw8
[Netflix]: https://netflix.com
[Hollow]: https://github.com/Netflix/hollow

### Motivation

Since leaving Indeed, I've often thought about what a modern take on this technology might look like. In addition to
this curiosity, I've found myself wanting a similar solution that can be used on edge or IoT devices where storage is
limited or non-existent.

### Concepts

**Dataset**

At Indeed, we referred to these as "artifacts," but I often found the term to be too generic in conversation. In
AetherFS, we refer to collections of information as a _dataset_. Datasets can be tagged, similar to containers. This
allows publishers to manage their own history and channels for consumers.

For example, you might maintain a `stable` tag that contains the latest stable version of the dataset. To help insulate
consumers, you might also manage a `next` tag that contains the next version of the dataset. This allows consumers to
follow the `stable` tag in production, and the `next` tag in development.

You can also follow your standard [semantic][] or [calendar][] versioning tags to maintain a history of all versions of the
dataset. This is particularly useful should you need to rollback a change to a dataset.

[semantic]: https://semver.org/
[calendar]: https://calver.org/

**BitTorrent**

Indeed's RAD ecosystem used the [BitTorrent][] protocol to replicate information around the world. This was done to
reduce the data load on the producer machine. However, for Indeed to leverage BitTorrent, they needed to modify the
torrent manifest to propagate the last modified times for a file. This adds a maintenance burden since we would then
need to maintain a [fork][]. Similarly, the academic community has latched onto this at [Florida State University][]
where they use BitTorrent to share large datasets between researchers.

While AetherFS does not use BitTorrent, we do lift some concepts from the protocol. For example, our dataset manifest
uses a similar structure to a BitTorrent manifest since we deal with similar structures. Similar to BitTorrent, AetherFS
chunks the data into blocks, optimized for storage in [AWS S3][] (or equivalent). When read from S3, we break blocks
down into smaller, cache optimized blocks. For better performance, we can tier the sizes of our caching layers. This
will be explained more in depth later on.

[Florida State University]: https://web.archive.org/web/20130402200554/https://www.hpc.fsu.edu/index.php?option=com_wrapper&view=wrapper&Itemid=80
[BitTorrent]: https://en.wikipedia.org/wiki/BitTorrent
[fork]: https://github.com/indeedeng/ttorrent
[AWS S3]: https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html

**Signature**

Each block stored in S3 is given a unique, cryptographic signature that represents the contents of the block (i.e. a
cryptographic hash). Signatures allow clients to check if a block already exists, to download a block, and to upload a
block.

## Overview

For the most part, this document focuses on the design of an AP system similar to Indeed's RAD system. Instead of using
bittorrent to replicate data, we opt for a simpler, replicated architecture.
This document focuses on the design of a highly available, partition-tolerant virtual file system for small to medium
datasets.

### Requirements

### Features
- [ ] Efficiently use [AWS S3][] (or equivalent) to store dataset information.
- [ ] Information should be encrypted in transit and at rest.
- [ ] Authenticate clients (users and services) using common schemes (OIDC, Basic).
- [ ] Enforce access controls around datasets.
- [ ] Provide numerous interfaces to manage and access information in the system.
- [ ] Build in developer tools to help artifact producers understand the performance of their datasets

### Components

- [ ] Ideal for small to medium datasets (KB, MB, and GB ; not TB or PB)
- [ ] Data encrypted in transit
- [ ] Authentication and authorization controls over datasets
- [ ] Support for any file type in any language
- [ ] Backed by Amazon [S3 API][] (AWS S3, [MinIO][], etc)
- [ ] Caches hot data amongst cluster peers using [groupcache][]
- [ ] Single agent process with minimal API
- [ ] [Prometheus][] / [Grafana][] for usage tracking
#### AetherFS Agent

[S3 API]: https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html
[MinIO]: https://min.io/
[groupcache]: https://github.com/golang/groupcache
[Prometheus]: https://prometheus.io/
[Grafana]: https://grafana.com/
#### AetherFS Server

## Implementation

<!--
[![](https://mermaid.ink/img/eyJjb2RlIjoiZ3JhcGggVERcbiAgICBwcm9kdWNlclxuICAgIHByb2R1Y2VyLWFnZW50W2FldGhlcmZzLWFnZW50XVxuXG4gICAgY29uc3VtZXJcbiAgICBjb25zdW1lci1hZ2VudFthZXRoZXJmcy1hZ2VudF1cblxuICAgIHNlcnZlci0xW2FldGhlcmZzLXNlcnZlcl1cbiAgICBzZXJ2ZXItMlthZXRoZXJmcy1zZXJ2ZXJdXG4gICAgc2VydmVyLTNbYWV0aGVyZnMtc2VydmVyXVxuXG4gICAgYXdzLXMzW0FXIFMzXVxuXG4gICAgc3ViZ3JhcGggcHJvZHVjZXItcG9kXG4gICAgICAgIHByb2R1Y2VyIC0tIGFldGhlcmZzLmFnZW50LnYxLkFnZW50QVBJL1B1Ymxpc2ggLS0-IHByb2R1Y2VyLWFnZW50XG4gICAgZW5kXG5cbiAgICBzdWJncmFwaCBjb25zdW1lci1wb2RcbiAgICAgICAgY29uc3VtZXIgLS0gYWV0aGVyZnMuYWdlbnQudjEuQWdlbnRBUEkvU3Vic2NyaWJlIC0tPiBjb25zdW1lci1hZ2VudFxuICAgIGVuZFxuXG4gICAgcHJvZHVjZXItYWdlbnQgLS0gYWV0aGVyZnMuZGF0YXNldC52MS5EYXRhc2V0QVBJL1B1Ymxpc2ggLS0-IHNlcnZlci0xXG4gICAgcHJvZHVjZXItYWdlbnQgLS0gYWV0aGVyZnMuYmxvY2sudjEuQmxvY2tBUEkvVXBsb2FkIC0tPiBzZXJ2ZXItMlxuICAgIHByb2R1Y2VyLWFnZW50IC0tPiBzZXJ2ZXItM1xuXG4gICAgY29uc3VtZXItYWdlbnQgLS0-IHNlcnZlci0xXG4gICAgY29uc3VtZXItYWdlbnQgLS0gYWV0aGVyZnMuYmxvY2sudjEuQmxvY2tBUEkvRG93bmxvYWQgLS0-IHNlcnZlci0yXG4gICAgY29uc3VtZXItYWdlbnQgLS0gYWV0aGVyZnMuZGF0YXNldC52MS5EYXRhc2V0QVBJL1N1YnNjcmliZSAtLT4gc2VydmVyLTNcblxuICAgIHNlcnZlci0xIC0tPiBhd3MtczNcbiAgICBzZXJ2ZXItMiAtLT4gYXdzLXMzXG4gICAgc2VydmVyLTMgLS0-IGF3cy1zM1xuIiwibWVybWFpZCI6eyJ0aGVtZSI6ImRlZmF1bHQifSwidXBkYXRlRWRpdG9yIjpmYWxzZSwiYXV0b1N5bmMiOnRydWUsInVwZGF0ZURpYWdyYW0iOmZhbHNlfQ)](https://mermaid-js.github.io/mermaid-live-editor/edit/#eyJjb2RlIjoiZ3JhcGggVERcbiAgICBwcm9kdWNlclxuICAgIHByb2R1Y2VyLWFnZW50W2FldGhlcmZzLWFnZW50XVxuXG4gICAgY29uc3VtZXJcbiAgICBjb25zdW1lci1hZ2VudFthZXRoZXJmcy1hZ2VudF1cblxuICAgIHNlcnZlci0xW2FldGhlcmZzLXNlcnZlcl1cbiAgICBzZXJ2ZXItMlthZXRoZXJmcy1zZXJ2ZXJdXG4gICAgc2VydmVyLTNbYWV0aGVyZnMtc2VydmVyXVxuXG4gICAgYXdzLXMzW0FXIFMzXVxuXG4gICAgc3ViZ3JhcGggcHJvZHVjZXItcG9kXG4gICAgICAgIHByb2R1Y2VyIC0tIGFldGhlcmZzLmFnZW50LnYxLkFnZW50QVBJL1B1Ymxpc2ggLS0-IHByb2R1Y2VyLWFnZW50XG4gICAgZW5kXG5cbiAgICBzdWJncmFwaCBjb25zdW1lci1wb2RcbiAgICAgICAgY29uc3VtZXIgLS0gYWV0aGVyZnMuYWdlbnQudjEuQWdlbnRBUEkvU3Vic2NyaWJlIC0tPiBjb25zdW1lci1hZ2VudFxuICAgIGVuZFxuXG4gICAgcHJvZHVjZXItYWdlbnQgLS0gYWV0aGVyZnMuZGF0YXNldC52MS5EYXRhc2V0QVBJL1B1Ymxpc2ggLS0-IHNlcnZlci0xXG4gICAgcHJvZHVjZXItYWdlbnQgLS0gYWV0aGVyZnMuYmxvY2sudjEuQmxvY2tBUEkvVXBsb2FkIC0tPiBzZXJ2ZXItMlxuICAgIHByb2R1Y2VyLWFnZW50IC0tPiBzZXJ2ZXItM1xuXG4gICAgY29uc3VtZXItYWdlbnQgLS0-IHNlcnZlci0xXG4gICAgY29uc3VtZXItYWdlbnQgLS0gYWV0aGVyZnMuYmxvY2sudjEuQmxvY2tBUEkvRG93bmxvYWQgLS0-IHNlcnZlci0yXG4gICAgY29uc3VtZXItYWdlbnQgLS0gYWV0aGVyZnMuZGF0YXNldC52MS5EYXRhc2V0QVBJL1N1YnNjcmliZSAtLT4gc2VydmVyLTNcblxuICAgIHNlcnZlci0xIC0tPiBhd3MtczNcbiAgICBzZXJ2ZXItMiAtLT4gYXdzLXMzdFxuICAgIHNlcnZlci0zIC0tPiBhd3MtczNcbiIsIm1lcm1haWQiOiJ7XG4gIFwidGhlbWVcIjogXCJkZWZhdWx0XCJcbn0iLCJ1cGRhdGVFZGl0b3IiOmZhbHNlLCJhdXRvU3luYyI6dHJ1ZSwidXBkYXRlRGlhZ3JhbSI6ZmFsc2V9)
-->

### Components
### Interfaces

#### HTTP File Server

#### Agent API

#### Block API

#### aetherfs-agent
#### Dataset API

The `aetherfs-agent` process is responsible for managing the local file-system. In Kubernetes, this should be run as a
sidecar to the main process. Operations can be performed programmatically. More often, consumers can simply watch the
file system for when a new version of a dataset becomes available.
### Configuration

#### aetherfs-server
#### Clustering

The `aetherfs-server` process translates data stored in S3 to the client. It provides a `DatasetAPI` that allows callers
to resolve information about datasets the user has access to.
<!-- how are clusters of nodes formed -->

#### Persistence

<!-- how and where is information stored -->

#### Caching

<!-- how and where is information cached -->

### Security & Privacy

#### Authentication

<!-- how are users and systems authenticated -->

#### Authorization

#### Encryption at Rest

For the most part, AetherFS expects your small blob storage solution to provide this functionality. After an initial
search, it seemed like most provide some form of encryption at rest.
search, it seemed like most solutions provide some form of encryption at rest.

#### Encryption in Transit

Expand Down

0 comments on commit 777efc0

Please sign in to comment.