Skip to content
This repository has been archived by the owner on Dec 1, 2023. It is now read-only.

Commit

Permalink
minor edits
Browse files Browse the repository at this point in the history
  • Loading branch information
mjpitz committed Oct 4, 2021
1 parent 784e3ee commit b39efae
Show file tree
Hide file tree
Showing 12 changed files with 160 additions and 92 deletions.
8 changes: 4 additions & 4 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
* [Overview](#overview)
* [Requirements](#requirements)
* [Components](#components)
* [AetherFS Hub](#aetherfs-hub)
* [AetherFS Agent](#aetherfs-agent)
* [AetherFS Server](#aetherfs-server)
* [Implementation](#implementation)
* [Interfaces](#interfaces)
* [HTTP File Server](#http-file-server)
Expand Down Expand Up @@ -112,10 +112,10 @@ AetherFS is distributed as a single binary. Each component provides both a REST
streaming APIs, not all gRPC calls are available on the REST interface. Additionally, the REST interface provides an
[HTTP file server](#http-file-server) where files can be read directly.

#### AetherFS Server
#### AetherFS Hub

The AetherFS server is the primary component in AetherFS. It provides the core interfaces that are leverage by all other
components in AetherFS. The server is responsible for managing the underlying storage tier and verifying the
The AetherFS Hub is the primary component in AetherFS. It provides the core interfaces that are leverage by all other
components in AetherFS. The hub is responsible for managing the underlying storage tier and verifying the
authenticated clients have access to the desired dataset.

#### AetherFS Agent
Expand Down
1 change: 1 addition & 0 deletions docs/CNAME
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
aetherfs.tech
46 changes: 46 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
AetherFS assists in the production, distribution, and replication of embedded databases and in-memory datasets. It
provides engineers with a platform to manage collections files called datasets. AetherFS optimizes its use of the
underlying blob store (AWS S3 or equivalent) to reduce cost to operators and improve performance for end users.

_Why not use S3 directly or a file server?_

While this is an option, there are several problems that arise with this solution. For example, to produce two
references to the same dataset, you must upload the same set of files twice. If you want to produce three references,
then three times (and so on). This comes at a cost of additional time in your pipeline and storage costs.

Instead, producers tag datasets in AetherFS. A tag can refer to a specific version ([semantic][] or [calendar][]) or a
channel that consumers can subscribe to (`latest`, `stable`, etc.). Instead of storing entire snapshots of datasets
in each version, AetherFS removes duplicated blocks between them. This allows clients to re-use blocks of data and only
download new or updated portions.

[semantic]: https://semver.org
[calendar]: https://calver.org

## Status

This project is under active development.

- Documentation
- Architecture Document
- Features
- HTTP file server for ease of interaction
- REST and gRPC APIs for programmatic interaction
- Optional agent that can manage a shared file systems
- Efficiently persist and query information stored in [AWS S3][] (or compatible)
- Authenticate using common schemes (such as OIDC)
- Enforce access control around datasets
- Encrypt data in transit and at rest
- Built-in developer tools to help understand dataset performance and usage

[AWS S3]: https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html


### Expectations

This is a project I'm mostly iterating on in my free time. It's closed source, and I have no intent to open source. If
you're interested in learning more or getting updates, please sign up using the link below.


[![Project Interest Form][]](https://forms.gle/uCMy38ZLEchfNuka9)

[Project Interest Form]: https://img.shields.io/badge/-Project%20Interest%20Form-blue?style=for-the-badge
6 changes: 6 additions & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
theme: jekyll-theme-minimal
exclude:
- AUTHORS
- LICENSE
title: AetherFS
description: A virtual file system for small to medium sized datasets (MB or GB, not TB or PB).
15 changes: 11 additions & 4 deletions internal/blocks/sizes.go
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
package blocks

var (
KiB int32 = 1 << 10
MiB int32 = 1 << 20
type Size int32

const (
Byte Size = 1 << (10 * iota)
Kibibyte
Mebibyte
)

PartSize = 64 * KiB
var (
// PartSize is a cache-optimized length that is used to send and share parts of a block amongst a group of nodes.
// It is also used during uploads and downloads as the segment sizes to avoid buffering gigabytes of data in memory.
PartSize = 64 * Kibibyte
)
2 changes: 1 addition & 1 deletion internal/commands/push.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ func Push() *cli.Command {
// cache some metadata for later on to make things easier
publishRequest := &datasetv1.PublishRequest{
Dataset: &datasetv1.Dataset{
BlockSize: cfg.BlockSize * blocks.MiB,
BlockSize: cfg.BlockSize * int32(blocks.Mebibyte),
},
}

Expand Down
6 changes: 3 additions & 3 deletions internal/commands/run.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ package commands
import (
"github.com/urfave/cli/v2"

"github.com/mjpitz/aetherfs/internal/commands/daemons"
"github.com/mjpitz/aetherfs/internal/commands/run"
)

// Run returns a command that can execute a given part of the ecosystem.
Expand All @@ -19,8 +19,8 @@ func Run() *cli.Command {
Usage: "Run the various AetherFS processes",
UsageText: "aetherfs run <process>",
Subcommands: []*cli.Command{
daemons.Agent(),
daemons.Server(),
//daemons.Agent(),
run.Hub(),
},
HideHelpCommand: true,
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
// Unauthorized copying of this file, via any medium is strictly prohibited.
// Written by Mya Pitzeruse, September 2021

package daemons
package run

import (
"fmt"
Expand All @@ -16,7 +16,6 @@ import (
grpc_prometheus "github.com/grpc-ecosystem/go-grpc-prometheus"
"github.com/grpc-ecosystem/grpc-gateway/v2/runtime"
"github.com/urfave/cli/v2"
"google.golang.org/grpc/metadata"

agentv1 "github.com/mjpitz/aetherfs/api/aetherfs/agent/v1"
blockv1 "github.com/mjpitz/aetherfs/api/aetherfs/block/v1"
Expand All @@ -40,7 +39,7 @@ func Agent() *cli.Command {
Port: 8080,
},
ServerClientConfig: components.GRPCClientConfig{
Target: "aetherfs-server:8080",
Target: "aetherfs-hub:8080",
},
}

Expand Down Expand Up @@ -88,38 +87,33 @@ func Agent() *cli.Command {

// use gin for all other routes (easier to reason about)
ginServer := components.GinServer(ctx.Context)
ginServer.Use(func(ginctx *gin.Context) {
// preprocess headers into grpc metadata
md := metadata.New(nil)
for k, vv := range ginctx.Request.Header {
md.Set(k, vv...)
}

ctx := metadata.NewIncomingContext(ginctx.Request.Context(), md)
ginctx.Request = ginctx.Request.WithContext(ctx)

writer := ginctx.Writer
request := ginctx.Request

switch {
case strings.HasPrefix(request.URL.Path, "/v1/fs/"):
// handle FileServer requests (need to trim prefix)
request.URL.Path = strings.TrimPrefix(request.URL.Path, "/v1/fs/")

fileSystem := &fs.FileSystem{
Context: ctx,
BlockAPI: blockAPI,
DatasetAPI: datasetAPI,
}

http.FileServer(fileSystem).ServeHTTP(writer, request)

case strings.HasPrefix(request.URL.Path, "/v1/"):
// handle grpc-gateway requests
apiServer.ServeHTTP(writer, request)
ginServer.Use(
components.TranslateHeadersToMetadata(),
func(ginctx *gin.Context) {
writer := ginctx.Writer
request := ginctx.Request

switch {
case strings.HasPrefix(request.URL.Path, "/v1/fs/"):
// handle FileServer requests (need to trim prefix)
fileSystem := &fs.FileSystem{
Context: ginctx.Request.Context(),
BlockAPI: blockAPI,
DatasetAPI: datasetAPI,
}

handler := http.FileServer(fileSystem)
handler = http.StripPrefix("/v1/fs/", handler)

handler.ServeHTTP(writer, request)

case strings.HasPrefix(request.URL.Path, "/v1/"):
// handle grpc-gateway requests
apiServer.ServeHTTP(writer, request)

}
})
}
},
)

err = components.ListenAndServeHTTP(
ctx.Context,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
// Unauthorized copying of this file, via any medium is strictly prohibited.
// Written by Mya Pitzeruse, September 2021

package daemons
package run

import (
"context"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
// Unauthorized copying of this file, via any medium is strictly prohibited.
// Written by Mya Pitzeruse, September 2021

package daemons
package run

import (
"fmt"
Expand All @@ -16,7 +16,6 @@ import (
grpc_prometheus "github.com/grpc-ecosystem/go-grpc-prometheus"
"github.com/grpc-ecosystem/grpc-gateway/v2/runtime"
"github.com/urfave/cli/v2"
"google.golang.org/grpc/metadata"

blockv1 "github.com/mjpitz/aetherfs/api/aetherfs/block/v1"
datasetv1 "github.com/mjpitz/aetherfs/api/aetherfs/dataset/v1"
Expand All @@ -27,16 +26,16 @@ import (
"github.com/mjpitz/aetherfs/internal/storage/s3"
)

// ServerConfig encapsulates the requirements for configuring and starting up the Server process.
type ServerConfig struct {
// HubConfig encapsulates the requirements for configuring and starting up the Hub process.
type HubConfig struct {
HTTPServerConfig components.HTTPServerConfig `json:""`
GRPCServerConfig components.GRPCServerConfig `json:""`
StorageConfig storage.Config `json:"storage"`
}

// Server returns a command that will run the server process.
func Server() *cli.Command {
cfg := &ServerConfig{
// Hub returns a command that will run the server process.
func Hub() *cli.Command {
cfg := &HubConfig{
HTTPServerConfig: components.HTTPServerConfig{
Port: 8080,
},
Expand All @@ -52,10 +51,10 @@ func Server() *cli.Command {
}

return &cli.Command{
Name: "server",
Usage: "Runs the AetherFS Server process",
UsageText: "aetherfs run server [options]",
Description: "The aetherfs-server process is responsible for the datasets in our small blob store.",
Name: "hub",
Usage: "Runs the AetherFS Hub process",
UsageText: "aetherfs run hub [options]",
Description: "The aetherfs-hub process is responsible for collecting and hosting datasets.",
Flags: flagset.Extract(cfg),
Action: func(ctx *cli.Context) error {
serverConn, err := components.GRPCClient(ctx.Context, components.GRPCClientConfig{
Expand Down Expand Up @@ -89,38 +88,33 @@ func Server() *cli.Command {

// use gin for all other routes (easier to reason about)
ginServer := components.GinServer(ctx.Context)
ginServer.Use(func(ginctx *gin.Context) {
// preprocess headers into grpc metadata
md := metadata.New(nil)
for k, vv := range ginctx.Request.Header {
md.Set(k, vv...)
}

ctx := metadata.NewIncomingContext(ginctx.Request.Context(), md)
ginctx.Request = ginctx.Request.WithContext(ctx)

writer := ginctx.Writer
request := ginctx.Request

switch {
case strings.HasPrefix(request.URL.Path, "/v1/fs/"):
// handle FileServer requests (need to trim prefix)
request.URL.Path = strings.TrimPrefix(request.URL.Path, "/v1/fs/")

fileSystem := &fs.FileSystem{
Context: ctx,
BlockAPI: blockAPI,
DatasetAPI: datasetAPI,
}

http.FileServer(fileSystem).ServeHTTP(writer, request)
ginServer.Use(
components.TranslateHeadersToMetadata(),
func(ginctx *gin.Context) {
writer := ginctx.Writer
request := ginctx.Request

switch {
case strings.HasPrefix(request.URL.Path, "/v1/fs/"):
// handle FileServer requests (need to trim prefix)
fileSystem := &fs.FileSystem{
Context: ginctx.Request.Context(),
BlockAPI: blockAPI,
DatasetAPI: datasetAPI,
}

handler := http.FileServer(fileSystem)
handler = http.StripPrefix("/v1/fs/", handler)

handler.ServeHTTP(writer, request)

case strings.HasPrefix(request.URL.Path, "/v1/"):
// handle grpc-gateway requests
apiServer.ServeHTTP(writer, request)

case strings.HasPrefix(request.URL.Path, "/v1/"):
// handle grpc-gateway requests
apiServer.ServeHTTP(writer, request)

}
})
}
},
)

err = components.ListenAndServeHTTP(
ctx.Context,
Expand All @@ -140,7 +134,7 @@ func Server() *cli.Command {
return err
}

ctxzap.Extract(ctx.Context).Info("running server")
ctxzap.Extract(ctx.Context).Info("running hub")
<-ctx.Done()
return nil
},
Expand Down
20 changes: 20 additions & 0 deletions internal/components/translate_metadata.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
package components

import (
"github.com/gin-gonic/gin"
"google.golang.org/grpc/metadata"
)

// TranslateHeadersToMetadata provides a gin handler that will copy Headers from an HTTP request to gRPC headers.
func TranslateHeadersToMetadata() gin.HandlerFunc {
return func(ginctx *gin.Context) {
// preprocess headers into grpc metadata
md := metadata.New(nil)
for k, vv := range ginctx.Request.Header {
md.Set(k, vv...)
}

ctx := metadata.NewIncomingContext(ginctx.Request.Context(), md)
ginctx.Request = ginctx.Request.WithContext(ctx)
}
}
4 changes: 2 additions & 2 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,11 @@ func main() {

app := &cli.App{
Name: "aetherfs",
Usage: "A publish once, consume many file system for small to medium datasets",
Usage: "A virtual file system for small to medium sized datasets (MB or GB, not TB or PB).",
UsageText: "aetherfs [options] <command>",
Version: fmt.Sprintf("%s (%s)", version, commit),
Commands: []*cli.Command{
commands.Auth(),
//commands.Auth(),
commands.Pull(),
commands.Push(),
commands.Run(),
Expand Down

0 comments on commit b39efae

Please sign in to comment.