Skip to content

Commit

Permalink
EdgeFS documentation needs more love
Browse files Browse the repository at this point in the history
Fixes #2351

Signed-off-by: Dmitry Yusupov <dmitry@nexenta.com>
  • Loading branch information
Dmitry Yusupov committed Dec 10, 2018
1 parent 54caa20 commit 66bbe71
Show file tree
Hide file tree
Showing 6 changed files with 36 additions and 19 deletions.
2 changes: 1 addition & 1 deletion Documentation/README.md
Expand Up @@ -20,6 +20,6 @@ High-level Storage Provider design documents:
| Storage Provider | Status | Description |
|---|---|---|
| [Ceph](ceph-storage.md) | Beta | Ceph is a highly scalable distributed storage solution for block storage, object storage, and shared file systems with years of production deployments. |
| [EdgeFS](edgefs-storage.md) | Alpha | EdgeFS is high-performance and low-latency object storage system with Geo-Transparent data access via standard protocols (S3, NFS, iSCSI) from on-prem, private/public clouds or small footprint edge (IoT) devices. |
| [EdgeFS](edgefs-storage.md) | Alpha | EdgeFS is high-performance and fault-tolerant object storage system with Geo-Transparent data access to file, block or object. |

Low level design documentation for supported list of storage systems collected at [design docs](https://github.com/rook/rook/tree/master/design) section.
4 changes: 2 additions & 2 deletions Documentation/edgefs-cluster-crd.md
Expand Up @@ -47,7 +47,7 @@ Settings can be specified at the global level to apply to the cluster as a whole
### Cluster metadata
- `name`: The name that will be used internally for the EdgeFS cluster. Most commonly the name is the same as the namespace since multiple clusters are not supported in the same namespace.
- `namespace`: The Kubernetes namespace that will be created for the Rook cluster. The services, pods, and other resources created by the operator will be added to this namespace. The common scenario is to create a single Rook cluster. If multiple clusters are created, they must not have conflicting devices or host paths.
- `edgefsImageName`: EdgeFS image to use. If not specified then edgefs/edgefs:latest is used. We recommend to specify particular image version for production use, i.e. edgefs/edgefs:1.0.0.
- `edgefsImageName`: EdgeFS image to use. If not specified then edgefs/edgefs:latest is used. We recommend to specify particular image version for production use, for example edgefs/edgefs:1.0.0.

### Cluster Settings
- `dataDirHostPath`: The path on the host ([hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath)) where config and data should be stored for each of the services. If the directory does not exist, it will be created. Because this directory persists on the host, it will remain after pods are deleted. If `storage` settings not provided then provisioned hostPath will also be used as a storage device for Target pods (automatic provisioning via `rtlfs`).
Expand All @@ -61,7 +61,7 @@ If this value is empty, each pod will get an ephemeral directory to store their
- `devicesResurrectMode`: When enabled, this mode attempts to recreate cluster based on previous CRD definition. If this flag set to one of the parameters, then operator will only adjust networking. Often used when clean up of old devices is needed. Only applicable when used with `dataDirHostPath`.
- `restore`: Attempt to restart and restore previously enabled cluster CRD.
- `restoreZap`: Attempt to re-initialize previously selected `devices` prior to restore. By default cluster assumes that selected devices have no logical partitions and considered empty.
- `restoreZapWait`: Attempt to cleanup preveiously selected `devices` and wait for cluster delete. This is useful when clean up of old devices is needed.
- `restoreZapWait`: Attempt to cleanup previously selected `devices` and wait for cluster delete. This is useful when clean up of old devices is needed.
- `serviceAccount`: The service account under which the EdgeFS pods will run that will give access to ConfigMaps in the cluster's namespace. If not set, the default of `rook-edgefs-cluster` will be used.
- `placement`: [placement configuration settings](#placement-configuration-settings)
- `resources`: [resources configuration settings](#cluster-wide-resources-configuration-settings)
Expand Down
4 changes: 2 additions & 2 deletions Documentation/edgefs-csi.md
Expand Up @@ -10,7 +10,7 @@ indent: true

## Overview

EdgeFS CSI plugins implement an interface between CSI enabled Container Orchestrator (CO) and EdgeFS local cluster site. It allows dynamic and static provisioning of EdgeFS NFS exports, and attaching them to application workloads. With EdgeFS NFS implementation, I/O load can be spread-out across multiple PODs, thus eliminating I/O bottlenecks of classing single-node NFS. Current implementation of EdgeFS CSI plugins was tested in Kubernetes environment (requires Kubernetes 1.11+), but the code is Kubernetes version agnostic and should be able to run with any CSI enabled CO.
EdgeFS CSI plugins implements interface between CSI enabled Container Orchestrator (CO) and EdgeFS local cluster site. It allows dynamic and static provisioning of EdgeFS NFS exports or iSCSI LUNs, and attaching them to stateful application workloads. With EdgeFS NFS implementation, I/O load can be spread-out across multiple PODs, thus eliminating networking I/O bottlenecks of classic single-node NFS. Current implementation of EdgeFS CSI plugin was tested in Kubernetes environment (requires Kubernetes 1.11+), however the code is Kubernetes version agnostic and should be able to run with any CSI enabled CO.

## Deployment

Expand Down Expand Up @@ -133,7 +133,7 @@ kubectl apply -f ./dynamic-nginx.yaml

## Troubleshooting and log collection

For details about other configuration and deployment of NFS and EdgeFS CSI plugin, see Wiki pages:
For details about other configuration and deployment of EdgeFS CSI plugin, see Wiki pages:

* [Quick Start Guide](https://github.com/Nexenta/edgefs-csi/wiki/EdgeFS-CSI-Quick-Start-Guide)

Expand Down
24 changes: 20 additions & 4 deletions Documentation/edgefs-quickstart.md
Expand Up @@ -17,8 +17,24 @@ EdgeFS operator, CSI plugin and CRDs were tested with Kubernetes **v1.11** or hi

To make sure you have a Kubernetes cluster that is ready for `Rook`, you can [follow these instructions](k8s-pre-reqs.md).

To operate efficiently EdgeFS requires 1 CPU core and 1GB of memory per storage device. Minimal memory requirement for EdgeFS target pod is 4GB. To get maximum out of SSD/NVMe device we recommend to double requirements to 2 CPU and 2GB per device.

If you are using `dataDirHostPath` to persist rook data on kubernetes hosts, make sure your host has at least 5GB of space available on the specified path.

We recommend you to configure EdgeFS to use of raw devices and equal distribution of available storage capacity.

**IMPORTANT** If you planning to use larger then 128KB data chunk sizes, make sure to adjust selected nodes host configuration with the following addition to /etc/sysctl.conf:

```
net.core.rmem_default = 80331648
net.core.rmem_max = 80331648
net.core.wmem_default = 33554432
net.core.wmem_max = 50331648
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
vm.swappiness = 15
```

## TL;DR

If you're feeling lucky, a simple EdgeFS Rook cluster can be created with the following kubectl commands. For the more detailed install, skip to the next section to [deploy the Rook operator](#deploy-the-rook-operator).
Expand Down Expand Up @@ -67,7 +83,7 @@ kubectl create -f cluster.yaml
```

Use `kubectl` to list pods in the `rook` namespace. You should be able to see the following pods once they are all running.
The number of osd pods will depend on the number of nodes in the cluster and the number of devices and directories configured.
The number of target pods will depend on the number of nodes in the cluster and the number of devices and directories configured.

```bash
$ kubectl -n rook-edgefs get pod
Expand All @@ -82,10 +98,10 @@ Notice that EdgeFS Targets are running as StatefulSet.
# Storage

For a walkthrough of the types of Storage CRDs exposed by EdgeFS Rook, see the guides for:
- **[NFS Server](edgefs-nfs-crd.md)**: Create Scale-Out NFS storage to be consumed by multiple pods
- **[NFS Server](edgefs-nfs-crd.md)**: Create Scale-Out NFS storage to be consumed by multiple pods, simultaneously
- **[S3X](edgefs-s3x-crd.md)**: Create an Extended S3 HTTP/2 compatible object and key-value store that is accessible inside or outside the Kubernetes cluster
- **[AWS S3](edgefs-s3-crd.md)**: Create an AWS S3 compatible object store that is accessible inside or outside the Kubernetes cluster
- **[iSCSI Target](edgefs-iscsi-crd.md)**: Create low-latency and high-performance iSCSI block to be consumed by a pod
- **[iSCSI Target](edgefs-iscsi-crd.md)**: Create low-latency and high-throughput iSCSI block to be consumed by a pod

# CSI Integration

Expand All @@ -99,4 +115,4 @@ To learn how to set up monitoring for your Rook cluster, you can follow the step

# Teardown

When you are done with the test cluster, see [these instructions](edgefs-teardown.md) to clean up the cluster.
When you are done with the cluster, simply delete CRDs in reverse order. You may want to re-format your raw disks with `wipefs -a` command. Or if you using raw devices and want to keep same storage configuration but change some resource or networking parameters, consider to use `devicesResurrectMode`.
19 changes: 10 additions & 9 deletions Documentation/edgefs-storage.md
Expand Up @@ -5,32 +5,33 @@ weight: 40

# EdgeFS Storage

EdgeFS is high-performance and low-latency object storage system with Geo-Transparent data access via standard protocols (S3, NFS, iSCSI) from on-prem, private/public clouds or small footprint edge (IoT) devices.
EdgeFS is high-performance and fault-tolerant object storage system with Geo-Transparent data access to file, block or object.

EdgeFS spans unlimited number of Geo-sites, connected with each other as one global name space data fabric running on top of Kubernetes platform, providing persistent, fault-talerant and high-performant volumes for Kubernetes Applications.
EdgeFS is capable of spanning unlimited number of geographically distributed sites (Geo-site), connected with each other as one global name space data fabric running on top of Kubernetes platform, providing persistent, fault-tolerant and high-performance volumes for stateful Kubernetes Applications.

At each Geo-site, EdgeFS nodes deployed as containers (StatefulSet) on physical or virtual Kubernetes nodes, pooling all their storage capacity and presenting it as fully compatible S3/NFS/iSCSI object access for cloud-native applications running on the same or dedicated servers.
At each Geo-site, EdgeFS nodes deployed as containers (StatefulSet) on physical or virtual Kubernetes nodes, pooling available storage capacity and presenting it via compatible S3/NFS/iSCSI/etc storage emulated protocols for cloud-native applications running on the same or dedicated servers.

### How it works, in a Nutshell?

If you familiar with git, where all modifications are fully versioned and globally immutable, it is highly likely you already know how it works. Think of it as a trully world-scale copy-on-write technique people use every day. Now, if we can make a parallel for you to understand it better - what EdgeFS does, it expands this paradigm to object storage and making Kubernetes Persistent Volumes accessible via standard protocols e.g. S3, NFS and even block devices such as iSCSI, in a high-performance and low-latency ways. Now, with fully versioned modifications, fully immutable metadata and data, things can be transparently replicated, distributed and dynamically pre-fetched across many Geo-sites.
If you familiar with "git", where all modifications are fully versioned and globally immutable, it is highly likely you already know how it works at its core. Think of it as a world-scale copy-on-write technique. Now, if we can make a parallel for you to understand it better - what EdgeFS does, it expands "git" paradigm to object storage and making Kubernetes Persistent Volumes accessible via emulated storage standard protocols e.g. S3, NFS and even block devices such as iSCSI, in a high-performance and low-latency ways. With fully versioned modifications, fully immutable metadata and data, users data can be transparently replicated, distributed and dynamically pre-fetched across many Geo-sites.

## Design

Rook enables EdgeFS storage systems to run on Kubernetes using Kubernetes primitives.
Rook enables easy deployment of EdgeFS Geo-sites on Kubernetes using Kubernetes primitives.

![EdgeFS Rook Architecture on Kubernetes](media/edgefs-rook.png)
With Rook running in the Kubernetes cluster, Kubernetes PODs or External applications can
mount block devices and filesystems managed by Rook, or can use the S3/Swift API for object storage. The Rook operator
mount block devices and filesystems managed by Rook, or can use the S3/S3X API for object storage. The Rook operator
automates configuration of storage components and monitors the cluster to ensure the storage remains available
and healthy.

The Rook operator is a simple container that has all that is needed to bootstrap and monitor the storage cluster. The operator will start and monitor StatefulSet storage Targets, gRPC manager and Prometheus Multi-Tenant Dashboard. All the attached devices (or directores) will provide pooled storage site. Storage sites then can be easily connected with each other as one global name space data fabric. The operator manages CRDs for Targets, Scale-out NFS, Object stores (S3/Swift), and iSCSI volumes by initializing the pods and other artifacts necessary to
The Rook operator is a simple container that has all that is needed to bootstrap and monitor the storage cluster. The operator will start and monitor StatefulSet storage Targets, gRPC manager and Prometheus Multi-Tenant Dashboard. All the attached devices (or directories) will provide pooled storage site. Storage sites then can be easily connected with each other as one global name space data fabric. The operator manages CRDs for Targets, Scale-out NFS, Object stores (S3/S3X), and iSCSI volumes by initializing the pods and other artifacts necessary to
run the services.

The operator will monitor the storage Targets to ensure the cluster is healthy. EdgeFS will dynamically handle services failover, and other adjustments that maybe made as the cluster grows or shrinks.

The EdgeFS Rook operator also comes with tighitly integrated CSI plugin. CSI pods deployed on every Kubernetes node. All storage operations required on the node are handled such as attaching network storage devices, mounting NFS exports, and dynamic provisioning.
The EdgeFS Rook operator also comes with tightly integrated CSI plugin. CSI pods deployed on every Kubernetes node. All storage operations required on the node are handled such as attaching network storage devices, mounting NFS exports, and dynamic provisioning.

Rook is implemented in golang. EdgeFS is implemented in Go/C where the data path is highly optimized.
While fully immutable, you will be impressed with additional capabilities EdgeFS provides besides ultra high-performant storage, i.e. built in Data Reduction with Global De-duplication and on the fly compression, at rest encryption and per-Tenant QoS controls.

Learn more at [edgefs.io](http://edgefs.io).
2 changes: 1 addition & 1 deletion README.md
Expand Up @@ -71,7 +71,7 @@ Rook Framework|The framework for common storage specs and logic used to support
Ceph|[Ceph](https://ceph.com/) is a distributed storage system that provides file, block and object storage and is deployed in large scale production clusters.|ceph.rook.io/v1|Stable
CockroachDB|[CockroachDB](https://www.cockroachlabs.com/product/cockroachdb/) is a cloud-native SQL database for building global, scalable cloud services that survive disasters.|cockroachdb.rook.io/v1alpha1|Alpha
Cassandra| [Cassandra](http://cassandra.apache.org/) is a highly available NoSQL database featuring lightning fast performance, tunable consistency and massive scalability. [Scylla](https://www.scylladb.com) is a close-to-the-hardware rewrite of Cassandra in C++, which enables much lower latencies and higher throughput.|cassandra.rook.io/v1alpha1|Alpha
EdgeFS|[EdgeFS](http://edgefs.io) is high-performance and low-latency object storage system with Geo-Transparent data access via standard protocols (S3, NFS, iSCSI) from on-prem, private/public clouds or small footprint edge (IoT) devices.|edgefs.rook.io/v1alpha1|Alpha
EdgeFS|[EdgeFS](http://edgefs.io) is high-performance and fault-tolerant object storage system with Geo-Transparent data access to file, block or object.|edgefs.rook.io/v1alpha1|Alpha
Minio|[Minio](https://www.minio.io/) is a high performance distributed object storage server, designed for large-scale private cloud infrastructure.|minio.rook.io/v1alpha1|Alpha
NFS|[Network File System (NFS)](https://github.com/nfs-ganesha/nfs-ganesha/wiki) allows remote hosts to mount file systems over a network and interact with those file systems as though they are mounted locally.|nfs.rook.io/v1alpha1|Alpha

Expand Down

0 comments on commit 66bbe71

Please sign in to comment.