Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LEP: Object Support for Longhorn #5832

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
300 changes: 300 additions & 0 deletions enhancements/20230430-object-endpoint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,300 @@
# Object Support for Longhorn

## Summary

By integrating s3gw with Longhorn, we are able to provide an S3-compatible,
Object API to clients consuming Longhorn volumes. This is achieved by creating
an Object endpoint (using s3gw) for a Longhorn volume.

### Related Issues

https://github.com/longhorn/longhorn/issues/4154


## Motivation

### Goals

* Provide an Object endpoint associated with a Longhorn volume, providing an
S3-compatible API.
* Multiple Object endpoints should be supported, with each endpoint being backed
by one single Longhorn volume.


### Non-goals

* Integration of s3gw UI for administration and management purposes. Such an
Enhancement Proposal should be a standalone LEP by its own right.
* Providing Object endpoints for multiple volumes. In this proposal we limit one
object endpoint per Longhorn volume (see longhorn/longhorn#5444).
* Multiple Object endpoints for a single volume, either in Read-Write Many, or
as active/passive for HA failover. While we believe it to be of the utmost
importance, we are considering it future work that should be addressed in its
own LEP.
* Specify how COSI can be implemented for Longhorn and s3gw. This too should be
addressed in a specific LEP.
Comment on lines +26 to +35
Copy link
Contributor

@PhanLe1010 PhanLe1010 Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While these are non-goals for this LEP. I feel that we should keep them in mind while designing this PR to make it easier for future work



## Proposal

### User Stories

#### Story

Currently, Longhorn does not support object storage. Should the user want to use
their Longhorn cluster for object storage, they have to rely on third-party
applications.

Instead, we propose to enhance the user experience by allowing a Longhorn volume
to be presented to the user as an object storage endpoint, without having the
user to install additional dependencies or manage different applications.

### User Experience In Detail

* A new page view exists specifically for *Object Storage* with the following specifications that match the existing UI behaviour:
* There is a *Create Object Store* to create new *Object Store*
* A table that lists the existing Object Endpoints. This table contains a "Delete" action button and a search field on the right side in the table header section.

#### Creating a new *Object Store*

* The user clicks on *Create*
* A modal dialog is shown, with the various Object Endpoint related fields:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields are taken from https://github.com/longhorn/longhorn-manager/pull/2136/files#diff-c75a796f2c69c35ee5ab6293f23db5c8d5af9a241f4f2345e96b1fc41d72ddceR55

Is it necessary to request all that fields from the user or can we hardcode some of them?

* An endpoint name.
* The size of the volume to be created. Defaults to `1 GiB`.
* The access and secret key for the administrator user. This can, potentially, be randomly generated as well.
* The number of replicas. Defaults to `3`.
* Replica Soft Anti Affinity. Defaults to global settings.
* Replica Zone Soft Anti Affinity. Defaults to global settings.
* Replica Disk Soft Anti Affinity. Defaults to global settings.
* Disk Selector
* Node Selector
* Data Locality. Defaults to `Disabled`.
* From Backup
* Stale Replica Timeout
* Recurring Job Selector
* Replica Auto Balance. Defaults to global settings.
* Revision Counter Disabled. Defaults to `False`.
* Unmap Mark Snap Chain Removed
* Backend Store Driver
* Target State
* Name of the s3gw container image.
* Name of the s3gw-ui container image.
* Finally, the user clicks *Ok* to deploy the *Object Store*.

#### Deleting an *Object Store*

The *Object Stores* to be deleted, one or several, can be checked in the table of existing *Object Stores*.
This is only possible for *Object Stores* whose status is not `Terminating`. After pressing the
*Delete* button a confirmation dialog is displayed to the user before the deletion is executed.

#### Administration of buckets and objects

The administration of the buckets and objects is done via the standalone *s3gw*
UI. To access that UI easily there will be a menu entry `Administration` in the
action dropdown of each *Object Store* table row.

#### Theming of standalone *s3gw* UI

The standalone *s3gw* UI will use a Longhorn theme to match the look of the
Longhorn UI.

```bash
npm run build:prod:longhorn
```

### API changes

The API will need a new endpoint to create an object endpoint, as well as
listing, updating, and deleting them. We believe it's not reasonable to reuse
the existing `/v1/volumes` API endpoints, given they are semantically distinct
from what we are trying to achieve.

We thus propose the creation of a `/v1/endpoint/object` API endpoint. This route
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also be `/v1/object-endpoint`, but we believe that by having a
`/v1/endpoint/...` route we can potentially future proof the API in case other
endpoint types (not just object) are eventually added.


## Design

### Implementation Overview

Integrating Longhorn with s3gw will require the creation of mechanisms that
allow us to 1) describe an object endpoint; 2) deploy an `s3gw` pod consuming
a volume; and 3) deploy an `s3gw-ui` pod for management and administration. For
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we support deploying S3GW ui here given the fact that we don't help user to deploy ingress. tls cert by default?

If we can deploy s3gw-ui without having deploying ingress, middleware for CORS, and tls cert, that would be great. But are we able to do this currently in s3gw?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this has been discussed previously. There are on-going efforts to make the s3gw-ui not depend on a specific ingress to handle CORS. This LEP accounts for those efforts.

the purposes of this LEP, there will always be one single `s3gw-ui` pod
associated with an `s3gw` service, hence one `s3gw-ui` pod per Object Endpoint.
We do not exclude eventually allowing one single `s3gw-ui` pod being associated
with multiple Object Endpoints, but that should be considered as future work.
Comment on lines +122 to +128

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LEP now includes the UI integration as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration, no. That presumes the administration and management UI becomes part of the Longhorn UI. That has been stated from the beginning that is not part of this LEP.

s3gw-ui support on the other hand has always been part of this LEP.


We believe we will need a new Custom Resource Definition, `ObjectEndpoint`,
representing an Object Endpoint deployment, containing information that will be
required to deploy an `s3gw` endpoint, as well as the `s3gw-ui` administration
interface.
jecluis marked this conversation as resolved.
Show resolved Hide resolved

Given we need backing storage for each `s3gw` instance, we will rely on
Persistent Volume Claims to request storage from the Longhorn cluster. This
allows us to abstract ourselves from volume creation, and rely on existing
Kubernetes infrastructure to provide us with the needed resources.

An `ObjectEndpointController` will also be necessary, responsible for creating
and managing `s3gw` pods, both the `s3gw` endpoint and the `s3gw-ui`. This
controller will be listening for new resources of type `objectendpoint`, and
will create the necessary bits for proper deployment, including services and
pods.

#### Custom Resource Definition

We define a new Object Endpoint Custom Resource Definition, as follows:

```golang
type ObjectEndpoint struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I did some demos about this s3 feature to other team members. There are comments about the name of the CRD. ObjectEndpoint sounds like an URL or link. How about the name ObjectStore which indicate a storage system for s3 objects? Looking at rook, looks like they are also using ObjectStore name https://rook.io/docs/rook/v1.9/CRDs/Object-Storage/ceph-object-store-crd/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazon calls theirs "S3 endpoints". And, well, they are the API endpoints associated with the S3 API too, and we can have multiple ones of them. I think either phrasing works and we shouldn't get too invested in this argument :-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Phan here. Using ObjectStore would leave endpoint free to describe TLS and ingress/domain configuration, allowing perhaps the use of multiple endpoints for one store, e.g. like so:

kind: ObjectStore
apiVersion: longhorn.io/v1beta2
metadata:
[...]
spec:
  credentials:
    accessKey: ...
    secretKey: ...
  storage:
    size: 100Gi
    numberOfReplicas: 3
    diskSelector:
      - ssd
  endpoints:
    - name: public
      domainName: foobar.example.com
      tls: ...
    - name: private
      domainName: foobar.longhorn-system.svc.cluster.local

metav1.TypeMeta
metav1.ObjectMeta

Spec ObjectEndpointSpec
Status ObjectEndpointStatus
}
```

The endpoint `Spec`, as follows, contains the information that we consider most
relevant to the endpoint at this point. More information may be added should the
need arise.

```golang
type ObjectEndpointSpec struct {
Credentials ObjectEndpointCredentials
StorageClass string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StorageClass property is ignored[1], since we manage the volume explicitly ourselves in the controller. Perhaps it should be removed or replaced by other properties that are more meaningful in this context.

[1]:
When explicitly creating a PV and PVC together, the storage class property must match or be an empty string, but all other aspects of the storage class are ignored.
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#reserving-a-persistentvolume

Copy link
Contributor

@PhanLe1010 PhanLe1010 Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to remove the StorageClass property. We still need to hardcode a fixed storageclass like longhorn-objectstore (without any parameter) so that features like PVC expansion can work. This is important because user will need to be able to resize the PVC. Ref the POC https://github.com/longhorn/longhorn-manager/blob/c30fb8b1250ee6d2cecd3e818d3ce7b60719fa23/controller/object_endpoint_controller.go#L433-L434

Size resource.Quantity
}
```

The `Credentials` field, as defined below, contains the access and secret keys
that are to be used as seed credentials for the object endpoint. We need these
to set the credentials for the default administrator user.

The `StorageClass` field contains the name of the Storage Class to be used when
obtaining a new volume to back the `s3gw` service. This needs to be specified by
the user with an existing Storage Class; alternatively the default Storage Class
will be used.

The `Size` field represents the initial size to provision for the new volume.

```golang
type ObjectEndpointCredentials struct {
AccessKey string
SecretKey string
}
```

The Object Endpoint CRD also contains a `Status` field, represented below. This
tracks the state of the object endpoint as observed by the object endpoint controller.

```golang
type ObjectEndpointStatus struct {
State ObjectEndpointState
Endpoint string
}

type ObjectEndpointState string
```

The `State` field can have one of the following values: `unknown`, `starting`,
`running`, `stopping`, or `error`. The state machine begins at `unknown`, and
moves to `starting` once the new object endpoint is detected and resource
creation begins. Once all resources have been created, the controller then moves
the state to `running`, and remains there until the object endpoint is deleted,
at which point the controller moves the state to `stopping` while waiting for
the associated resources to be cleaned up. The state `error` means something
went wrong.


#### The Object Endpoint Controller

The Object Endpoint Controller will be responsible for lifecycle management of
Object Endpoints, from creation to their deletion.

To a large extent, the vast majority of the logic of the controller will be in
creating resources, or handling the `starting` and `stopping` states discussed
before.

When the Object Endpoint is created, the required resources will need to be
created. These include,

* `Service` associated with the pods being deployed
* Various `Secret` and `ConfigMap` needed by `s3gw` and `s3gw-ui`
* The associated `Deployment`
* A `Persistent Volume Claim` required for `s3gw` storage.

During resource creation we will also need to explicitly create a
`Persitent Volume` that will be bound to the `Persistent Volume Claim` mentioned

Check failure on line 230 in enhancements/20230430-object-endpoint.md

View workflow job for this annotation

GitHub Actions / codespell

Persitent ==> Persistent
Comment on lines +229 to +230

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will also need to create the Longhorn volume explicitly.

above. We need to do this so we can opinionate on the file system being used; in
this case, XFS, which `s3gw` requires for performance reasons.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As a note, we need the reflink support so we can effectively merge multipart uploads etc, which ext4 can't do.)


While handling `starting`, the resources required to have already been created
through the Kubernetes API, but the controller still has to wait for them to be
ready and healthy before the transition to `running` can be performed.
Coincidentally, this is also the behavior expected when the endpoint is in state
`error`, given the controller will need to wait for resources to be healthy
before being able to move the endpoint to state `running`.
Comment on lines +234 to +239

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a requirement or is this documenting what the current iteration of the controller is doing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is incorporating what the current implementation is doing.


In turn, handling `stopping` means waiting for those same resources to be
removed.


#### Required changes

Aside from what has been discussed previously, we believe we need to add two new
options as arguments to `longhorn-manager`: `--object-endpoint-image`, and
`--object-endpoint-ui-image`, both expecting their corresponding image names.
These will be essential for us to be able to spin up the pods for the object
endpoints being deployed.

Additionally, we will require to add the new `ObjectEndpointController` to
`StartControllers()`, in `controller/controller_manager.go`.

A new informer will need to be created as well, `ObjectEndpointInformer`, adding
it to the `DataStore`, so we can listen for `ObjectEndpoint` resources, which we
will critically need to in the `ObjectEndpointController`.

Finally, we expect to add `s3gw` images as dependencies to be downloaded by
the Longhorn chart.

Further changes may be needed as development evolves.

### Backup and Restore

Given the data being kept by an Object Endpoint is stored in a Longhorn volume,
we rely on Longhorn's backup and restore capabilities.

However, an Object Endpoint in this context is more than just the data held by a
given volume: there's meta state that needs to be backed up and restored, in the
form of Secrets, endpoint names and their associated volume, etc.

At this point in time we don't yet have a solution for this, but we believe this
should rely on whatever mechanisms Longhorn has to backup and restore its own
state. Insights are greatly appreciated.
Comment on lines +265 to +276
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on designing this part

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be very interesting if we could restore an endpoint to a different/new one - so someone can pull up the "old" version while the new one remains around.


### Test plan

It is not clear at this moment how this can be tested, much due to lack of
knowledge on how Longhorn testing works. Help on this topic would be much
appreciated.
Comment on lines +278 to +282

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an integration test suite in longhorn/longhorn-tests which we can utilize to test the controller's behavior from a K8s client point of view.
There are also unit tests in the controller it self, which can be used to test individual functions against a mocked/faked K8s API.


### Upgrade strategy

Upgrading to this enhancement should be painless. Once this feature is available
in Longhorn, the user should be able to create new object endpoints without much
else to do.

At this stage it is not clear how upgrades between Longhorn versions will
happen. We expect to be able to simply restart existing pods using new images.
jecluis marked this conversation as resolved.
Show resolved Hide resolved

### Versioning

Including the `s3gw` containers in the Longhorn chart means that, for a specific
Longhorn version, only a specific `s3gw` version is expected to have been tested
and be in working condition. We don't make assumptions as to whether other
`s3gw` versions would correctly function.

An upgrade to `s3gw` will require an upgrade to Longhorn.