-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LEP: Object Support for Longhorn #5832
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,300 @@ | ||
# Object Support for Longhorn | ||
|
||
## Summary | ||
|
||
By integrating s3gw with Longhorn, we are able to provide an S3-compatible, | ||
Object API to clients consuming Longhorn volumes. This is achieved by creating | ||
an Object endpoint (using s3gw) for a Longhorn volume. | ||
|
||
### Related Issues | ||
|
||
https://github.com/longhorn/longhorn/issues/4154 | ||
|
||
|
||
## Motivation | ||
|
||
### Goals | ||
|
||
* Provide an Object endpoint associated with a Longhorn volume, providing an | ||
S3-compatible API. | ||
* Multiple Object endpoints should be supported, with each endpoint being backed | ||
by one single Longhorn volume. | ||
|
||
|
||
### Non-goals | ||
|
||
* Integration of s3gw UI for administration and management purposes. Such an | ||
Enhancement Proposal should be a standalone LEP by its own right. | ||
* Providing Object endpoints for multiple volumes. In this proposal we limit one | ||
object endpoint per Longhorn volume (see longhorn/longhorn#5444). | ||
* Multiple Object endpoints for a single volume, either in Read-Write Many, or | ||
as active/passive for HA failover. While we believe it to be of the utmost | ||
importance, we are considering it future work that should be addressed in its | ||
own LEP. | ||
* Specify how COSI can be implemented for Longhorn and s3gw. This too should be | ||
addressed in a specific LEP. | ||
|
||
|
||
## Proposal | ||
|
||
### User Stories | ||
|
||
#### Story | ||
|
||
Currently, Longhorn does not support object storage. Should the user want to use | ||
their Longhorn cluster for object storage, they have to rely on third-party | ||
applications. | ||
|
||
Instead, we propose to enhance the user experience by allowing a Longhorn volume | ||
to be presented to the user as an object storage endpoint, without having the | ||
user to install additional dependencies or manage different applications. | ||
|
||
### User Experience In Detail | ||
|
||
* A new page view exists specifically for *Object Storage* with the following specifications that match the existing UI behaviour: | ||
* There is a *Create Object Store* to create new *Object Store* | ||
* A table that lists the existing Object Endpoints. This table contains a "Delete" action button and a search field on the right side in the table header section. | ||
|
||
#### Creating a new *Object Store* | ||
|
||
* The user clicks on *Create* | ||
* A modal dialog is shown, with the various Object Endpoint related fields: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These fields are taken from https://github.com/longhorn/longhorn-manager/pull/2136/files#diff-c75a796f2c69c35ee5ab6293f23db5c8d5af9a241f4f2345e96b1fc41d72ddceR55 Is it necessary to request all that fields from the user or can we hardcode some of them? |
||
* An endpoint name. | ||
* The size of the volume to be created. Defaults to `1 GiB`. | ||
* The access and secret key for the administrator user. This can, potentially, be randomly generated as well. | ||
* The number of replicas. Defaults to `3`. | ||
* Replica Soft Anti Affinity. Defaults to global settings. | ||
* Replica Zone Soft Anti Affinity. Defaults to global settings. | ||
* Replica Disk Soft Anti Affinity. Defaults to global settings. | ||
* Disk Selector | ||
* Node Selector | ||
* Data Locality. Defaults to `Disabled`. | ||
* From Backup | ||
* Stale Replica Timeout | ||
* Recurring Job Selector | ||
* Replica Auto Balance. Defaults to global settings. | ||
* Revision Counter Disabled. Defaults to `False`. | ||
* Unmap Mark Snap Chain Removed | ||
* Backend Store Driver | ||
* Target State | ||
* Name of the s3gw container image. | ||
* Name of the s3gw-ui container image. | ||
* Finally, the user clicks *Ok* to deploy the *Object Store*. | ||
|
||
#### Deleting an *Object Store* | ||
|
||
The *Object Stores* to be deleted, one or several, can be checked in the table of existing *Object Stores*. | ||
This is only possible for *Object Stores* whose status is not `Terminating`. After pressing the | ||
*Delete* button a confirmation dialog is displayed to the user before the deletion is executed. | ||
|
||
#### Administration of buckets and objects | ||
|
||
The administration of the buckets and objects is done via the standalone *s3gw* | ||
UI. To access that UI easily there will be a menu entry `Administration` in the | ||
action dropdown of each *Object Store* table row. | ||
|
||
#### Theming of standalone *s3gw* UI | ||
|
||
The standalone *s3gw* UI will use a Longhorn theme to match the look of the | ||
Longhorn UI. | ||
|
||
```bash | ||
npm run build:prod:longhorn | ||
``` | ||
|
||
### API changes | ||
|
||
The API will need a new endpoint to create an object endpoint, as well as | ||
listing, updating, and deleting them. We believe it's not reasonable to reuse | ||
the existing `/v1/volumes` API endpoints, given they are semantically distinct | ||
from what we are trying to achieve. | ||
|
||
We thus propose the creation of a `/v1/endpoint/object` API endpoint. This route | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section does not match the reality of the PR https://github.com/longhorn/longhorn-manager/pull/2136/files#diff-cb740c237beef9090d8b3a15ca8b76aac83a3af2d52deadc382b4309916a715c and needs to be adapted. |
||
could also be `/v1/object-endpoint`, but we believe that by having a | ||
`/v1/endpoint/...` route we can potentially future proof the API in case other | ||
endpoint types (not just object) are eventually added. | ||
|
||
|
||
## Design | ||
|
||
### Implementation Overview | ||
|
||
Integrating Longhorn with s3gw will require the creation of mechanisms that | ||
allow us to 1) describe an object endpoint; 2) deploy an `s3gw` pod consuming | ||
a volume; and 3) deploy an `s3gw-ui` pod for management and administration. For | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How can we support deploying S3GW ui here given the fact that we don't help user to deploy ingress. tls cert by default? If we can deploy s3gw-ui without having deploying ingress, middleware for CORS, and tls cert, that would be great. But are we able to do this currently in s3gw? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this has been discussed previously. There are on-going efforts to make the |
||
the purposes of this LEP, there will always be one single `s3gw-ui` pod | ||
associated with an `s3gw` service, hence one `s3gw-ui` pod per Object Endpoint. | ||
We do not exclude eventually allowing one single `s3gw-ui` pod being associated | ||
with multiple Object Endpoints, but that should be considered as future work. | ||
Comment on lines
+122
to
+128
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This LEP now includes the UI integration as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Integration, no. That presumes the administration and management UI becomes part of the Longhorn UI. That has been stated from the beginning that is not part of this LEP.
|
||
|
||
We believe we will need a new Custom Resource Definition, `ObjectEndpoint`, | ||
representing an Object Endpoint deployment, containing information that will be | ||
required to deploy an `s3gw` endpoint, as well as the `s3gw-ui` administration | ||
interface. | ||
jecluis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Given we need backing storage for each `s3gw` instance, we will rely on | ||
Persistent Volume Claims to request storage from the Longhorn cluster. This | ||
allows us to abstract ourselves from volume creation, and rely on existing | ||
Kubernetes infrastructure to provide us with the needed resources. | ||
|
||
An `ObjectEndpointController` will also be necessary, responsible for creating | ||
and managing `s3gw` pods, both the `s3gw` endpoint and the `s3gw-ui`. This | ||
controller will be listening for new resources of type `objectendpoint`, and | ||
will create the necessary bits for proper deployment, including services and | ||
pods. | ||
|
||
#### Custom Resource Definition | ||
|
||
We define a new Object Endpoint Custom Resource Definition, as follows: | ||
|
||
```golang | ||
type ObjectEndpoint struct { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When I did some demos about this s3 feature to other team members. There are comments about the name of the CRD. ObjectEndpoint sounds like an URL or link. How about the name ObjectStore which indicate a storage system for s3 objects? Looking at rook, looks like they are also using ObjectStore name https://rook.io/docs/rook/v1.9/CRDs/Object-Storage/ceph-object-store-crd/ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Amazon calls theirs "S3 endpoints". And, well, they are the API endpoints associated with the S3 API too, and we can have multiple ones of them. I think either phrasing works and we shouldn't get too invested in this argument :-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with Phan here. Using ObjectStore would leave endpoint free to describe TLS and ingress/domain configuration, allowing perhaps the use of multiple endpoints for one store, e.g. like so:
|
||
metav1.TypeMeta | ||
metav1.ObjectMeta | ||
|
||
Spec ObjectEndpointSpec | ||
Status ObjectEndpointStatus | ||
} | ||
``` | ||
|
||
The endpoint `Spec`, as follows, contains the information that we consider most | ||
relevant to the endpoint at this point. More information may be added should the | ||
need arise. | ||
|
||
```golang | ||
type ObjectEndpointSpec struct { | ||
Credentials ObjectEndpointCredentials | ||
StorageClass string | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The [1]: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree to remove the |
||
Size resource.Quantity | ||
} | ||
``` | ||
|
||
The `Credentials` field, as defined below, contains the access and secret keys | ||
that are to be used as seed credentials for the object endpoint. We need these | ||
to set the credentials for the default administrator user. | ||
|
||
The `StorageClass` field contains the name of the Storage Class to be used when | ||
obtaining a new volume to back the `s3gw` service. This needs to be specified by | ||
the user with an existing Storage Class; alternatively the default Storage Class | ||
will be used. | ||
|
||
The `Size` field represents the initial size to provision for the new volume. | ||
|
||
```golang | ||
type ObjectEndpointCredentials struct { | ||
AccessKey string | ||
SecretKey string | ||
} | ||
``` | ||
|
||
The Object Endpoint CRD also contains a `Status` field, represented below. This | ||
tracks the state of the object endpoint as observed by the object endpoint controller. | ||
|
||
```golang | ||
type ObjectEndpointStatus struct { | ||
State ObjectEndpointState | ||
Endpoint string | ||
} | ||
|
||
type ObjectEndpointState string | ||
``` | ||
|
||
The `State` field can have one of the following values: `unknown`, `starting`, | ||
`running`, `stopping`, or `error`. The state machine begins at `unknown`, and | ||
moves to `starting` once the new object endpoint is detected and resource | ||
creation begins. Once all resources have been created, the controller then moves | ||
the state to `running`, and remains there until the object endpoint is deleted, | ||
at which point the controller moves the state to `stopping` while waiting for | ||
the associated resources to be cleaned up. The state `error` means something | ||
went wrong. | ||
|
||
|
||
#### The Object Endpoint Controller | ||
|
||
The Object Endpoint Controller will be responsible for lifecycle management of | ||
Object Endpoints, from creation to their deletion. | ||
|
||
To a large extent, the vast majority of the logic of the controller will be in | ||
creating resources, or handling the `starting` and `stopping` states discussed | ||
before. | ||
|
||
When the Object Endpoint is created, the required resources will need to be | ||
created. These include, | ||
|
||
* `Service` associated with the pods being deployed | ||
* Various `Secret` and `ConfigMap` needed by `s3gw` and `s3gw-ui` | ||
* The associated `Deployment` | ||
* A `Persistent Volume Claim` required for `s3gw` storage. | ||
|
||
During resource creation we will also need to explicitly create a | ||
`Persitent Volume` that will be bound to the `Persistent Volume Claim` mentioned | ||
Comment on lines
+229
to
+230
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We will also need to create the Longhorn volume explicitly. |
||
above. We need to do this so we can opinionate on the file system being used; in | ||
this case, XFS, which `s3gw` requires for performance reasons. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (As a note, we need the reflink support so we can effectively merge multipart uploads etc, which ext4 can't do.) |
||
|
||
While handling `starting`, the resources required to have already been created | ||
through the Kubernetes API, but the controller still has to wait for them to be | ||
ready and healthy before the transition to `running` can be performed. | ||
Coincidentally, this is also the behavior expected when the endpoint is in state | ||
`error`, given the controller will need to wait for resources to be healthy | ||
before being able to move the endpoint to state `running`. | ||
Comment on lines
+234
to
+239
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a requirement or is this documenting what the current iteration of the controller is doing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is incorporating what the current implementation is doing. |
||
|
||
In turn, handling `stopping` means waiting for those same resources to be | ||
removed. | ||
|
||
|
||
#### Required changes | ||
|
||
Aside from what has been discussed previously, we believe we need to add two new | ||
options as arguments to `longhorn-manager`: `--object-endpoint-image`, and | ||
`--object-endpoint-ui-image`, both expecting their corresponding image names. | ||
These will be essential for us to be able to spin up the pods for the object | ||
endpoints being deployed. | ||
|
||
Additionally, we will require to add the new `ObjectEndpointController` to | ||
`StartControllers()`, in `controller/controller_manager.go`. | ||
|
||
A new informer will need to be created as well, `ObjectEndpointInformer`, adding | ||
it to the `DataStore`, so we can listen for `ObjectEndpoint` resources, which we | ||
will critically need to in the `ObjectEndpointController`. | ||
|
||
Finally, we expect to add `s3gw` images as dependencies to be downloaded by | ||
the Longhorn chart. | ||
|
||
Further changes may be needed as development evolves. | ||
|
||
### Backup and Restore | ||
|
||
Given the data being kept by an Object Endpoint is stored in a Longhorn volume, | ||
we rely on Longhorn's backup and restore capabilities. | ||
|
||
However, an Object Endpoint in this context is more than just the data held by a | ||
given volume: there's meta state that needs to be backed up and restored, in the | ||
form of Secrets, endpoint names and their associated volume, etc. | ||
|
||
At this point in time we don't yet have a solution for this, but we believe this | ||
should rely on whatever mechanisms Longhorn has to backup and restore its own | ||
state. Insights are greatly appreciated. | ||
Comment on lines
+265
to
+276
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am working on designing this part There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'd be very interesting if we could restore an endpoint to a different/new one - so someone can pull up the "old" version while the new one remains around. |
||
|
||
### Test plan | ||
|
||
It is not clear at this moment how this can be tested, much due to lack of | ||
knowledge on how Longhorn testing works. Help on this topic would be much | ||
appreciated. | ||
Comment on lines
+278
to
+282
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is an integration test suite in longhorn/longhorn-tests which we can utilize to test the controller's behavior from a K8s client point of view. |
||
|
||
### Upgrade strategy | ||
|
||
Upgrading to this enhancement should be painless. Once this feature is available | ||
in Longhorn, the user should be able to create new object endpoints without much | ||
else to do. | ||
|
||
At this stage it is not clear how upgrades between Longhorn versions will | ||
happen. We expect to be able to simply restart existing pods using new images. | ||
jecluis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Versioning | ||
|
||
Including the `s3gw` containers in the Longhorn chart means that, for a specific | ||
Longhorn version, only a specific `s3gw` version is expected to have been tested | ||
and be in working condition. We don't make assumptions as to whether other | ||
`s3gw` versions would correctly function. | ||
|
||
An upgrade to `s3gw` will require an upgrade to Longhorn. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While these are non-goals for this LEP. I feel that we should keep them in mind while designing this PR to make it easier for future work