Skip to content

Commit

Permalink
Refine Alameda workflow and Rook integration
Browse files Browse the repository at this point in the history
Refine Alameda workflow and Rook integration

Signed-off-by: Matt Wu <mamafun@gmail.com>
  • Loading branch information
matt committed Oct 19, 2018
1 parent 8683a59 commit e89a21d
Show file tree
Hide file tree
Showing 2 changed files with 96 additions and 29 deletions.
Binary file modified design/Alameda/Alameda_work_with_Rook.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
125 changes: 96 additions & 29 deletions design/Alameda/design.md
@@ -1,6 +1,6 @@
## What is Alameda

Alameda is an intelligent resource orchestrator for Kubernetes, providing the features of autonomous balancing, scaling, and scheduling by using machine learning. Alameda learns the continuing changes of computing resources from K8S clusters, predicts the future computing resoruce demands for pods and nodes, and intelligently orchestrates the underlying computing resources without manual configuration.
Alameda is an intelligent resource orchestrator for Kubernetes, providing the features of autonomous balancing, scaling, and scheduling by using machine learning. Alameda learns the continuing changes of computing resources from K8S clusters, predicts the future computing resources demands for pods and nodes, and intelligently orchestrates the underlying computing resources without manual configuration.

For more details, please refer to https://github.com/containers-ai/Alameda

Expand All @@ -20,41 +20,108 @@ Our first take is to provide the following features, which we consider they are

## How Alameda works

1. Alameda data collector gets metrics from Prometheus (e.g., CPU, memory, Ceph metrics)
No Alameda Agent is needed.
2. Alameda AI engine generates resource prediction
3. Alameda resource operator monitors Rook cluster CRD
Alameda monitors rook CRDs with Alameda annotations. For example, Rook user can add ```container.ai/autoscale``` and ```container.ai/diskFailurePrediction``` annotations in their *cluster.yaml* as:
1. Users specifying objects that need Alameda services
Proposal 1: by adding Alameda annotations to Rook CRD objects
Proposal 2: by creating Alameda CRD objects to specify users' K8S deployment objects.

2. Alameda watches creation, update, and deletion of the specified objects

3. Alameda utilizes Prometheus to scrape data, and these data is adapted into Alameda plane
Alameda does not have data collection agent.

4. Alameda's AI engine predicts computing resourece demands

5. Alameda exposes prediction raw data
With these predictions, Rook can (1) update CR spec, or (2) update CR spec with new definitions of planning.

6. Alameda generates operational plans based on the prediction for further automation

7. Third party projects such as Rook can automate resource orchestrations by either leveraging Alameda recommended operational plans or generating their own operational plans from the prediction raw data

8. Alameda has a feedback mechanism to evalute the operation results for further refinement.


![work_flow](./Alameda_work_with_Rook.png)

## Request Alameda services by annotating Rook CRD objects (Proposal 1)

Rook users deploy a ceph cluster by creating a Rook CRD cluster object. Rook operator will create _Deployment_, _ReplicaSet_ and _Pod_ objects subsequently according to this cluster object. All these subsequent objects can be traced back to the Rook CRD cluster object by looking into the _ownerReferences_ metadata. To request Alameda services, Rook users can annotate a Rook CRD cluster object with Alameda annotations. For example, Rook users can add ```containers.ai/autoscale```, ```containers.ai/diskFailurePrediction``` and ```containers.ai/capacityTrendingPrediction``` annotations in their *cluster.yaml* as:
<pre>
apiVersion: v1
kind: Namespace
metadata:
name: rook
---
apiVersion: rook.io/v1alpha1
apiVersion: ceph.rook.io/v1beta1
kind: Cluster
metadata:
name: rook
namespace: rook
<b>annocations:
container.ai/autoscale: true
container.ai/diskFailurePrediction: true
container.ai/capacityTrendingPrediction: true</b>
name: rook-ceph
namespace: rook-ceph
<b>annotations:
containers.ai/autoscale: true
containers.ai/diskFailurePrediction: true
containers.ai/capacityTrendingPrediction: true</b>
spec:
versionTag: v0.5.1
dataDirHostPath:
dataDirHostPath: /var/lib/rook
serviceAccount: rook-ceph-cluster
storage:
useAllNodes: true
useAllDevices: false
storeConfig:
storeType: filestore
databaseSizeMB: 1024
journalSizeMB: 1024
useAllDevices: true
</pre>

4. Alameda generates resource operation planning for Rook cluster
5. Alameda update cluster CRD
Alameda will (1) update CRD spec, or (2) update CRD spec with new definitions of planning. No matter they are (1) or (2), Rook needs to change the logic of watching CRD.
Then Rook operator will create a _Deployment_ object such as:
<pre>
apiVersion: apps/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: 2018-09-24T06:59:25Z
generation: 5
labels:
app: rook-ceph-osd
ceph-osd-id: "1"
rook_cluster: rook-ceph
name: rook-ceph-osd-id-1
namespace: rook-ceph
ownerReferences:
- apiVersion: v1beta1
blockOwnerDeletion: true
kind: Cluster
name: rook-ceph
uid: e1ec433a-96f4-11e8-b01a-0a168aa5aac2
resourceVersion: "11765632"
selfLink: /apis/extensions/v1beta1/namespaces/rook-ceph/deployments/rook-ceph-osd-id-1
uid: 5f21a46e-bfc7-11e8-8e77-0645df3fb718
</pre>

![work_flow](./Alameda_work_with_Rook.png)
By tracing back the _ownerReferences_ information, Alameda knows users request Alameda services.

## Request Alameda services by creating Alameda CRD objects (Proposal 2)

When Rook users create a Rook cluster CRD object, Rook operator may create an _Deployment_ object with the following yaml:
<pre>
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: rook-ceph-osd-id-1
namespace: rook-ceph
labels:
app: rook-ceph-osd
ceph-osd-id: "1"
rook_cluster: rook-ceph
spec:
...
</pre>
meanwhile, Rook operator needs to also create an Alameda CRD object as the following yaml to request Alameda for services:
<pre>
apiVersion: containers.ai/v1beta1
kind: Deployment
metadata:
<b>annotations:
containers.ai/autoscale: true
containers.ai/diskFailurePrediction: true
containers.ai/capacityTrendingPrediction: true</b>
spec:
<b>selector:
matchLabels:
app: rook-ceph-osd
ceph-osd-id: "1"
rook_cluster: rook-ceph</b>
</pre>

0 comments on commit e89a21d

Please sign in to comment.