diff --git a/design/Alameda/Alameda_work_with_Rook.png b/design/Alameda/Alameda_work_with_Rook.png index 613fa9fb6b3de..5918b21f857fe 100755 Binary files a/design/Alameda/Alameda_work_with_Rook.png and b/design/Alameda/Alameda_work_with_Rook.png differ diff --git a/design/Alameda/design.md b/design/Alameda/design.md index 4ac3c50825c1b..e4c075fe91505 100644 --- a/design/Alameda/design.md +++ b/design/Alameda/design.md @@ -1,6 +1,6 @@ ## What is Alameda -Alameda is an intelligent resource orchestrator for Kubernetes, providing the features of autonomous balancing, scaling, and scheduling by using machine learning. Alameda learns the continuing changes of computing resources from K8S clusters, predicts the future computing resoruce demands for pods and nodes, and intelligently orchestrates the underlying computing resources without manual configuration. +Alameda is an intelligent resource orchestrator for Kubernetes, providing the features of autonomous balancing, scaling, and scheduling by using machine learning. Alameda learns the continuing changes of computing resources from K8S clusters, predicts the future computing resources demands for pods and nodes, and intelligently orchestrates the underlying computing resources without manual configuration. For more details, please refer to https://github.com/containers-ai/Alameda @@ -20,41 +20,108 @@ Our first take is to provide the following features, which we consider they are ## How Alameda works -1. Alameda data collector gets metrics from Prometheus (e.g., CPU, memory, Ceph metrics) -No Alameda Agent is needed. -2. Alameda AI engine generates resource prediction -3. Alameda resource operator monitors Rook cluster CRD -Alameda monitors rook CRDs with Alameda annotations. For example, Rook user can add ```container.ai/autoscale``` and ```container.ai/diskFailurePrediction``` annotations in their *cluster.yaml* as: +1. Users specifying objects that need Alameda services +Proposal 1: by adding Alameda annotations to Rook CRD objects +Proposal 2: by creating Alameda CRD objects to specify users' K8S deployment objects. + +2. Alameda watches creation, update, and deletion of the specified objects + +3. Alameda utilizes Prometheus to scrape data, and these data is adapted into Alameda plane +Alameda does not have data collection agent. + +4. Alameda's AI engine predicts computing resourece demands + +5. Alameda exposes prediction raw data +With these predictions, Rook can (1) update CR spec, or (2) update CR spec with new definitions of planning. + +6. Alameda generates operational plans based on the prediction for further automation + +7. Third party projects such as Rook can automate resource orchestrations by either leveraging Alameda recommended operational plans or generating their own operational plans from the prediction raw data + +8. Alameda has a feedback mechanism to evalute the operation results for further refinement. + + +![work_flow](./Alameda_work_with_Rook.png) + +## Request Alameda services by annotating Rook CRD objects (Proposal 1) + +Rook users deploy a ceph cluster by creating a Rook CRD cluster object. Rook operator will create _Deployment_, _ReplicaSet_ and _Pod_ objects subsequently according to this cluster object. All these subsequent objects can be traced back to the Rook CRD cluster object by looking into the _ownerReferences_ metadata. To request Alameda services, Rook users can annotate a Rook CRD cluster object with Alameda annotations. For example, Rook users can add ```containers.ai/autoscale```, ```containers.ai/diskFailurePrediction``` and ```containers.ai/capacityTrendingPrediction``` annotations in their *cluster.yaml* as:
-    apiVersion: v1
-    kind: Namespace
-    metadata:
-      name: rook
-    ---
-    apiVersion: rook.io/v1alpha1
+    apiVersion: ceph.rook.io/v1beta1
     kind: Cluster
     metadata:
-      name: rook
-      namespace: rook
-      annocations:
-        container.ai/autoscale: true
-        container.ai/diskFailurePrediction: true
-        container.ai/capacityTrendingPrediction: true
+      name: rook-ceph
+      namespace: rook-ceph
+      annotations:
+        containers.ai/autoscale: true
+        containers.ai/diskFailurePrediction: true
+        containers.ai/capacityTrendingPrediction: true
     spec:
-      versionTag: v0.5.1
-      dataDirHostPath:
+      dataDirHostPath: /var/lib/rook
+      serviceAccount: rook-ceph-cluster
       storage:
         useAllNodes: true
-        useAllDevices: false
-        storeConfig:
-          storeType: filestore
-          databaseSizeMB: 1024
-          journalSizeMB: 1024
+        useAllDevices: true
 
-4. Alameda generates resource operation planning for Rook cluster -5. Alameda update cluster CRD -Alameda will (1) update CRD spec, or (2) update CRD spec with new definitions of planning. No matter they are (1) or (2), Rook needs to change the logic of watching CRD. +Then Rook operator will create a _Deployment_ object such as: +
+    apiVersion: apps/v1beta1
+    kind: Deployment
+    metadata:
+      annotations:
+        deployment.kubernetes.io/revision: "1"
+      creationTimestamp: 2018-09-24T06:59:25Z
+      generation: 5
+      labels:
+        app: rook-ceph-osd
+        ceph-osd-id: "1"
+        rook_cluster: rook-ceph
+      name: rook-ceph-osd-id-1
+      namespace: rook-ceph
+      ownerReferences:
+      - apiVersion: v1beta1
+        blockOwnerDeletion: true
+        kind: Cluster
+        name: rook-ceph
+        uid: e1ec433a-96f4-11e8-b01a-0a168aa5aac2
+      resourceVersion: "11765632"
+      selfLink: /apis/extensions/v1beta1/namespaces/rook-ceph/deployments/rook-ceph-osd-id-1
+      uid: 5f21a46e-bfc7-11e8-8e77-0645df3fb718
+
-![work_flow](./Alameda_work_with_Rook.png) +By tracing back the _ownerReferences_ information, Alameda knows users request Alameda services. + +## Request Alameda services by creating Alameda CRD objects (Proposal 2) + +When Rook users create a Rook cluster CRD object, Rook operator may create an _Deployment_ object with the following yaml: +
+    apiVersion: apps/v1beta1
+    kind: Deployment
+    metadata:
+      name: rook-ceph-osd-id-1
+      namespace: rook-ceph
+      labels:
+        app: rook-ceph-osd
+        ceph-osd-id: "1"
+        rook_cluster: rook-ceph
+    spec:
+        ...
+
+meanwhile, Rook operator needs to also create an Alameda CRD object as the following yaml to request Alameda for services: +
+    apiVersion: containers.ai/v1beta1
+    kind: Deployment
+    metadata:
+      annotations:
+        containers.ai/autoscale: true
+        containers.ai/diskFailurePrediction: true
+        containers.ai/capacityTrendingPrediction: true
+    spec:
+      selector:
+        matchLabels:
+          app: rook-ceph-osd
+          ceph-osd-id: "1"
+          rook_cluster: rook-ceph
+