-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propose Alameda design for rook integration #2182
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
## What is Alameda | ||
|
||
Alameda is an intelligent resource orchestrator for Kubernetes, providing the features of autonomous balancing, scaling, and scheduling by using machine learning. Alameda learns the continuing changes of computing resources from K8S clusters, predicts the future computing resources demands for pods and nodes, and intelligently orchestrates the underlying computing resources without manual configuration. | ||
|
||
For more details, please refer to https://github.com/containers-ai/Alameda | ||
|
||
## How Alameda works with Rook | ||
|
||
![integration](./architecture_rook_integration.png) | ||
(Note: CRD AlamedaResource is called AlamedaScaler and CRD AlamedaResourcePrediction is called AlamedaRecommendation from release 0.2) | ||
|
||
This figure illustrates Alameda's architecture and how it can work with Rook. The primary purpose of Alameda is to **recommend optimal computing resource configuration for Kubernetes**. To achieve that, users of Alameda (which is Rook here) specify which Pods require metrics prediction and resource configuration recommendation by creating *AlamedaResource* CRs ([example](https://github.com/containers-ai/alameda/blob/master/example/samples/nginx/alameda_deployment.yaml)). After some machine learning calculations, users can see metrics predictions and recommendations in the *AlamedaResourcePrediction* CRs. An example can be found [here](https://github.com/containers-ai/alameda/blob/master/docs/quickstart.md#example). Please note that starting from Alameda release 0.2, the metrics predictions will be store in a time-series DB instead of CRs due to performance and size considerations of etcd. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What does the interface look like for an orchestrator to consume the predictions and recommendations? does it call a specific Alameda API? how does it discover that endpoint? How big are the predictions and recommendations? I can see Alameda gathering a lot of metrics/stats and that taking up a lot of space, but I would expect the recommendations to have a small footprint comparatively. Maybe a CRD approach could still work if the predictions are small and each CRD is over time updated with the latest recommendations, as opposed to new recommendation CRDs constantly being generated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it possible to add an example of what a recommendation looks like? what data/fields are exposed? |
||
|
||
Another function that Alameda provided for Rook is **disk failure prediction**. This function can be switch on/off when deploying Alameda or in Alameda's configmap. When this function is enabled, Alameda requires [prometheus_disk_log_exporter](https://github.com/containers-ai/prometheus_disk_log_exporter) to expose disk data such as S.M.A.R.T. to Prometheus. Once Alameda have disk failure prediction results, it will expose them in Alameda's recommendation. Since the physical disk can be identified by their world wild names (WWN) or serial numbers, Rook can pick up from here or Alameda can write the result to Rook's configmap that is named with *device-in-use* prefix. For example, Rook maintains a configmap to track the devices in use. Crane component of Alameda can execute the disk failure prediction by adding data to the configmap. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the interest of a decoupled system, I'd lean towards the failure predictions being exposed through Alameda's recommendations, instead of Alameda having internal knowledge about Rook's internal config structures. |
||
![rook_device_configmap](./rook_cm.png) | ||
|
||
An example disk failure prediction result is: | ||
``` | ||
{ | ||
"near_failure": "Good", | ||
"disk_wwn": "500a075118cb0318", | ||
"serial_number": "174718CB0318", | ||
"predicted": "2018-05-30 18:33:12", | ||
} | ||
``` | ||
and the *near_failure* attribute indicates disk life expectancy in the following table. | ||
|
||
near_failure | Life expectancy (weeks) | ||
--------------------|------------------------------------------------------ | ||
Good | > 6 weeks | | ||
Warning | 2 weeks ~ 6 weeks | | ||
Bad | < 2 weeks | | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the example AlamedaResource CRD link is broken (404): https://github.com/containers-ai/alameda/blob/master/example/samples/nginx/alameda_deployment.yaml
i'd recommend breaking up this paragraph to a new line per sentence, comments can be added by line, which is an entire paragraph right now, so it's difficult to make scoped comments.