New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose Alameda design for rook integration #2182

Open
wants to merge 9 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@mamafun

mamafun commented Oct 3, 2018

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Documentation has been updated, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.
  • Comments have been added or updated based on the standards set in CONTRIBUTING.md

[skip ci]

@mamafun mamafun force-pushed the containers-ai:wip-Alameda-rook-integration branch from 0c5d4c5 to 3d18777 Oct 3, 2018

@travisn

Great to see the design doc, thanks for submitting! Overall what would help is to have more detail around the workflow of the rook components together with Alameda components.

## How Alameda works
1. First Alameda data collector gets metrics, events, and logs from Prometheus

This comment has been minimized.

@travisn

travisn Oct 3, 2018

Member

What reports the metrics to prometheus? Is there an Alameda agent that will run on each node to collect them? Or is this something that could be reported by the rook discovery agent?

This comment has been minimized.

@mamafun

mamafun Oct 4, 2018

Thanks for your review. Alameda does not use agent. It looks up metrics from Prometheus. To my knowledge, rook agent collects r/w throughput/latency/iops and send them to Prometheus too. (please advise if I am wrong)

1. First Alameda data collector gets metrics, events, and logs from Prometheus
2. Alameda AI engine generates resource prediction
3. Alameda resource operator monitors Rook cluster CRD
4. Alameda generates resource operation planning for Rook cluster

This comment has been minimized.

@travisn

travisn Oct 3, 2018

Member

What is the output of the resource operation planning? How could the rook operator watch for the planning output?

This comment has been minimized.

@mamafun

mamafun Oct 4, 2018

Please see the refresh design.md. Alameda plans to output disk failure prediction/performance prediction/ capacity trending prediction. And the outcomes could be written to Prometheus and rook's CRDs. Please advise if rook prefers other ways for integration.

3. Alameda resource operator monitors Rook cluster CRD
4. Alameda generates resource operation planning for Rook cluster
![work_flow](./Alameda_work_with_rook.png)

This comment has been minimized.

@travisn

travisn Oct 3, 2018

Member

Could you expand on a few items in the diagram? Perhaps a description of each component would help, though I know some is explained above already.

  • How would annotations be added for ceph?
  • Resource autoscaler: How does it work?

This comment has been minimized.

@mamafun

mamafun Oct 4, 2018

Please see the refreshed design.md.

Alameda is an intelligent resource orchestrator for Kubernetes, providing the features of autonomous balancing, scaling, and scheduling by using machine learning. Alameda learns the continuing changes of computing and I/O metrics from clusters, predicts the future demands for pods, and intelligently orchestrates underlying resources to fulfill the dynamic resource requests without manual configuration.
For more details, please refer to https://github.com/containers-ai/Alameda

This comment has been minimized.

@rootfs

rootfs Oct 3, 2018

Member

the repo offers rather limited details

This comment has been minimized.

@mamafun

mamafun Oct 4, 2018

Thanks for your review. We are working on it to our best.

- Disk health prediction
Based on a disk's S.M.A.R.T. value, Alameda predicts how bad a commodity disk is going to fail in a near future. Rook can stop provisioning volumes from a critical status disk.
- Performance prediction
Alameda learns patterns from the historical performance metrics of persistent volumes and pools. With this knowledge, Rook can:

This comment has been minimized.

@rootfs

rootfs Oct 3, 2018

Member

would be great if expanding what metrics can be learned and how they are collected

This comment has been minimized.

@mamafun

mamafun Oct 4, 2018

Please see the refreshed design.md.

Alameda provides Rook with the following features:
- Disk health prediction
Based on a disk's S.M.A.R.T. value, Alameda predicts how bad a commodity disk is going to fail in a near future. Rook can stop provisioning volumes from a critical status disk.

This comment has been minimized.

@rootfs

mamafun added a commit to mamafun/rook that referenced this pull request Oct 4, 2018

mamafun added a commit to containers-ai/rook that referenced this pull request Oct 4, 2018

@mamafun mamafun force-pushed the containers-ai:wip-Alameda-rook-integration branch from 652e0e4 to da59d2d Oct 4, 2018

mamafun added a commit to containers-ai/rook that referenced this pull request Oct 4, 2018

refine rook#2182 (review) (#1)
* refine rook#2182 (review)

Signed-off-by: Matt Wu <mamafun@gmail.com>

mamafun and others added some commits Oct 2, 2018

draft alamada root design
Signed-off-by: Matt Wu <mamafun@gmail.com>
Fixed typos
Signed-off-by: Matt Wu <mamafun@gmail.com>
Update design.md
Fixed typos

Signed-off-by: Matt Wu <mamafun@gmail.com>
fix typos
Signed-off-by: Matt Wu <mamafun@gmail.com>
Update contents to reflect Rook's PR quesitons
Signed-off-by: Matt Wu <mamafun@gmail.com>
refine #2182 (review) (#1)
* refine #2182 (review)

Signed-off-by: Matt Wu <mamafun@gmail.com>

@mamafun mamafun force-pushed the containers-ai:wip-Alameda-rook-integration branch 2 times, most recently from 3cb980d to bcf9f41 Oct 5, 2018

Refine features for Rook
Signed-off-by: Matt Wu <mamafun@gmail.com>

@mamafun mamafun force-pushed the containers-ai:wip-Alameda-rook-integration branch from bcf9f41 to 8683a59 Oct 5, 2018

Refine Alameda workflow and Rook integration
Signed-off-by: Matt Wu <mamafun@gmail.com>

@mamafun mamafun force-pushed the containers-ai:wip-Alameda-rook-integration branch from e89a21d to 5ccc65a Oct 19, 2018

Refine designs of specifying targets that required Alameda services
Signed-off-by: Matt Wu <mamafun@gmail.com>

@mamafun mamafun force-pushed the containers-ai:wip-Alameda-rook-integration branch from 5f46bf2 to c018fc7 Oct 30, 2018

@galexrt galexrt added the design label Dec 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment