Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RFC-0002 (Layer Architecture) #20

Merged
merged 7 commits into from
Jul 20, 2021
Merged

Add RFC-0002 (Layer Architecture) #20

merged 7 commits into from
Jul 20, 2021

Conversation

twelho
Copy link
Member

@twelho twelho commented Apr 21, 2021

This RFC shall describe the requirements of the individual layers of Racklet in more detail. It doesn't describe the implementation details of each layer, those will be covered by upcoming RFCs.

hackmd-github-sync-badge

Closes #21.

@twelho twelho added do-not-merge/wip The PR is still work in progress kind/design Categorizes issue or PR as related to the design of the project. labels Apr 21, 2021
@twelho twelho requested review from luxas and chiplet April 21, 2021 11:50
Copy link
Member

@luxas luxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this, looks great 💯
Left a couple of comments/nits, but overall LGTM


_One paragraph explanation of the feature._

This RFC describes the overall Racklet architecture, its defining layers, and requirements for each such layer, derived from [RFC-0001]. For each layer the defining components are described at a high level (avoiding implementation details). The compnoents are associated with their role and five highlighted key requirements from the values and user goals of [RFC-0001].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: compnoents


- Define well-known layers of Racklet.
- Describe the requirements for each layer.
- Describe roughly how to be "Racklet conformant" and what the differences are between Racklet and other similar alternatives.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're not answering this now, we can just omit this. I don't think we need to solve this problem in full now anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC-0002 is intended to partially assess that topic, but I'll see if I can reword that to sound less like this document is going to give an in-depth answer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


_The section should return to the examples given in the guide-level explanation below, and explain more fully how the detailed proposal makes those examples work._

Racklet is divided into 5 distinct layers, from lowest-level to highest-level:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse this so it's in line with how the RFC is read?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, Markdown was being a bit difficult with reversed lists, but I managed to find a workaround.


| Component | Role | Key Requirements |
| -------------------------------- | ------------------------------------------------ | ------------------------------------------------------------------------------------------------------- |
| **VM image building automation** | Define and run VMs declaratively | [Improve status quo], [Openness], [Declarative management], [Documentation], [Fast reconfiguration] |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/VM image building automation/Micro-Virtual Machine orchestration/

| Component | Role | Key Requirements |
| -------------------------------- | ------------------------------------------------ | ------------------------------------------------------------------------------------------------------- |
| **VM image building automation** | Define and run VMs declaratively | [Improve status quo], [Openness], [Declarative management], [Documentation], [Fast reconfiguration] |
| **Kubernetes VM image** | Consume/use a Kubernetes cluster | [De-facto standards], [Declarative management], [Loose coupling], [Upgradability], [Utilize Kubernetes] |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd omit "VM image" here as an implementation detail, and describe it instead as a system of components, which it is gonna be at the end of the day, so e.g. Kubernetes deployment automation

- If applicable, provide sample error messages, deprecation warnings, or migration guidance.
- If applicable, describe the differences between teaching this to a Racklet administrator versus a Racklet end user.

**TODO**: Mention what is the difference between "reference" implementation and "community" implementations. TBD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's omit this in this RFC


Explain the proposal as if it was already a feature of the project and this would be the documentation for that feature.

- Introducing new named concepts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BMC and RMC would be two examples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section needs a bit of a rewrite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 177c369.


As stated in [Risks and Mitigations](#risks-and-mitigations), Racklet is (one of) the first of its kind with regards to its specification-first architecture. The initial layer separation presented here is the result of an iterative thought process by the core Racklet authors. The five layers are chosen to clearly separate roles and responsibilities of components, without going into too much detail (too many layers) or causing excessive overlap (too few layers). Firmware and system software are separated to achieve loose coupling and clear, secure communication between them. User software is separated from system software to define a border between software mostly provided by the Racklet project and external software that the user introduces (workloads).

[Loose coupling] plays a very important role in the architecture presented here. Racklet could have been designed as a fully integrated system with implementations that are strictly defined by the project, but while this potentially could make the system more compact and simple, it also faces many drawbacks that make it incompatible with the values and goals of the project. For example, Racklet relies heavily on various different projects in the Open Firmware and Cloud Native ecosystems, many of which evolve quickly and provide alternative implementations complying to standard APIs. We want Racklet to be accessible, transparent and modular, which means supporting a wide variety of hardware, and enabling user customization to a great extent. If loose coupling is implemented properly, we believe that the standardized architecture presented here will be relatively simple to maintain and extend, and community-built Racklet solutions will also be able to use the modules and different software implementations effortlessly. In summary, to fulfill the values defined in [RFC-0001] and to avoid ecosystem fragmentation the Racklet project aims to provide interfaces, not implementations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Open Source Firmware


TODO: Explain what is the difference between this try and prior RPi clusters.

At the time of Racklet creation the history of Raspberry Pi (and other SBC) based cluster computers is already very rich. Various private persons, educational insistutes and companies have come up with a wide variety of designs (**TODO: Examples**) for different use cases for at least the past 10 years. What sets Racklet apart from these mostly one-off implementations is it's **specification**. Instead of deriving a specification from some implementation, Racklet as a system is *primarily* defined as a set of RFC documents. This specification is intended to define a **standardized** way to build a miniature compute cluster, from the lowest-level hardware details up to a state-of-the-art software stack. Since the specification is defined from the ground up, we prioritize basing it on the most _secure_ and _modern_ technologies available today, essentially merging the core concepts of prior SBC cluster computer implementations with the state of the art security and fleet management models of large-scale cloud providers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say past 8 years :)


## Motivation

_Why are we doing this? What use cases does it support? What is the expected outcome?_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove these now. They make the document a bit hard to review.

@twelho
Copy link
Member Author

twelho commented Jun 28, 2021

I've now overhauled the guide-level explanation and done a bunch of improvements all around. Perhaps a bit embarrassingly, the core of Racklet, the compute unit, was missing in the layer description altogether, so that's now been added as well.

@twelho twelho marked this pull request as ready for review July 20, 2021 12:39
@twelho twelho removed the do-not-merge/wip The PR is still work in progress label Jul 20, 2021
@twelho twelho requested a review from luxas July 20, 2021 12:39
Copy link
Member

@luxas luxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in 💯. Sorry I had missed to re-review this.

@luxas luxas merged commit 112472d into main Jul 20, 2021
@luxas luxas deleted the rfc/layer-architecture branch July 20, 2021 13:55
@twelho twelho mentioned this pull request Jul 21, 2021
@twelho twelho linked an issue Jul 21, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to the design of the project.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC-0002: Layer Architecture
3 participants