-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: rewrite the design document #132
Conversation
New here so forgive me if raising any duplicate/discussed question. This seems better than v1. Just two comments. Fo me, the name of Another suggestion is, could we separate out scheduling parts from the whole picture? Apparently it's not the responsibility of a |
No problem. You are welcome :)
Well, I am open for better names. But I don't agree that stable/sindex are better names 🙄
Do you mean the "Microunit" part? Do you mean it is easier to understand if we move that part to somewhere else?
That's exactly the idea. We are currently thinking about giving k8s a try first. You can check some discussions here or in Discord (history of last night). |
It's ok with a separate image or a larger one. Just the scheduling part (connection between module and microunit) need more emphasis I think.
Thx. I'll follower those up first. |
Microunit serves as a simple abstraction for a resource pool. It can use k8s, or simply provision resources from cloud vendors or other internal machine management systems. |
Oh, I see. That part is about "Microunit", which I haven't designed yet. I will surely add more description about that in the future. |
do you mean |
It is the bridge between Engula modules and the underlying resource pool. For example, if we run on k8s, we may have an operator in k8s, and then Microunit will provide some interfaces for Engula modules to manipulate k8s resources, like Pods or ReplicaSets. |
say |
I think a more precise description will be: Compute tells Background to run some jobs, Background calls Microunit to provision some Background unit (for example, a pod in k8s) and then run the job in that unit. |
Hmm, maybe I should rename "Warehouse" to "Manifest", since it is more like a module that manages metadata than a warehouse 🤔 |
OK, I renamed "Warehouse" to "Manifest", since "Warehouse" is confused for people with AP background and "Manifest" is more familiar to people with RocksDB background :) |
Since there're multiple Seems it's future work though...the finished part is good to me. Manifest is better :) |
Yeah, I plan to add more details about that here or in another PR later. Thanks for your review. |
I have some question about the Manifest. It seems that a Manifest act as both metadata and txn manager? So the manifest need some way to find the obsoleted objects, but those objects exists in BaseStore and DeltaStore, does a manifest need to scan both storage and journal to find those obsoleted objects? On the same time, how does compute handle these obsoleted objects to ensure atomicity? |
@PatrickNicholas From your description, I feel that maybe you misunderstood the concept of objects. You may think that an object is a key-value record or something. But what objects mean in the document is actually a blob object in object storage, like a file. |
Does anyone think that "object" is a confusing concept? Should we use "file" instead in upper-level modules? |
@huachaohuang So the term object used in Manifest isn't equals to the object used in Storage? |
They are the same. They all mean immutable data files/objects. |
Can I understand the object in the Manifest as the output of compute and background, and the atomic addition and deletion of objects can be understood as the version edit of rocksdb? |
Yes, exactly. |
The original position of "Manifest" is a very simple metadata abstraction. The previous design introduced some semantics that are beyond the scope of "Manifest". And as we add more functionalities in the future, we will need a place to sustain extra semantics. So I decided to introduce the "Kernel" module. Kernel is the bridge between API and other modules, which I think also simplifies API interactions. Another thing is that I remove the "Compute" concept and simply use "API" instead, since we don't really have a "Compute" module. And some descriptions about API are also added. |
docs/design.md
Outdated
|
||
Engula unbundles the storage engine into the following modules: | ||
|
||
- **API** provides stateless data API services. For example, KV, SQL, or GraphQL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we focus on KV/collection APIs? Mention SQL
here seems far more than what Engula is going to do, and likely confuses our audiences. Will Engula be a SQL query engine? Or a SQL query engine can build on Engula.
docs/design.md
Outdated
Engula unbundles the storage engine into the following modules: | ||
|
||
- **API** provides stateless data API services. For example, KV, SQL, or GraphQL. | ||
- **Kernel** provides the essential storage capabilities to implement upper-level APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Discord I'm confused whether Kernel
will take place of the previous Warehouse
that is an upper-level APIs wrapper, or mixin deployment functionalities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that coupling different focuses into one abstraction mess up logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to let the kernel abstraction defines all storage engine data API for server-side only, that a client can communicate with a kernel with data API for client-side. And thus a kernel unit must start with existing storage and journal units.
We have a dedicated microunit implementation and cluster management unit (control unit) for (rolling) updating/provisioning these units.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we build a unique language for these topics. In #136 we introduces a new concept "engine" seems like what is defined as "Kernel" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, give me some time to prototype it before adding more precise descriptions.
Design documents of individual modules are also pulled together to provide a better landscape.
I have updated the design document to reflect recent discussions and implementations. The updated descriptions are mainly about |
Merging it now, thanks all for the reviews. |
A rendered version of the document is here.
Design documents of individual modules are also pulled together to provide a better landscape.
Microunit's Design is omitted for now. As we are shifting to k8s, I will add related designs once I clear my mind.