Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: rewrite the design document #132

Merged
merged 4 commits into from
Dec 3, 2021
Merged

docs: rewrite the design document #132

merged 4 commits into from
Dec 3, 2021

Conversation

huachaohuang
Copy link
Contributor

@huachaohuang huachaohuang commented Nov 24, 2021

A rendered version of the document is here.

Design documents of individual modules are also pulled together to provide a better landscape.

Microunit's Design is omitted for now. As we are shifting to k8s, I will add related designs once I clear my mind.

@huachaohuang huachaohuang added this to the Version 0.2 milestone Nov 24, 2021
@Ryan-Git
Copy link

Ryan-Git commented Nov 24, 2021

New here so forgive me if raising any duplicate/discussed question. This seems better than v1. Just two comments.

Fo me, the name of warehouse is somewhat confusing. It actually maintains a multi-version index of the immutable storage. In linux we have inode table, maybe stable/sindex would be better? Naming is hard of course...

Another suggestion is, could we separate out scheduling parts from the whole picture? Apparently it's not the responsibility of a Compute instance to deploy itself to some ec2 instance. Components can just claim resource requests and the shared customizable schedulers(k8s has nice abstraction in this field) takes care of the rest. This control path view might make the design even clearer.

@huachaohuang
Copy link
Contributor Author

New here so forgive me if raising any duplicate/discussed question. Just two comments.

No problem. You are welcome :)

Fo me, the name of warehouse is somewhat confusing. It actually maintains a multi-version index of the immutable storage. In linux we have inode table, maybe stable/sindex would be better? Naming is hard of course...

Well, I am open for better names. But I don't agree that stable/sindex are better names 🙄

Another suggestion is, could we separate out(meanwhile emphasize) scheduling parts from the whole picture?

Do you mean the "Microunit" part? Do you mean it is easier to understand if we move that part to somewhere else?

If different components(or implementations) can just claim resource requests and the shared customizable scheduler(s) takes care of the rest, that would be fantastic.

That's exactly the idea. We are currently thinking about giving k8s a try first. You can check some discussions here or in Discord (history of last night).

@Ryan-Git
Copy link

Ryan-Git commented Nov 24, 2021

Do you mean the "Microunit" part? Do you mean it is easier to understand if we move that part to somewhere else?

It's ok with a separate image or a larger one. Just the scheduling part (connection between module and microunit) need more emphasis I think.

That's exactly the idea. We are currently thinking about giving k8s a try first. You can check some discussions here or in Discord (history of last night).

Thx. I'll follower those up first.

@huachaohuang
Copy link
Contributor Author

Microunit serves as a simple abstraction for a resource pool. It can use k8s, or simply provision resources from cloud vendors or other internal machine management systems.

@huachaohuang
Copy link
Contributor Author

Do you mean the "Microunit" part? Do you mean it is easier to understand if we move that part to somewhere else?

It's ok with a separate image or a larger one. Just the scheduling part (connection between module and microunit) need more emphasis I think.

Oh, I see. That part is about "Microunit", which I haven't designed yet. I will surely add more description about that in the future.

@Ryan-Git
Copy link

Microunit serves as a simple abstraction for a resource pool. It can use k8s, or simply provision resources from cloud vendors or other internal machine management systems.

do you mean Microunit is either a resource or a provisioner?

@huachaohuang
Copy link
Contributor Author

do you mean Microunit is either a resource or a provisioner?

It is the bridge between Engula modules and the underlying resource pool. For example, if we run on k8s, we may have an operator in k8s, and then Microunit will provide some interfaces for Engula modules to manipulate k8s resources, like Pods or ReplicaSets.

@Ryan-Git
Copy link

Ryan-Git commented Nov 24, 2021

do you mean Microunit is either a resource or a provisioner?

It is the bridge between Engula modules and the underlying resource pool. For example, if we run on k8s, we may have an operator in k8s, and then Microunit will provide some interfaces for Engula modules to manipulate k8s resources, like Pods or ReplicaSets.

say Compute decides to do sth in background, it requests Microunit to start a new Background(with resource specification, pre-defined template, etc..). Then Microunit communicates with the underlying platform(k8s, aws, etc) to fulfill the request. Am i right?

@huachaohuang
Copy link
Contributor Author

say Compute decides to do sth in background, it requests Microunit to start a new Background(with resource specification, pre-defined template, etc..). Then Microunit communicates with the underlying platform(k8s, aws, etc) to fulfill the request. Am i right?

I think a more precise description will be: Compute tells Background to run some jobs, Background calls Microunit to provision some Background unit (for example, a pod in k8s) and then run the job in that unit.

@huachaohuang
Copy link
Contributor Author

Hmm, maybe I should rename "Warehouse" to "Manifest", since it is more like a module that manages metadata than a warehouse 🤔

@huachaohuang
Copy link
Contributor Author

OK, I renamed "Warehouse" to "Manifest", since "Warehouse" is confused for people with AP background and "Manifest" is more familiar to people with RocksDB background :)

@Ryan-Git
Copy link

Ryan-Git commented Nov 25, 2021

I think a more precise description will be: Compute tells Background to run some jobs, Background calls Microunit to provision some Background unit (for example, a pod in k8s) and then run the job in that unit.

Since there're multiple Background Groups, you mean Background Group is a kind of service? If so, should we add an interface layer of Microunit to the image? Remote Background implementations should depend on that interface.

Seems it's future work though...the finished part is good to me. Manifest is better :)

@huachaohuang
Copy link
Contributor Author

Since there're multiple Background Groups, you mean Background Group is a kind of service? If so, should we add an interface layer of Microunit to the image? Remote Background implementations should depend on that interface.

Seems it's future work though...the finished part is good to me. Manifest is better :)

Yeah, I plan to add more details about that here or in another PR later. Thanks for your review.

@w41ter
Copy link
Contributor

w41ter commented Nov 26, 2021

I have some question about the Manifest.

It seems that a Manifest act as both metadata and txn manager? So the manifest need some way to find the obsoleted objects, but those objects exists in BaseStore and DeltaStore, does a manifest need to scan both storage and journal to find those obsoleted objects?

On the same time, how does compute handle these obsoleted objects to ensure atomicity?

@huachaohuang
Copy link
Contributor Author

@PatrickNicholas From your description, I feel that maybe you misunderstood the concept of objects. You may think that an object is a key-value record or something. But what objects mean in the document is actually a blob object in object storage, like a file.

@huachaohuang
Copy link
Contributor Author

Does anyone think that "object" is a confusing concept? Should we use "file" instead in upper-level modules?

@w41ter
Copy link
Contributor

w41ter commented Nov 26, 2021

But what objects mean in the document is actually a blob object in object storage, like a file.

@huachaohuang So the term object used in Manifest isn't equals to the object used in Storage?

@huachaohuang
Copy link
Contributor Author

@huachaohuang So the term object used in Manifest isn't equals to the object used in Storage?

They are the same. They all mean immutable data files/objects.

@w41ter
Copy link
Contributor

w41ter commented Nov 26, 2021

Can I understand the object in the Manifest as the output of compute and background, and the atomic addition and deletion of objects can be understood as the version edit of rocksdb?

@huachaohuang
Copy link
Contributor Author

Can I understand the object in the Manifest as the output of compute and background, and the atomic addition and deletion of objects can be understood as the version edit of rocksdb?

Yes, exactly.

w41ter
w41ter previously approved these changes Nov 26, 2021
@huachaohuang
Copy link
Contributor Author

huachaohuang commented Nov 26, 2021

The original position of "Manifest" is a very simple metadata abstraction. The previous design introduced some semantics that are beyond the scope of "Manifest". And as we add more functionalities in the future, we will need a place to sustain extra semantics. So I decided to introduce the "Kernel" module. Kernel is the bridge between API and other modules, which I think also simplifies API interactions. Another thing is that I remove the "Compute" concept and simply use "API" instead, since we don't really have a "Compute" module. And some descriptions about API are also added.

docs/design.md Outdated

Engula unbundles the storage engine into the following modules:

- **API** provides stateless data API services. For example, KV, SQL, or GraphQL.
Copy link
Contributor

@tisonkun tisonkun Nov 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we focus on KV/collection APIs? Mention SQL here seems far more than what Engula is going to do, and likely confuses our audiences. Will Engula be a SQL query engine? Or a SQL query engine can build on Engula.

docs/design.md Outdated
Engula unbundles the storage engine into the following modules:

- **API** provides stateless data API services. For example, KV, SQL, or GraphQL.
- **Kernel** provides the essential storage capabilities to implement upper-level APIs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Discord I'm confused whether Kernel will take place of the previous Warehouse that is an upper-level APIs wrapper, or mixin deployment functionalities?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that coupling different focuses into one abstraction mess up logic.

Copy link
Contributor

@tisonkun tisonkun Nov 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to let the kernel abstraction defines all storage engine data API for server-side only, that a client can communicate with a kernel with data API for client-side. And thus a kernel unit must start with existing storage and journal units.

We have a dedicated microunit implementation and cluster management unit (control unit) for (rolling) updating/provisioning these units.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we build a unique language for these topics. In #136 we introduces a new concept "engine" seems like what is defined as "Kernel" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, give me some time to prototype it before adding more precise descriptions.

@huachaohuang
Copy link
Contributor Author

I have updated the design document to reflect recent discussions and implementations. The updated descriptions are mainly about Engine and Kernel. I think this is enough for v0.2.

@huachaohuang huachaohuang mentioned this pull request Dec 2, 2021
10 tasks
@huachaohuang
Copy link
Contributor Author

Merging it now, thanks all for the reviews.

@huachaohuang huachaohuang merged commit fa8f2e2 into engula:main Dec 3, 2021
@huachaohuang huachaohuang deleted the docs branch December 3, 2021 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants