Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions related to how to contribute this project #123

Open
Espresso-Kp opened this issue May 18, 2023 · 11 comments
Open

Questions related to how to contribute this project #123

Espresso-Kp opened this issue May 18, 2023 · 11 comments

Comments

@Espresso-Kp
Copy link

Hi, you mentioned supporting e2e testing framework like https://github.com/kubernetes-sigs/e2e-framework to make writing tests easier in the wish list. We are considering to make some contribution on it but to be honest we don't have much previous experence about this.

Could you provide some suggestions how to do it or what are the difficulties to do it?

In addition, we are looking for some short-term and feasible workload to contribute this project, so if it is possible that you could give us some insights, we would be really happy to hear them!

@tianyin
Copy link
Member

tianyin commented May 19, 2023

Hi @Espresso-Kp Thank you for the interest!

Could you elaborate on your needs and use cases more? For example, which controllers/operators are you trying to test? What scenarios you want to test (which determines the workloads to write).

We did wish to have a more generic test infra based on the e2e test framework you referred to, but it is currently not a part of Sieve (yet).

@Espresso-Kp
Copy link
Author

For example, there are lots of manifest files in ZooKeeper operator, but I don't understand why you choosed zookeeper.pravega.io_zookeeperclusters_crd.yaml, operator.yaml, and rbac.yaml as manifests in examples/zookeeper-operator.

In the paper "Automatic Reliability Testing for Cluster Management Controllers", you mentioned that to find manifest that specifies how to build and deploy the controller under test is straightforward, but when I'm watching the manifests of zookeeper operator, I don't really know which to use.

So now we are thinking if we can use e2e-framework/klient/decoder to make it easier to build a controller, because decoder can decode the manifest to objects. While I haven't tried it so I am not sure if it can reduce the time for understanding how to build the controller. Do you think if it is a good idea?

@marshtompsxd
Copy link
Member

marshtompsxd commented May 19, 2023

@Espresso-Kp Thanks for your interest!

For example, there are lots of manifest files in ZooKeeper operator, but I don't understand why you choosed zookeeper.pravega.io_zookeeperclusters_crd.yaml, operator.yaml, and rbac.yaml as manifests in examples/zookeeper-operator.

There is no single rule to decide which manifest files to choose for different controllers because controller developers tend to generate different manifests to deploy their controllers. It is usually straightforward for the controller developers to choose the manifests to deploy the controller.

Despite the diversity of controllers, there are some common patterns: (1) you need to install the CRD (custom resource definition) which is a data type used to represent the desired state of the system managed by the controller, (2) you need to deploy the (containerized) controller to your k8s cluster so that the controller will continuously monitor the current cluster state and match it to the desired state, (3) you need to configure the permission of your controller running in the k8s cluster.

For the zookeeper operator you mentioned, we learned how to deploy the controller following their readme: zookeeper.pravega.io_zookeeperclusters_crd.yaml is used to install the CRD, operator.yaml is used to deploy the controller in the k8s cluster, and rbac.yaml is used to set the permission of the controller.

So now we are thinking if we can use e2e-framework/klient/decoder to make it easier to build a controller, because decoder can decode the manifest to objects.

This sounds exciting! The e2e-framework won't help build a controller, but it will help write test workloads for a controller which makes it easier to port a controller. I am not very familiar with the other two tools.

We welcome all kinds of contributions. Please let me know if you need more help.

@Espresso-Kp
Copy link
Author

Thanks for such a detailed answer!
While what do you mean e2e-framework can make it easier to port a controller? From my understaning, supporting helper functions in e2e-framework can make it easier to use Client-Go functionalities (CRUD ops), which helping users to write test, do you mean the same thing as this or can you explain a bit how does it make easier to port a controller?

In terms of decoder, it can be used to decode YAML or JSON encoded Kubernetes objects from files, strings, and byte slices.
So what I'm thinking is even though decoder won't help build a controller directly, it can help testing developers to understand controllers easier by checking the objects it filtered from the manfiest instead of looking through the whole manifest. Do you think if it is a good idea to integrate decoder into the current sieve testing workflow and to save time for developers to understand how to build the controllers that they want to test.

@tianyin
Copy link
Member

tianyin commented May 20, 2023

do you mean the same thing as this

Yes. We've been thinking about integrating the e2e test framework you pointed out to write tests, but we didn't get time to do it.

Our current workflow of porting a controller is not the best and we are thinking of how to make it easier to use. One thing would be valuable (and we indeed plan to do when time allows) is to decouple the runtime and common library from the current Sieve system. We can build more common utils to make things easier (e.g., the decoder you mentioned). But, the refactoring is not trivial as it needs to know the codebase fairly well.

Do you think if it is a good idea to integrate decoder into the current sieve testing workflow and to save time for developers to understand how to build the controllers that they want to test.

I think it's an interesting feature to add. We would love it if you can contribute.

May I ask (again) what's your use case of Sieve? Are you doing research projects on top of it? Or you are actually using it to test your own controller(s)?

In our experience, if you are a developer of a controller, it's straightforward which YAML file should be the input of Sieve (as you know your controller code/config well).

Certainly, if you are testing many controllers on GitHub or OperatorHub, then it could be a very useful util.

Thanks!

@Espresso-Kp
Copy link
Author

Thanks for the reply!

May I ask (again) what's your use case of Sieve? Are you doing research projects on top of it? Or you are actually using it to test your own controller(s)?

Yes, I am a CS graduate student from TUM and I am doing a seminar research on top of Sieve project with my teammate. We are not the developer of a controller and that's maybe the reason we didn't understand at first why you mentioned in the paper that finding which YAML files to use is straightforward.

Certainly, if you are testing many controllers on GitHub or OperatorHub, then it could be a very useful util.

Glad to know that! Agree, it could be more efficient if the use case is to test many controllers or new controllers that the developer is not familiar with before. We will keep investing on this path and try to introduce a fair practice!

@tianyin
Copy link
Member

tianyin commented May 20, 2023

Thanks for the interests of Sieve and glad that you are further developing research projects on top of it! Would love to know your research idea(s) if possible.

that's maybe the reason we didn't understand at first why you mentioned in the paper

Yes. Sieve was designed as a developer tool, not a market-scale tool (as it needs some tests written as you pointed out). For the controllers we evaluated in the paper, we (mostly @marshtompsxd @tylergu and @laphets) spent some time to understand the evaluated controllers, so we can "pretend" to be developers.

it could be more efficient if the use case is to test many controllers or new controllers that the developer is not familiar with before

Let us know if you have any thoughts on that. We would love to hear!

And, you are right that making the porting process easier (e.g., using decoder to automatically find the CR YAML) is very useful.

@Espresso-Kp
Copy link
Author

Thanks for the supporting and we will update it to you if we made more concrete progress!

For now, we are looking at two interesting indirect bugs: the zookeeper indirect bug and mongodb indirect bug. Apparently they have very similar root cause and I wonder how the test plans for exposing these two indirect bugs was generated? Were they written manually by you or automatically generated by Sieve according to the test workload you provided?

@tianyin
Copy link
Member

tianyin commented Jun 22, 2023

@Espresso-Kp Test plans are all automatically generated (based on the test workloads).

@Espresso-Kp
Copy link
Author

Thanks! While from my understanding, test plans are generated based on three perturbation policies, so how can Sieve automatically generate test plans beyond these three, for example zookeeper-operator-indirect-1.yaml.

As you also mentioned in the "Bugs indirectly detected by Sieve" part of osdi paper, I thought the zookeeper-operator-indirect-1.yaml and mongodb-operator-indirect-2.yaml were written manually.

after understanding the root causes, we were able to reproduce two of these bugs consistently with manually written test plans.

@marshtompsxd
Copy link
Member

Hi @Espresso-Kp . Thanks for asking, and let me clarify here.

All the bugs were initially detected by test runs guided by the automatically generated test plans. In the bug_reproduction_test_plans folder, all the test plans for reproducing the bugs of the three main patterns targeted by Sieve are automatically generated based on the test workloads.

The indirect bugs are a bit different because they are not directly triggered by the three patterns (they are more like by-products of Sieve by running many different workloads) and the originally automatically generated test plans cannot always reliably reproduce all of the indirect bugs.

Interestingly, after more investigations, we find that there are a few such indirect bug examples that can be reproduced by different perturbation patterns which Sieve currently does not support yet. Although we haven't implemented a 4th perturbation pattern in Sieve yet, we tried to manually encode such patterns in a few test plans (the ZooKeeper and MongoDB ones you were referring to) and they turned out to be effective. To make it easier to even reproduce some indirect bugs, we also put these test plans under the bug_reproduction_test_plans folder.

An interesting future work is to integrate more such promising perturbation patterns into Sieve so that we can use Sieve to automatically generate them and find more bugs. If you have any interest in that, the existing policies for generating test plans would be a good starting point: sieve_perturbation_policies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants