-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions related to how to contribute this project #123
Comments
Hi @Espresso-Kp Thank you for the interest! Could you elaborate on your needs and use cases more? For example, which controllers/operators are you trying to test? What scenarios you want to test (which determines the workloads to write). We did wish to have a more generic test infra based on the e2e test framework you referred to, but it is currently not a part of Sieve (yet). |
For example, there are lots of manifest files in ZooKeeper operator, but I don't understand why you choosed zookeeper.pravega.io_zookeeperclusters_crd.yaml, operator.yaml, and rbac.yaml as manifests in examples/zookeeper-operator. In the paper "Automatic Reliability Testing for Cluster Management Controllers", you mentioned that to find manifest that specifies how to build and deploy the controller under test is straightforward, but when I'm watching the manifests of zookeeper operator, I don't really know which to use. So now we are thinking if we can use e2e-framework/klient/decoder to make it easier to build a controller, because decoder can decode the manifest to objects. While I haven't tried it so I am not sure if it can reduce the time for understanding how to build the controller. Do you think if it is a good idea? |
@Espresso-Kp Thanks for your interest!
There is no single rule to decide which manifest files to choose for different controllers because controller developers tend to generate different manifests to deploy their controllers. It is usually straightforward for the controller developers to choose the manifests to deploy the controller. Despite the diversity of controllers, there are some common patterns: (1) you need to install the CRD (custom resource definition) which is a data type used to represent the desired state of the system managed by the controller, (2) you need to deploy the (containerized) controller to your k8s cluster so that the controller will continuously monitor the current cluster state and match it to the desired state, (3) you need to configure the permission of your controller running in the k8s cluster. For the zookeeper operator you mentioned, we learned how to deploy the controller following their readme:
This sounds exciting! The e2e-framework won't help build a controller, but it will help write test workloads for a controller which makes it easier to port a controller. I am not very familiar with the other two tools. We welcome all kinds of contributions. Please let me know if you need more help. |
Thanks for such a detailed answer! In terms of decoder, it can be used to decode YAML or JSON encoded Kubernetes objects from files, strings, and byte slices. |
Yes. We've been thinking about integrating the e2e test framework you pointed out to write tests, but we didn't get time to do it. Our current workflow of porting a controller is not the best and we are thinking of how to make it easier to use. One thing would be valuable (and we indeed plan to do when time allows) is to decouple the runtime and common library from the current Sieve system. We can build more common utils to make things easier (e.g., the decoder you mentioned). But, the refactoring is not trivial as it needs to know the codebase fairly well.
I think it's an interesting feature to add. We would love it if you can contribute. May I ask (again) what's your use case of Sieve? Are you doing research projects on top of it? Or you are actually using it to test your own controller(s)? In our experience, if you are a developer of a controller, it's straightforward which YAML file should be the input of Sieve (as you know your controller code/config well). Certainly, if you are testing many controllers on GitHub or OperatorHub, then it could be a very useful util. Thanks! |
Thanks for the reply!
Yes, I am a CS graduate student from TUM and I am doing a seminar research on top of Sieve project with my teammate. We are not the developer of a controller and that's maybe the reason we didn't understand at first why you mentioned in the paper that finding which YAML files to use is straightforward.
Glad to know that! Agree, it could be more efficient if the use case is to test many controllers or new controllers that the developer is not familiar with before. We will keep investing on this path and try to introduce a fair practice! |
Thanks for the interests of Sieve and glad that you are further developing research projects on top of it! Would love to know your research idea(s) if possible.
Yes. Sieve was designed as a developer tool, not a market-scale tool (as it needs some tests written as you pointed out). For the controllers we evaluated in the paper, we (mostly @marshtompsxd @tylergu and @laphets) spent some time to understand the evaluated controllers, so we can "pretend" to be developers.
Let us know if you have any thoughts on that. We would love to hear! And, you are right that making the porting process easier (e.g., using decoder to automatically find the CR YAML) is very useful. |
Thanks for the supporting and we will update it to you if we made more concrete progress! For now, we are looking at two interesting indirect bugs: the zookeeper indirect bug and mongodb indirect bug. Apparently they have very similar root cause and I wonder how the test plans for exposing these two indirect bugs was generated? Were they written manually by you or automatically generated by Sieve according to the test workload you provided? |
@Espresso-Kp Test plans are all automatically generated (based on the test workloads). |
Thanks! While from my understanding, test plans are generated based on three perturbation policies, so how can Sieve automatically generate test plans beyond these three, for example zookeeper-operator-indirect-1.yaml. As you also mentioned in the "Bugs indirectly detected by Sieve" part of osdi paper, I thought the zookeeper-operator-indirect-1.yaml and mongodb-operator-indirect-2.yaml were written manually.
|
Hi @Espresso-Kp . Thanks for asking, and let me clarify here. All the bugs were initially detected by test runs guided by the automatically generated test plans. In the bug_reproduction_test_plans folder, all the test plans for reproducing the bugs of the three main patterns targeted by Sieve are automatically generated based on the test workloads. The indirect bugs are a bit different because they are not directly triggered by the three patterns (they are more like by-products of Sieve by running many different workloads) and the originally automatically generated test plans cannot always reliably reproduce all of the indirect bugs. Interestingly, after more investigations, we find that there are a few such indirect bug examples that can be reproduced by different perturbation patterns which Sieve currently does not support yet. Although we haven't implemented a 4th perturbation pattern in Sieve yet, we tried to manually encode such patterns in a few test plans (the ZooKeeper and MongoDB ones you were referring to) and they turned out to be effective. To make it easier to even reproduce some indirect bugs, we also put these test plans under the bug_reproduction_test_plans folder. An interesting future work is to integrate more such promising perturbation patterns into Sieve so that we can use Sieve to automatically generate them and find more bugs. If you have any interest in that, the existing policies for generating test plans would be a good starting point: sieve_perturbation_policies |
Hi, you mentioned supporting e2e testing framework like https://github.com/kubernetes-sigs/e2e-framework to make writing tests easier in the wish list. We are considering to make some contribution on it but to be honest we don't have much previous experence about this.
Could you provide some suggestions how to do it or what are the difficulties to do it?
In addition, we are looking for some short-term and feasible workload to contribute this project, so if it is possible that you could give us some insights, we would be really happy to hear them!
The text was updated successfully, but these errors were encountered: