WIP: send data to experimental ES#68
WIP: send data to experimental ES#68chaws wants to merge 2 commits intokernelci:masterfrom chaws:add-experimental-callback
Conversation
|
I'm OK with this approach to send live data to additional backends for experimentation. |
|
@khilman sorry for delay, I've parameterized it now, but for one experimental_url only, if needed I can add multiple urls (maybe comma separated). |
khilman
left a comment
There was a problem hiding this comment.
I don't understand why build.py is involved at all. For current callback URLs, it's not involved at all. This is not a kernel build time decision, but one that should be made when creating the LAVA jobs (lava-v2-jobs-from-api.py)
Rather, I think that the --calback feature of lava-v2-jobs-from-api.py should be extended to support more than one callback URLs. Also, why limit ourselves to 2?
@gctucker what's your thought on the right way to support multiple callbacks in LAVA jobs in a flexible way
|
@khilman the intention of having experimental url in build.py was to send ES a fresh version of the Agreed on |
|
@chaws ok, right. I forgot that |
@khilman I think this should be part of a general architecture to make the system more modular and enable alternatives (i.e. builds not in Jenkins, non-LAVA labs, alternative backends and frontends). It's probably worth a design document in Google docs as something to aim for, I can dump some ideas there to get this started. |
|
@khilman While I understand that sending the builds to both the current KernelCI backend and Elastic Search helps us with this experiment, is there a good reason why we would keep both in a production environment? It would seem fine to make it possible to configure a KernelCI instance to use either the current backend, or ES, or something else. But having more than one used by a single production instance, is there any use-case for this? |
|
@gctucker yes. It's not just for experiments like ES. The other obvious example is to to have production and development backends operating on the same "real" data. To get around this in the past, I've just copied the db from a production backend to a development backend, but that's just a workaround for not being able to send the same data to multiple backends. That's why I also suggested we should support even more than 2. As the project grows, I think there will always be needs for production, development and experimental backends. |
|
OK. I guess it would also help on staging to have multiple backend instances to test separate things being developed in parallel, running with different branches. It would still not strictly be part of the production path though. One thing that might cause some issues is the storage server. If we need to send the kernel source tarballs, the kernel build artifacts, the rootfs etc... to several places then it could start slowing things down as the number of them increases. So I think we may need to keep them separate, essentially having one or several "storage" backends independently from one or several "results" backends. It's not a trivial thing to do though, especially as the current backend is entangled with the storage part, so probably something to keep in mind for later if we're having issues with scaling and network bandwidth. |
This option will allow multiple callbacks to be defined in a yaml file, later to be used when generating test job definitions.
`build.py` was a bit limited due to `getopt`, it now works with `argparse`, that allows more flexible command line set up. Also, now the script allows notifications to be sent to multiple backends when a new build is complete.
| broken_callbacks = [] | ||
|
|
||
| for c in callbacks: | ||
| if not c.get('url'): |
There was a problem hiding this comment.
I would check whether all mandatory fields ('expected_fields') are present. There is no point checking any further if fields is missing.
There was a problem hiding this comment.
@mwasilew , only url field would be mandatory. The rest of them can be optional. If an expected field is not present, it will be set to LAVA's default.
|
|
||
| for c in callbacks: | ||
| if not c.get('url'): | ||
| print('Warning: one of the callbacks has no "url" set, removing it from the list') |
There was a problem hiding this comment.
logging sounds a better fit, but this is a case for another patch.
There was a problem hiding this comment.
Agreed, I could propose some refactoring later on
| [c.pop(k) for k in unknown_fields] | ||
|
|
||
| for broken in broken_callbacks: | ||
| callbacks.remove(broken) |
|
@khilman, @gctucker I rebased this PR and updated The callbacks is a yaml file that contains pretty much what LAVA accepts in their callbacks block. The only mandatory field is |
OK, I guess the file with the ES callback can be on a branch to test on staging as long as it doesn't contain any secrets? |
|
@chaws I'd really like to see the build.py argparse changes and new features separated into two separate patches. |
|
So I think the steps to get this merged are:
|
|
@chaws Could you please rebase and submit a new PR in https://github.com/kernelci/kernelci-core as we're shutting down this kernelci-core-staging repo this week? |
|
@chaws Please resubmit this PR in https://github.com/kernelci/kernelci-core as we're shutting down cc @khilman |
On last Monday call (10/12/18) Milosz suggested a patch in kernelci so that job results from lava would also be sent to our experimental ElasticSearch instance.
This patch does two things:
build.jsonto our servercallback_type != 'custom'2.1. Apparently only LAVA V2 labs after 2018.5 support to multiple callbacks and from https://github.com/kernelci/kernelci-core-staging/blob/master/labs.ini, I came up with this list:
Since kernelci-core-staging uses matt's, collabora's and baylibre's lab, we're safe adding the multiple callbacks block in the test job definition