master branch in
kernelci-core is used as a development branch. The code used in production can be found on the
kernelci.org branch, which gets updated manually typically once a week. This procedure is described below.
To update production
Updating the code used in production can be disruptive, especially as kernelci-backend, kernelci-frontend and kernelci-core may need to be kept in sync and updated all together. The Jenkins jobs configuration may also need to be adjusted to match kernelci-core changes (list of parameters, overall jobs configuration). Several things need to be done in order to maximise the continuity of the kernelci.org service, as explained below. These things should gradually become either automated or simplified to reduce the amount of manual work and chances of missing a step.
Create release tags
This is a simple step to keep track of which versions get put into production. So create a
kernelci-yyyyddmm tag with the date on the
master branch in the
kernelci-core project. Similarly, create a new version tag if needed in
kernelci-frontend (see previous version updates in the history).
Send an email to the kernelci.org mailing list with a summary of all the changes going into production. This first step should be done at least a day ahead of time in order give a chance to anyone to comment on it before rolling things out into production.
Pause the kernel tree monitor job to flush Jenkins jobs
kernel-tree-monitor job periodically looks for new changes in all the kernel trees. As updating the production code should be done atomically for all the components, it's necessary to wait until all the Jenkins jobs have completed. This can be achieved by pausing the monitor job and then waiting for any on-going or queued job to complete in Jenkins.
Ideally, all the LAVA jobs should also complete in case updating the kernelci-backend may cause some callbacks to fail. The email reports should also ideally all be sent before updating the kernelci-backend code, although scheduled reports will remain in the queue even if the code is updated. In practice, if the kernelci-backend code doesn't need to be updated or if the API hasn't changed, given the short downtime caused by the update it is typically fine to not wait for these things to complete (and which may take several hours).
Update kernelci-core production branch
Push the tagged revision from kernelci-core master branch to the
kernelci.org branch used in production. This branch should match exactly, so if for any reason the history was not linear then the
kernelci.org branch needs to be force-pushed.
Go through all the Jenkins jobs and update their configuration if necessary (job params...). This should be explained in the PR release notes, but it depends on each case. (ToDo: keep Jenkins jobs definitions in Git to automate that)
Update kernelci-backend and kernelci-frontend
ansible commands to update them, ensuring that all steps are done automatically (proper restart of the services, regenerating the static files etc...).
For example, to update the backend:
cd kernelci-backend-config ansible-playbook \ -i hosts site.yml \ -l api.kernelci.org \ -D \ -b \ --ask-sudo-pass \ --skip-tags=secrets \ -t app \ -e git_head=master
Update the Docker images
If any changes have been made to the
Dockerfile definitions, build and push them all again to the registry. Wait until this has completed and be sure the latest images are available to be pulled before carrying on. (see https://github.com/kernelci/kernelci-core-staging/pull/94)
In doubt, this can be done every time as if there haven't been any changes the docker image builds should complete very quickly.
Update the root file systems
If there have been any changes to the
debos recipes, or in order to get the latest version of the test suites, run a build of all the rootfs jobs (
stretch-v4l2...). Then update the
test-configs.yaml file with the new URL for these file systems directly on the
master branch. Once that has been done, the patch should be cherry-picked on the staging branch to keep them in sync. If any rootfs fails to build, keep the previous revision in
test-configs.yaml and report the issue so it can be fixed for the next production update.
In doubt, this can be done every time. It can take about 1h to complete but it's usually a good idea to keep the test suites updated and built with the latest available revisions.
Run a "pipe cleaner" job
Before enabling the tree monitor again, it's important to run a final check to verify that all the kernels are building correctly and all the tests are running as expected. The test coverage on the staging instance doesn't allow building over 200 kernel variants like production does, and some labs are only available to run tests in production but not on staging.
This can be done using any individual's tree listed in
kernelci-builds.yaml by updating a branch based on the latest stable that is known to be building and passing tests in production. Having a recent version is necessary to cover all the available hardware, and using a stable branch is necessary to avoid false positives (i.e. finding actual kernel problems rather than KernelCI infrastructure ones). It's however generally a good idea to build all the configs on all the architectures rather than the reduced set normally built on stable branches. For example, see
The kernel tree monitor job can be scheduled manually with only one build config specified as a parameter (it requires enabling it to start the job then disabling it again, or having a copy of the job). Wait for all the builds and tests to complete, and all the emails to be received. The should typically be sent to a limited audience given the tree being built. If any issues arise, fix them if possible or revert changes in the code to be able to restart production shortly. Re-run parts or all of the pipe cleaner job after applying a fix to ensure things are working well before enabling the monitor job again.
Enable the tree monitor job again
Re-enable the tree monitor job, and manually start one to avoid waiting potentially for another hour until the next automated trigger occurs. Check that it works as expected and keep an eye on the results when they finally come in, to double check there hasn't been any regression introduced in spite of all the precautions explained above.