-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CentOS CI "get well" plan #18
Comments
@keszybz any thoughts on this one? |
Failing tests:
Common error:
This happens even with
Workaround/fix:
|
"Both" is worthwhile, because different things are tested in both environments. But reliability is more important than having both, so if just one can be made to work, that's better than having flaky tests. |
Notes from the "make the QEMU testsuite work again" session:
|
@evverx With the help of several other people I finally got something, which could get things moving again - I'm going to propose this ticket at CentOS CBS meeting (every Monday, 2 PM UTC in #centos-devel@Freenode) and hopefully it will get us somewhere. |
Apparently there was some error in communication, so I didn't receive the previous email. However I finally got the credentials, so we can start breaking things! (OT: Is there any chat to catch you in (e.g. IRC, Telegram, etc.)? @evverx) |
That's great news! Congratulations!
I'm afraid it isn't possible to catch me there, but, on the positive side, I usually reply to comments on GitHub relatively fast. |
Notes from the "why it doesn't work in CentOS CI infrastructure" session:
|
Could it be that you ran into systemd/systemd#10854? There're two PRs that are supposed to fix the issue. Could you try applying one of them to see if it works? |
If the failure is caused by systemd/systemd#10854, then please provide any logs or something if possible. Thank you. |
Another possibility is systemd/systemd#10754... |
Unfortunately, neither mentioned issue seems to be relevant for this case. I did a quick bisect, but the issue occurs all the way down to systemd/systemd@80df8f2 - without this commit the systemd won't compile, will try to workaround it tomorrow. Also, I'll try to ask for some possibility to get any useful logs from the machine after it dies. Anyway, in my opinion, the issue is somewhere in the multipath which is used for the root filesystem... |
Regarding systemd/systemd@80df8f2, I think |
Could you try to boot with |
Thanks a lot for the suggestions, unfortunately neither of them helped. I raised the post-mortem debugging issue on the CentOS CI Users mailing list so let's see if someone will be able to help. In the meantime I'll play around with bisect in hopes I'll stumble upon the root cause... |
Notes from the "why it doesn't work in CentOS CI infrastructure" session, part 2:
[0] |
@mrc0mmand thank you a lot for finding the offending commit! By the way, apparently GitHub doesn't send notifications when comments are edited so probably major breakthroughs deserve to be written down separately :-) The easiest way to unbreak CentOS CI would be to revert that commit. @keszybz @poettering @yuwata what do you think? As usual, I agree that it would be much better to figure out what's going on and fix it, but, in this case, it's not that easy and given how long it took to get access to the testing infrastructure I don't think the question @mrc0mmand asked in https://lists.centos.org/pipermail/ci-users/2018-November/000918.html will be answered anytime soon. |
@evverx I was trying to get a remote shell in the initrd to get logs before pinging everyone, and believe me or not I managed to do it using https://github.com/dracut-crypt-ssh/dracut-crypt-ssh! Right now I have a working shell and access to journal and kernel ring buffer, so I'll open an issue shortly with as much logs as I can get. |
@mrc0mmand that's great! I'm wondering if it would be possible to use it in the script that reboots and connects to the machine so that in the future issues like this would be a little bit easier to debug. It could just dump all the logs somewhere, which is better than nothing I guess and more or less automatic. |
I guess we could incorporate it into the CI scripts, as the setup is fairly simple. |
The testsuite almost passes, there's some issue with networking, hopefully it's not something major - https://ci.centos.org/job/systemd-pr-build/3673/console Debug log from systemd-networkd-tests.py: https://paste.fedoraproject.org/paste/jnhwagD3-saGbeCNzYYk0w @ssahani could you shed some light into what's happening here? |
I suspect Regarding |
That makes sense, thanks for the reference link.
Unfortunately not. I even tried rebooting the machine before the test itself, but it still fails the same. |
@mrc0mmand could you create a new issue about |
Tracking issues for current CentOS CI blockers: |
@mrc0mmand I'm wondering if you have figured out what @systemd-centos-ci is. I think it would make sense to turn CentOS CI on as soon as possible to at least make sure that
|
@evverx IMHO @systemd-centos-ci was created to simply provide an API key for the GitHub builder plugin in the CentOS CI jenkins - this allows jenkins to update commit/PR state according to the results of the test run. However, I don't know who has access to this account, so maybe it would be wise if I just used my API key (with limited permissions), so we have everything under our control. I'll go ahead and temporarily disable mentioned tests so the results are finally usable. |
Ah, I take that back, I can't use my API key as I don't have appropriate permissions in systemd/systemd. Either we could track down the owner of @systemd-centos-ci or just create a new account for such purpose. |
I have no problem with a new account. If I understand correctly, it'll just have to be invited as a collaborator and I can do that. But, as far as know, https://wiki.centos.org/QaWiki/CI/GithubIntegration will no longer be applicable there so it'd be great if you could let me know how the webhook is supposed to look like. Now it just points to https://ci.centos.org/ghprbhook/ with no secret. |
@keszybz it would be great it you could help here. Judging by the presence of @systemd-centos-ci I assume there are some unknown to me reasons for it to be here (most likely related to secure access to the repository, but who knows). |
In the light of the recent events that shall remain nameless, one can never be too cautious giving write access to the repository :-) |
@evverx Sorry for the delay, wanted to make sure everything works before we start messing with webhooks. I temporarily disabled the problematic parts of the testsuite in 42340c2 and it's finally passing I guess now we just have to figure out which user to use for the CI, so I can configure it properly on the jenkins side. |
@mrc0mmand given that I already bother contributors with LGTM alerts like systemd/systemd#10249 (comment) I think we could use my account as a bearer of bad news (at least temporarily). What do you think? |
Though, I'd prefer it if @poettering and @keszybz chimed in here because I'm still not sure whether anyone else is interested in getting it working. |
On second thoughts, It also seems reasonable to me to invite @mrc0mmand as a collaborator to the systemd repository and point CentOS CI to @mrc0mmand's handle. I'm pretty sure it'll make everything much faster, simpler and even a little bit more secure. |
And |
I guess CentOS CI could have easily prevented that... As for the ideas above - using your account, @evverx, is definitely possible, but I don't like the idea of being in charge of someone else's API key. Not that I have any ulterior motives, but it's still a responsibility. |
@mrc0mmand I'm completely with you on this one that's why I suggested inviting you as a collaborator to the systemd repository. I'd do that right now but I'm not sure I can make decisions like that without at least one ACK. Maybe you could ping someone to speed up the process. |
So, manually launching a CentOS VM and running |
@mrc0mmand let me know when (and probably how) I should turn the webhook on. 6 hours ago https://ci.centos.org/ghprbhook/ responded with 500 so I turned it off again. |
@evverx will do! However, as usual, there is one small catch, because otherwise things would be too easy... In Jenkins, every user has its credentials store, to manage credentials for various plugins, but, for some reason, I can't manage credentials for the plugin we need (GitHub Pull Request Builder). I just asked about that on the #centos-devel channel, so let's hope for a (relatively) fast response. |
@mrc0mmand in case the response won't be fast, I'm wondering if it would be possible as a last resort to trigger CentOS CI via Travis CI. I'm fantasizing here and assuming you have everything you need to run Jenkins jobs that can produce reports like https://ci.centos.org/job/systemd-pr-build/3676/console. In theory could we encrypt your credentials and use them to spawn VMs via |
@evverx I just gave up and wrote a simple wrapper which does the status reporting and it seems to be working. I'll definitely improve it as soon as possible (or ditch it completely if I figure out the jenkins plugin madness), but for now it should finally start delivering results to PRs. Right now just setup a webhook according to the CentOS CI documentation, i.e.:
|
I didn't select "Issue Comment" because I'm not sure it'd be useful. To judge from systemd/systemd#11045, the hook has started to deliver :-) |
So, thanks to collaboration with Brian we now have a working CentOS CI without workarounds. As the next step I'll sort out artifact exporting, so the logs can be properly investigated in case of failure. |
Quick update:
|
I'd say the main goal of this issue was successfully achieved - the CentOS CI is working and delivering results. I'm going to close this issue and move any outstanding issues to a new one, to keep things easier to follow. |
Purpose of this issue is to keep track of things which need to be done to make systemd CentOS CI work again.
Following things still need to be done:
bootstrap.sh
andtestsuite.sh
to runtest-exec-deserialization.py
#14, 195510e)user_namespace.enable=1
to the kernel cmdline and setuser.max_user_namespaces
> 0Long term goals:
Notes:
bootstrap.sh
andtestsuite.sh
to runtest-exec-deserialization.py
#14 (comment),systemd
seems to be failing to compile on CentOS systemd#10474The text was updated successfully, but these errors were encountered: