New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
periodically check the status of a copr build #490
periodically check the status of a copr build #490
Conversation
Build failed.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks simple but really promising!
(It will solve also the branch builds for now..;)
packit_service/worker/tasks.py
Outdated
bind=True, | ||
name="task.babysit_copr_build", | ||
# 120 minutes | ||
default_retry_delay=120 * 60, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the time when we start this baby-sit task for the first time? (Any then retry each minute?) (i.e. 120, 1, 1, 1, ...)
I would try some exponential sequence as Dominika suggested. (e.g. 1, 2, 4, 8, 16, ... ) or, a constant + exponential sequence (e.g. 10, 1, 2, 4, 8, ... )
[examples are in minutes and mean time to the next check]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too sold on the exponential backoff
d2b1f3c
to
144a231
Compare
Build failed.
|
two requre tests broke |
Let's check on other PRs if it fails everywhere. It may be caused by an API change in pygithub or Github.
|
LGTM 👍 |
144a231
to
a58a971
Compare
I'm definitely cursed - "simple" PR turned into... touch and change everything |
Build failed.
|
Makefile
Outdated
@@ -38,6 +38,33 @@ check_in_container: test_image | |||
-v $(CURDIR)/files/packit-service.yaml:/root/.config/packit-service.yaml \ | |||
$(TEST_IMAGE) make check | |||
|
|||
# This is my target so don't touch it! :) How to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so... the target for running tests in a container is not good enough for me and I keep changing it all the time when working on p-s: so I decided to start my own target (which everyone can use)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I understood that as "don't touch it and don't try to use it"..;)
files/deployment.yaml
Outdated
|
||
# path_to_secrets = this is what we get from the outside | ||
# internal_secrets_path = this is what we'll use here | ||
- set_fact: secrets_path={{ deployment_dir }}/secrets/dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please send the deployment changes in another PR? (This started to be really unreviewable.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
packit_service/service/events.py
Outdated
@@ -188,7 +188,9 @@ def __init__(self, trigger: TheJobTriggerType, project_url: str): | |||
) | |||
|
|||
def get_project(self) -> GitProject: | |||
return ServiceConfig.get_service_config().get_project(url=self.project_url) | |||
project = ServiceConfig.get_service_config().get_project(url=self.project_url) | |||
logger.debug(f"GitService = {project.service.__dict__}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very dangerous -- print the str
representation instead so we don't log service secrets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dangerous where? that's literally the point here: print the secrets so we know what's the setup
I mean, I can drop that commit since the issue is resolved - it's just these log lines helped me resolve it
thinking... it's true that we share srpm logs with people and this could potentially be printed in there, I'll drop it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The str
version contains the first and last character of the tokens to be sure we are using the right tokens...
a58a971
to
22d509c
Compare
Build failed.
|
22d509c
to
62a505c
Compare
Build failed.
|
62a505c
to
13c16cc
Compare
Build failed.
|
13c16cc
to
ab07d60
Compare
Build failed.
|
ab07d60
to
2afcad0
Compare
Build succeeded.
|
packit_service/worker/tasks.py
Outdated
bind=True, | ||
name="task.babysit_copr_build", | ||
# 120 minutes | ||
default_retry_delay=120 * 60, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this set to 120 minutes? countdown
in the retry()
bellow will override this to be 60 seconds, and when the task is sent, it's also 60 seconds.
Another note: isn't doing this work every minute too often? The cadence is very close to what we would get just listening to the message bus. I would go with 5 minutes, but I can imagine even 15 minutes would be fine. Longer times seem to be a better fit for what this functionality would try to achieve: pick up the leftovers.
How many times will this retry? 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and how about making all these values part of the configuration, so that we don't have to touch the code, to change them? Would help a lot in fine tuning operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is a good point
so I changed it to use the exponential backoff, current setup:
do the first check in 120s, then next one in 60s, next 120s, 240s, 480s, etc.
would that be okay?
(btw after we merge this, I'm stoked to see how it goes on stg and will fix all the issues - which should be much easier to do and review once we see it in action)
also, I set max retries to 7
thanks for such a thorough review
2afcad0
to
6bc807f
Compare
Build succeeded.
|
please don't merge this before fixing #504 we probably need to deploy to prod tomorrow, we can merge this PR after we fix the prod TFT problems |
6bc807f
to
da5d03c
Compare
Build failed.
|
recheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
name="task.babysit_copr_build", | ||
retry_backoff=60, # retry again in 60s, 120s, 240s, 480s... | ||
retry_backoff_max=60 * 60 * 8, # is 8 hours okay? gcc/kernel build really long | ||
max_retries=7, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work ok? In the celery doc, it is specified like this: retry_kwargs={'max_retries': 5}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually checking the celery code itself, and they are picking it from these kwargs, so, it should work
Build failed.
|
no longer needed, builds in the past 2 weeks are in psql Signed-off-by: Tomas Tomecek <ttomecek@redhat.com>
Signed-off-by: Tomas Tomecek <ttomecek@redhat.com>
Signed-off-by: Tomas Tomecek <ttomecek@redhat.com>
Signed-off-by: Tomas Tomecek <ttomecek@redhat.com>
da5d03c
to
bf6013b
Compare
Build succeeded.
|
Build succeeded (gate pipeline).
|
looking fwd fixing stg tomorrow! |
TODO:
Fixes #334