Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

periodically check the status of a copr build #490

Conversation

@TomasTomecek
Copy link
Contributor

TomasTomecek commented Mar 13, 2020

TODO:

  • write tests
  • merge #497

Fixes #334

@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 13, 2020

Build failed.

Copy link
Member

lachmanfrantisek left a comment

Looks simple but really promising!

(It will solve also the branch builds for now..;)

bind=True,
name="task.babysit_copr_build",
# 120 minutes
default_retry_delay=120 * 60,

This comment has been minimized.

Copy link
@lachmanfrantisek

lachmanfrantisek Mar 13, 2020

Member

This is the time when we start this baby-sit task for the first time? (Any then retry each minute?) (i.e. 120, 1, 1, 1, ...)

I would try some exponential sequence as Dominika suggested. (e.g. 1, 2, 4, 8, 16, ... ) or, a constant + exponential sequence (e.g. 10, 1, 2, 4, 8, ... )

[examples are in minutes and mean time to the next check]

This comment has been minimized.

Copy link
@TomasTomecek

TomasTomecek Mar 13, 2020

Author Contributor

I'm not too sold on the exponential backoff

packit_service/worker/build/copr_build.py Outdated Show resolved Hide resolved
packit_service/models.py Show resolved Hide resolved
packit_service/worker/tasks.py Outdated Show resolved Hide resolved
@TomasTomecek TomasTomecek force-pushed the TomasTomecek:fedora-massaging branch from d2b1f3c to 144a231 Mar 13, 2020
@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 13, 2020

Build failed.

@TomasTomecek

This comment has been minimized.

Copy link
Contributor Author

TomasTomecek commented Mar 13, 2020

two requre tests broke

@lachmanfrantisek

This comment has been minimized.

Copy link
Member

lachmanfrantisek commented Mar 16, 2020

two requre tests broke

Let's check on other PRs if it fails everywhere.

It may be caused by an API change in pygithub or Github.

github.GithubException.GithubException: 401 {"message":"'Issuer' claim ('iss') must be an Integer","documentation_url":"https://developer.github.com/v3"}
@sakalosj

This comment has been minimized.

Copy link
Contributor

sakalosj commented Mar 16, 2020

LGTM 👍

@TomasTomecek TomasTomecek force-pushed the TomasTomecek:fedora-massaging branch from 144a231 to a58a971 Mar 16, 2020
@TomasTomecek

This comment has been minimized.

Copy link
Contributor Author

TomasTomecek commented Mar 16, 2020

I'm definitely cursed - "simple" PR turned into... touch and change everything

@TomasTomecek TomasTomecek changed the title WIP: periodically check the status of a copr build periodically check the status of a copr build Mar 16, 2020
@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 16, 2020

Build failed.

Makefile Outdated
@@ -38,6 +38,33 @@ check_in_container: test_image
-v $(CURDIR)/files/packit-service.yaml:/root/.config/packit-service.yaml \
$(TEST_IMAGE) make check

# This is my target so don't touch it! :) How to:

This comment has been minimized.

Copy link
@lachmanfrantisek

This comment has been minimized.

Copy link
@TomasTomecek

TomasTomecek Mar 17, 2020

Author Contributor

so... the target for running tests in a container is not good enough for me and I keep changing it all the time when working on p-s: so I decided to start my own target (which everyone can use)

This comment has been minimized.

Copy link
@lachmanfrantisek

lachmanfrantisek Mar 17, 2020

Member

Oh, I understood that as "don't touch it and don't try to use it"..;)


# path_to_secrets = this is what we get from the outside
# internal_secrets_path = this is what we'll use here
- set_fact: secrets_path={{ deployment_dir }}/secrets/dev

This comment has been minimized.

Copy link
@lachmanfrantisek

lachmanfrantisek Mar 17, 2020

Member

Can you please send the deployment changes in another PR? (This started to be really unreviewable.)

This comment has been minimized.

Copy link
@TomasTomecek

TomasTomecek Mar 17, 2020

Author Contributor

yes

This comment has been minimized.

Copy link
@lachmanfrantisek
@@ -188,7 +188,9 @@ def __init__(self, trigger: TheJobTriggerType, project_url: str):
)

def get_project(self) -> GitProject:
return ServiceConfig.get_service_config().get_project(url=self.project_url)
project = ServiceConfig.get_service_config().get_project(url=self.project_url)
logger.debug(f"GitService = {project.service.__dict__}")

This comment has been minimized.

Copy link
@lachmanfrantisek

lachmanfrantisek Mar 17, 2020

Member

This is very dangerous -- print the str representation instead so we don't log service secrets.

This comment has been minimized.

Copy link
@TomasTomecek

TomasTomecek Mar 17, 2020

Author Contributor

dangerous where? that's literally the point here: print the secrets so we know what's the setup

I mean, I can drop that commit since the issue is resolved - it's just these log lines helped me resolve it

thinking... it's true that we share srpm logs with people and this could potentially be printed in there, I'll drop it

This comment has been minimized.

Copy link
@lachmanfrantisek

lachmanfrantisek Mar 17, 2020

Member

The str version contains the first and last character of the tokens to be sure we are using the right tokens...

@TomasTomecek TomasTomecek changed the title periodically check the status of a copr build WIP: periodically check the status of a copr build Mar 17, 2020
@TomasTomecek TomasTomecek force-pushed the TomasTomecek:fedora-massaging branch from a58a971 to 22d509c Mar 17, 2020
@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 17, 2020

Build failed.

@TomasTomecek TomasTomecek force-pushed the TomasTomecek:fedora-massaging branch from 22d509c to 62a505c Mar 17, 2020
@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 17, 2020

Build failed.

@TomasTomecek TomasTomecek force-pushed the TomasTomecek:fedora-massaging branch from 62a505c to 13c16cc Mar 17, 2020
@TomasTomecek TomasTomecek changed the title WIP: periodically check the status of a copr build periodically check the status of a copr build Mar 17, 2020
@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 17, 2020

Build failed.

@TomasTomecek TomasTomecek force-pushed the TomasTomecek:fedora-massaging branch from 13c16cc to ab07d60 Mar 17, 2020
@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 17, 2020

Build failed.

@TomasTomecek TomasTomecek force-pushed the TomasTomecek:fedora-massaging branch from ab07d60 to 2afcad0 Mar 17, 2020
@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 17, 2020

Build succeeded.

bind=True,
name="task.babysit_copr_build",
# 120 minutes
default_retry_delay=120 * 60,

This comment has been minimized.

Copy link
@csomh

csomh Mar 17, 2020

Member

Why is this set to 120 minutes? countdown in the retry() bellow will override this to be 60 seconds, and when the task is sent, it's also 60 seconds.

Another note: isn't doing this work every minute too often? The cadence is very close to what we would get just listening to the message bus. I would go with 5 minutes, but I can imagine even 15 minutes would be fine. Longer times seem to be a better fit for what this functionality would try to achieve: pick up the leftovers.

How many times will this retry? 3?

This comment has been minimized.

Copy link
@csomh

csomh Mar 17, 2020

Member

Oh, and how about making all these values part of the configuration, so that we don't have to touch the code, to change them? Would help a lot in fine tuning operations.

This comment has been minimized.

Copy link
@TomasTomecek

TomasTomecek Mar 17, 2020

Author Contributor

yeah, this is a good point

so I changed it to use the exponential backoff, current setup:

do the first check in 120s, then next one in 60s, next 120s, 240s, 480s, etc.

would that be okay?

(btw after we merge this, I'm stoked to see how it goes on stg and will fix all the issues - which should be much easier to do and review once we see it in action)

also, I set max retries to 7

thanks for such a thorough review

@TomasTomecek TomasTomecek force-pushed the TomasTomecek:fedora-massaging branch from 2afcad0 to 6bc807f Mar 17, 2020
@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 17, 2020

Build succeeded.

@TomasTomecek

This comment has been minimized.

Copy link
Contributor Author

TomasTomecek commented Mar 17, 2020

please don't merge this before fixing #504

we probably need to deploy to prod tomorrow, we can merge this PR after we fix the prod TFT problems

@csomh csomh added the do-not-merge label Mar 18, 2020
@TomasTomecek TomasTomecek force-pushed the TomasTomecek:fedora-massaging branch from 6bc807f to da5d03c Mar 18, 2020
@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 18, 2020

Build failed.

@TomasTomecek

This comment has been minimized.

Copy link
Contributor Author

TomasTomecek commented Mar 18, 2020

recheck

Copy link
Contributor

dhodovsk left a comment

LGTM

name="task.babysit_copr_build",
retry_backoff=60, # retry again in 60s, 120s, 240s, 480s...
retry_backoff_max=60 * 60 * 8, # is 8 hours okay? gcc/kernel build really long
max_retries=7,

This comment has been minimized.

Copy link
@dhodovsk

dhodovsk Mar 18, 2020

Contributor

Does this work ok? In the celery doc, it is specified like this: retry_kwargs={'max_retries': 5}.

This comment has been minimized.

Copy link
@TomasTomecek

TomasTomecek Mar 18, 2020

Author Contributor

I was actually checking the celery code itself, and they are picking it from these kwargs, so, it should work

@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 18, 2020

Build failed.

no longer needed, builds in the past 2 weeks are in psql

Signed-off-by: Tomas Tomecek <ttomecek@redhat.com>
Signed-off-by: Tomas Tomecek <ttomecek@redhat.com>
Signed-off-by: Tomas Tomecek <ttomecek@redhat.com>
Signed-off-by: Tomas Tomecek <ttomecek@redhat.com>
@TomasTomecek TomasTomecek force-pushed the TomasTomecek:fedora-massaging branch from da5d03c to bf6013b Mar 18, 2020
@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 18, 2020

Build succeeded.

@softwarefactory-project-zuul

This comment has been minimized.

Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 18, 2020

Build succeeded (gate pipeline).

@softwarefactory-project-zuul softwarefactory-project-zuul bot merged commit 8efe455 into packit-service:master Mar 18, 2020
4 checks passed
4 checks passed
LGTM analysis: Python No new or fixed alerts
Details
ci/dockercloud (/Dockerfile.worker) Your tests passed in Docker Cloud
Details
local/check check status: success
Details
local/gate gate status: success
Details
@TomasTomecek

This comment has been minimized.

Copy link
Contributor Author

TomasTomecek commented Mar 18, 2020

looking fwd fixing stg tomorrow!

@TomasTomecek TomasTomecek deleted the TomasTomecek:fedora-massaging branch Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants
You can’t perform that action at this time.