Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

periodically check the status of a copr build #490

Merged

Conversation

TomasTomecek
Copy link
Member

@TomasTomecek TomasTomecek commented Mar 13, 2020

@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 13, 2020

Build failed.

Copy link
Member

@lachmanfrantisek lachmanfrantisek left a comment

Looks simple but really promising!

(It will solve also the branch builds for now..;)

bind=True,
name="task.babysit_copr_build",
# 120 minutes
default_retry_delay=120 * 60,
Copy link
Member

@lachmanfrantisek lachmanfrantisek Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the time when we start this baby-sit task for the first time? (Any then retry each minute?) (i.e. 120, 1, 1, 1, ...)

I would try some exponential sequence as Dominika suggested. (e.g. 1, 2, 4, 8, 16, ... ) or, a constant + exponential sequence (e.g. 10, 1, 2, 4, 8, ... )

[examples are in minutes and mean time to the next check]

Copy link
Member Author

@TomasTomecek TomasTomecek Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sold on the exponential backoff

packit_service/worker/build/copr_build.py Outdated Show resolved Hide resolved
packit_service/models.py Show resolved Hide resolved
packit_service/worker/tasks.py Outdated Show resolved Hide resolved
@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 13, 2020

Build failed.

@TomasTomecek
Copy link
Member Author

TomasTomecek commented Mar 13, 2020

two requre tests broke

@lachmanfrantisek
Copy link
Member

lachmanfrantisek commented Mar 16, 2020

two requre tests broke

Let's check on other PRs if it fails everywhere.

It may be caused by an API change in pygithub or Github.

github.GithubException.GithubException: 401 {"message":"'Issuer' claim ('iss') must be an Integer","documentation_url":"https://developer.github.com/v3"}

@sakalosj
Copy link
Contributor

sakalosj commented Mar 16, 2020

LGTM 👍

@TomasTomecek
Copy link
Member Author

TomasTomecek commented Mar 16, 2020

I'm definitely cursed - "simple" PR turned into... touch and change everything

@TomasTomecek TomasTomecek changed the title WIP: periodically check the status of a copr build periodically check the status of a copr build Mar 16, 2020
@TomasTomecek TomasTomecek added the ready-for-review Pull request is ready for review. label Mar 16, 2020
@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 16, 2020

Build failed.

Makefile Outdated
@@ -38,6 +38,33 @@ check_in_container: test_image
-v $(CURDIR)/files/packit-service.yaml:/root/.config/packit-service.yaml \
$(TEST_IMAGE) make check

# This is my target so don't touch it! :) How to:
Copy link
Member

@lachmanfrantisek lachmanfrantisek Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

@TomasTomecek TomasTomecek Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so... the target for running tests in a container is not good enough for me and I keep changing it all the time when working on p-s: so I decided to start my own target (which everyone can use)

Copy link
Member

@lachmanfrantisek lachmanfrantisek Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I understood that as "don't touch it and don't try to use it"..;)


# path_to_secrets = this is what we get from the outside
# internal_secrets_path = this is what we'll use here
- set_fact: secrets_path={{ deployment_dir }}/secrets/dev
Copy link
Member

@lachmanfrantisek lachmanfrantisek Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please send the deployment changes in another PR? (This started to be really unreviewable.)

Copy link
Member Author

@TomasTomecek TomasTomecek Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member

@lachmanfrantisek lachmanfrantisek Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -188,7 +188,9 @@ def __init__(self, trigger: TheJobTriggerType, project_url: str):
)

def get_project(self) -> GitProject:
return ServiceConfig.get_service_config().get_project(url=self.project_url)
project = ServiceConfig.get_service_config().get_project(url=self.project_url)
logger.debug(f"GitService = {project.service.__dict__}")
Copy link
Member

@lachmanfrantisek lachmanfrantisek Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very dangerous -- print the str representation instead so we don't log service secrets.

Copy link
Member Author

@TomasTomecek TomasTomecek Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dangerous where? that's literally the point here: print the secrets so we know what's the setup

I mean, I can drop that commit since the issue is resolved - it's just these log lines helped me resolve it

thinking... it's true that we share srpm logs with people and this could potentially be printed in there, I'll drop it

Copy link
Member

@lachmanfrantisek lachmanfrantisek Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The str version contains the first and last character of the tokens to be sure we are using the right tokens...

@TomasTomecek TomasTomecek changed the title periodically check the status of a copr build WIP: periodically check the status of a copr build Mar 17, 2020
@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 17, 2020

Build failed.

@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 17, 2020

Build failed.

@TomasTomecek TomasTomecek changed the title WIP: periodically check the status of a copr build periodically check the status of a copr build Mar 17, 2020
@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 17, 2020

Build failed.

@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 17, 2020

Build failed.

@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 17, 2020

Build succeeded.

bind=True,
name="task.babysit_copr_build",
# 120 minutes
default_retry_delay=120 * 60,
Copy link
Member

@csomh csomh Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this set to 120 minutes? countdown in the retry() bellow will override this to be 60 seconds, and when the task is sent, it's also 60 seconds.

Another note: isn't doing this work every minute too often? The cadence is very close to what we would get just listening to the message bus. I would go with 5 minutes, but I can imagine even 15 minutes would be fine. Longer times seem to be a better fit for what this functionality would try to achieve: pick up the leftovers.

How many times will this retry? 3?

Copy link
Member

@csomh csomh Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and how about making all these values part of the configuration, so that we don't have to touch the code, to change them? Would help a lot in fine tuning operations.

Copy link
Member Author

@TomasTomecek TomasTomecek Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is a good point

so I changed it to use the exponential backoff, current setup:

do the first check in 120s, then next one in 60s, next 120s, 240s, 480s, etc.

would that be okay?

(btw after we merge this, I'm stoked to see how it goes on stg and will fix all the issues - which should be much easier to do and review once we see it in action)

also, I set max retries to 7

thanks for such a thorough review

@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 17, 2020

Build succeeded.

@TomasTomecek
Copy link
Member Author

TomasTomecek commented Mar 17, 2020

please don't merge this before fixing #504

we probably need to deploy to prod tomorrow, we can merge this PR after we fix the prod TFT problems

@csomh csomh added the do-not-merge Work in progress. label Mar 18, 2020
@TomasTomecek TomasTomecek removed the do-not-merge Work in progress. label Mar 18, 2020
@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 18, 2020

Build failed.

@TomasTomecek
Copy link
Member Author

TomasTomecek commented Mar 18, 2020

recheck

Copy link
Contributor

@dhodovsk dhodovsk left a comment

LGTM

name="task.babysit_copr_build",
retry_backoff=60, # retry again in 60s, 120s, 240s, 480s...
retry_backoff_max=60 * 60 * 8, # is 8 hours okay? gcc/kernel build really long
max_retries=7,
Copy link
Contributor

@dhodovsk dhodovsk Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work ok? In the celery doc, it is specified like this: retry_kwargs={'max_retries': 5}.

Copy link
Member Author

@TomasTomecek TomasTomecek Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually checking the celery code itself, and they are picking it from these kwargs, so, it should work

@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 18, 2020

Build failed.

no longer needed, builds in the past 2 weeks are in psql

Signed-off-by: Tomas Tomecek <ttomecek@redhat.com>
Signed-off-by: Tomas Tomecek <ttomecek@redhat.com>
Signed-off-by: Tomas Tomecek <ttomecek@redhat.com>
Signed-off-by: Tomas Tomecek <ttomecek@redhat.com>
@TomasTomecek TomasTomecek added the mergeit When set, zuul wil gate and merge the PR. label Mar 18, 2020
@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 18, 2020

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

softwarefactory-project-zuul bot commented Mar 18, 2020

Build succeeded (gate pipeline).

@softwarefactory-project-zuul softwarefactory-project-zuul bot merged commit 8efe455 into packit:master Mar 18, 2020
4 checks passed
@TomasTomecek
Copy link
Member Author

TomasTomecek commented Mar 18, 2020

looking fwd fixing stg tomorrow!

@TomasTomecek TomasTomecek deleted the fedora-massaging branch Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mergeit When set, zuul wil gate and merge the PR. ready-for-review Pull request is ready for review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't rely on fedora-messaging to give us results when builds are done
5 participants