Skip to content
This repository has been archived by the owner on Aug 22, 2022. It is now read-only.

[BB-4378] Fix CI cleanup issues #815

Merged
merged 1 commit into from
Jul 9, 2021

Conversation

lgp171188
Copy link
Contributor

@lgp171188 lgp171188 commented Jun 15, 2021

This PR fixes the issues that were preventing the cleanup of some unused CI resources like MySQL databases, Gandi DNS records, etc.

It also changes the CircleCI workflow definitions to allow pushing to the ci-cleanup branch for running cleanup on-demand (similar to how stage is used for the frontend).

Testing

  1. Push this branch to ci-cleanup
  2. Verify the output of the cleanup job

Links

@kewne kewne self-assigned this Jun 30, 2021
@kewne kewne force-pushed the guruprasad/BB-4378-fix-CI-cleanup-issues branch from 0236301 to 40d01c7 Compare June 30, 2021 11:31
shimulch
shimulch previously approved these changes Jul 2, 2021
Copy link
Contributor

@shimulch shimulch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kewne, @lgp171188 Thanks for the fix. LGTM 👍

  • I tested this: Checked output on CircleCI cleanup job. Tried to get a list of empty tables using the query provided.
  • I read through the code
  • [N/A] I checked for accessibility issues
  • Includes documentation
  • [N/A] I made sure any change in configuration variables is reflected in the corresponding client's configuration-secure repository.

@kewne kewne force-pushed the guruprasad/BB-4378-fix-CI-cleanup-issues branch 3 times, most recently from 28928e7 to 0f08136 Compare July 4, 2021 15:05
@kewne
Copy link

kewne commented Jul 5, 2021

@shimulch It seems that making the cleanup step manual has the nasty side-effect of blocking the rerunning of failed jobs... in this specific case, the integration tests group 2 failed but I couldn't select the "rerun from failed" option in circle ci until I had launched the cleanup job as well.

This isn't a huge deal but maybe it warrants a follow-up?

@lgp171188
Copy link
Contributor Author

lgp171188 commented Jul 5, 2021

@kewne,

This isn't a huge deal but maybe it warrants a follow-up?

This is a huge deal because the PR CI runs are blocked by something they shouldn't be blocked by and additional steps are now needed for something that worked fine before. Let's not create follow-up tasks by creating new issues 😛

I would recommend against changing anything that tweaks how the cleanup job is currently run periodically and separately, outside the typical workflow for the PR CI checks. There is already a way to run the cleanup job on demand - go to CircleCI and re-run the last run of the job. It is expected the automatic runs should do the job and manual run is only needed to debug and fix issues, if any.

This cleanup job has had multiple rewrites/fixes to solve each issue that showed up after some time passed after the previous fix/rewrite. Let's not introduce a lot of changes that tend to increase the surface area for the bugs.

We are in the process of reinventing/rewriting Ocim as we know it today into something that is going to be totally different and better. So I would recommend against adding more code/new features for something like this cleanup job which will go away soon ™️ ,

CC @shimulch

mkdocs.yml Outdated Show resolved Hide resolved
@kewne kewne force-pushed the guruprasad/BB-4378-fix-CI-cleanup-issues branch from 03378b5 to 18d0411 Compare July 5, 2021 10:16
@kewne
Copy link

kewne commented Jul 5, 2021

@lgp171188

This is a huge deal because the PR CI runs are blocked by something they shouldn't be blocked by

To clarify, CI runs aren't blocked by the cleanup job; what happens is that, if a job fails in a workflow with another job pending approval, CircleCI doesn't enable the "re-run from ..." options.

Anyway, I've moved the approval job to a separate workflow, where it shouldn't interfere anymore.

It is expected the automatic runs should do the job and manual run is only needed to debug and fix issues, if any.

That is the intent of the change: your original PR changed the scheduled job to run on this branch in addition to master; that is now unnecessary: if you want to run it on a branch, you can simply use the link in the PR and trigger it right away.
The original schedule that runs from master hasn't been changed at all.

@kewne kewne requested a review from shimulch July 6, 2021 08:19
@kewne kewne dismissed shimulch’s stale review July 6, 2021 08:19

Significant changes done to the PR.

@kewne
Copy link

kewne commented Jul 6, 2021

@shimulch it seems that the pending manual job blocks the github checks; IIRC you can make that specific check optional in the repo settings, which I don't have permissions to do.

@shimulch
Copy link
Contributor

shimulch commented Jul 6, 2021

@kewne Looks like we can't have an optional job that requires approval. So let's just keep the old way of having ci-cleanup branch for manual cleanup. And the scheduled job is okay I guess.

Also, can you temporarily enable scheduled cleanup in the PR branch as well? So that we can test if it runs or not? Maybe just remove only section from the scheduled task for now. After PR review we can add that again. :-)

@kewne
Copy link

kewne commented Jul 6, 2021

@shimulch I meant making the check optional in Github; this is an example of the setting in one of my personal repos:
github_status

In this case, you should be able to remove the on-demand-cleanup/hold-cleanup check, to make everything behave as before.

Let me know if you feel like this is the wrong thing to do and I'll revert my change.

@kewne kewne force-pushed the guruprasad/BB-4378-fix-CI-cleanup-issues branch from a792b9a to ed62fd8 Compare July 6, 2021 09:32
@shimulch
Copy link
Contributor

shimulch commented Jul 6, 2021

@kewne Currently the only required status is the coverage. So the merge is not blocked by that task anyway.

Screenshot 2021-07-06 at 4 49 16 PM

Let me know if you feel like this is the wrong thing to do and I'll revert my change.

I think we should stay with the "push to the cleanup-ci branch" idea instead.

@kewne kewne force-pushed the guruprasad/BB-4378-fix-CI-cleanup-issues branch from c30332c to 49007a6 Compare July 6, 2021 15:34
@kewne
Copy link

kewne commented Jul 6, 2021

@shimulch The cleanup only runs on the ci-cleanup branch now. Sorry for all the back and forth on this.

Ironically, I think we've stumbled on a Circle CI bug: the cleanup job from the ci-cleanup branch appears here when it shouldn't, and some of the other checks also link to jobs on the ci-cleanup branch 🤷

@shimulch
Copy link
Contributor

shimulch commented Jul 7, 2021

@kewne, No worries, we weren't expecting these CI issues.

I am not sure either why this job is showing up there. Can you try filtering on the workflow instead? branches filter in jobs seems depricated. I don't know if it will work. Just throwing rocks in dark. Also, check how pushing only on stage branch deploys frontend. 😄

@lgp171188
Copy link
Contributor Author

@kewne, @shimulch, can we drop the changes to how the cleanup job is run and revert it to how it was before? We have a way to re-run that job manually and that will avoid all the unnecessary complications that we have run into and are trying to resolve now.

@kewne kewne force-pushed the guruprasad/BB-4378-fix-CI-cleanup-issues branch from 49007a6 to 9be911f Compare July 7, 2021 19:32
@kewne
Copy link

kewne commented Jul 7, 2021

@lgp171188 what do you mean by "re-run the job manually"? If I understood correctly the previous configuration only runs the cleanup job on a schedule, so the most you can do is search for a previous run and rerun it, and it'll only do so for the master branch.
If someone makes some changes to the cleanup job (as we have done here), I don't see how they can do it without changing the Circle CI configuration.

@shimulch The "issue" is that the Github API for setting statuses uses SHA to identify commits: I've tried pushing "my own" status to the commit here, which resulted in this:
image

What's happening here, then, is that I'm pushing the same commit to CircleCI on both this branch and the ci-cleanup branch, which results in the status being set twice, but is otherwise harmless.

I think this can be approved now: if you prefer to take @lgp171188's suggestion just put it in the approval and I'll revert my change before merging.

@shimulch
Copy link
Contributor

shimulch commented Jul 8, 2021

@lgp171188, The current change is fine. We now have the option to run clean up jobs by -

  • Scheduled
  • On-demand by pushing to the ci-cleanup branch

The integration failure CI job is failing for a known unrelated issue.

@kewne can you remove your own status and rerun CI checks in this branch?

Copy link
Contributor

@shimulch shimulch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kewne, @lgp171188 Thanks a lot for working on improving this. LGTM 👍

  • I tested this: checked the output of Circle CI job
  • I read through the code
  • I checked for accessibility issues: N/A
  • Includes documentation
  • I made sure any change in configuration variables is reflected in the corresponding client's configuration-secure repository: N/A

@kewne can you re-run the CI and fix the small nit I've mentioned? :) Then this is good to merge.

documentation/development/ci.md Outdated Show resolved Hide resolved
This adds the following to the list of resources deleted by the cleanup job:
* DNS records for sub-domains
* DNS record for active vm
* Empty MySQL databases

Additionally, it allows running the cleanup job manually from any branch.

Related tickets:
* [BB-4378](https://tasks.opencraft.com/browse/BB-4378)
@kewne kewne force-pushed the guruprasad/BB-4378-fix-CI-cleanup-issues branch from 201456e to 50721ae Compare July 8, 2021 12:40
@kewne kewne merged commit 612da33 into master Jul 9, 2021
@kewne kewne deleted the guruprasad/BB-4378-fix-CI-cleanup-issues branch July 9, 2021 09:16
@lgp171188
Copy link
Contributor Author

@kewne, it looks like there are some issues in the changes made in this PR that are causing the cleanup job to fail daily (example). It looks like the base domain for DNS records is not determined correctly and hence the DNS cleanup API calls are failing. Can you take a look and fix it?

CC @shimulch

@kewne
Copy link

kewne commented Jul 15, 2021

@lgp171188 thanks for noticing! I don't think I have access to properly debug this though, I'd either need to run it locally but don't have Gandi credentials, and to run it in the pipeline I'd have to break the domain name masking.

Any idea what I can do here?

@lgp171188
Copy link
Contributor Author

@kewne, you can re-run the failing job with SSH and that will allow you to log in to the container with your GitHub SSH key(s) and debug the issue.

@kewne
Copy link

kewne commented Jul 15, 2021

This issue will be fixed in #826

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants