Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor migrations commands to prevent subtle concurrency bugs #162

Closed

Conversation

ptzianos
Copy link
Contributor

@ptzianos ptzianos commented May 1, 2024

stacked on top of: #161

  • 6dd5f56 - fixes 159: modify draconctl compiling make target to always build binary

    The previous version of the draconctl compiling make target would only
    compile the binary if the binary didn't exist in the expected path. This
    however is a problem if the code changes, because Makefile doesn't have
    the ability to track that and recompile the binary. This commit changes
    the target to be a phony one so that the binary is always recompiled if
    necessary. Go internally is able to track changes and produce a new
    binary only when needed, so this shouldn't cause anything to slow down
    but it will definitely improve the development process.

  • 1e01522 - fixes 153: refactor 'draconctl migrations apply' command

    The previous version of the 'migrations apply' command suffered from a
    couple of shortcomings:

    1. The code tracking the logs and success of the generated pods was
      fairly complex and monolithic
    2. there were no tests for the pod tracking code
    3. if the code polling for new pods found one that had succeeded it
      would not show in the output the logs of the pod and it would exit
      immediately which could cause a user to misunderstand what was
      happening.
    4. If a SIGTERM signal was sent to the process inside the pod, the
      leader election loop was not terminated cleanly
    5. there was no proper error management since the last error would be
      returned which could be the signal handler having its channel closed
      or the leader election failure hook being invoked because the success
      of the command triggered the code to release the lock.

    All the above issues are fixed with this commit that does the following:

    1. the code tracking the progress of the pods is split away into the
      pkg/k8s package
    2. the tracking of the pod logs is split away from the pod polling
      functionality and the job watching functionality to make it easier to
      test.
    3. the leader election loop has been modified to use the sync.Once
      object to make sure that only the first error that triggers the exit
      of the main loop is recorded, so that we can get a better idea of
      what happened.
    4. a timeout parameter is introduced and a more careful use of contexts
      with timeouts is used to ensure that the code gracefully shuts down.

This new clientset is hidden behind the ClientInterface which is an
amalgamation of the core K8s clientset and the TektonV1Beta clientset.
Most importantly, it introduces the Apply method which replicates the
kubectl apply command, received a K8s runtime.Object and applying it on
the cluster using server-side apply
(https://kubernetes.io/docs/reference/using-api/server-side-apply/).

The clientset is complimented with a fake implementation that can be
used by the user to intercept the objects passed to the Apply method and
inject errors.
The previous version of the 'migrations apply' command suffered from a
couple of shortcomings:

1. The code tracking the logs and success of the generated pods was
   fairly complex and monolithic
2. there were no tests for the pod tracking code
3. if the code polling for new pods found one that had succeeded it
   would not show in the output the logs of the pod and it would exit
   immediately which could cause a user to misunderstand what was
   happening.
4. If a SIGTERM signal was sent to the process inside the pod, the
   leader election loop was not terminated cleanly
5. there was no proper error management since the last error would be
   returned which could be the signal handler having its channel closed
   or the leader election failure hook being invoked because the success
   of the command triggered the code to release the lock.

All the above issues are fixed with this commit that does the following:
1. the code tracking the progress of the pods is split away into the
   `pkg/k8s` package
2. the tracking of the pod logs is split away from the pod polling
   functionality and the job watching functionality to make it easier to
   test.
3. the leader election loop has been modified to use the sync.Once
   object to make sure that only the first error that triggers the exit
   of the main loop is recorded, so that we can get a better idea of
   what happened.
5. a timeout parameter is introduced and a more careful use of contexts
   with timeouts is used to ensure that the code gracefully shuts down.
The previous version of the draconctl compiling make target would only
compile the binary if the binary didn't exist in the expected path. This
however is a problem if the code changes, because Makefile doesn't have
the ability to track that and recompile the binary. This commit changes
the target to be a phony one so that the binary is always recompiled if
necessary. Go internally is able to track changes and produce a new
binary only when needed, so this shouldn't cause anything to slow down
but it will definitely improve the development process.
@ptzianos ptzianos added effort: 3d S (~3d) amount of effort to complete priority: need Something that needs to be done. type: bug Something isn't working. type: fix Iterations on existing features or infrastructure. Improvement labels May 1, 2024
@ptzianos ptzianos self-assigned this May 1, 2024
@ptzianos ptzianos marked this pull request as draft May 1, 2024 10:22
@ptzianos ptzianos changed the title Ptzianos/refactor migrations Refactor migrations commands to prevent subtle concurrency bugs May 1, 2024
@andream16
Copy link
Contributor

It looks very out of date - feel free to re-open it if relevant

@andream16 andream16 closed this Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: 3d S (~3d) amount of effort to complete Improvement priority: need Something that needs to be done. type: bug Something isn't working. type: fix Iterations on existing features or infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants