Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix k8s tag propagation and controller-uid detection #656

Merged
merged 2 commits into from
Sep 13, 2022

Conversation

sjawhar
Copy link
Contributor

@sjawhar sjawhar commented Sep 5, 2022

Purpose

  • K8s tags are currently being pulled from the Task object (where they are always empty) instead of the correct Cloud object.
  • When custom label are added to a k8s job, the automatically added controller-uid label disappears from the ObjectMeta, which then breaks things like waiting for the job to be created, etc.

Approach

  • Pull the tags from Cloud
  • Use the job's Spec.Selector.MatchLabels instead of ObjectMeta to retrieve controller-uid

Logs

This is what kubectl describe job shows on master:

Without custom labels/tags:

Labels:         controller-uid=5cf76b42-cbc8-484a-b496-6b867182e523
                job-name=tpi-3aw4vu6287ot1-2biu93iu-1tw3q6af
Annotations:    batch.kubernetes.io/job-tracking: 

With labels:

Labels:         app.kubernetes.io/instance=jenkins
Annotations:    app.kubernetes.io/instance: jenkins
                batch.kubernetes.io/job-tracking: 

@sjawhar sjawhar mentioned this pull request Sep 5, 2022
2 tasks
@sjawhar sjawhar temporarily deployed to manual September 5, 2022 23:21 Inactive
@sjawhar sjawhar temporarily deployed to automatic September 5, 2022 23:23 Inactive
@sjawhar sjawhar temporarily deployed to automatic September 5, 2022 23:23 Inactive
@sjawhar sjawhar temporarily deployed to automatic September 5, 2022 23:23 Inactive
@0x2b3bfa0

This comment was marked as outdated.

@0x2b3bfa0 0x2b3bfa0 added bug Something isn't working technical-debt Refactoring, linting & tidying cloud-common Applies to every cloud vendor resource-task iterative_task TF resource cloud-k8s Kubernetes leo standalone CLI binary labels Sep 5, 2022
@0x2b3bfa0 0x2b3bfa0 self-requested a review September 5, 2022 23:36
Copy link
Member

@0x2b3bfa0 0x2b3bfa0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke tests are failing, can you please take a look?

@sjawhar sjawhar temporarily deployed to manual September 6, 2022 01:18 Inactive
@sjawhar sjawhar temporarily deployed to manual September 7, 2022 01:16 Inactive
@sjawhar sjawhar temporarily deployed to automatic September 7, 2022 01:16 Inactive
@sjawhar sjawhar temporarily deployed to automatic September 7, 2022 01:16 Inactive
@sjawhar sjawhar temporarily deployed to manual September 7, 2022 10:21 Inactive
@sjawhar sjawhar requested a deployment to automatic September 7, 2022 10:21 Abandoned
@sjawhar sjawhar requested a deployment to automatic September 7, 2022 10:21 Abandoned
@sjawhar sjawhar temporarily deployed to automatic September 7, 2022 10:21 Inactive
Copy link
Member

@0x2b3bfa0 0x2b3bfa0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like smoke tests keep failing after retrying. 🤔

    task_smoke_test.go:133: 
        	Error Trace:	task_smoke_test.go:133
        	Error:      	Received unexpected error:
        	            	container "tpi-smoke-test-2996915558-5wp6qvfp-4rg3ne32" in pod "tpi-smoke-test-2996915558-5wp6qvfp-4rg3ne32-sxj56" is not available
        	Test:       	TestTaskSmoke/k8s

@sjawhar sjawhar temporarily deployed to manual September 9, 2022 07:15 Inactive
@sjawhar sjawhar requested a deployment to automatic September 9, 2022 07:15 Abandoned
@sjawhar sjawhar requested a deployment to automatic September 9, 2022 07:15 Abandoned
@sjawhar sjawhar requested a deployment to automatic September 9, 2022 07:15 Abandoned
@sjawhar sjawhar temporarily deployed to manual September 9, 2022 21:16 Inactive
@sjawhar sjawhar requested a deployment to automatic September 9, 2022 21:16 Abandoned
@sjawhar sjawhar requested a deployment to automatic September 9, 2022 21:16 Abandoned
@sjawhar sjawhar requested a deployment to automatic September 9, 2022 21:16 Abandoned
@sjawhar sjawhar temporarily deployed to manual September 13, 2022 16:34 Inactive
@sjawhar sjawhar requested a deployment to automatic September 13, 2022 16:34 Abandoned
@sjawhar sjawhar requested a deployment to automatic September 13, 2022 16:34 Abandoned
@sjawhar sjawhar requested a deployment to automatic September 13, 2022 16:34 Abandoned
@sjawhar sjawhar temporarily deployed to automatic September 13, 2022 16:34 Inactive
@0x2b3bfa0
Copy link
Member

Finally tests are passing! 😅

@0x2b3bfa0 0x2b3bfa0 merged commit 6f10bfe into iterative:master Sep 13, 2022
@sjawhar sjawhar deleted the hotfix/k8s-task-id-selection branch October 19, 2022 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cloud-common Applies to every cloud vendor cloud-k8s Kubernetes leo standalone CLI binary resource-task iterative_task TF resource technical-debt Refactoring, linting & tidying
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants