Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't keep re-trying the same bake #997

Merged
merged 3 commits into from
Apr 10, 2020

Conversation

robfletcher
Copy link
Contributor

@robfletcher robfletcher commented Apr 10, 2020

Closes #856

Comment on lines 1 to 20
plugins {
`java-library`
id("kotlin-spring")
}

dependencies {
implementation(project(":keel-bakery-plugin"))
implementation(project(":keel-sql"))

testImplementation("com.netflix.spinnaker.kork:kork-sql-test")
testImplementation("io.strikt:strikt-core")
testImplementation(project(":keel-spring-test-support"))
testImplementation(project(":keel-test"))
testImplementation(project(":keel-core-test"))

testImplementation(project(":keel-web")) {
// avoid circular dependency which breaks Liquibase
exclude(module = "keel-bakery-history-sql")
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New module because I didn't want keel-sql to depend on keel-bakery-plugin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if I should instead have keel-bakery-plugin depend on keel-sql. Not sure. I'm not entirely happy with adding this new module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to keep all sql in one module? or does that present a lot of problems?

Comment on lines 3 to 6
interface BakeHistory {
fun contains(appVersion: String, baseAmiVersion: String): Boolean
fun add(appVersion: String, baseAmiVersion: String, regions: Collection<String>, taskId: String)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple interface to record and query whether we've baked a particular appversion / base_ami_version combination

Comment on lines 55 to 62
log.info("Versions or regions for {} differ, launching bake", artifact.name)
launchBake(artifact, latestVersion)
if (bakeHistory.contains(latestArtifactVersion, latestBaseAmiVersion)) {
log.warn("Artifact version {} and base AMI version {} were baked previously", latestArtifactVersion, latestBaseAmiVersion)
publisher.publishEvent(RecurrentBakeDetected(latestArtifactVersion, latestBaseAmiVersion))
} else {
log.info("Versions or regions for {} differ, launching bake", artifact.name)
launchBake(artifact, latestArtifactVersion)
.also { tasks ->
bakeHistory.add(latestArtifactVersion, latestBaseAmiVersion, artifact.vmOptions.regions, tasks.first().id)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't bake if we've done it before

Copy link
Contributor

@luispollo luispollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you're trying to do here, but don't we have that information in orca already? We even have the orca task IDs stored in the task_tracking table (maybe missing some columns for cross-referencing). My feeling is that we could just ask orca about the bake tasks for a particular artifact version. It could also very well be that I'm missing the point entirely. 😄

@robfletcher
Copy link
Contributor Author

I understand what you're trying to do here, but don't we have that information in orca already? We even have the orca task IDs stored in the task_tracking table (maybe missing some columns for cross-referencing). My feeling is that we could just ask orca about the bake tasks for a particular artifact version. It could also very well be that I'm missing the point entirely. 😄

Can we rely on Orca not having cleaned up those tasks? I'm kind of looking at this as an interim fix anyway. It seems like the reason we keep re-baking is because we don't really have a way to handle missing regions on an AMI. I'm going to do some digging in to that but for now this should stop us spamming bake tasks.

@luispollo
Copy link
Contributor

Can we rely on Orca not having cleaned up those tasks?

I think we only care about the short term, since we'll only be baking the latest approved version, no? In any case, our cleanup config looks like this right now (i.e. orchestration tasks live forever, apparently):

pollers:
  oldPipelineCleanup:
    enabled: false
    thresholdDays: 90
  pipelineCleanup:
    enabled: false
  orchestrationCleanup:
    enabled: false

@luispollo
Copy link
Contributor

P.S. Not trying to discourage you, just wanted to understand if we had that information available elsewhere already.

@robfletcher
Copy link
Contributor Author

P.S. Not trying to discourage you, just wanted to understand if we had that information available elsewhere already.

I don't think it's accessible in any very convenient way. I think the change to Orca to enable us to access it would maybe be bigger than this change is.

Copy link
Contributor

@emjburns emjburns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach. Hopefully this will help us figure out why we keep rebaking.

Comment on lines +46 to +54
@EventListener(RecurrentBakeDetected::class)
fun onRecurrentBakeDetected(event: RecurrentBakeDetected) {
spectator.counter(
RECURRENT_BAKE_DETECTED_ID,
listOf(
BasicTag("versions", "${event.appVersion}+${event.baseAmiVersion}")
)
)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emjburns new telemetry

@robfletcher robfletcher force-pushed the prevent-rebaking branch 2 times, most recently from 0d42de7 to feeb869 Compare April 10, 2020 19:40
publisher.publishEvent(RecurrentBakeDetected(latestArtifactVersion, latestBaseAmiVersion))
} else {
launchBake(artifact, latestArtifactVersion)
diffFingerprintRepository.store(artifact.name, diff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these ever get cleaned up from the database?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only ever one per resource but no

log.info("Image for {} delta: {}", artifact.name, diff.toDebug())
}

if (diffFingerprintRepository.seen(artifact.name, diff)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should hash this diff, since that's what the diff fingerprint repository stores

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does that internally in the method, same as store does

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL right. duh.

if (image != null && regionsDiffer) {
publisher.publishEvent(ImageRegionMismatchDetected(image, artifact.vmOptions.regions))
if (current != null && diff.affectedRootPropertyNames == setOf("regions")) {
publisher.publishEvent(ImageRegionMismatchDetected(current, artifact.vmOptions.regions))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great instrumentation. We'd been missing data on the frequency of this forever. 👏

Copy link
Contributor

@luispollo luispollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I like this approach much better than the original.

Copy link
Contributor

@lorin lorin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +1 to +10
databaseChangeLog:
- changeSet:
id: rename-diff-fingerprint-column
author: fletch
changes:
- renameColumn:
tableName: diff_fingerprint
oldColumnName: resource_id
newColumnName: entity_id
columnDataType: varchar(255)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emjburns did the column name change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@emjburns emjburns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@robfletcher robfletcher added the ready to merge Approved and ready for merge label Apr 10, 2020
@mergify mergify bot merged commit 2ccd656 into spinnaker:master Apr 10, 2020
@mergify mergify bot added the auto merged label Apr 10, 2020
@robfletcher robfletcher deleted the prevent-rebaking branch April 10, 2020 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto merged ready to merge Approved and ready for merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix unnecessary (no-op) bakes
4 participants