-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't keep re-trying the same bake #997
Conversation
plugins { | ||
`java-library` | ||
id("kotlin-spring") | ||
} | ||
|
||
dependencies { | ||
implementation(project(":keel-bakery-plugin")) | ||
implementation(project(":keel-sql")) | ||
|
||
testImplementation("com.netflix.spinnaker.kork:kork-sql-test") | ||
testImplementation("io.strikt:strikt-core") | ||
testImplementation(project(":keel-spring-test-support")) | ||
testImplementation(project(":keel-test")) | ||
testImplementation(project(":keel-core-test")) | ||
|
||
testImplementation(project(":keel-web")) { | ||
// avoid circular dependency which breaks Liquibase | ||
exclude(module = "keel-bakery-history-sql") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New module because I didn't want keel-sql
to depend on keel-bakery-plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if I should instead have keel-bakery-plugin
depend on keel-sql
. Not sure. I'm not entirely happy with adding this new module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it reasonable to keep all sql in one module? or does that present a lot of problems?
...-sql/src/test/kotlin/com/netflix/spinnaker/keel/bakehistory/sql/spring/SpringStartupTests.kt
Outdated
Show resolved
Hide resolved
interface BakeHistory { | ||
fun contains(appVersion: String, baseAmiVersion: String): Boolean | ||
fun add(appVersion: String, baseAmiVersion: String, regions: Collection<String>, taskId: String) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple interface to record and query whether we've baked a particular appversion / base_ami_version combination
log.info("Versions or regions for {} differ, launching bake", artifact.name) | ||
launchBake(artifact, latestVersion) | ||
if (bakeHistory.contains(latestArtifactVersion, latestBaseAmiVersion)) { | ||
log.warn("Artifact version {} and base AMI version {} were baked previously", latestArtifactVersion, latestBaseAmiVersion) | ||
publisher.publishEvent(RecurrentBakeDetected(latestArtifactVersion, latestBaseAmiVersion)) | ||
} else { | ||
log.info("Versions or regions for {} differ, launching bake", artifact.name) | ||
launchBake(artifact, latestArtifactVersion) | ||
.also { tasks -> | ||
bakeHistory.add(latestArtifactVersion, latestBaseAmiVersion, artifact.vmOptions.regions, tasks.first().id) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't bake if we've done it before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you're trying to do here, but don't we have that information in orca already? We even have the orca task IDs stored in the task_tracking
table (maybe missing some columns for cross-referencing). My feeling is that we could just ask orca about the bake tasks for a particular artifact version. It could also very well be that I'm missing the point entirely. 😄
Can we rely on Orca not having cleaned up those tasks? I'm kind of looking at this as an interim fix anyway. It seems like the reason we keep re-baking is because we don't really have a way to handle missing regions on an AMI. I'm going to do some digging in to that but for now this should stop us spamming bake tasks. |
I think we only care about the short term, since we'll only be baking the latest approved version, no? In any case, our cleanup config looks like this right now (i.e. orchestration tasks live forever, apparently): pollers:
oldPipelineCleanup:
enabled: false
thresholdDays: 90
pipelineCleanup:
enabled: false
orchestrationCleanup:
enabled: false |
P.S. Not trying to discourage you, just wanted to understand if we had that information available elsewhere already. |
I don't think it's accessible in any very convenient way. I think the change to Orca to enable us to access it would maybe be bigger than this change is. |
3f8d2c6
to
a03f6ef
Compare
keel-bakery-plugin/src/main/kotlin/com/netflix/spinnaker/keel/bakery/artifact/ImageHandler.kt
Outdated
Show resolved
Hide resolved
keel-bakery-plugin/src/main/kotlin/com/netflix/spinnaker/keel/bakery/artifact/ImageHandler.kt
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach. Hopefully this will help us figure out why we keep rebaking.
a03f6ef
to
8c2af5d
Compare
@EventListener(RecurrentBakeDetected::class) | ||
fun onRecurrentBakeDetected(event: RecurrentBakeDetected) { | ||
spectator.counter( | ||
RECURRENT_BAKE_DETECTED_ID, | ||
listOf( | ||
BasicTag("versions", "${event.appVersion}+${event.baseAmiVersion}") | ||
) | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emjburns new telemetry
0d42de7
to
feeb869
Compare
publisher.publishEvent(RecurrentBakeDetected(latestArtifactVersion, latestBaseAmiVersion)) | ||
} else { | ||
launchBake(artifact, latestArtifactVersion) | ||
diffFingerprintRepository.store(artifact.name, diff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these ever get cleaned up from the database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's only ever one per resource but no
log.info("Image for {} delta: {}", artifact.name, diff.toDebug()) | ||
} | ||
|
||
if (diffFingerprintRepository.seen(artifact.name, diff)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should hash this diff, since that's what the diff fingerprint repository stores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does that internally in the method, same as store
does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL right. duh.
if (image != null && regionsDiffer) { | ||
publisher.publishEvent(ImageRegionMismatchDetected(image, artifact.vmOptions.regions)) | ||
if (current != null && diff.affectedRootPropertyNames == setOf("regions")) { | ||
publisher.publishEvent(ImageRegionMismatchDetected(current, artifact.vmOptions.regions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great instrumentation. We'd been missing data on the frequency of this forever. 👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I like this approach much better than the original.
keel-sql/src/main/kotlin/com/netflix/spinnaker/keel/sql/SqlDiffFingerprintRepository.kt
Outdated
Show resolved
Hide resolved
keel-orca/src/main/kotlin/com/netflix/spinnaker/keel/constraints/CanaryConstraintEvaluator.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
feeb869
to
f7dcc11
Compare
databaseChangeLog: | ||
- changeSet: | ||
id: rename-diff-fingerprint-column | ||
author: fletch | ||
changes: | ||
- renameColumn: | ||
tableName: diff_fingerprint | ||
oldColumnName: resource_id | ||
newColumnName: entity_id | ||
columnDataType: varchar(255) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emjburns did the column name change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
f7dcc11
to
d2fa59c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closes #856