Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spinnaker 1.33.0 comes with changeset checksum changes #6941

Open
Badbond opened this issue Apr 11, 2024 · 4 comments
Open

Spinnaker 1.33.0 comes with changeset checksum changes #6941

Badbond opened this issue Apr 11, 2024 · 4 comments
Labels
bug testing/needs-tests Code paths/features that should be exercised by our integration test suite.

Comments

@Badbond
Copy link

Badbond commented Apr 11, 2024

Issue Summary:

While trying to upgrade to Spinnaker 1.33.0 recently, we noticed that Orca and Clouddriver were unable to start due to Liquibase checksum validation throwing exceptions.

Cloud Provider(s):

Kubernetes

Environment:

Clouddriver (5.83.0 and 5.84.0) and Orca (8.48.0) with PostgreSQL 16.2 storage.
Other Spinnaker services are running with other persistence backends (Redis, S3, etc.), so no idea whether they are affected too.

Feature Area:

PostgreSQL persistence for Clouddriver and Orca.

Description:

We are in the middle of upgrading our Spinnaker clusters and of configuring SQL persistence where possible (using PostgreSQL).
Before we upgraded to Spinnaker 1.33.0, while running Spinnaker 1.13.2, we already had configured the SQL backend for Clouddriver fully, and were running a dual repostitory setup for Orca (with Redis as primary still). When we upgraded to 1.33.0, we started noticing changeset checksum validation errors from Liquibase causing the services to fail to start.

We narrowed it down for Clouddriver to the 5.83.0 releae, which seems that this also matches the timing of the Liquibase upgrade from 3.10.3 to 4.24.0 (see spinnaker/kork#1117). Perhaps partially related, it seems that 4 of the Orca changesets got updated in spinnaker/orca#4601 for managing afterColumn in PostgreSQL. I could not find a similar change for Clouddriver though.

Steps to Reproduce:

For Clouddriver we narrow the issue down to the 5.83.0 release.

With the following Clouddriver SQL and Redis configuration:

sql:
  enabled: true
  read-only: false
  cache:
    enabled: true
  scheduler:
    enabled: false
  taskRepository:
    enabled: true
  unknown-agent-cleanup-agent:
    enabled: true
  connectionPools:
    default:
      default: true
      dialect: Postgres
      jdbcUrl: jdbc:postgresql://postgres:5432/clouddriver
      user: clouddriver_service
      password: <password>
  migration:
    user: clouddriver_service
    password: <password>
    jdbcUrl: jdbc:postgresql://postgres:5432/clouddriver
redis:
  enabled: true
  cache:
    enabled: false
  scheduler:
    enabled: true
  taskRepository:
    enabled: false

By running Clouddriver 5.82.2, the DB should be initialized with tables and the migrations are EXECUTED.
By then running Clouddriver 5.83.0, the service fails to start due to unexpected checksums.

Additional Details:

Clouddriver 5.83.0 logs
2024-04-11 08:54:23.959 ERROR 1 --- [           main] o.s.boot.SpringApplication               : Application run failed

org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'dockerRegistryImageLookupController': Unsatisfied dependency expressed through field 'cacheView'; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'cacheView' defined in class path resource [com/netflix/spinnaker/clouddriver/cache/CacheConfig.class]: Unsatisfied dependency expressed through method 'cacheView' parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'catsModule' defined in class path resource [com/netflix/spinnaker/config/SqlCacheConfiguration.class]: Unsatisfied dependency expressed through method 'catsModule' parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'sqlAgentProvider' defined in class path resource [com/netflix/spinnaker/config/SqlCacheConfiguration.class]: Unsatisfied dependency expressed through method 'sqlAgentProvider' parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'sqlTableMetricsAgent' defined in class path resource [com/netflix/spinnaker/config/SqlCacheConfiguration.class]: Unsatisfied dependency expressed through method 'sqlTableMetricsAgent' parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'liquibase' defined in class path resource [com/netflix/spinnaker/kork/sql/config/DefaultSqlConfiguration.class]: Invocation of init method failed; nested exception is liquibase.exception.CommandExecutionException: liquibase.exception.ValidationFailedException: Validation Failed:
     3 changesets check sum
          db/changelog/20180919-initial-schema.yml::mysql-change-state-stauts-to-enum-type::robzienert was: 8:f0bfebd55de9168e38a8ef9c7217c610 but is now: 8:d41d8cd98f00b204e9800998ecf8427e
          db/changelog/20180919-initial-schema.yml::mysql-revert-change-state-stauts-to-enum-type::afeldman was: 8:d6f5eedc195011826620cc0355e8352d but is now: 8:d41d8cd98f00b204e9800998ecf8427e
          db/changelog/20190913-task-sagaids.yml::mysql-update-state-enum-values::robzienert was: 8:9601af668599fbc12e338b9b84c66f56 but is now: 8:d41d8cd98f00b204e9800998ecf8427e
Orca 8.48.0 logs
14:09:39.886 ERROR 1 --- [           main] o.s.boot.SpringApplication               : [] Application run failed

org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'redisOrcaQueueConfiguration': Unsatisfied dependency expressed through method 'redisQueueObjectMapper' parameter 2; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'taskResolver' defined in com.netflix.spinnaker.orca.config.OrcaConfiguration: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.netflix.spinnaker.orca.TaskResolver]: Factory method 'taskResolver' threw exception; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'dependsOnExecutionTask' defined in URL [jar:file:/opt/orca/lib/orca-core-8.48.0.jar!/com/netflix/spinnaker/orca/pipeline/tasks/DependsOnExecutionTask.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'dualExecutionRepository' defined in URL [jar:file:/opt/orca/lib/orca-core-8.48.0.jar!/com/netflix/spinnaker/orca/pipeline/persistence/DualExecutionRepository.class]: Unsatisfied dependency expressed through constructor parameter 4; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'sqlExecutionRepository' defined in class path resource [com/netflix/spinnaker/config/SqlConfiguration.class]: Unsatisfied dependency expressed through method 'sqlExecutionRepository' parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'liquibase' defined in class path resource [com/netflix/spinnaker/config/SqlConfiguration.class]: Invocation of init method failed; nested exception is liquibase.exception.CommandExecutionException: liquibase.exception.ValidationFailedException: Validation Failed:
     6 changesets check sum
         db/changelog/20180515-execution-canceled-column.yml::add-canceled-column::robzienert was: 8:4c214f01b403163ac014e78ef3603c71 but is now: 8:7758552b0792c8636c7c9b7c231a2a53
         db/changelog/20180510-add-legacy-id-fields.yml::add-legacy-id-fields::cthielen was: 8:f97b552b426f284932461b028e9d6d9c but is now: 8:f598d61450cc5c4d37255ffa0242a7ba
         db/changelog/20180521-status-enum.yml::modify-status-column-enum::afeldman was: 8:a927aca379a0eecf2a6eecdee281be9f but is now: 8:d41d8cd98f00b204e9800998ecf8427e
         db/changelog/20180724-partitions.yml::partition-updated-executions::rzienert was: 8:26bce67df849ed409cdd5dffc335a95b but is now: 8:1f52146b80fa28e7adca3abf66b7b035
         db/changelog/20181016-add-start-time.yml::20181016-add-start-time::robzienert was: 8:576429ab611ed541e5f99ab282de4177 but is now: 8:5858aae541ec6b018626e6c9f3312ca9
         db/changelog/20200327-deleted-executions-table.yml::create-deleted-executions-table::mvulfson was: 8:10687930e669629d8c370cd5a2348585 but is now: 8:11b5df8d131f259ddf2292832cd12219
@kirangodishala
Copy link

@Badbond - I am looking into it.

@kirangodishala
Copy link

@Badbond - The following PRs will fix the issue but it will take some time to undergo review and finally become part of a release:

If it is possible for you to test these changes early in your local environment, that will be great.

@kirangodishala
Copy link

kirangodishala commented May 22, 2024

Some background:

Broken Sprinnaker upgrades

Kork 7.201.0 upgraded liquibase to 4.24.0. And this is causing checkSum error for spinnaker upgrades. So all the releases starting from 1.33.0(1.33.0, 1.33.1, 1.33.2, 1.34.0, 1.34.1 and 1.34.2) are broken for Spinnaker upgrades.

Broken new Spinnaker Installation when Clouddriver uses Postgresql

Post liquibase 4.24.0 upgrade, due to lack of changes from spinnaker/clouddriver#6194, releases 1.33.0, 1.33.1, 1.34.0 and 1.34.1 would fail for new spinnaker installation if clouddriver uses Postgres and the error being "addAfterColumn is not allowed on postgresql". After including the changes 1.33.2 and 1.34.2 have no issues for new spinnaker installation.

So when the current fixes are merged and released, the versions 1.35.0, 1.33.3 and 1.34.3 and all later versions will work for both new installation and upgrade of Spinnaker.

@xibz xibz added bug testing/needs-tests Code paths/features that should be exercised by our integration test suite. labels May 22, 2024
@Badbond
Copy link
Author

Badbond commented May 23, 2024

Hi @kirangodishala thank you for following up on the ticket with Liquibase and providing the fixes.

If it is possible for you to test these changes early in your local environment, that will be great.

Do you have some instructions on how to do this? Do you have a specific nightly image that I can use to test this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug testing/needs-tests Code paths/features that should be exercised by our integration test suite.
Projects
None yet
Development

No branches or pull requests

3 participants