Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart unable to create fresh server: relation "accounts" does not exist unless db migration job is run manually #18

Open
deefdragon opened this issue Nov 20, 2022 · 9 comments
Labels
bug Something isn't working

Comments

@deefdragon
Copy link

Steps to reproduce the problem

Creating a fresh, completely new mastodon server using the helm chart fails to initialize. It appears that the database tables required are not initialized as both the web and sidekiq containers return relation "accounts" does not exist

From my casual perusal, it appears that the creation of the tables should be done by job-db-migrate.yaml, but it never runs as the web container is never finished installing it needs the tables to be initialized, and is thus stuck in a crash loop.

Executing the template to a file, copying the job from there and manually kubectling the db migration job results in a deployment.

Expected behaviour

Helm chart should spin up database tables

Actual behaviour

Web and sidekiq containers booting

Detailed description

No response

Specifications

Chart v 3.0.0,
Tag: latest (v4.0.2)
K8s

@deefdragon deefdragon added the bug Something isn't working label Nov 20, 2022
@godber
Copy link

godber commented Nov 20, 2022

Have you tried restarting (deleting) the sidekiq and web pods? I've seen enough weird behavior similar to what you report on chart based deployments that I restart those two pods as a rule now after release. I have not seen failed migrations.

@deefdragon
Copy link
Author

This was not a migration. This was a fresh install. (I'm trying out mastodon for the first time), which is I believe, why it was deadlocking.

Web container requires the DB tables to start -> DB tables require the DB job to have run atleast once -> DB job currently waits for the web container/sidekiq to be up to run -> ... (deadlock)

Side note, I think the jobs containers did not get upgraded to v4?

@godber
Copy link

godber commented Nov 20, 2022

I am also talking about a fresh install. A db migration happens both on initial install and on upgrades.

This is what determines the version of the db migrate job container:
https://github.com/mastodon/mastodon/blob/main/chart/templates/job-db-migrate.yaml#L46

@deefdragon
Copy link
Author

Oh, I known what happened with the job's container version. I was testing with v3 to see if the helm chart using V3 would work. I exported to a file to dig around. Must have forgotten to do that before I pulled the jobs to run manually (which was my eventual solution to get mastodon running at all).

Regarding restarting, I guess deleting the pods may have caused the jobs to exist but I don't know. It feels really hacky to have that kind of problem, and I feel the original deadlock problem should still be addressed.

@godber
Copy link

godber commented Nov 21, 2022

I feel the original deadlock problem should still be addressed.

I agree. I think the problem is that the web and sidekiq pods manage to come up partway through the migration, but don't quite end up in the right state. Restarting them after rake db:migration is completed fixes it.

These lines here make an effort to order the job relative to other things:

https://github.com/mastodon/mastodon/blob/main/chart/templates/job-db-migrate.yaml#L8-L10

I've not thought about that vs the startup of the other containers.

@ineffyble ineffyble transferred this issue from mastodon/mastodon Dec 13, 2022
@scooberu
Copy link

scooberu commented Jan 1, 2023

I also experience this issue regularly. Makes it rather difficult to run helm from Terraform because an apply always fails waiting for a db:migrate job that will never complete on its own.

@jodok
Copy link

jodok commented Jan 3, 2023

I face the same issue. I'm also wondering how I can trigger the run of the initialization of a fresh postgres instance?

the error log of the web container shows:

[1] Puma starting in cluster mode...
[1] * Puma version: 5.6.5 (ruby 3.0.4-p208) ("Birdie's Version")
[1] *  Min threads: 5
[1] *  Max threads: 5
[1] *  Environment: production
[1] *   Master PID: 1
[1] *      Workers: 2
[1] *     Restarts: (✔) hot (✖) phased
[1] * Preloading application
bundler: failed to load command: puma (/opt/mastodon/vendor/bundle/ruby/3.0.0/bin/puma)
[1] ! Unable to load application: ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR:  relation "accounts" does not exist
LINE 8:  WHERE a.attrelid = '"accounts"'::regclass
                            ^
/opt/mastodon/vendor/bundle/ruby/3.0.0/gems/activerecord-6.1.7/lib/active_record/connection_adapters/postgresql/database_statements.rb:19:in `exec': PG::UndefinedTable: ERROR:  relation "accounts" does not exist (ActiveRecord::StatementInvalid)
LINE 8:  WHERE a.attrelid = '"accounts"'::regclass
                            ^
	from /opt/mastodon/vendor/bundle/ruby/3.0.0/gems/activerecord-6.1.7/lib/active_record/connection_adapters/postgresql/database_statements.rb:19:in `block (2 levels) in query'

...

the sidekiq one:

2023-01-03T13:28:50.933Z pid=1 tid=53x WARN: `config.options[:key] = value` is deprecated, use `config[:key] = value`: ["/opt/mastodon/lib/mastodon/redis_config.rb:38:in `<top (required)>'", "/opt/mastodon/config/application.rb:53:in `require_relative'"]
2023-01-03T13:28:51.135Z pid=1 tid=53x INFO: Booting Sidekiq 6.5.7 with Sidekiq::RedisConnection::RedisAdapter options {:driver=>:hiredis, :url=>"redis://:REDACTED@redis.pm161k.0001.euc1.cache.amazonaws.com:6379/0", :namespace=>nil}
2023-01-03T13:28:51.663Z pid=1 tid=53x WARN: ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR:  relation "accounts" does not exist
LINE 8:  WHERE a.attrelid = '"accounts"'::regclass
                            ^
...

@ekeih
Copy link

ekeih commented Jan 3, 2023

As a workaround you can:

  1. Run helm template ... > workaround.yaml
  2. Edit the file to only contain the job
  3. kubectl apply -f workaround.yaml
  4. Wait for the initial job to finish
  5. kubectl delete -f workaround.yaml

This is not great but it should take care of the database initialization.

@angdraug
Copy link

PR #37 tries to address both this and #26.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants