Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbreak Debian support with custom encoding #1564

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

smortex
Copy link
Collaborator

@smortex smortex commented Jan 3, 2024

On Debian, version 10.0.0+ of the module cause a broken catalog due to dependecy cycles:

Anchor[postgresql::server::service::begin::main] =>
Postgresql_psql[Set template1 encoding to UTF8] =>
Postgresql::Server::Instance::Late_initdb[main] =>
Postgresql::Server::Instance::Initdb[main] =>
Class[Postgresql::Server::Initdb] =>
Postgresql_conf[port_for_instance_main] =>
Class[Postgresql::Server::Service] =>
Postgresql::Server::Instance::Service[main] =>
Anchor[postgresql::server::service::begin::main]

Reenable the test that trigger this issue (taken from #1547) and try to fix it.

@smortex smortex force-pushed the unbreak-debian-support-with-encoding branch from 37dfadf to ed8981e Compare January 4, 2024 00:57
@antaflos
Copy link
Contributor

We are successfully running this branch and it works as advertised. No more dependency cycles. 👍

@smortex
Copy link
Collaborator Author

smortex commented Jan 11, 2024

@antaflos Beware that this is unfortunately still WIP because it breaks EL for reasons I had not time to discover yet. I have been busy the last week with other $WORK related things, and am about to go on vacation for a week, so will not be able to have a look before I am back.

But thank your for your feedback 👍 : for now I only tested it with the CI and as you can see with d15ac61 there is some nasty bug because A -> B + A -> B -> C and A -> B + B -> C should not change much regarding dependencies.

@ekohl ekohl linked an issue Apr 5, 2024 that may be closed by this pull request
@smortex smortex force-pushed the unbreak-debian-support-with-encoding branch 5 times, most recently from 6414904 to b2284f1 Compare April 16, 2024 02:37
@@ -135,11 +135,7 @@ def matches(value)
end

autorequire(:anchor) do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an alternative, can we maybe autorequire the postgresql_conn_validator resource?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That quite make sense. I'll dig into this.

puts '-------------------------------'
puts LitmusHelper.instance.run_shell('systemctl status postgresql*').stdout
puts '-------------------------------'
expect(port(5432)).to be_listening.on('127.0.0.1').with('tcp')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem I had in the past: some systems report tcp, some tcp4. that was really ugly so I switched to matching the IP address. and that sometimes didn't work because some CI systems have IPv6 so it binds to ::1 but sometimes it's only 127.0.0.1. all ugly...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am doing more test because this is quite weird. Looking at the code it should work, looking at ss output it seems to be working, but it is not and that makes me feel angry. Similar code work in other areas of the project so I guess something is very wrong… I would like to nail it rather than ignoring it if possible.

Copy link
Collaborator

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, I think the initdb (which doesn't need a DB running) shouldn't include late_initdb (which does need a server running). Instead, postgresql::server_instance should be modified to split it.

The reason you run into this is that Debian has $needs_initdb set to false by default:

$needs_initdb = pick($needs_initdb, $postgresql::globals::manage_package_repo == true)

So this code path is triggered:

postgresql::server::instance::late_initdb { $name:
encoding => $encoding,
user => $user,
group => $group,
module_workdir => $module_workdir,
psql_path => $psql_path,
port => $port,
}

Wouldn't it make more sense to move it to postgresql::server_instance as:

if !$initdb_settings['needs_initdb'] and $initdb_settings['encoding'] {
  postgresql::server::instance::late_initdb { $instance_name:
    encoding       => $initdb_settings['encoding'],
    user           => $initdb_settings['user'],
    group          => $initdb_settings['group'],
    module_workdir => $initdb_settings['module_workdir'],
    psql_path      => $initdb_settings['psql_path'],
    port           => $initdb_settings['port'],
  }
}

It may break because those values may not be undef, but I hope the idea is clear.

["postgresql::server::service::begin::#{self[:instance]}"]
end

autorequire(:service) do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the logic. Don't you always need the server to be running to execute any SQL statements?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but the service doesn't seem to be available immediately after start. There's a custom resource that waits until the tcp port is open and I think we should depend on that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, that makes sense. Though with the newest versions (Fedora 38 with PostgreSQL 15) I see the systemd unit is Type=notify so I'd assume that there it will only really be ready once it's listening (spoiler: the patch sends READY=1 right after logging database system is ready to accept connections).

Digging into this, it was introduced in postgres/postgres@7d17e68 so with PostgreSQL 9.6 it became possible to support. Looking at EL8 that's built with Type=notify, but EL7 probably isn't. Sadly Debian 11 & 12 are also Type=forking. That means you're right and we still need to depend on the connection validator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, that makes sense. Though with the newest versions (Fedora 38 with PostgreSQL 15) I see the systemd unit is Type=notify so I'd assume that there it will only really be ready once it's listening (spoiler: the patch sends READY=1 right after logging database system is ready to accept connections).

Digging into this, it was introduced in postgres/postgres@7d17e68 so with PostgreSQL 9.6 it became possible to support. Looking at EL8 that's built with Type=notify, but EL7 probably isn't. Sadly Debian 11 & 12 are also Type=forking. That means you're right and we still need to depend on the connection validator.

These seem to be the relevant bugs about this:

bastelfreak and others added 11 commits June 6, 2024 21:18
When using anchors, resources should be sandwiched those. An ordering
relation was missing for the "end" anchor.
Some resources where not properly ordered between the "begin" and "end"
anchors.  Make sure they are properly bound to them.
It does not really make sense to require the "begin" anchor, the
`postgresql_psql` type communicating with the server, it makes more
sense to require the "end" anchor which mean that the server is running
and reachable.
We already have a relationship in `manifests/server/instance/service.pp`
that ensure `Service["postgresqld_instance_${name}"]` is realized before
`Anchor["postgresql::server::service::end::${name}"]`.

Fun fact, removing this duplaciate fix the circular dependency reported
by puppet.
Add status info to help diagnose failures
We see that the service is listening on 127.0.0.1:5432, so this failure
is weird.

Maybe looking for a specific IP address and protocol will help?
Attempt to dig into serverspec internals for the various failures.
Postgres seems to be listening on the expected port, but serverspec
cannot connect to it?

Attempt to do some connections to see if it is a serverspec bug or
an issue with the module.
ss(8) report the same for a working and non-working port as far as
serverspec is concerned.  Maybe ordering has an impact?  We know these
tests fail when running at the end of the test suite, so attempt to run
them earlier.
@ShadNex
Copy link

ShadNex commented Jul 12, 2024

I've tested this PR as suggested by @bastelfreak and indeed it seems to solve the dependency issue for our environment as seen in #1541 👍 Would be great if this gets released soon!

@@ -42,6 +42,10 @@
status => $service_status,
}

Anchor["postgresql::server::service::begin::${name}"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may deeply regret asking when I do not have a full background on this but my understanding was Anchor was a redundant pattern as the contain function came in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

setting the encoding parameter causes a dependency cycle
8 participants