Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade flow for images RHEL/CentOS 8+ #562

Merged
merged 2 commits into from
Jun 18, 2024
Merged

Conversation

eifrach
Copy link
Contributor

@eifrach eifrach commented Apr 11, 2024

Creating flow for upgrading postgresSQL

  1. add install_weak_deps=false to installation which reduce the image size
  2. install postgresql-upgrade package for the upgrade

@eifrach
Copy link
Contributor Author

eifrach commented Apr 11, 2024

[test]

@@ -297,7 +297,7 @@ function wait_for_postgresql_master() {
run_pgupgrade ()
(
# Remove .pid file if the file persists after ugly shut down
if [ -f "$PGDATA/postmaster.pid" ] && ! pgrep -f "postgres" > /dev/null; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing the condition? We do not want to remove .pid file of the running process

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I removed it because I had some issues when I was testing the upgrade. I will look into it deeper.
we should check if the PID is up first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is from a running container I added a stop before the line
seem that the service is not running, also there isn't any open socket

I'm assuming the it is partially match run-postgresql

bash-4.4$ pgrep -f "postgres" 
1
17

 ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
postgres       1       0  0 12:18 pts/0    00:00:00 /bin/bash /usr/bin/run-postgresql
postgres      17       1  0 12:18 pts/0    00:00:00 /bin/bash /usr/bin/run-postgresql
postgres      18      17  0 12:18 pts/0    00:00:00 /usr/bin/coreutils --coreutils-prog-shebang
postgres      19       0  0 12:19 pts/1    00:00:00 bash
postgres      27      19  0 12:22 pts/1    00:00:00 ps -ef

how about

Suggested change
if [ -f "$PGDATA/postmaster.pid" ] && ! pgrep -f "postgres" > /dev/null; then
if [ -f "$PGDATA/postmaster.pid" ] && ! pg_isready > /dev/null; then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense +1

@@ -352,7 +352,6 @@ run_pgupgrade ()
info_msg "Starting old postgresql once again for a clean shutdown..."
"${old_pgengine}/pg_ctl" start -w --timeout 86400 -o "-h ''"
info_msg "Waiting for postgresql to be ready for shutdown again..."
"${old_pgengine}/pg_isready"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this safety procedure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my testing it returned exit code 2 - I did manual checks the socket was up and running but the pg_isready wasn't catching it.
On the above line we use -w which is waiting for the DB to be ready. I thought maybe we don't need to double check it unless there is another reason for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after some searching, It seems we start the server as local only -- which mean it doesn't create a TCP port instead it creates a Linux socket.

the 'pg_isready' tries the port and fails

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines 308 to 323
if test "$old_raw_version" = 92; then
old_collection=postgresql92
else
old_collection=rh-postgresql$old_raw_version
old_collection=postgresql-$old_raw_version
fi
Copy link
Member

@fila43 fila43 Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not correspond with the current state of PostgreSQL rpm names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@fila43
Copy link
Member

fila43 commented Apr 11, 2024

Overall, it looks good. Thank you for your contribution. We will need to generalize it and implement it in the master branch.
https://github.com/sclorg/postgresql-container/blob/master/src/root/usr/share/container-scripts/postgresql/common.sh

@@ -44,7 +44,8 @@ COPY root/usr/libexec/fix-permissions /usr/libexec/fix-permissions
RUN yum -y module enable postgresql:13 && \
INSTALL_PKGS="rsync tar gettext bind-utils nss_wrapper postgresql-server postgresql-contrib" && \
INSTALL_PKGS="$INSTALL_PKGS pgaudit" && \
yum -y --setopt=tsflags=nodocs install $INSTALL_PKGS && \
INSTALL_PKGS="$INSTALL_PKGS procps-ng util-linux postgresql-upgrade" && \
yum -y --setopt=tsflags=nodocs --setopt=install_weak_deps=false install $INSTALL_PKGS && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What difference in packages is there when using this option? I would rather not introduce it for already built image versions if the change is too disruptive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it not a must, it removes about 100MB of size - those are packages that are extra doesn't change / affect the core application

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the existing images, I would choose the more defensive approach. I would just add the necessary packages.

@eifrach
Copy link
Contributor Author

eifrach commented Apr 14, 2024

Overall, it looks good. Thank you for your contribution. We will need to generalize it and implement it in the master branch. https://github.com/sclorg/postgresql-container/blob/master/src/root/usr/share/container-scripts/postgresql/common.sh

I've issue fixes for all the remarks
once they approved I'll create a PR with changes to all common.sh files

@eifrach eifrach requested review from fila43 and pkubatrh April 15, 2024 12:00
@@ -311,8 +311,8 @@ run_pgupgrade ()
old_collection=rh-postgresql$old_raw_version
fi

old_pgengine=/opt/rh/$old_collection/root/usr/bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be changed only for RHEL8+ images. The general change would cause problems in rhel7/centos7 containers. Condition needed in distgen source files.

@@ -312,7 +312,7 @@ run_pgupgrade ()
old_collection=rh-postgresql$old_raw_version
fi

old_pgengine=/opt/rh/$old_collection/root/usr/bin
old_pgengine=/usr/lib64/pgsql/postgresql-$old_raw_version/bin
new_pgengine=/opt/rh/rh-postgresql${new_raw_version}/root/usr/bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/usr/bin ?

@fila43
Copy link
Member

fila43 commented Apr 16, 2024

It makes a good impression on me. Please implement the changes into /src directory. Once it is there, we will check and adjust the compatibility for rhel7/centos7 vs rhel8+ versions, or we can wait for the EOL of centos7/rhel7 images. Also, there should be the prepared upgrade test that needs to be enabled for rhel8+ containers.

@eifrach eifrach changed the title [WIP] adding upgrade flow for postgress 13 Upgrade flow for images RHEL/CentOS 8+ Apr 18, 2024
@eifrach eifrach marked this pull request as ready for review April 18, 2024 12:44
@eifrach
Copy link
Contributor Author

eifrach commented Apr 18, 2024

[test]

@eifrach
Copy link
Contributor Author

eifrach commented Apr 18, 2024

cc @danmanor

@fila43
Copy link
Member

fila43 commented Apr 24, 2024

LGTM, but there is dist-gen issue

@fila43
Copy link
Member

fila43 commented Apr 24, 2024

@phracek would it be possible to enable this test also for RHEL8+ ?

run_upgrade_test ()

@fila43
Copy link
Member

fila43 commented Apr 24, 2024

[test]

@erthalion
Copy link

Hi,

Thanks for the PR! I've got a couple of general commentaries about PostgreSQL
major version upgrade, and although they may not necessarily be related to the
PR, I reckon it's a good idea to collect them here. As an outside person I
could be missing some details of this project, please let me know if that's the
case.

  • POSTGRESQL_UPGRADE seems to allow only --link and --copy, but since PG
    11 there is also --clone option, which somewhat combines best of both
    worlds, but requires newer Linux kernel versions.

  • The docs mention --link option as an unsafe, but the old cluster should not
    be used only of the new cluster was started. If it wasn't the case, the old
    cluster was unmodified and should be safe to start.

  • It seems that it's possible to provide custom initdb options via
    POSTGRESQL_UPGRADE_INITDB_OPTIONS, but some of them must be the same as the
    old cluster (collate, encoding, data checksums, wal segment size). It sounds
    a bit unreliable to leave it to the container user, could those options be
    enforced?

  • It's worth running pg_upgrade --check to verify compatibility.

  • Maybe a good idea to add -j to upgrade_cmd for additional parallelism.

  • Usually triggering a vacuum after the upgrade to rebuild statistics is a
    good idea as well.

  • More open-ended question, what do you think about those situations where
    some coordination is needed? E.g. when upgrading a replica, most likely rsync
    has to be used in coordination with the primary upgrade. Or upgrading
    extensions, which is not handled via pg_upgrade either.

    It's probably beyond what you want to achieve within a container, but it's
    at least worth mentioning in the documentation, right?

https://www.postgresql.org/docs/current/pgupgrade.html

@fila43
Copy link
Member

fila43 commented Apr 30, 2024

[test]

3 similar comments
@fila43
Copy link
Member

fila43 commented Apr 30, 2024

[test]

@fila43
Copy link
Member

fila43 commented May 6, 2024

[test]

@fila43
Copy link
Member

fila43 commented May 6, 2024

[test]

@fila43
Copy link
Member

fila43 commented May 6, 2024

@phracek @zmiklank any idea about c9s failures? RHEL9 passed but c9s failed

@zmiklank
Copy link
Contributor

zmiklank commented May 7, 2024

@phracek @zmiklank any idea about c9s failures? RHEL9 passed but c9s failed

This is reproducible also in the master branch - so not seem to be related to this PR. Seems like a segfault (double free) from pg_isready binary.

Aren't there any differences in c9s and rhel9 binaries?

@zmiklank
Copy link
Contributor

zmiklank commented May 7, 2024

So this may be an postgresql[1] issue which is present only when using newer openssl (3.2.1) that got into c9s, but not into rhel9 yet. postgresql upstream patch seeems to exist already, more info is in the linked mailing thread[1].

[1] https://www.postgresql.org/message-id/CAN55FZ1eDDYsYaL7mv%2BoSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ%40mail.gmail.com

@eifrach
Copy link
Contributor Author

eifrach commented May 28, 2024

sorry for the delay, I was on PTO
looking into the failures.

@zmiklank
Copy link
Contributor

sorry for the delay, I was on PTO looking into the failures.

No problem at all. Just note that c8s images will be deprecated on May the 31th. So no need to work further on them I guess. (This PR needs to merged first: #567)

@fila43
Copy link
Member

fila43 commented May 29, 2024

[test]

@fila43
Copy link
Member

fila43 commented Jun 18, 2024

I tested RHEL8 failed cases locally and it works, c9s fails are known issues so I would merge it. @eifrach could you please confirm my observation to be sure before merge?

@eifrach
Copy link
Contributor Author

eifrach commented Jun 18, 2024

I tested RHEL8 failed cases locally and it works, c9s fails are known issues so I would merge it. @eifrach could you please confirm my observation to be sure before merge?

yes it works for me - also I've tested the data migration

eifrach and others added 2 commits June 18, 2024 14:09
@fila43 fila43 merged commit e42cb5b into sclorg:master Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants