Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data deletion sometimes fails and there is no easy way to retry #430

Closed
yanokwa opened this issue May 16, 2023 · 1 comment · Fixed by #431
Closed

Data deletion sometimes fails and there is no easy way to retry #430

yanokwa opened this issue May 16, 2023 · 1 comment · Fixed by #431
Assignees
Labels
ops Docker, ops to deploy Central

Comments

@yanokwa
Copy link
Member

yanokwa commented May 16, 2023

In our testing, data deletions always worked, but in every production install I've upgraded, data deletion has failed and I have no idea why. It'd be nice to have a force that runs https://github.com/getodk/central/blob/master/files/postgres/upgrade-postgres.sh#L42.

root@ip-xx-xx-xx-xx:~/central# docker compose up postgres
[+] Running 1/1
 ⠿ Container central-postgres-1  Recreated                                                                                                                 0.1s
Attaching to central-postgres-1
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] Checking for existing upgrade marker file...
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] Upgrade has been run previously.
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!! WARNING: you still have old data from PostgreSQL 9.6
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!! This is taking up disk space: 311MB
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!! Continue with the instructions at https://docs.getodk.org/central-upgrade/
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] Complete.
central-postgres-1 exited with code 0

root@ip-172-31-18-74:~/central# touch ./files/postgres14/upgrade/delete-old-data \
   && docker compose up --abort-on-container-exit postgres
[+] Running 1/1
 ⠿ Container central-postgres-1  Recreated                                                                                                                 0.1s
Attaching to central-postgres-1
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] Checking for existing upgrade marker file...
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] Upgrade has been run previously.
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] Deleting old data...
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!! ERROR: file missing: delete_old_cluster.sh
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!! Upgrade may not have completed successfully.
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!! Old data will not be deleted.
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1 exited with code 1
Aborting on container exit...
[+] Running 1/0
 ⠿ Container central-postgres-1  Stopped

In this case, I can log into Central fine, I can also run docker exec -it central-postgres14-1 psql -U odk -W odk -c "select version();" to confirm I'm running 14.7.

The logs from a Central migration that where delete worked looks ends like this:

Tue 16 May 2023 08:27:38 PM GMT [upgrade-postgres.sh] Upgrade complete.
Tue 16 May 2023 08:27:38 PM GMT [upgrade-postgres.sh] Complete.

The logs where the delete failed looks like this:

Sat 25 Mar 2023 05:28:40 AM GMT [upgrade-postgres.sh] Upgrade complete.

It seems like maybe delete_old_cluster.sh is somehow not available? And maybe it doesn't matter? Rather than check delete_old_cluster.sh, maybe if /postgres14-upgrade/upgrade-successful exists, we do the delete.

tianon/docker-postgres-upgrade#11 (comment) seems to suggest that the delete_old_cluster.sh is created in a work directory in the container which is not persisted and might go away.

It seems like the best solution is to check to see if the upgrade has completed and delete the data then.

@yanokwa yanokwa changed the title Data deletion sometimes fails and there is no easy way to retry. Data deletion sometimes fails and there is no easy way to retry May 16, 2023
@matthew-white matthew-white added the ops Docker, ops to deploy Central label May 24, 2023
@matthew-white
Copy link
Member

Closed by #431.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ops Docker, ops to deploy Central
Projects
Status: ✅ done
Development

Successfully merging a pull request may close this issue.

2 participants