Skip to content

Sam/nix and conventional ami #1012

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Jul 19, 2024
Merged

Sam/nix and conventional ami #1012

merged 45 commits into from
Jul 19, 2024

Conversation

samrose
Copy link
Collaborator

@samrose samrose commented Jun 24, 2024

This PR will need minor follow up prior to approval/merge for github actions that are dedicated to specifically merging to develop. It supersedes pr #953

Documentation of changes in #1012

Conventional AMI approach

The existing/conventional AMI build approach installs postgres from the postgresql-common ubuntu/debian package at the time of the AMI build. In addition, it builds extensions, and wrappers from source at the point of AMI build, and installs them as ‘.deb’ packages.

Flowcharts (3)

Nix packaged postgresql bundle approach

In the nix approach, we use the postgresql provided by nixpkgs (currently pinned at version 15.6 vi a76c4553d7e741e17f289224eda135423de0491d commit of nixpkgs-unstable branch locked via https://github.com/supabase/postgres/blob/sam/2-stage-ami-nix/flake.lock#L114 )

Nixpkgs sources from https://github.com/NixOS/nixpkgs/blob/a76c4553d7e741e17f289224eda135423de0491d/pkgs/servers/sql/postgresql/generic.nix#L52 ← this URL

the nixpkgs package applies the following patches for aarch64-linux pg 15.6

https://github.com/NixOS/nixpkgs/blob/a76c4553d7e741e17f289224eda135423de0491d/pkgs/servers/sql/postgresql/patches/disable-resolve_symlinks.patch

https://github.com/NixOS/nixpkgs/blob/a76c4553d7e741e17f289224eda135423de0491d/pkgs/servers/sql/postgresql/patches/less-is-more.patch

https://github.com/NixOS/nixpkgs/blob/a76c4553d7e741e17f289224eda135423de0491d/pkgs/servers/sql/postgresql/patches/hardcode-pgxs-path.patch

https://github.com/NixOS/nixpkgs/blob/a76c4553d7e741e17f289224eda135423de0491d/pkgs/servers/sql/postgresql/patches/specify_pkglibdir_at_runtime.patch

https://github.com/NixOS/nixpkgs/blob/a76c4553d7e741e17f289224eda135423de0491d/pkgs/servers/sql/postgresql/patches/findstring.patch

https://github.com/NixOS/nixpkgs/blob/a76c4553d7e741e17f289224eda135423de0491d/pkgs/servers/sql/postgresql/patches/locale-binary-path.patch

https://github.com/NixOS/nixpkgs/blob/a76c4553d7e741e17f289224eda135423de0491d/pkgs/servers/sql/postgresql/patches/socketdir-in-run-13.patch

When a PR is submitted to the supabase/postgres repo updating any of the nix packages maintained there, a build of the entire bundle is triggered on supported systems (x86_64-linux and aarch64-linux as of this writing). When this the nix ci workflow is initiated, nix is able to source from our binary cache (currently located in a publicly readable aws s3 bucket at https://nix-postgres-artifacts.s3.amazonaws.com ) and will check for any component dependency which has an exact match and has already successfully built. Nix will source that built version from the cache, and only build the items that have changed. If nix cannot build a changed item, the build will fail. If the build succeeds, nix will perform flake “checks” (scripted tests with dependencies managed by nix). An example of the “check” is seen here

Our CI implmentation of nix has only 2 trusted public keys and 2 specified nix caches (ours and the upstream nixpkgs community cache https://github.com/supabase/postgres/blob/sam/2-stage-ami-nix/docker/nix/Dockerfile#L5 and on the AMI at https://github.com/supabase/postgres/blob/sam/2-stage-ami-nix/scripts/nix-provision.sh#L20 )

https://github.com/supabase/postgres/actions/runs/9468138054/job/26083806922?pr=953#step:6:813 this starts the database, and enables several extensions post-build. If this test fails, the build will also fail. If the nix build and check succeeds, the build will upload the artifacts to the nix cache for re-use prior to stopping this workflow.

Flowcharts (2)

In the debian/ubuntu postgresql-common package, the “postgres” user is created, and postgres is installed to locations that are conventional for debian/ubuntu. In the nix approach, we explicitly create the “postgres” linux user, and then we use the nix profile install method to install the nix-built binaries for postgres, into the nix profile for the “postgres” user (located a /home/postgres/.nixprofile on the ami machine). We then alias the installed file locations to the conventional debian/ubuntu locations for postgres installation. nix profile command will give us an imperative way to install, uninstall, and upgrade packages that we build with nix going forward, allowing us to integrate our nix-built packages with debian/ubuntu distributions.

2 Stage AMI approach

The Ansible and Packer code has been forked in parallel in the same repo, so that both the nix-built approach, and the existing ubuntu/debian package approach can be supported in paralell. This will allow continued production rollouts under the old method, while also allowing targeted rollouts with the nix build AMI.

The existing build lives under the same ansible folder, and the companion packer hcl files have been retained. The parallel nix AMI build has parallel packer files with nix inserted into the name, and a new folder ansible-nix . Both of these builds use the same command line command recipe to initiate them.

Description of 2 stage approach

The previous packer/ansible build used the https://developer.hashicorp.com/packer/integrations/hashicorp/amazon/latest/components/builder/ebssurrogate exclusively. The new nix-based retains the ebssurrogate approach to build and configure everything except for the postgres bundle.

The nature of nix builds is that they are already “sandboxed” and isolated at build time, and the results are store in a read only directory called the “nix store”. Nix has never had the need to support building in chroot as ebssurrogate packer build does, and so running nix in chroot has never been supported for these reasons. Therefore, a second stage of the AMI build was introduced, that securely sources the private “stage1” AMI built by the stage 1 ebssurrogate approach, and then installs the nix built suapbase postgres/extensions/wrappers bundle from binary cache using the conventional github.com/hashicorp/amazon packer plugin, and limited to installing, configuring and testing postgres from either files uploaded in the first stage, or sourced from nix cache (other than stage 2 ansible playbook and unit test files). The workflow that performs these 2 stages is located here https://github.com/supabase/postgres/blob/sam/2-stage-ami-nix/.github/workflows/ami-release-nix.yml As more Supabase projects are packaged in nix, they will be moved into this 2nd stage for installation and configuration. In the 2nd stage we run migration and unit tests, and linux user/group assignment checks with a temporarily installed copy of osquery https://github.com/supabase/postgres/blob/sam/2-stage-ami-nix/ansible-nix/tasks/stage2/playbook.yml#L71 and https://github.com/supabase/postgres/blob/sam/2-stage-ami-nix/ansible-nix/files/permission_check.py

The 2nd stage also creates path aliases to the nix-installed binaries so that files and configurations are still where they are expected to be as much as possible. This allows post-AMI-build init scripts like https://github.com/supabase/infrastructure/blob/develop/init-scripts/project/00-init.sh to continue to succeed in running.

We are maintaining documentation on how to work with the nix portion of supabase/postgres at https://github.com/supabase/postgres/tree/sam/2-stage-ami-nix/nix/docs and will continue to expand that as much as possible.

Current Progress on adoption in https://github.com/supabase/postgres

There is an umbrella draft PR at #953 which includes the building of an aarch64-linux AMI

Docker image PR #986

Docker AIO Image PR #987

The Ansible and Packer code has been forked in parallel in the same repo, so that both the nix-built approach, and the existing ubuntu/debian package approach can be supported in paralell. This will allow continued production rollouts under the old method, while also allowing targeted rollouts with the nix build AMI.

@samrose samrose requested a review from a team as a code owner June 24, 2024 17:40
@samrose samrose mentioned this pull request Jun 25, 2024
@samrose
Copy link
Collaborator Author

samrose commented Jun 26, 2024

@darora just wanted to follow up that in this new PR testinfra ami tests are now passing for the nix ami build

https://github.com/supabase/postgres/actions/runs/9672036422/job/26683662377?pr=1012#step:10:95

This resolves #953 (comment)

I have moved the docker work to a new PR that should be coming up tomorrow (and will deprecate the old)

@olirice olirice mentioned this pull request Jun 26, 2024
2 tasks
Copy link
Contributor

@pashkinelfe pashkinelfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view we still need some automatic deduplication of config files to be sure they don't have differences and will not have in the future. E.g ansible-nix/files/kong_config/kong.service.j2 and postgres/ansible/files/kong_config/kong.service.j2 look like copies and many other files like that. If we have files derived from other ones, I think it's better to have one "master" one and the other to be derived in an automatic way. Otherwise, we could get easily confused in the changes between them. And also this make PR somewhat bulky and hard to review. Maybe a script that derives config files from main Supabase repo is a suitable solution for this. Or symlinks to existing files.

@samrose
Copy link
Collaborator Author

samrose commented Jul 2, 2024

In my view we still need some automatic deduplication of config files to be sure they don't have differences and will not have in the future. E.g ansible-nix/files/kong_config/kong.service.j2 and postgres/ansible/files/kong_config/kong.service.j2 look like copies and many other files like that. If we have files derived from other ones, I think it's better to have one "master" one and the other to be derived in an automatic way. Otherwise, we could get easily confused in the changes between them. And also this make PR somewhat bulky and hard to review. Maybe a script that derives config files from main Supabase repo is a suitable solution for this. Or symlinks to existing files.

@pashkinelfe that's reasonable to me, thanks.

@samrose samrose force-pushed the sam/nix-and-conventional-ami branch 2 times, most recently from 83fd3ed to d5b4643 Compare July 3, 2024 21:29
@darora
Copy link
Contributor

darora commented Jul 4, 2024

The diff itself looks fine, though someone will need to test an actual upgrade to see if there's any other hurdles that are not immediately obvious. Agreed with Pavel if it's doable in a straightforward manner. Otherwise, I guess we'll have to review a diff of the two dirs locally as a one-time exercise.

Copy link
Contributor

@darora darora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving once #1025 gets merged in to clean things up

@soedirgo soedirgo requested a review from a team as a code owner July 18, 2024 08:38
@soedirgo soedirgo force-pushed the sam/nix-and-conventional-ami branch from bc1b197 to 4afe5f0 Compare July 18, 2024 08:38
samrose and others added 22 commits July 19, 2024 10:30
Sam/nix and conventional consolidate (#1025)

* feat: consolidate ansible and use vars to toggle AMI builds

* fix: resolving merge conflict

* chore: merge conflict

* Revert "chore: merge conflict"

This reverts commit ddc6b1d.

* fix: update ansible location for script

* fix: ansible consolidated location

* fix: set up modes on system-setup

* fix: set vars

* fix: python True and False in extra_vars

* fix: adj vars

* fix: set all ami vars

* fix: args as json

* fix: nixpkg_mode

* fix: refining mode rules

* fix: consolidate create dirs

* fix: cleaning up modes

* fix: systemd psql service reload targets

* fix: starting postgres issues

* fix: timing for pgsodium_getkey script

* fix: packer file upload on stage 2

* fix: consolidation of ansible location

* fix: stage2 fix hostname

* fix: limit stage that tasks run on

* fix: setting hosts only on stage 2 nix ami

* fix: rewrite hosts in ansible to allow for re-use of playbook file

* chore: trigger checks

* fix: pgsodium getkey is different for deb vs nix builds

* fix: consolidated files location

* fix: on stage2 postgres server is already started at this point

* fix: without env vars

* fix: vars on the right mode

* fix: dedupe

* fix: locales

* fix: locales

* chore: try step with no env vars

* fix: no need to start pg at this point stage2

* fix: yaml

* fix: more cleanup of modes

* fix: snapd already absent at this point + consolidate tasks

* fix: already absent at this point

* fix: service not present at this stage

* fix: disable different services for first boot depending on mode

* fix: pg already restarted at this point in stage 2

* fix: no start on stage2

* fix: try to start in stage2

* chore: include env vars for stage2

* fix: stop before starting

* fix: debpkg mode only

* fix: should use conventional path

* fix: need to locale-gen prior to initdb

* fix: nix build needs .env

* fix: stage2 treatment of pgsodium_getket

* chore: re-introduce permission checks via osquery

* fix: correct the path to files

---------

Co-authored-by: Sam Rose <samuel@supabase.io>
* fix: was using the wrong sha256 hash for version

* chore: updating wrappers version

* itests: make sure we run the current commit on psql bundle test

---------

Co-authored-by: Sam Rose <samuel@supabase.io>
* fix: locale gen and ami deregister on any testinfra run

* fix: use more manual approach

---------

Co-authored-by: Sam Rose <samuel@supabase.io>
@samrose samrose force-pushed the sam/nix-and-conventional-ami branch from 48ecb12 to 12852b2 Compare July 19, 2024 14:33
@samrose samrose merged commit bad563a into develop Jul 19, 2024
12 of 13 checks passed
@samrose samrose deleted the sam/nix-and-conventional-ami branch July 19, 2024 16:50
damonrand pushed a commit to cepro/postgres that referenced this pull request Jun 15, 2025
* feat: nix-ami-changes

* chore: version bump

* chore: remap branch for ami build

* chore: bump version

* chore: bump version to trigger build

* feat: use /var/lib/postgresql as home for postgres user

* fix: makre sure bashrc exists

* fix: minor refactor

* chore: moving to a different PR

* chore: bump version and remove deprecated workflow

* feat: parallel testinfra-nix just for ami test

* chore: testing just testinfra-nix workflow

* chore: re-run build

* chore: re-trigger testinfra

* fix: wait for AMI to reach available state

* fix: use ami id in stage 3 testinfra ami-test

* fix: env vars

* chore: bump version

* chore: restore packer build

* chore: create a parallel test

* chore: bump version

* fix: capture and use ami name

* fix: aws regions

* chore: capture ami name

* chore: force_deregister all ami prior to create new

* fix: pass same ami name each time

* fix: manage concurrency of testinfra builds

* fix: no args on stage 2

* fix: re-intro original testinfra

* Revert "fix: re-intro original testinfra"

This reverts commit f719e66.

* chore: push to re-trigger build

* chore: update instance name

* fix: location of pg_isready binary

* fix: re-intro conventional ami infra test + more symlinks where expected

* fix: dealing with symlink creation issues

* fix: try concurrency rules on on all large builds

* chore; try with no concurrency rules

* chore: rerun

* chore: rebasing on develop
Sam/nix and conventional consolidate (supabase#1025)

* feat: consolidate ansible and use vars to toggle AMI builds

* fix: resolving merge conflict

* chore: merge conflict

* Revert "chore: merge conflict"

This reverts commit ddc6b1d.

* fix: update ansible location for script

* fix: ansible consolidated location

* fix: set up modes on system-setup

* fix: set vars

* fix: python True and False in extra_vars

* fix: adj vars

* fix: set all ami vars

* fix: args as json

* fix: nixpkg_mode

* fix: refining mode rules

* fix: consolidate create dirs

* fix: cleaning up modes

* fix: systemd psql service reload targets

* fix: starting postgres issues

* fix: timing for pgsodium_getkey script

* fix: packer file upload on stage 2

* fix: consolidation of ansible location

* fix: stage2 fix hostname

* fix: limit stage that tasks run on

* fix: setting hosts only on stage 2 nix ami

* fix: rewrite hosts in ansible to allow for re-use of playbook file

* chore: trigger checks

* fix: pgsodium getkey is different for deb vs nix builds

* fix: consolidated files location

* fix: on stage2 postgres server is already started at this point

* fix: without env vars

* fix: vars on the right mode

* fix: dedupe

* fix: locales

* fix: locales

* chore: try step with no env vars

* fix: no need to start pg at this point stage2

* fix: yaml

* fix: more cleanup of modes

* fix: snapd already absent at this point + consolidate tasks

* fix: already absent at this point

* fix: service not present at this stage

* fix: disable different services for first boot depending on mode

* fix: pg already restarted at this point in stage 2

* fix: no start on stage2

* fix: try to start in stage2

* chore: include env vars for stage2

* fix: stop before starting

* fix: debpkg mode only

* fix: should use conventional path

* fix: need to locale-gen prior to initdb

* fix: nix build needs .env

* fix: stage2 treatment of pgsodium_getket

* chore: re-introduce permission checks via osquery

* fix: correct the path to files

---------

Co-authored-by: Sam Rose <samuel@supabase.io>

* Sam/timescale and wrappers (supabase#1052)

* fix: was using the wrong sha256 hash for version

* chore: updating wrappers version

* itests: make sure we run the current commit on psql bundle test

---------

Co-authored-by: Sam Rose <samuel@supabase.io>

* fix: locale gen and ami deregister on any testinfra run (supabase#1055)

* fix: locale gen and ami deregister on any testinfra run

* fix: use more manual approach

---------

Co-authored-by: Sam Rose <samuel@supabase.io>

* chore: update pg_upgrade initiate.sh to support nix-based upgrades (supabase#1057)

* chore: package nix flake revision in pg_upgrade binaries tarball when building the nix AMI (supabase#1058)

* chore: activate release workflow

* chore: bump version

---------

Co-authored-by: Sam Rose <samuel@supabase.io>
Co-authored-by: Paul Cioanca <paul.cioanca@supabase.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants