Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initdb huge_pages #715

Merged
merged 1 commit into from Oct 3, 2021
Merged

initdb huge_pages #715

merged 1 commit into from Oct 3, 2021

Conversation

baum
Copy link
Contributor

@baum baum commented Aug 13, 2021

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1946792

  • Extract Postgres initialization into second init container
  • During initialization wrap the Postgres binary, force huge_pages=off for initdb

Signed-off-by: Alexander Indenbaum aindenba@redhat.com

@baum baum force-pushed the initdb_huge_pages_pr branch 8 times, most recently from e55ca43 to 1155165 Compare August 24, 2021 13:11
@dannyzaken
Copy link
Contributor

@guymguym we had discussions in the past about how we should manage our RBAC resources. WDYT on these changes?

a few things that are done in this PR, as a result of the need to have more privileges for the DB init container:

  1. a new noobaa-db account was added
  2. the CSV was changed to only contain the noobaa account\role\binding.
  3. noobaa install from the CLI does not install noobaa-db and noobaa-endpoint accounts\roles
  4. noobaa-endpoint and noobaa-db RBAC resources along with SCC are created in the system reconcile.

we should also notice that noobaa-core is still using noobaa account, which gives it all of the k8s permissions noobaa operator has. we might want to change that in the future

@guymguym
Copy link
Member

@dannyzaken @baum

  1. I can't seem to understand why is this all needed for our postgres deployment and not any other postgres deployment.
  2. Why do we deploy different RBAC for OLM vs CLI - that can't be good.
  3. Why put a script in a configmap? Is this expected to be configured by a user? If not I would expect either a custom image or to write this script inline in the deployment yaml instead in its own configmap which is hinting towards a configuration.

@dannyzaken
Copy link
Contributor

dannyzaken commented Sep 13, 2021

@guymguym it's been a while so I'll try to answer your questions accurately. @baum correct me if I miss anything

  1. this is a problem for any Postgres deployment which is deployed on a cluster with hugepages enabled
  2. we don't deploy different RBAC for OLM vs CLI. for all deployments, the db and endpoint roles are created in system reconcile
  3. we don't want to maintain our own postgres image. writing the script inline in the yaml is an option. @baum what do you think?

@guymguym
Copy link
Member

First of all I want to paste the references to the issues that we are trying to workaround here:

  1. Bus error (core dumped) kubernetes/kubernetes#71233 this is the kubernetes issue that causes postgres to not be able to auto detect that hugepages are disabled, and instead the memory allocation passes, but then there's a runtime Bus error.
  2. https://github.com/postgres/postgres/blob/1882d6cca161dcf9fa05ecab5abeb1a027a5cfd2/src/bin/initdb/initdb.c#L2775 - this is the postgres initdb code that runs this autodetection very early in the db creation flow, which means that we cannot affect it by changing the configuration files and we have to force the binary to use the -c huge_pages=off in some hacky way.

To be continued.

@guymguym
Copy link
Member

Hey @baum @dannyzaken

My main concern with the proposed PR is its size and collateral damage for later.
We certainly want to workaround these issues, but in order to do so we are inserting quite a lot of changes, which I would want to understand if all are needed as a temporary fix.

To begin - I would separate the two commits to two separate PR's so that we can review and have a future reference to each one individually.

Can the SCC/RBAC refactor commit be separate to another PR? Is it needed for this PR to work?

Is this approach of creating RBAC and SCC's by the operator reconcile recommended for openshift operators?

Can we suggest any PR to the postgres centos image repo instead to allow this workaround for anyone else who runs the same image - https://github.com/sclorg/postgresql-container ? I would much prefer to see the workaround as a PR to the image repo itself instead of making so many changes that might be needed by all other postgres users.

@baum
Copy link
Contributor Author

baum commented Sep 14, 2021

See postgresql-container issue

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1946792

- Extract Postgres initialisation into second init container
- During initialization wrap the postgres binary, force huge_pages=off for initdb

Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>
@baum baum merged commit 2ccc3d8 into noobaa:master Oct 3, 2021
baum pushed a commit to baum/noobaa-operator that referenced this pull request Oct 6, 2021
Use static YAML declaration for credentials secret references.
Remove golang logic, to simplify.

Suggested by @guymguym here:
  noobaa#715 (comment)

Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>
baum pushed a commit to baum/noobaa-operator that referenced this pull request Oct 13, 2021
Use static YAML declaration for credentials secret references.
Remove golang logic, to simplify.

Suggested by @guymguym here:
  noobaa#715 (comment)

Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants