Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run the etcd as non-root #100635

Merged
merged 1 commit into from
Apr 9, 2021
Merged

Conversation

cindy52
Copy link
Contributor

@cindy52 cindy52 commented Mar 29, 2021

What type of PR is this?

/kind feature

Run etcd as non-root on GCE provider'

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 29, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @cindy52!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 29, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @cindy52. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/provider/gcp Issues or PRs related to gcp provider labels Mar 29, 2021
@k8s-ci-robot k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 29, 2021
@@ -1254,6 +1254,8 @@ ${CUSTOM_CALICO_NODE_DAEMONSET_YAML//\'/\'\'}
CUSTOM_TYPHA_DEPLOYMENT_YAML: |
${CUSTOM_TYPHA_DEPLOYMENT_YAML//\'/\'\'}
CONCURRENT_SERVICE_SYNCS: $(yaml-quote "${CONCURRENT_SERVICE_SYNCS:-}")
ETCD_RUNASUSER: $(yaml-quote "${ETCD_RUNASUSER:-2000}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not set then this should be defaulted to 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already an etcd user as 2000, I'm afraid of defaulting it to 0 will end up with running as root again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the etcd user 2000 created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find it anywhere in the code but when I ssh to the master node, I can see the etcd user by cat /etc/group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

},
"runAsUser": {{runAsUser}},
"runAsGroup": {{runAsGroup}},
"runAsNonRoot": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting it to true will prevent the pod run as user 0 which is root, not sure it's desired or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

sed -i -e "s@{{runAsUser}}@${ETCD_RUNASUSER}@g" "${temp_file}"
sed -i -e "s@{{runAsGroup}}@${ETCD_RUNASGROUP}@g" "${temp_file}"
chown -R ${ETCD_RUNASUSER}:${ETCD_RUNASGROUP} /var/etcd/
chown -R ${ETCD_RUNASUSER}:${ETCD_RUNASGROUP} /var/log/etcd.log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already done on line 1893 and 1894

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -1878,10 +1889,15 @@ function start-etcd-servers {
if [[ -e /etc/init.d/etcd ]]; then
rm -f /etc/init.d/etcd
fi
prepare-log-file /var/log/etcd.log
if [[ -n "${ETCD_RUNASUSER:-}" && -n "${ETCD_RUNASGROUP:-}" ]]; then
prepare-log-file /var/log/etcd.log ${ETCD_RUNASUSER} ${ETCD_RUNASGROUP}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only need to pass the user and not the group see:

prepare-log-file /var/log/kube-controller-manager.log ${KUBE_CONTROLLER_MANAGER_RUNASUSER}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 30, 2021
if [[ -n "${ETCD_RUNASUSER:-}" && -n "${ETCD_RUNASGROUP:-}" ]]; then
container_security_context=$(echo "{\"allowPrivilegeEscalation\": false, \"capabilities\": {\"drop\": [\"all\"]}}" | base64 | tr -d '\r\n')
else
container_security_context=$(echo "{\"allowPrivilegeEscalation\": false}" | base64 | tr -d '\r\n')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets just leave the securityContext alone if it is running as root?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

chown -R ${ETCD_RUNASUSER:-0}:${ETCD_RUNASGROUP:-0} /var/etcd/
# Replace capabilities
if [[ -n "${ETCD_RUNASUSER:-}" && -n "${ETCD_RUNASGROUP:-}" ]]; then
container_security_context=$(echo "{\"allowPrivilegeEscalation\": false, \"capabilities\": {\"drop\": [\"all\"]}}" | base64 | tr -d '\r\n')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need the base64 and the tr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -1858,6 +1858,18 @@ function prepare-etcd-manifest {
fi
# Replace the volume host path.
sed -i -e "s@/mnt/master-pd/var/etcd@/mnt/disks/master-pd/var/etcd@g" "${temp_file}"
# Replace the runAsUser and runAsGroup
sed -i -e "s@{{runAsUser}}@${ETCD_RUNASUSER:-0}@g" "${temp_file}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be safest to not edit the manifest if ETCD_RUNASUSER and ETCD_RUNASGROUP is not set. Basically if ETCD_RUNASUSER and ETCD_RUNASGROUP is not set then this change is a noop, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to replace the {{runAsUser}} and {{runAsGroup}} in the manifest anyway isn't? I mean replace it to 0 if ETCD_RUNASUSER and ETCD_RUNASGROUP is not set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do that? If we don't set it then it will anyways run as 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manifest is written as:

"runAsUser": {{runAsUser}},
"runAsGroup": {{runAsGroup}}

The {{runAsUser}} and group need to be replace at least with some value, or are we talking about different thing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I am talking about this, but I think we can make the manifest to be:-

 securityContext {
 ...
 {{runAsUser}}
 {{runAsGroup}}
 }

and replace {{runAsUser}} with "\"runAsUser\": ${ETCD_RUNASUSER}" and same for runAsGroup. That way we only insert it ETCD_RUNASUSER else we replace {{runAsUser}} with empty string.

Copy link
Contributor Author

@cindy52 cindy52 Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to replace the entire securityContext as empty when it's root? Otherwise when it's root and runAsUser and runAsGroup is empty, the delimiter , at seccompProfile will be a problem.

"securityContext": {
    "seccompProfile": {
        "type": "RuntimeDefault"
    },
    {{runAsUser}}
    {{runAsGroup}}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm... you can add those before the seccompProfile 😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

if [[ -n "${ETCD_RUNASUSER:-}" && -n "${ETCD_RUNASGROUP:-}" ]]; then
container_security_context=$(echo "{\"allowPrivilegeEscalation\": false, \"capabilities\": {\"drop\": [\"all\"]}}")
else
container_security_context=$(echo "{}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of making the securityContext: {} when ETCD_RUNASUSER is not set, you can make it so that:-
instead of having

securityContext: {{security_context}}

You make it just:

{{containerSecurityContext}}

Then this would be

container_security_context=""
if [[ -n "${ETCD_RUNASUSER:-}" && -n "${ETCD_RUNASGROUP:-}" ]]; then
    container_security_context=$(echo "{\"allowPrivilegeEscalation\": false, \"capabilities\": {\"drop\": [\"all\"]}}")
fi
sed -i -e "s@{{security_context}}@${container_security_context}@g" "${temp_file}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

pod_run_as_user="\"runAsUser\": ${ETCD_RUNASUSER},"
pod_run_as_group="\"runAsGroup\": ${{ETCD_RUNASGROUP},"
container_security_context="\"securityContext\": {\"allowPrivilegeEscalation\": false, \"capabilities\": {\"drop\": [\"all\"]}},"
chown -R ${ETCD_RUNASUSER}:${ETCD_RUNASGROUP} /var/etcd/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like to avoid "chown" in the prepare manifests function.

Can we move this one to start-etcd-servers (also given that /var/etcd is actually shared between the two etcd's)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@wojtek-t
Copy link
Member

wojtek-t commented Apr 7, 2021

I'm mostly fine with this modulo the comment. But I would like @ptabor to also take a look at it.

@@ -26,7 +29,7 @@
"command": [
"/bin/sh",
"-c",
"set -o errexit; if [ -e /usr/local/bin/migrate-if-needed.sh ]; then /usr/local/bin/migrate-if-needed.sh 1>>/var/log/etcd{{ suffix }}.log 2>&1; fi; exec /usr/local/bin/etcd --name etcd-{{ hostname }} --listen-peer-urls {{ etcd_protocol }}://{{ host_ip }}:{{ server_port }} --initial-advertise-peer-urls {{ etcd_protocol }}://{{ hostname }}:{{ server_port }} --advertise-client-urls {{ etcd_apiserver_protocol }}://127.0.0.1:{{ port }} --listen-client-urls {{ etcd_apiserver_protocol }}://{{ listen_client_ip }}:{{ port }} {{ quota_bytes }} --data-dir /var/etcd/data{{ suffix }} --initial-cluster-state {{ cluster_state }} --initial-cluster {{ etcd_cluster }} {{ etcd_creds }} {{ etcd_apiserver_creds }} {{ etcd_extra_args }} 1>>/var/log/etcd{{ suffix }}.log 2>&1"
"set -o errexit; if [ -e /usr/local/bin/migrate-if-needed.sh ]; then /usr/local/bin/migrate-if-needed.sh 1>>/var/log/etcd{{ suffix }}.log 2>&1; fi; exec /usr/local/bin/etcd-{{ pillar.get('etcd_version', '3.4.13') }} --name etcd-{{ hostname }} --listen-peer-urls {{ etcd_protocol }}://{{ host_ip }}:{{ server_port }} --initial-advertise-peer-urls {{ etcd_protocol }}://{{ hostname }}:{{ server_port }} --advertise-client-urls {{ etcd_apiserver_protocol }}://127.0.0.1:{{ port }} --listen-client-urls {{ etcd_apiserver_protocol }}://{{ listen_client_ip }}:{{ port }} {{ quota_bytes }} --data-dir /var/etcd/data{{ suffix }} --initial-cluster-state {{ cluster_state }} --initial-cluster {{ etcd_cluster }} {{ etcd_creds }} {{ etcd_apiserver_creds }} {{ etcd_extra_args }} 1>>/var/log/etcd{{ suffix }}.log 2>&1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? I assume that always a 'target' image is declared by the manifest,
and the .../bin/etcd in the image is the 'target' version within the image.

Long-term I wish we get rid of 'specific-version' binaries in the image... so we shouldn't add dependencies on this naming convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed because in runMigrate, if the copy binary flag is true, it will copy the specific version of etcd and etcdclt to /etcd and /etcdclt https://github.com/kubernetes/kubernetes/blob/master/cluster/images/etcd/migrate/migrate.go#L77. As nonroot, we set the copy binary as false, thus we need to specify the version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see bellow - I think ./etcd is already in the image and does not requires copying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -65,7 +72,7 @@
"command": [
"/bin/sh",
"-c",
"set -x; exec /usr/local/bin/etcdctl --endpoints=127.0.0.1:{{ port }} {{ etcdctl_certs }} --command-timeout=15s endpoint health"
"set -x; exec /usr/local/bin/etcdctl-{{ pillar.get('etcd_version', '3.4.13') }} --endpoints=127.0.0.1:{{ port }} {{ etcdctl_certs }} --command-timeout=15s endpoint health"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. E.g. listed binaries in the 3.4.9 (gcr.io/google-containers/etcd-amd64:3.4.9) image:

-r-xr-xr-x  1 ptab primarygroup 23827424 Jun 25  2020 etcd
-r-xr-xr-x  1 ptab primarygroup 20186048 Jun 25  2020 etcd-3.0.17
-r-xr-xr-x  1 ptab primarygroup 16352288 Jun 25  2020 etcd-3.1.12
-r-xr-xr-x  1 ptab primarygroup 17899872 Jun 25  2020 etcd-3.2.24
-r-xr-xr-x  1 ptab primarygroup 22102784 Jun 25  2020 etcd-3.3.17
-r-xr-xr-x  1 ptab primarygroup 23827424 Jun 25  2020 etcd-3.4.9
-r-xr-xr-x  1 ptab primarygroup 17612384 Jun 25  2020 etcdctl
-r-xr-xr-x  1 ptab primarygroup 18468640 Jun 25  2020 etcdctl-3.0.17
-r-xr-xr-x  1 ptab primarygroup 14319264 Jun 25  2020 etcdctl-3.1.12
-r-xr-xr-x  1 ptab primarygroup 15279616 Jun 25  2020 etcdctl-3.2.24
-r-xr-xr-x  1 ptab primarygroup 17770784 Jun 25  2020 etcdctl-3.3.17
-r-xr-xr-x  1 ptab primarygroup 17612384 Jun 25  2020 etcdctl-3.4.9

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@wojtek-t
Copy link
Member

wojtek-t commented Apr 8, 2021

I will wait for @ptabor to take another look, but I'm generally fine with it.

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 8, 2021
@vinayakankugoyal
Copy link
Contributor

/ok-to-test

Instead of re-running everything consider re-running the only the ones that failed. You can do this by commenting /retest. Or by /test <test-name>

Co-authored-by: Vinayak Goyal <vinayakankugoyal@gmail.com>
@cindy52
Copy link
Contributor Author

cindy52 commented Apr 8, 2021

/retest

@cindy52
Copy link
Contributor Author

cindy52 commented Apr 8, 2021

/assign @dchen1107
/assign @tallclair

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 9, 2021
@wojtek-t
Copy link
Member

wojtek-t commented Apr 9, 2021

/triage accepted
/priority important-soon

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 9, 2021
@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 9, 2021
@wojtek-t
Copy link
Member

wojtek-t commented Apr 9, 2021

/triage accepted
/priority important-soon

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cindy52, ptabor, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ptabor
Copy link
Contributor

ptabor commented Apr 9, 2021

/lgtm

@k8s-ci-robot
Copy link
Contributor

@ptabor: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot merged commit 5b038e6 into kubernetes:master Apr 9, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/gcp Issues or PRs related to gcp provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants