-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run the etcd as non-root #100635
Run the etcd as non-root #100635
Conversation
Welcome @cindy52! |
Hi @cindy52. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
cluster/gce/util.sh
Outdated
@@ -1254,6 +1254,8 @@ ${CUSTOM_CALICO_NODE_DAEMONSET_YAML//\'/\'\'} | |||
CUSTOM_TYPHA_DEPLOYMENT_YAML: | | |||
${CUSTOM_TYPHA_DEPLOYMENT_YAML//\'/\'\'} | |||
CONCURRENT_SERVICE_SYNCS: $(yaml-quote "${CONCURRENT_SERVICE_SYNCS:-}") | |||
ETCD_RUNASUSER: $(yaml-quote "${ETCD_RUNASUSER:-2000}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not set then this should be defaulted to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already an etcd user as 2000, I'm afraid of defaulting it to 0 will end up with running as root again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the etcd user 2000 created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find it anywhere in the code but when I ssh to the master node, I can see the etcd user by cat /etc/group
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
cluster/gce/manifests/etcd.manifest
Outdated
}, | ||
"runAsUser": {{runAsUser}}, | ||
"runAsGroup": {{runAsGroup}}, | ||
"runAsNonRoot": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting it to true will prevent the pod run as user 0 which is root, not sure it's desired or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
cluster/gce/gci/configure-helper.sh
Outdated
sed -i -e "s@{{runAsUser}}@${ETCD_RUNASUSER}@g" "${temp_file}" | ||
sed -i -e "s@{{runAsGroup}}@${ETCD_RUNASGROUP}@g" "${temp_file}" | ||
chown -R ${ETCD_RUNASUSER}:${ETCD_RUNASGROUP} /var/etcd/ | ||
chown -R ${ETCD_RUNASUSER}:${ETCD_RUNASGROUP} /var/log/etcd.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already done on line 1893 and 1894
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
cluster/gce/gci/configure-helper.sh
Outdated
@@ -1878,10 +1889,15 @@ function start-etcd-servers { | |||
if [[ -e /etc/init.d/etcd ]]; then | |||
rm -f /etc/init.d/etcd | |||
fi | |||
prepare-log-file /var/log/etcd.log | |||
if [[ -n "${ETCD_RUNASUSER:-}" && -n "${ETCD_RUNASGROUP:-}" ]]; then | |||
prepare-log-file /var/log/etcd.log ${ETCD_RUNASUSER} ${ETCD_RUNASGROUP} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only need to pass the user and not the group see:
kubernetes/cluster/gce/gci/configure-helper.sh
Line 2029 in 816bdd3
prepare-log-file /var/log/kube-controller-manager.log ${KUBE_CONTROLLER_MANAGER_RUNASUSER} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
cluster/gce/gci/configure-helper.sh
Outdated
if [[ -n "${ETCD_RUNASUSER:-}" && -n "${ETCD_RUNASGROUP:-}" ]]; then | ||
container_security_context=$(echo "{\"allowPrivilegeEscalation\": false, \"capabilities\": {\"drop\": [\"all\"]}}" | base64 | tr -d '\r\n') | ||
else | ||
container_security_context=$(echo "{\"allowPrivilegeEscalation\": false}" | base64 | tr -d '\r\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets just leave the securityContext alone if it is running as root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
cluster/gce/gci/configure-helper.sh
Outdated
chown -R ${ETCD_RUNASUSER:-0}:${ETCD_RUNASGROUP:-0} /var/etcd/ | ||
# Replace capabilities | ||
if [[ -n "${ETCD_RUNASUSER:-}" && -n "${ETCD_RUNASGROUP:-}" ]]; then | ||
container_security_context=$(echo "{\"allowPrivilegeEscalation\": false, \"capabilities\": {\"drop\": [\"all\"]}}" | base64 | tr -d '\r\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need the base64 and the tr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
cluster/gce/gci/configure-helper.sh
Outdated
@@ -1858,6 +1858,18 @@ function prepare-etcd-manifest { | |||
fi | |||
# Replace the volume host path. | |||
sed -i -e "s@/mnt/master-pd/var/etcd@/mnt/disks/master-pd/var/etcd@g" "${temp_file}" | |||
# Replace the runAsUser and runAsGroup | |||
sed -i -e "s@{{runAsUser}}@${ETCD_RUNASUSER:-0}@g" "${temp_file}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be safest to not edit the manifest if ETCD_RUNASUSER and ETCD_RUNASGROUP is not set. Basically if ETCD_RUNASUSER and ETCD_RUNASGROUP is not set then this change is a noop, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to replace the {{runAsUser}} and {{runAsGroup}} in the manifest anyway isn't? I mean replace it to 0 if ETCD_RUNASUSER and ETCD_RUNASGROUP is not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do that? If we don't set it then it will anyways run as 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manifest is written as:
"runAsUser": {{runAsUser}},
"runAsGroup": {{runAsGroup}}
The {{runAsUser}} and group need to be replace at least with some value, or are we talking about different thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I am talking about this, but I think we can make the manifest to be:-
securityContext {
...
{{runAsUser}}
{{runAsGroup}}
}
and replace {{runAsUser}}
with "\"runAsUser\": ${ETCD_RUNASUSER}"
and same for runAsGroup. That way we only insert it ETCD_RUNASUSER else we replace {{runAsUser}} with empty string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to replace the entire securityContext as empty when it's root? Otherwise when it's root and runAsUser and runAsGroup is empty, the delimiter ,
at seccompProfile
will be a problem.
"securityContext": {
"seccompProfile": {
"type": "RuntimeDefault"
},
{{runAsUser}}
{{runAsGroup}}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm... you can add those before the seccompProfile 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
cluster/gce/gci/configure-helper.sh
Outdated
if [[ -n "${ETCD_RUNASUSER:-}" && -n "${ETCD_RUNASGROUP:-}" ]]; then | ||
container_security_context=$(echo "{\"allowPrivilegeEscalation\": false, \"capabilities\": {\"drop\": [\"all\"]}}") | ||
else | ||
container_security_context=$(echo "{}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of making the securityContext: {}
when ETCD_RUNASUSER is not set, you can make it so that:-
instead of having
securityContext: {{security_context}}
You make it just:
{{containerSecurityContext}}
Then this would be
container_security_context=""
if [[ -n "${ETCD_RUNASUSER:-}" && -n "${ETCD_RUNASGROUP:-}" ]]; then
container_security_context=$(echo "{\"allowPrivilegeEscalation\": false, \"capabilities\": {\"drop\": [\"all\"]}}")
fi
sed -i -e "s@{{security_context}}@${container_security_context}@g" "${temp_file}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
cluster/gce/gci/configure-helper.sh
Outdated
pod_run_as_user="\"runAsUser\": ${ETCD_RUNASUSER}," | ||
pod_run_as_group="\"runAsGroup\": ${{ETCD_RUNASGROUP}," | ||
container_security_context="\"securityContext\": {\"allowPrivilegeEscalation\": false, \"capabilities\": {\"drop\": [\"all\"]}}," | ||
chown -R ${ETCD_RUNASUSER}:${ETCD_RUNASGROUP} /var/etcd/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really like to avoid "chown" in the prepare manifests function.
Can we move this one to start-etcd-servers (also given that /var/etcd is actually shared between the two etcd's)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I'm mostly fine with this modulo the comment. But I would like @ptabor to also take a look at it. |
cluster/gce/manifests/etcd.manifest
Outdated
@@ -26,7 +29,7 @@ | |||
"command": [ | |||
"/bin/sh", | |||
"-c", | |||
"set -o errexit; if [ -e /usr/local/bin/migrate-if-needed.sh ]; then /usr/local/bin/migrate-if-needed.sh 1>>/var/log/etcd{{ suffix }}.log 2>&1; fi; exec /usr/local/bin/etcd --name etcd-{{ hostname }} --listen-peer-urls {{ etcd_protocol }}://{{ host_ip }}:{{ server_port }} --initial-advertise-peer-urls {{ etcd_protocol }}://{{ hostname }}:{{ server_port }} --advertise-client-urls {{ etcd_apiserver_protocol }}://127.0.0.1:{{ port }} --listen-client-urls {{ etcd_apiserver_protocol }}://{{ listen_client_ip }}:{{ port }} {{ quota_bytes }} --data-dir /var/etcd/data{{ suffix }} --initial-cluster-state {{ cluster_state }} --initial-cluster {{ etcd_cluster }} {{ etcd_creds }} {{ etcd_apiserver_creds }} {{ etcd_extra_args }} 1>>/var/log/etcd{{ suffix }}.log 2>&1" | |||
"set -o errexit; if [ -e /usr/local/bin/migrate-if-needed.sh ]; then /usr/local/bin/migrate-if-needed.sh 1>>/var/log/etcd{{ suffix }}.log 2>&1; fi; exec /usr/local/bin/etcd-{{ pillar.get('etcd_version', '3.4.13') }} --name etcd-{{ hostname }} --listen-peer-urls {{ etcd_protocol }}://{{ host_ip }}:{{ server_port }} --initial-advertise-peer-urls {{ etcd_protocol }}://{{ hostname }}:{{ server_port }} --advertise-client-urls {{ etcd_apiserver_protocol }}://127.0.0.1:{{ port }} --listen-client-urls {{ etcd_apiserver_protocol }}://{{ listen_client_ip }}:{{ port }} {{ quota_bytes }} --data-dir /var/etcd/data{{ suffix }} --initial-cluster-state {{ cluster_state }} --initial-cluster {{ etcd_cluster }} {{ etcd_creds }} {{ etcd_apiserver_creds }} {{ etcd_extra_args }} 1>>/var/log/etcd{{ suffix }}.log 2>&1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? I assume that always a 'target' image is declared by the manifest,
and the .../bin/etcd in the image is the 'target' version within the image.
Long-term I wish we get rid of 'specific-version' binaries in the image... so we shouldn't add dependencies on this naming convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed because in runMigrate, if the copy binary flag is true, it will copy the specific version of etcd and etcdclt to /etcd and /etcdclt https://github.com/kubernetes/kubernetes/blob/master/cluster/images/etcd/migrate/migrate.go#L77. As nonroot, we set the copy binary as false, thus we need to specify the version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see bellow - I think ./etcd is already in the image and does not requires copying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
cluster/gce/manifests/etcd.manifest
Outdated
@@ -65,7 +72,7 @@ | |||
"command": [ | |||
"/bin/sh", | |||
"-c", | |||
"set -x; exec /usr/local/bin/etcdctl --endpoints=127.0.0.1:{{ port }} {{ etcdctl_certs }} --command-timeout=15s endpoint health" | |||
"set -x; exec /usr/local/bin/etcdctl-{{ pillar.get('etcd_version', '3.4.13') }} --endpoints=127.0.0.1:{{ port }} {{ etcdctl_certs }} --command-timeout=15s endpoint health" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. E.g. listed binaries in the 3.4.9 (gcr.io/google-containers/etcd-amd64:3.4.9) image:
-r-xr-xr-x 1 ptab primarygroup 23827424 Jun 25 2020 etcd
-r-xr-xr-x 1 ptab primarygroup 20186048 Jun 25 2020 etcd-3.0.17
-r-xr-xr-x 1 ptab primarygroup 16352288 Jun 25 2020 etcd-3.1.12
-r-xr-xr-x 1 ptab primarygroup 17899872 Jun 25 2020 etcd-3.2.24
-r-xr-xr-x 1 ptab primarygroup 22102784 Jun 25 2020 etcd-3.3.17
-r-xr-xr-x 1 ptab primarygroup 23827424 Jun 25 2020 etcd-3.4.9
-r-xr-xr-x 1 ptab primarygroup 17612384 Jun 25 2020 etcdctl
-r-xr-xr-x 1 ptab primarygroup 18468640 Jun 25 2020 etcdctl-3.0.17
-r-xr-xr-x 1 ptab primarygroup 14319264 Jun 25 2020 etcdctl-3.1.12
-r-xr-xr-x 1 ptab primarygroup 15279616 Jun 25 2020 etcdctl-3.2.24
-r-xr-xr-x 1 ptab primarygroup 17770784 Jun 25 2020 etcdctl-3.3.17
-r-xr-xr-x 1 ptab primarygroup 17612384 Jun 25 2020 etcdctl-3.4.9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I will wait for @ptabor to take another look, but I'm generally fine with it. /ok-to-test |
Instead of re-running everything consider re-running the only the ones that failed. You can do this by commenting |
Co-authored-by: Vinayak Goyal <vinayakankugoyal@gmail.com>
/retest |
/assign @dchen1107 |
/triage accepted /lgtm |
/triage accepted /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cindy52, ptabor, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
@ptabor: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind feature